Xubing Ye (叶栩冰)

Now, I am a 2nd year master at Shenzhen International Graduate School, Tsinghua University (M.Eng.@THU’2026), supervised by Prof. Yansong Tang. I obtained my bachelor's degree from the School of Software Engineering at Tongji University in 2023. I was a research intern at Tencent ARC Lab where I worked with Dr. Yukang Gan, Dr. Yixiao Ge, and Dr. Ying Shan. I also collaborated with Dr. Zhao Yang at Shanghai AI Lab.

My current research interest lies at Multi-Modal Large Language Models (MLLMs), efficient MLLMs, and video understanding.

Email: yxb_tongji@163.com  /  Github  /  Scholar  /  X

profile photo
News

  • 2025-02: A paper on vision token pruning for MLLMs accepted by CVPR, 2025.
  • 2025-02: A paper on vision compression with LLMs accepted by CVPR, 2025.
  • 2024-12: Start an internship at Bytedance Douyin.
  • 2024-09: A paper on referring image & video segmentation accepted by TPAMI, 2024.
  • 2024-02: Start an internship at Tencent ARC Lab.
  • Recent Publications

    * Indicates Equal Contribution

    dise VoCo-LLaMA: Towards Vision Compression with Large Language Models
    Xubing Ye, Yukang Gan, Xiaoke Huang, Yixiao Ge, Yansong Tang
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
    [arXiv] [PDF] [Project Page] [Code]

    We propose VoCo-LLaMA, the first approach to compress vision information utilizing the LLMs' understanding paradigm, which can compress hundreds of vision tokens into a single VoCo token with minimal visual information loss.

    dise ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models
    Xubing Ye, Yukang Gan, Yixiao Ge, Xiao-ping Zhang, Yansong Tang
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
    [arXiv] [PDF] [Project Page]

    We propose ATP-LLaVA, a framework that adaptively determines pruning ratios instance-wise and LLM layer-wise for effective vision token pruning on large vision language models.

    dise Language-Aware Vision Transformer for Referring Segmentation
    Zhao Yang*, Jiaqi Wang*, Xubing Ye*, Yansong Tang, Kai Chen, Hengshuang Zhao, Philip H.S. Torr
    IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI, IF=20.8), 2024
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
    [IEEE] [PDF] [Code] [Conference Version]

    We propose LAVT, a Transformer-based universal referring image and video segmentation (RIS and RVOS) framework that performs language-aware visual encoding in place of cross-modal fusion post feature extraction.

    Selected Honors and Awards

  • Zhaoyi Scholarship, Comprehensive Outstanding Scholarship of Tsinghua University, 2024. (清华大学综合优秀奖学金, 校级一等)
  • First Prize Scholarship of Tongji University, 2023. (同济大学综合优秀奖学金, 校级一等)
  • Second Prize Scholarship of Tongji University, 2021, 2022. (同济大学综合优秀奖学金, 校级二等)
  • Industrial Experience

    dise Bytedance Douyin, Beijing, China. December, 2024 - Now.
  • Project: AI Search with MLLMs.
  • Work with Dr. Baihan Shu, Dr. Huaishan Zhou.
  • dise Tencent ARC Lab (PCG), Shenzhen, China. February, 2024 - December, 2024.
  • Project: Token Pruning & Compression for MLLMs, Image & Video QA.
  • Work with Dr. Yukang Gan, Dr. Yixiao Ge, Dr. Ying Shan.

  • Academic Services

  • Conference Reviewer: CVPR 2025 (Efficient LVM Workshop), JVCIR 2024
  • © Xubing Ye | Last updated: Mar. 17, 2024 | Website Template