I am a 2nd-year Ph.D. candidate at Harbin Institute of Technology (Shenzhen), under the supervision of Zhenyu He. Now I am doing research under the supervision of Xin Li and Ming-Hsuan Yang.

🔥 News

  • 2024.08:  🎉🎉 Winner of the LSVOS V6 Challenge (VOS Track)
  • 2024.07:  🎉🎉 Winner of the VOTS 2024 Challenge (VOTS Track)
  • 2024.07:  🎉🎉 One paper is accepted to ECCV 2024
  • 2024.06:  🎉🎉 Winner of the 3th PVUW workshop (MOSE Track)
  • 2024.03:  🎉🎉 One paper is accepted to TMM
  • 2023.08:  🎉🎉 2nd Place of 5th LSVOS Challenge

📝 Publications

Arxiv
sym

Learning Spatial-Semantic Features for Robust Video Object Segmentation [pdf][code]

Xin Li, Deshui Miao, Zhenyu He, Yaowei Wang, Huchuan Lu, Ming-Hsuan Yang

  • In this paper, we propose a robust video object segmentation framework equipped with spatial-semantic features and discriminative object queries to address the above issues. Specifically, we construct a spatial-semantic network comprising a semantic embedding block and spatial dependencies modeling block to associate the pretrained ViT features with global semantic features and local spatial features, providing a comprehensive target representation. In addition, we develop a masked cross-attention module to generate object queries that focus on the most discriminative parts of target objects during query propagation, alleviating noise accumulation and ensuring effective long-term query propagation.
ECCV 2024
sym

Spatial-Temporal Multi-level Association for Video Object Segmentation [pdf][code]

Deshui Miao, Xin Li, Zhenyu He, Huchuan Lu, Ming-Hsuan Yang

European Conference on Computer Vision (ECCV) 2024

  • we propose a spatial-temporal memory to assist feature association and temporal ID assignment and correlation. We evaluate the proposed method by conducting extensive experiments on numerous video object segmentation datasets, including DAVIS 2016/2017 val, DAVIS 2017 test-dev, and YouTube-VOS 2018/2019 val. The favorable performance against the state-of-the-art methods demonstrates the effectiveness of our approach.
TMM
sym

Context-Guided Black-Box Attack for Visual Tracking [pdf]

Xingsen Huang, Deshui Miao, Hongpeng Wang, Yaowei Wang, Xin Li

IEEE Transactions on Multimedia (TMM)

  • We propose a context-guided black-box attack method to investigate the robustness of recent advanced deep trackers against spatial and temporal interference.

🎖 Services

Reviewer of NIPS 2024.

🎖 Honors and Awards

Winner of LSVOSV6 workshop VOS Track
sym

Discriminative Spatial-Semantic VOS Solution: 1st Place Solution for 6th LSVOS [pdf] [code]

Winner of VOTS2024 workshop
sym

Learning Spatial-Semantic Features for Robust Video Object Segmentation [pdf] [code]

Winner of CVPR 2024 PVUW workshop
sym

1st Place Solution for MOSE Track in CVPR 2024 PVUW Workshop: Complex Video Object Segmentation [pdf] [code]

2nd of ICCV 2023 LSVOS workshop
sym

2nd Place Solution for the LSVOS Challenge 2023: Video Object Segmentation [pdf]

💻 Internships

  • 2021.03 - 2022.06, Sensetime, Beijing, China.
  • 2021.06 - 2022.05, Alibaba AI Research (GaoDe map), Beijing, China.
  • 2022.09 - Now, PengCheng Laboratory (supervised by Xin Li),Shenzhen, China.