Hi, I am Jianzong Wu (吴健宗), a PhD Student at School of Intelligence Science and Technology, Peking University (PKU), advised by Prof. Yunhai Tong. Previously, I obtained my bachelor’s degree at University of Science and Technology of China (USTC). I work closely with Dr. Xiangtai Li from NTU, Dr. Jingbo Wang from CUHK (MMLAB), and Yanhong Zeng from SAI.
My research interests focus on leveraging AIGC technologies to create practical application tools that can improve people’s daily lives and drive innovations in academia. My primary research areas include multimodal learning and controllable generation of images, videos, and artistic creations.
I am looking for motivated collaborators and support from industry partners. Please contact me through email.
🔥 News
- 2025.03: 🎉🎉 Congradulations! Three papers were accepted by CVPR 2025!
- 2024.12: 🎉🎉 I have started my internship at Kling Team, Kuaishou!
- 2024.09: 🎉🎉 MotionBooth is accepted by NeurIPS 2024 as spotlight!
- 2024.02: 🎉🎉 LGVI is accepted by CVPR!
- 2024.02: 🎉🎉 Towards Robust Referring Image Segmentation is accepted by TIP!
- 2024.01: 🎉🎉 Towards Open Vocabulary Learning: A Survey is accepted by TPAMI!
- 2023.07: 🎉🎉 CGG is accepted by ICCV-2023!
📝 Selected Publications
Full publications can be seen here
* means equal contribution.

DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Jianzong Wu, Chao Tang, Jingbo Wang, Yanhong Zeng, Xiangtai Li, Yunhai Tong
- DiffSensei can generate controllable black-and-white manga panels with flexible character adaptation.

MotionBooth: Motion-Aware Customized Text-to-Video Generation
Jianzong Wu, Xiangtai Li, Yanhong Zeng, Jiangning Zhang, Qianyu Zhou, Yining Li, Yunhai Tong, Kai Chen
- Let’s animate customized subjects with precise control over both object and camera movements!

Towards Language-Driven Video Inpainting via Multimodal Large Language Models
Jianzong Wu, Xiangtai Li, Chenyang Si, Shangchen Zhou, Jingkang Yang, Jiangning Zhang, Yining Li, Kai Chen, Yunhai Tong, Zewei Liu, Chen Change Loy
- Novel language-driven video inpainting task, dataset, and model.

Towards Open Vocabulary Learning: A Survey
Jianzong Wu*, Xiangtai Li*, Shilin Xu*, Haobo Yuan, Henghui Ding, Xia Li, Jiangning Zhang, Yunhai Tong, Xudong Jiang, Bernard Ghanem, Dacheng Tao
- A survey on open vocabulary learning.

Jianzong Wu*, Xiangtai Li*, Henghui Ding, Xia Li, Guangliang Cheng, Yunhai Tong, Chen Change Loy
- Query-based open vocabulary segmentation aided by caption generation.

Towards Robust Referring Image Segmentation
Jianzong Wu, Xiangtai Li, Xia Li, Henghui Ding, Yunhai Tong, Dacheng Tao
- Novel robust referring image segmentation (R-RIS) task, dataset, and model.
📖 Educations_
- 2021.07 - now, PhD Student in Peking University (PKU)
- 2017.09 - 2021.07, Bachelor in University of Science and Technology of China (USTC)
💻 Internships
- 2024.12 - now, Kling Team,Kuaishou,mentored by Haotian Yang.
- 2023.11 - 2024.12, Shanghai Artificial Intelligence Laboratory, mentored by Dr. Yining Li, Dr. Jingbo Wang, and Dr. Xiangtai Li.
- 2020.10 - 2021.09, Search Technology Center Asia (STCA), Microsoft, mentored by Dr. Congrui Huang and Dr. Yujing Wang