Hi, I am Jianzong Wu (吴健宗), a PhD Student at School of Intelligence Science and Technology, Peking University (PKU), advised by Prof. Yunhai Tong. Previously, I obtained my bachelor’s degree at University of Science and Technology of China (USTC). I work closely with Dr. Xiangtai Li from NTU, Dr. Jingbo Wang from CUHK (MMLAB), Dr. Yanhong Zeng from SAI, and Dr. Xin Tao from Kling team..

My research interests focus on leveraging AIGC technologies to create practical application tools that can improve people’s daily lives and drive innovations in academia. My primary research areas include multimodal learning and controllable generation of images, videos, and artistic creations.

I will graduate in 2026 Summer. I’m looking for a job. Please contact me through email.

🔥 News

2025.03: 🎉🎉 Congradulations! Three papers were accepted by CVPR 2025!
2024.12: 🎉🎉 I have started my internship at Kling Team, Kuaishou!
2024.09: 🎉🎉 MotionBooth is accepted by NeurIPS 2024 as spotlight!
2024.02: 🎉🎉 LGVI is accepted by CVPR!
2024.02: 🎉🎉 Towards Robust Referring Image Segmentation is accepted by TIP!
2024.01: 🎉🎉 Towards Open Vocabulary Learning: A Survey is accepted by TPAMI!
2023.07: 🎉🎉 CGG is accepted by ICCV-2023!

📝 Selected Publications

Full publications can be seen here

* means equal contribution.

In Submission

Does Hearing Help Seeing? Investigating Audio-Video Joint Denoising for Video Generation

Jianzong Wu, Hao Lian, Dachao Hao, Ye Tian, Qingyu Shi, Biaolong Chen, Hao Jiang, Yunhai Tong

Code

We systematically evaluate the video generation performance gain with audio-video joint training.

In Submission

VMoBA: Mixture-of-Block Attention for Video Diffusion Models

Jianzong Wu, Liang Hou, Haotian Yang, Xin Tao, Ye Tian, Pengfei Wan, Di Zhang, Yunhai Tong

Code

The first sparse attention mechanism based on MoBA, designed for video diffusion model training.

CVPR 2025

DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation

Jianzong Wu, Chao Tang, Jingbo Wang, Yanhong Zeng, Xiangtai Li, Yunhai Tong

Project | Code

DiffSensei can generate controllable black-and-white manga panels with flexible character adaptation.

NeurIPS 2024 Spotlight

MotionBooth: Motion-Aware Customized Text-to-Video Generation

Jianzong Wu, Xiangtai Li, Yanhong Zeng, Jiangning Zhang, Qianyu Zhou, Yining Li, Yunhai Tong, Kai Chen

Project | Code

Let’s animate customized subjects with precise control over both object and camera movements!

CVPR 2024

Towards Language-Driven Video Inpainting via Multimodal Large Language Models

Jianzong Wu, Xiangtai Li, Chenyang Si, Shangchen Zhou, Jingkang Yang, Jiangning Zhang, Yining Li, Kai Chen, Yunhai Tong, Zewei Liu, Chen Change Loy

Project | Code

Novel language-driven video inpainting task, dataset, and model.

TPAMI

Towards Open Vocabulary Learning: A Survey

Jianzong Wu*, Xiangtai Li*, Shilin Xu*, Haobo Yuan, Henghui Ding, Xia Li, Jiangning Zhang, Yunhai Tong, Xudong Jiang, Bernard Ghanem, Dacheng Tao

Code

A survey on open vocabulary learning.

ICCV 2023

Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation

Jianzong Wu*, Xiangtai Li*, Henghui Ding, Xia Li, Guangliang Cheng, Yunhai Tong, Chen Change Loy

Code

Query-based open vocabulary segmentation aided by caption generation.

TIP

Towards Robust Referring Image Segmentation

Jianzong Wu, Xiangtai Li, Xia Li, Henghui Ding, Yunhai Tong, Dacheng Tao

Code

Novel robust referring image segmentation (R-RIS) task, dataset, and model.

📖 Educations_

2021.07 - now, PhD Student in Peking University (PKU)
2017.09 - 2021.07, Bachelor in University of Science and Technology of China (USTC)

💻 Internships

2024.12 - now, Kling Team，Kuaishou，mentored by Haotian Yang.
2023.11 - 2024.12, Shanghai Artificial Intelligence Laboratory, mentored by Dr. Yining Li, Dr. Jingbo Wang, and Dr. Xiangtai Li.
2020.10 - 2021.09, Search Technology Center Asia (STCA), Microsoft, mentored by Dr. Congrui Huang and Dr. Yujing Wang