EchoMimic Series

EchoMimicV1: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning. GitHub
EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation. GitHub
EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation. GitHub

1.3B Parameters are All You Need for Unified
Multi-Modal and Multi-Task Human Animation

Rang Meng Yan Wang Weipeng Wu Ruobing Zheng Yuming Li Chenguang Ma
Terminal Technology Department, Alipay, Ant Group.

[Paper]

[Github]

[HuggingFace]

[ModelScope] wechat

[WeChat]

Abstract

EchoMimicV3 makes human animation Faster, Higher in quality, Stronger in generalization, and makes various tasks together in one model. EchoMimicV3 is an efficient framework that unifies multi-task and multi-modal human animation. At the core of EchoMimicV3 lies a threefold design: a Soup-of-Tasks paradigm, a Soup-of-Modals paradigm, and a novel training and inference strategy. The Soup-of-Tasks leverages multi-task mask inputs and a counter-intuitive task allocation strategy to achieve multi-task gains without multi-model pains. Meanwhile, the Soup-of-Modals introduces a Coupled-Decoupled Multi-Modal Cross Attention module to inject multi-modal conditions, complemented by a Timestep Phase-aware Multi-Modal Allocation mechanism to dynamically modulate multi-modal mixtures. Besides, we propose Negative Direct Preference Optimization and Phase-aware Negative Classifier-Free Guidance, which ensure stable training and inference. Extensive experiments and analyses demonstrate that EchoMimicV3, with a minimal model size of 1.3 billion parameters, achieves competitive performance in both quantitative and qualitative evaluations.

Gallery

Chinese Driven Audio

English Driven Audio

Singing

Cartoon

Human-Object Interaction

Comparison

BibTex

@article{meng2025echomimicv3,

      title={EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation},
 
      author={Rang Meng, Yan Wang, Weipeng Wu, Ruobing Zheng, Yuming Li, Chenguang Ma},

      year={2025},

      eprint={2507.03905},

      archivePrefix={arXiv},

      primaryClass={cs.CV}

}