EchoMimic Series

EchoMimicV1: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning. GitHub
EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation. GitHub
EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation. GitHub


1.3B Parameters are All You Need for Unified
Multi-Modal and Multi-Task Human Animation

Rang Meng Yan Wang Weipeng Wu Ruobing Zheng Yuming Li Chenguang Ma
Terminal Technology Department, Alipay, Ant Group.

arxiv[Paper]      github[Github]      huggingface[HuggingFace]      modelscope[ModelScope]      wechat[WeChat]     

Abstract

EchoMimicV3 makes human animation Faster, Higher in quality, Stronger in generalization, and makes various tasks together in one model. EchoMimicV3 is an efficient framework that unifies multi-task and multi-modal human animation. At the core of EchoMimicV3 lies a threefold design: a Soup-of-Tasks paradigm, a Soup-of-Modals paradigm, and a novel training and inference strategy. The Soup-of-Tasks leverages multi-task mask inputs and a counter-intuitive task allocation strategy to achieve multi-task gains without multi-model pains. Meanwhile, the Soup-of-Modals introduces a Coupled-Decoupled Multi-Modal Cross Attention module to inject multi-modal conditions, complemented by a Timestep Phase-aware Multi-Modal Allocation mechanism to dynamically modulate multi-modal mixtures. Besides, we propose Negative Direct Preference Optimization and Phase-aware Negative Classifier-Free Guidance, which ensure stable training and inference. Extensive experiments and analyses demonstrate that EchoMimicV3, with a minimal model size of 1.3 billion parameters, achieves competitive performance in both quantitative and qualitative evaluations.

Gallery

Comparison

BibTex

@article{meng2025echomimicv3,
title={EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation},
author={Rang Meng, Yan Wang, Weipeng Wu, Ruobing Zheng, Yuming Li, Chenguang Ma},
year={2025},
eprint={2507.03905},
archivePrefix={arXiv},
primaryClass={cs.CV}
}