VideoRetalking

VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild

¹ Xidian University ² Tencent AI Lab ³ Tsinghua University
SIGGRAPH Asia 2022 (Conference Track)
^*Indicates Equal Contribution

Abstract

We present VideoReTalking, a new system to edit the faces of a real-world talking head video according to input audio, producing a high-quality and lip-syncing output video even with a different emotion. Our system disentangles this objective into three sequential tasks: (1) face video generation with a canonical expression; (2) audio-driven lip-sync; and (3) face enhancement for improving photo-realism. Given a talking-head video, we first modify the expression of each frame according to the same expression template using the expression editing network, resulting in a video with the canonical expression. This video, together with the given audio, is then fed into the lip-sync network to generate a lip-syncing video. Finally, we improve the photo-realism of the synthesized faces through an identity-aware face enhancement network and post-processing. We use learning-based approaches for all three steps and all our modules can be tackled in a sequential pipeline without any user intervention.

BibTeX

@misc{videoretalking, title={VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild}, author={Kun Cheng and Xiaodong Cun and Yong Zhang and Menghan Xia and Fei Yin and Mingrui Zhu and Xuan Wang and Jue Wang and Nannan Wang}, year={2022}, eprint={2211.14758}, archivePrefix={arXiv}, primaryClass={cs.CV} }

VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild

Abstract

Pipeline

Video1: Video Results in the Wild.

Video2: Comparison with SOTA Methods.

Video3: Ablation Study on Different Modules.

BibTeX