Skip to content

Latest commit

 

History

History
46 lines (41 loc) · 1.86 KB

README.md

File metadata and controls

46 lines (41 loc) · 1.86 KB

Diffusion Models with Applications in Face Reenactment and Talking-Face Synthesis

Preparation

  • Clone the repo and its submodules:
git clone --recurse-submodules -j4 https://github.com/GiannisPikoulis/dsml-thesis
cd dsml-thesis
  • A suitable conda environment named ldm can be created and activated with:
conda env create -f environment.yaml
conda activate ldm
cd talking_face/external/av_hubert/fairseq/
pip install --editable ./

First Stage

In order to train first stage autonencoders, please follow the instructions of the Taming Transformers repository. We recommend using a VQGAN as a first stage model.

LDM Training

In both face-reenactment and talking-face generation scenarios, LDM training can be performed as follows:

CUDA_VISIBLE_DEVICES=<GPU_ID> python main.py --base configs/latent-diffusion/<config_spec>.yaml -t --gpus 0,

Citation

If you use our code or your research benefits from this repository, consider citing the following:

@misc{pikoulis2023photorealistic,
      title={Photorealistic and Identity-Preserving Image-Based Emotion Manipulation with Latent Diffusion Models}, 
      author={Ioannis Pikoulis and Panagiotis P. Filntisis and Petros Maragos},
      year={2023},
      eprint={2308.03183},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowlegements

Contact

For questions feel free to open an issue.