- Clone the repo and its submodules:
git clone --recurse-submodules -j4 https://github.com/GiannisPikoulis/dsml-thesis
cd dsml-thesis
- A suitable conda environment named ldm can be created and activated with:
conda env create -f environment.yaml
conda activate ldm
cd talking_face/external/av_hubert/fairseq/
pip install --editable ./
In order to train first stage autonencoders, please follow the instructions of the Taming Transformers repository. We recommend using a VQGAN as a first stage model.
In both face-reenactment and talking-face generation scenarios, LDM training can be performed as follows:
CUDA_VISIBLE_DEVICES=<GPU_ID> python main.py --base configs/latent-diffusion/<config_spec>.yaml -t --gpus 0,
If you use our code or your research benefits from this repository, consider citing the following:
@misc{pikoulis2023photorealistic,
title={Photorealistic and Identity-Preserving Image-Based Emotion Manipulation with Latent Diffusion Models},
author={Ioannis Pikoulis and Panagiotis P. Filntisis and Petros Maragos},
year={2023},
eprint={2308.03183},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
- https://github.com/CompVis/latent-diffusion
- https://github.com/CompVis/stable-diffusion
- https://github.com/CompVis/taming-transformers
- https://github.com/gwang-kim/DiffusionCLIP
- https://github.com/filby89/spectre
For questions feel free to open an issue.