Skip to content

Models for using prosody cloning and GAN-generated speaker embeddings

Latest
Compare
Choose a tag to compare
@SarinaMeyer SarinaMeyer released this 28 Oct 14:24

This release contains all models of our latest pipeline version capable of generating artificial speaker embeddings using a GAN, prosody cloning and prosody modifications using offsets.

Place the unzipped folders in a models directory located directly under root. So, the structure should look like follows:

speaker-anonymization
   └─ models
        └─ anonymization
            └─ gan_style-embed
                └─ settings.json
                └─ style-embed_wgan.pt
        └─ asr
            └─ asr_branchformer_tts-phn_en.zip
       └─ tts
            └─ Aligner
                └─ aligner.pt
            └─ Embedding
                └─ embedding_function.pt
            └─ FastSpeech2_Multi
                └─ prosody_cloning.pt
            └─ HiFiGAN_combined
                └─ best.pt

Note: Do not unzip the ASR models but keep them as zip folders! They will be unzipped during runtime.