The code is forked from Fairseq-v0.12.3. For more Installation details, please refer to Fairseq
Training scripts and configurations for the MuST-C dataset are as follows:
egs
|---machine_translation
| |---train.sh
| |---decode.sh
| |---load_embedding.py
|---pretrain-all
| |---joint_train_merge.sh
| |---decode.sh
| |---device_run.sh
| |---conf
• Prepare MT training data.
• Modify the necessary paths in machine_translation/train.sh
, and run machine_translation/train.sh
to pretrain MT model.
• Adjust all the required paths in the machine_translation/decode.sh
to match those in machine_translation/train.sh
, and run machine_translation/decode.sh
to inference your pretrained MT model.
• Use machine_translation/load_embedding.py
to fetch necessary word embeddings from pretrianed MT model.
• Download the Hubert-base
pretrained Model without fune-tuning.
• Prepare the MuST-C ST training data, please follow here.
• Modify the necessary paths in the pretrain-all/conf/train_soft_alignment.yaml
, such as:
w2v-path=/your/path/to/hubert
mt-model-path=/your/path/to/mt/pretrain/model
decoder-embed-path=/your/path/to/mt/word/embedding
• Set data path and other required paths in the pretrain-all/joint_train_merge.sh
, and run pretrain-all/joint_train_merge.sh
to fune-tune your model.
• Use pretrain-all/decode.sh
to inference your model