Skip to content

LSTM encoder-decoder sequence-to-sequence models for Icelandic

License

Notifications You must be signed in to change notification settings

grammatek/g2p-lstm

Repository files navigation

LSTM encoder-decoder sequence-to-sequence models for Icelandic

This directory contains an LSTM encoder-decoder sequence-to-sequence models, trained for Icelandic g2p. The models were trained using the baseline for the Sigmorphon 2020 Shared task in multilingual g2p, with manually transcribed training data of ~5,800 words per pronunciation variant.

See code for training and evaluation: https://github.com/sigmorphon/2020/tree/master/task1

Reference paper: Gorman, Kyle et al. (2020): The SIGMORPHON 2020 Shared Task on Multilingual Grapheme-to-Phoneme Conversion. In: Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology (https://www.aclweb.org/anthology/2020.sigmorphon-1.2/)

Fairseq setup

  1. With Conda:

conda is recommended for a reproducible environment. Once you have conda installed, create a new conda environment by running this:

conda env create -f environment.yml

The new environment is called "fairseq-lstm". Activate it by running this:

conda activate fairseq-lstm
  1. Clone Fairseq and install, see: https://github.com/pytorch/fairseq

Trouble shooting & inquiries

This application is still in development. If you encounter any errors, feel free to open an issue inside the issue tracker. You can also contact us via email.

Contributing

You can contribute to this project by forking it, creating a private branch and opening a new pull request.

About

LSTM encoder-decoder sequence-to-sequence models for Icelandic

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages