Since Grapheme to Phoneme is a sequence to sequence problem we can use transformer model from the famous paper [Attention is all you need] to model the problem.
masking,padding,data collating functions are based on the pytorch tutorial for Language Translation in this link [tutorial]
In this problem,input sequence and output sequence have differnet vocabulary,like language translation problem.
I trained two variation of transformer model. first with 3 encoder layers and 3 decoder layers.I called this PersianG2P-light.
the second variation has 5 encoder layers and 5 decoder layers.I called this PersianG2P-base.
I'm not allowed to share the dataset but if you have a custom dataset the format of the 'data.csv' file should be something like this :
grapheme | phoneme |
---|---|
سلام | salAm |
open 'PersianG2P.ipynb' in google colab or jupyter notebook and run the cells
download PersianG2P-base checkpoint form this [link]
open 'JustInference.ipynb' in google colab or jupyter notebook and run the cells
Our dataset had 75000 grapheme and phoneme pairs. so I splitted them into 80% train and 20% test.the following results are evaluated with test data.
There are two common evaluation metrics for grapheme to phoneme problem.[reference]
PER (Phonetic error rate) : For each word, calculate the percentage of the total number of predicted phonemes that are correct when compared to the gold phonemes. Average this across all words.
WER (Word error rate): For each word, compare the entire sequence of predicted phonemes to the gold phonemes. We calculate the percentage of words whose predicted phonemes are an exact match to the gold phonemes.
both of these evaluations are calculated with "jiwer" library [jiwer]
Model | PER | WER |
---|---|---|
PersianG2P-light | 6.5 % | 28.5 % |
PersianG2P-base | 6.19 % | 26.5 % |
Here are some random examples :