Persian Grapheme To Phoneme With Transformer

Since Grapheme to Phoneme is a sequence to sequence problem we can use transformer model from the famous paper [Attention is all you need] to model the problem.

masking,padding,data collating functions are based on the pytorch tutorial for Language Translation in this link [tutorial]

In this problem,input sequence and output sequence have differnet vocabulary,like language translation problem.

Model variations

I trained two variation of transformer model. first with 3 encoder layers and 3 decoder layers.I called this PersianG2P-light.

the second variation has 5 encoder layers and 5 decoder layers.I called this PersianG2P-base.

Dataset

I'm not allowed to share the dataset but if you have a custom dataset the format of the 'data.csv' file should be something like this :

grapheme	phoneme
سلام	salAm

Usage

1.For Training,Evaluating and Inference :

open 'PersianG2P.ipynb' in google colab or jupyter notebook and run the cells

2.Just Inference :

download PersianG2P-base checkpoint form this [link]

open 'JustInference.ipynb' in google colab or jupyter notebook and run the cells

Results

Our dataset had 75000 grapheme and phoneme pairs. so I splitted them into 80% train and 20% test.the following results are evaluated with test data.

There are two common evaluation metrics for grapheme to phoneme problem.[reference]

PER (Phonetic error rate) : For each word, calculate the percentage of the total number of predicted phonemes that are correct when compared to the gold phonemes. Average this across all words.

WER (Word error rate): For each word, compare the entire sequence of predicted phonemes to the gold phonemes. We calculate the percentage of words whose predicted phonemes are an exact match to the gold phonemes.

both of these evaluations are calculated with "jiwer" library [jiwer]

Model	PER	WER
PersianG2P-light	6.5 %	28.5 %
PersianG2P-base	6.19 %	26.5 %

Examples

Here are some random examples :

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
JustInference.ipynb		JustInference.ipynb
LICENSE		LICENSE
PersianG2P.ipynb		PersianG2P.ipynb
README.md		README.md
examples.JPG		examples.JPG

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Persian Grapheme To Phoneme With Transformer

Model variations

Dataset

Usage

1.For Training,Evaluating and Inference :

2.Just Inference :

Results

Examples

About

Releases

Packages

Languages

License

sajadalipour7/Persian-Grapheme-To-Phoneme-With-Transformer

Folders and files

Latest commit

History

Repository files navigation

Persian Grapheme To Phoneme With Transformer

Model variations

Dataset

Usage

1.For Training,Evaluating and Inference :

2.Just Inference :

Results

Examples

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages