- A desktop notification app that uses a Seq2Seq model to allow users to create various language notifications.
In this section, we will prepare our dataset for training by performing the following tasks:
- Clean the text data by removing punctuation symbols, numbers, and converting characters to lowercase.
- Replace Unicode characters with their ASCII equivalents.
- Determine the maximum sequence length of both English and French phrases to establish input and output sequence lengths for our model.
english_text | french_text | |
---|---|---|
0 | youre very clever | [start] vous etes fort ingenieuse [end] |
1 | are there kids | [start] y atil des enfants [end] |
2 | come in | [start] entrez [end] |
3 | wheres boston | [start] ou est boston [end] |
4 | you see what i mean | [start] vous voyez ce que je veux dire [end] |
⚒️ We will tokenize the English and French phrases using separate Tokenizer instances and generate padded sequences for model training. The steps involved are as follows:
- Fit a Tokenizer to the English phrases and another Tokenizer to their French equivalents.
- Compute the vocabulary sizes based on the Tokenizer instances.
- Create padded sequences for all phrases.
- Prepare features and labels for training:
- The features consist of the padded English sequences and the padded
French sequences excluding the
[end]
tokens. - The labels consist of the padded French sequences excluding the
[start]
tokens.
1563/1563 [==============================] - 14s 9ms/step - loss: 0.2290 - accuracy: 0.8512
Test Loss: 0.22895030677318573
Validation Accuracy: 0.8511516451835632
English: let us out of here => French: laissenous sortir dici
English: it could be fun => French: ca pourrait etre marrant
English: this is my new video => French: cest ma nouvelle video
English: do you like fish => French: aimestu le poisson
English: you were in a coma => French: vous etiez dans le coma
English: dont be upset => French: ne soyez pas fache
English: didnt you know that => French: le saviezvous
English: im not exactly sure => French: je nen suis pas a la tete
English: i put it on your desk => French: je lai mise sur votre bureau
English: somehow tom knew => French: pourtant tom savait
Compare against Baseline model is: LibreTranslate which uses a NMT Model architecture
English: let us out of here => French: laissez-nous sortir d'ici
English: it could be fun => French: ça pourrait être amusant
English: this is my new video => French: c'est ma nouvelle vidéo
English: do you like fish => French: vous aimez le poisson
English: you were in a coma => French: tu étais dans le coma
English: dont be upset => French: ne soyez pas contrarié
English: didnt you know that => French: tu ne savais pas que
English: im not exactly sure => French: im pas exactement sûr
English: i put it on your desk => French: je l'ai mis sur ton bureau
English: somehow tom knew => French: tom le savait