.
├── Assignment_2.pdf
├── corpus
│ ├── dev.en
│ ├── dev.fr
│ ├── processed_dev.en
│ ├── processed_dev.fr
│ ├── processed_test.en
│ ├── processed_test.fr
│ ├── processed_train.en
│ ├── processed_train.fr
│ ├── test.en
│ ├── test.fr
│ ├── train.en
│ └── train.fr
├── metrics
│ └── testbleu.txt
├── plots.ipynb
├── README.md
├── Report.pdf
└── src
├── decoder.py
├── encoder.py
├── __init__.py
├── test.py
├── train.py
└── utils.py
- All scripts can be run by staying in the current (root) directory.
- Run the following command to process the corpus for the first time:
python -m src.utils
- This will create the processed files in the
corpus
directory withprocessed_
prefix.
- Run the following command to train the model:
python -m src.train
- This will train the model and save the model in the
weights
directory. - To change the hyperparameters, you can pass them as arguments to the above command. For example:
python -m src.train --epochs 10 --batch_size 64
- To see all the hyperparameters, you can run:
python -m src.train --help
- Run the following command to test the model:
python -m src.test
- This will test the model and save the BLEU score for the test set in the
metrics
directory.
- The file
weights/transformer.pt
is a state dict containing various parameters of the model and other information computed during the training process. - A list of these parameters is as follows:
args
: Hyperparameters of the training run.model_state_dict
: The state dict of the model.optimizer_state_dict
: The state dict of the optimizer.en_seq_len
: The sequence length of the English train dataset.fr_seq_len
: The sequence length of the French train dataset.word2idx_en
: The word to index mapping of the English train dataset.word2idx_fr
: The word to index mapping of the French train dataset.idx2word_en
: The index to word mapping of the English train dataset.idx2word_fr
: The index to word mapping of the French train dataset.train_losses
: The training losses.dev_losses
: The validation losses.dev_bleu
: The BLEU score on the validation set.dev_rouge
: The ROUGE score on the validation set.
- To load the model, you can use the following code:
model = Transformer(args) # Pass the necessary args
model.load_state_dict(torch.load('weights/transformer.pt')['model_state_dict'])
- Used a sequence length of
77
for faster training during hyperparameter tuning. - If a sequence length of
77
is used, the model will assume that apart from the<BOS>
and<EOS>
tokens, the maximum length of the sentence is77
. - The model will pad the sentences to a length of
77
if the sentence is shorter than77
. - The model will truncate the sentences to a length of
77
if the sentence is longer than77
. - The model will not learn the padding token.
- The position embeddings are not learned.
- The model uses the
Adam
optimizer with a learning rate of0.0001
. - The model uses the
CrossEntropyLoss
loss function.
- The main model with
6
blocks,8
heads,512
-dim embeddings and0.1
has been trained for10
epochs on the maximum sequence length of the training data.- The model can be downloaded from here
- Some more models saved during the hyperparameter tuning process can be downloaded from the following links:
6
blocks,8
heads,512
-dim embeddings and0.1
dropout rate trained for10
epochs with a sequence length of77
:6
blocks,8
heads,640
-dim embeddings and0.2
dropout rate trained for10
epochs with a sequence length of77
:8
blocks,10
heads,640
-dim embeddings and0.1
dropout rate trained for10
epochs with a sequence length of77
: