This repository contains notebooks for fine-tuning the mBart 50 model for translating English subtitles into Persian. The project leverages the capabilities of the Hugging Face Transformers library to create an effective translation tool.
The project consists of two main functionalities:
-
Fine-Tuning mBart 50 for English-Persian Subtitle Translation:
- This notebook demonstrates the process of fine-tuning the pre-trained mBart 50 model on a dataset of English-Persian subtitle pairs. It covers data loading, tokenization, model training, evaluation, and inference.
- The fine-tuned model has been made available on Hugging Face for open-source access and further development by the community.
-
Leveraging the Fine-Tuned Model for Translation:
- This notebook showcases the practical application of the fine-tuned model. It features two interactive functionalities:
- Subtitle Translation: Users can upload English SRT subtitle files to receive translated Persian SRT subtitles.
- Sentence Translation: Users can input arbitrary English sentences to compare translation results before and after fine-tuning.
- This notebook showcases the practical application of the fine-tuned model. It features two interactive functionalities:
Here are a few examples of translations before and after the fine-tuning process:
English Sentence | Translation Before Fine-Tuning | Translation After Fine-Tuning |
---|---|---|
"Hello you guys, what's up?" | "سلام بچه ها ، چیه ؟" | "سلام بچه ها، چه خبر؟" |
"Toto, I've a feeling we're not in Kansas anymore." | "توتو، من اØساسی دارم Ú©Ù‡ دیگر در کانزاس نیستیم." | "توتو، Øس می‌کنم Ú©Ù‡ دیگر در کانزاس نیستیم." |
"m gonna make him an offer he can't refuse" | "من به او پیشنهادی می دهم که او نمی تواند رد کند" | "من به اون پیشنهادی میدم که اون نمیتونه رد کنه" |
To run the notebooks, you will need the following dependencies:
- Python 3.x
datasets
srt
transformers
gradio
evaluate
(for evaluation metrics)
You can install these dependencies using pip:
pip install datasets srt transformers gradio evaluate
- Open the notebook
Fine-Tuning mBart for English to Persian Subtitle Translation.ipynb
. - Follow the instructions to load the dataset, preprocess the data, fine-tune the model, and evaluate its performance.
- Open the notebook
Leveraging Fine-Tuned mBart 50 for English-Persian Subtitle Translation.ipynb
. - Use the subtitle translation feature to upload an English SRT file and download the translated Persian subtitles.
- Use the sentence translation feature to input English sentences and compare translations before and after fine-tuning.
Contributions to improve the model and enhance its functionalities are welcome! Please feel free to open issues or submit pull requests.
- Hugging Face for providing the Transformers library and model hub.
- The community for their contributions and support in developing machine translation models.