A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB team.
-
Updated
Jan 7, 2025 - Python
A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB team.
Neural Fuzzy Repair (NFR) is a data augmentation pipeline, which integrates fuzzy matches (i.e. similar translations) into neural machine translation.
Scripts for machine translation corpora filtering/ 机器翻译平行语料过滤的脚本
Personal NMT Playground
repository for automatic files translation using Google Translate API and R Statistical Software
just trying to translate from Amharic to English
Using Sq2Sq LSTM based model alsg with attension
Translator developed and trained on a provided corpus using IBM model
Python script to split the text generated by 'wikipedia parallel title extractor' into separate text files (separate file for each language)
Parallel sentence quality filter based on text classification methods
Replication package for SO processing for bitext
Extend/Passing extra source tokens to seq2seq encoder (PyTorch)
Add a description, image, and links to the machine-translation-data-processing topic page so that developers can more easily learn about it.
To associate your repository with the machine-translation-data-processing topic, visit your repo's landing page and select "manage topics."