language-identification-survey

Live survey of off-the-shelf language identification tools for python

Reproducing benchmark

1. Download the dataset

./datasets/tatoeba-sentences-2021-06-05/download

2. Run the language inference for benchmarks

Available benchmarks:

fasttext
fasttext-compressed
gcld3
langdetect
langid
pycld2

Available datasets:

tatoeba-sentences-2021-06-05
tatoeba-sentences-2021-06-05-common-48
open-subtitles-v2018-100k-per-lang

On the host machine.

python run.py <benchmark_name>

In docker:

docker build -t bench .
docker run -v `pwd`:/src -t -i bench python /src/run.py <benchmark_name>

3. Run analysis

python analyze.py --correctness
python analyze.py --timings

4. Get memory usage for different models

python get_memory_usage.py <benchmark_names>
# e.g. python get_memory_usage.py fasttext
# e.g. python get_memory_usage.py fasttext-compressed

It will print memory usage in MB (bytes/1024/1024).

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
datasets		datasets
models		models
results		results
templates		templates
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
analyze.py		analyze.py
analyze_dataset.py		analyze_dataset.py
benchmarks.py		benchmarks.py
datasets.py		datasets.py
get_memory_usage.py		get_memory_usage.py
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

language-identification-survey

Reproducing benchmark

1. Download the dataset

2. Run the language inference for benchmarks

3. Run analysis

4. Get memory usage for different models

About

Releases

Packages

Languages

License

vigneshmj1997/language-identification-survey

Folders and files

Latest commit

History

Repository files navigation

language-identification-survey

Reproducing benchmark

1. Download the dataset

2. Run the language inference for benchmarks

3. Run analysis

4. Get memory usage for different models

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages