Language_Identifier : Getting Started

A C library that can determine the language of text

This library was my solution to a data clustering contest Telegram Data Clustering Contest 2021 - check it out.

Running Locally

The library was tested on servers running Debian GNU/Linux 10 (buster), x86-64 with 8 cores and 16 GB RAM and will work correctly on any clean system. Use the following commands to build the library:

$ mkdir build
$ cd build
$ cmake -DCMAKE_BUILD_TYPE=Release ..
$ cmake --build .

You can test the resulting library file libtgcat.so on the test data using the test script libtgcat-tester.tar.gz. To do this, copy libtgcat.so into the directory containing the test script, then build with cmake in the standard way:

$ mkdir build
$ cd build
$ cmake -DCMAKE_BUILD_TYPE=Release ..
$ cmake --build .

To test the library output, launch the resulting binary file tgcat-tester with the following parameters:

$ tgcat-tester language <input_file> <output_file>

where: <input_file> – path to file containing input data, <output_file> – path to file containing output data.

Output data is presented as a text file where each line represents processed channel data in JSON format:

 {
   "lang_code": "en"
 }

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
CMakeLists.txt		CMakeLists.txt
README.md		README.md
catdt1.c		catdt1.c
catdt1.h		catdt1.h
category.c		category.c
category.h		category.h
catlang.c		catlang.c
catlang.h		catlang.h
data.c		data.c
data.h		data.h
dict.c		dict.c
dict.h		dict.h
input.txt		input.txt
langdt.c		langdt.c
langdt.h		langdt.h
langdt2.c		langdt2.c
langdt2.h		langdt2.h
langdt3.c		langdt3.c
langdt3.h		langdt3.h
language.c		language.c
language.h		language.h
language2.c		language2.c
language2.h		language2.h
main.c		main.c
tgcat.c		tgcat.c
tgcat.h		tgcat.h
utf8.c		utf8.c
utf8.h		utf8.h
utils.c		utils.c
utils.h		utils.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Language_Identifier : Getting Started

Running Locally

About

Releases

Packages

Languages

brianzhou139/Language_Identifier

Folders and files

Latest commit

History

Repository files navigation

Language_Identifier : Getting Started

Running Locally

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages