Skip to content

The repository contains parallel language corpus links for popular Indian languages.

License

Notifications You must be signed in to change notification settings

aravinth/parallel-corpus

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Parallel-corpus

The repository contains parallel language corpus links for popular Indian languages.

Link for the parallel-coprus

Contains Parallel corpus from following sources

  • IIT Bombay v_2.1- Original, 1.5 Million sentences.Using these groups in the corpus -chats, Movie Dialogs, general,Hi-Eng Word-Linkage,Admin Dictionary, Admin Examples,Admin Definitions, ted talks, Indic Multi-Parallel, JudicialI and II, Govt Websites I and II, Book Translations, Wikipedia, Book translation.

Citation : Anoop Kunchukuttan, Pratik Mehta, Pushpak Bhattacharyya. The IIT Bombay English-Hindi Parallel Corpus. Language Resources and Evaluation Conference. 2018. http://www.cfilt.iitb.ac.in/iitb_parallel/

  • Augmented data
  • Law Commission of India

Prepared from the documents of Law Commision of India using OCR.

  • Indian Judiciary

Contains data scraped from indian judiciary data sources and translated using google.

  • Names dictionary

Contains names of person, geographical location etc.

  • Contains data scraped from indian judiciary data sources and translated using google.
  • Contains data scraped from indian judiciary data sources and translated using google.
  • Contains data scraped from indian judiciary data sources and translated using google.
  • Contains data scraped from indian judiciary data sources and translated using google.
  • Contains data scraped from indian judiciary data sources and translated using google.
  • Contains data scraped from indian judiciary data sources and translated using google.
  • Contains data scraped from indian judiciary data sources and translated using google.
  • Contains data scraped from indian judiciary data sources and translated using google.

About

The repository contains parallel language corpus links for popular Indian languages.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published