Skip to content

உளி வீரன் - வித்து - உத்தி

License

Notifications You must be signed in to change notification settings

Ezhil-Language-Foundation/uliveeran

Repository files navigation

Harvesting of unigram and bigram data from various corpus data. First we carry out with Project Madurai corpus for prose data only (skip cir/seer unparsed poetry and all other poetry). This data and any scripts are under public-domain.

Currently 4036616 total words in 'plain_text' folder which contains unigram data and bigram data at word level. One may use open-tamil library to: - discover the unigram word-frequency of this corpus - discover the bi-gram word-frequency of this corpus (since successive words occur in successive lines)

About

உளி வீரன் - வித்து - உத்தி

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages