A word tokenizer NLP tool for the Tamil language
Command-line utility to perform word tokenization on a given Tamil corpus text file.
python word_tokenizer.py <input_file>
python word_tokenizer.py -h
- No preprocessing needed
- Works on any OS which supports Python 3
- Handles input file of any size