Tokenizer For Indian Languages Tokenizer output can either be raw or in Shakti Standard Format (https://aclanthology.org/W14-5208.pdf). Check the code for instructions