Some NLP experiments starting with a tokenization attempt in Python. The code tokenite.py reads a text file "blog1.txt" and tries to tokenize it. The code doesnot work as is, but is almost on the verge of working. Any suggestions will be greatly appreciated.
I define a class called text and define methods inside it. The method count defines a generator which I use in the method named t_tok. But if you look closely at 66 to 72 you will see that I am modifying the outer limit of the for loop while in the loop. It doesnot work. But I dont see the reason why.