A project developed in Python.
Python libraries that are used in this project are:
- Pandas
- nltk(Natural Language Toolkit)
- re(Regular Expression)
Using nltk (Natural Language Toolkit) library to perform these tasks:
- Tokenization (Tokenization is a way of separating a piece of text into smaller units called tokens.)
- Removing stopwords (eliminate words that are so commonly used that they carry very little useful information.)
- Stemming (Stemming is the process of reducing a word to its word stem that affixes to suffixes and prefixes or to the roots of words known as a lemma)
Converting comment_text column into vectors by using CountVectorizer().
Then train our machine learning model(Multinomial Naive Bayes algorithm).