Skip to content

daaaanish17/toxic_comments_classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

Toxic Comments Classification

A project developed in Python.

Information

Python libraries that are used in this project are:

  • Pandas
  • nltk(Natural Language Toolkit)
  • re(Regular Expression)

Using nltk (Natural Language Toolkit) library to perform these tasks:

  • Tokenization (Tokenization is a way of separating a piece of text into smaller units called tokens.)
  • Removing stopwords (eliminate words that are so commonly used that they carry very little useful information.)
  • Stemming (Stemming is the process of reducing a word to its word stem that affixes to suffixes and prefixes or to the roots of words known as a lemma)

Converting comment_text column into vectors by using CountVectorizer().

Then train our machine learning model(Multinomial Naive Bayes algorithm).