Skip to content

Cryball/Swear-words-filter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

✅ 🔞 Swear words filter

Flask Python NumPy Pandas Docker

A filter using for recognizing swear words and their various variations. When a word is found, it replaces all word with *.

The algorithm is divided into several stages:

  1. Checking text by regular expressions and difflib library to find obvious swear words and replace them.
    Not obvious words go next.
  2. If word from current list of swear words is in word2vec model, it is checked by word2vec similarity function.
  3. If word from current list of swear words is NOT in word2vec model, the word is checked by the Levenshtein distance and token_sort_ratio function from FuzzyWuzzy library.

📁 Word2Vec model was trained on old datasets of negative and positive reviews of movies, series, anime from Kaggle.


📺 Demo:

MVP_filter_swear_words.mp4

📜 Project stack:

Filter Part:

  • Python
  • Gensim
  • Word2Vec NLP model
  • FuzzyWuzzy and different metrics
  • Difflib
  • NumPy
  • Pandas
  • RegExp

Web Part:

  • Flask

Deploy:

  • Docker

🚀 Installation

⚠️ You need Docker to start project easily!

Run command that will create docker build:

docker build -t <your build name> . 

Then you can start created docker and open it in browser at http://localhost:5000/

docker run -p 5000:5000 <your build name>