Skip to content

Latest commit

 

History

History
59 lines (43 loc) · 1.99 KB

README.md

File metadata and controls

59 lines (43 loc) · 1.99 KB

✅ 🔞 Swear words filter

Flask Python NumPy Pandas Docker

A filter using for recognizing swear words and their various variations. When a word is found, it replaces all word with *.

The algorithm is divided into several stages:

  1. Checking text by regular expressions and difflib library to find obvious swear words and replace them.
    Not obvious words go next.
  2. If word from current list of swear words is in word2vec model, it is checked by word2vec similarity function.
  3. If word from current list of swear words is NOT in word2vec model, the word is checked by the Levenshtein distance and token_sort_ratio function from FuzzyWuzzy library.

📁 Word2Vec model was trained on old datasets of negative and positive reviews of movies, series, anime from Kaggle.


📺 Demo:

MVP_filter_swear_words.mp4

📜 Project stack:

Filter Part:

  • Python
  • Gensim
  • Word2Vec NLP model
  • FuzzyWuzzy and different metrics
  • Difflib
  • NumPy
  • Pandas
  • RegExp

Web Part:

  • Flask

Deploy:

  • Docker

🚀 Installation

⚠️ You need Docker to start project easily!

Run command that will create docker build:

docker build -t <your build name> . 

Then you can start created docker and open it in browser at http://localhost:5000/

docker run -p 5000:5000 <your build name>