A filter using for recognizing swear words and their various variations. When a word is found, it replaces all word with *.
The algorithm is divided into several stages:
- Checking text by regular expressions and difflib library to find obvious swear words and replace them.
Not obvious words go next. - If word from current list of swear words is in word2vec model, it is checked by word2vec similarity function.
- If word from current list of swear words is NOT in word2vec model, the word is checked by the Levenshtein distance and token_sort_ratio function from FuzzyWuzzy library.
📁 Word2Vec model was trained on old datasets of negative and positive reviews of movies, series, anime from Kaggle.
MVP_filter_swear_words.mp4
- Python
- Gensim
- Word2Vec NLP model
- FuzzyWuzzy and different metrics
- Difflib
- NumPy
- Pandas
- RegExp
- Flask
- Docker
Run command that will create docker build:
docker build -t <your build name> .
Then you can start created docker and open it in browser at http://localhost:5000/
docker run -p 5000:5000 <your build name>