hack
to support my short text classification thesis. This code serves as a quick demo and is NOT maintained!
The goal of this code demo is to predict gender of social media users based on comments section on Instagram profile by using AdaBoost, XGBoost, Support Vector Machine, and Naive Bayes Classifier combined with a grid search and K- Fold validation.
How many are males vs females?
We collect comments
against followers' Instagram picture media posts
and format them as bag-of-words along with other pre-processing.
To label the data that was used to train the model, we use Azure FaceAPI to filter pictures where there's only one person and is able to detect their gender.
The code demo consists of several parts:
- The frontend built with
socketio, flask, html & javascript
, - and 4 different implementations of classifier algorithm: xgboost, support vector machine, naive bayes, and adaboost.
Implementing AdaBoost using sklearn
library.
Main entrypoint for the Flask application (this project).
Data dump(s) or saved pickle files (cache, models, etc.).
Screenshots.
Implementing naive bayes algorithm using nltk
library.
Implementing Support Vector Machine algorithm using sklearn
library.
Third-party related library supporting this project.
Implementing eXtreme gradient boosting algorithm using xgboost
library.
The config is simply loaded by app.py via decouple package, keeping things simple.
Run cp .env.example .env
and fill the necessary variables.
docker run --env-file=.env -p 9000:9000 williamchanrico/follower-gender-classification:v0.1.0
Assuming you have virtualenv
and python3-pip
installed:
virtualenv venv && source venv/bin/activate
pip3 install -r requirements.txt
python -m nltk.downloader punkt
- Optionally, you may need
libgomp1
depending on your operating system (required by xgboost)
python3 app.py
$ python --version
Python 3.7.1