View our write-up of the project here: A Case Study on Weakly Supervised Learning.
Project was created for the Full Stack Deep Learning 2021 course. This project was chosen as one of the top projects from the course and presented at the project showcase.
- Create a text data labeling service where the user inputs text data and receives a labeled dataset.
- Experiment with weak supervised learning and compare different approaches.
- 01. Baseline with BERT on DBPedia-14 (GitHub link) - Colab version
- 02. Distilling with Zero-Shot Classification on DBPedia-14 (GitHub link)
- 03. Data Labeling DBPedia-14 with Snorkel (GitHub link) - Colab version
- 04. Multi-Label Classification on Toxic Comments Dataset (GitHub link)
- 05. Toxicity Dataset Classification and Data Labeling with Snorkel (GitHub link) - Colab version
- 06. Model Deployment in Azure Machine Learning Studio (GitHub link)
For using only the Snorkel approach to weak supervision, use the following notebooks in this order: 01, 03, 05, 06.
For using only the model distillation approach to weak supervision, use the following notebooks int this order: 02, 04.
For more information on how to deploy a Streamlit App of this project, please go to our webapp directory.
.
|-- ./pyproject.toml
|-- ./requirements
| |-- ./requirements/dev.in
| |-- ./requirements/dev.txt
| |-- ./requirements/prod.in
| `-- ./requirements/prod.txt
|-- ./setup.cfg
|-- ./project_proposal.md
|-- ./tasks
| `-- ./tasks/lint.sh
|-- ./Dockerfile
|-- ./distill_classifier.py
|-- ./service.py
|-- ./test_request.json
|-- ./train_baseline_dbpedia_model.py
|-- ./tree-md
|-- ./text_classifier
| |-- ./text_classifier/__init__.py
| |-- ./text_classifier/models
| | `-- ./text_classifier/models/__init__.py
| |-- ./text_classifier/lit_models
| | `-- ./text_classifier/lit_models/__init__.py
| `-- ./text_classifier/notebooks
| |-- ./text_classifier/notebooks/01_dbpedia_14_bert_classification_exploration.ipynb
| |-- ./text_classifier/notebooks/04_transformers-multi-label-classification-toxicity.ipynb
| |-- ./text_classifier/notebooks/03_dbpedia_14_snorkel_dataset_labeling.ipynb
| |-- ./text_classifier/notebooks/05_toxicity_classification_snorkel_dataset.ipynb
| |-- ./text_classifier/notebooks/02_dbmedia_14_distilling_with_zero_shot_classification.ipynb
| `-- ./text_classifier/notebooks/06_AMLS_model_deployment.ipynb
|-- ./data
| |-- ./data/toxic_comments
| | |-- ./data/toxic_comments/test.csv
| | |-- ./data/toxic_comments/toxic_dev_200_examples.csv
| | |-- ./data/toxic_comments/toxic_test_630_examples.csv
| | |-- ./data/toxic_comments/toxic_train_2100_examples.csv
| | |-- ./data/toxic_comments/toxic_val_70_examples.csv
| | |-- ./data/toxic_comments/train.csv
| | |-- ./data/toxic_comments/toxicity_snorkel_dataset_3014ex.csv
| | `-- ./data/toxic_comments/toxicity_test_675ex.csv
| `-- ./data/readme.md
|-- ./README.md
`-- ./webapp
|-- ./webapp/Dockerfile
|-- ./webapp/app.py
|-- ./webapp/backend.py
|-- ./webapp/demo_config.json
|-- ./webapp/requirements.txt
|-- ./webapp/run_webapp.sh
|-- ./webapp/utils.py
`-- ./webapp/README.md%
Find our project proposal here.