An English language grammar classification model trained on Bert. The self-supervised pretrained model is a Bert-base model trained on large corpus of text data. Here we explore the possibility of few shot learning by using the pretrained model to classify the text data. The model is trained on the cola public dataset. It's a dataset that contains a very few labelled sentences from the public domain in English.
Performance metrics
MCC Score | Accuracy | |
---|---|---|
Results | 0.5699 | 0.8299 |
- All the development work is done using
Python 3.7
- Install all the necessary dependencies using
requirements.txt
file. Runpip install -r requirements.txt
in terminal - Alternatively, set up the environment and train the model using the
Dockerfile
. Rundocker build -f Dockerfile -t <image_name> .
configs/config.py
: This file contains all the configurations for the model.src/dataset.py
: This file contains the utility functions for loading the dataset.src/engine.py
: This file contains the utilities for training and evaluation.src/train.py
: This file is used to train the model.deploy/flask_app.py
: This file is used to deploy the model.deploy/convert_onnx.py
: This file is used to convert the model to onnx format.saves
: This directory contains bert tokenizer configuration, special tokens etc.
- Run
python src/train.py
in terminal.