Deploying a Machine Learning Model on Heroku with FastAPI
Training a Random Forest classification model to predict the income category of a person based on other personal informations.
- Created unit tests to monitor the model performance on various slices of the data.
- Then, deployed the model using the FastAPI package and create API tests.
- Both the slice-validation and the API tests were incorporated into a CI/CD framework using GitHub Actions.
- UCI census datasets was used to experience updating the dataset and model in git and DVC.
The training data is the census data available at the UCI library. It is the adult.income data from the data folder.
Link: UCI Census Data
This data versioning is tracked through DVC
using AWS S3
bucket as remote storage.
A basic Random Forest classifier imported from scikit-learn library and fit onto the census data
Model parameters are(other than default):
{
"random_state": 8,
"max_depth": 16,
"n_estimators":128
}
Refer to the - model card
The model versioning was tracked using dvc
.
Also, the performance of the model was evaluated on a slice of data (code). The results are stored in the slice_output.txt
for slices done on education and race.
To run the model trainer, evaluation code (link):
python main.py
Continuous integration was incorporated in the project using the Github actions. The action was completed only if the pytest
and flake8
linter tests passed on the project without any error.
Unit tests were written for the model training and the inference API features. The tests are done using the pytest
library via command:
pytest test/ -vv
A FastAPI
framework was developed for the inference API using the input type-hints example from pydantic
library
The API main file is the inference_api.py
The API was deployed on Heroku
using the main branch of the current GitHub repository with Continuous Delivery
enabled.