TLDR: NLP models often have quite a specific set of inputs. With Transformers models this is no different. In this project, we create a small Wrapper class (huggingface_wrapper.py
) which will be used to package a Hugging Face model in mlflow.pyfunc
flavor.
In the inference notebook (huggingface_inference.py
), we have an example where we:
- Download the pretrained model and tokenizer from Hugging Face
- Save the model into MLflow Model Registry
- Promote the model to Production
- Load the model
- Get some predictions (concretely, we're having a chat with the GPT Bot 😄)
This repo contains a basic example of Hugging Face model for Language Modeling using Transformers - most specifically, microsoft/DialoGPT-medium
.
From their Hugging Face repo:
DialoGPT is a SOTA large-scale pretrained dialogue response generation model for multiturn conversations. The human evaluation results indicate that the response generated from DialoGPT is comparable to human response quality under a single-turn conversation Turing test. The model is trained on 147M multi-turn dialogue from Reddit discussion thread.
Multi-turn generation examples from an interactive environment:
Role | Response |
---|---|
User | Does money buy happiness? |
Bot | Depends how much money you spend on it . |
User | What is the best way to buy happiness ? |
Bot | You just have to be a millionaire by your early 20s, then you can be happy . |
User | This is so difficult ! |
Bot | You have no idea how hard it is to be a millionaire and happy . There is a reason the rich have a lot of money |
- Databricks + MLflow
- FastAPI
- NGINX
- ReactJS
- Clone this repo into your Databricks workspace.
- Make sure to use a cluster running an
LTS ML
runtime. - In the
databricks
folder, run thehf_ingest_nb.py
notebook. This will generate our dataset for further fine tuning ourDialoGPT
model. - Run the
hf_finetune_nb.py
notebook. This will finetune our model with the dataset generated in the previous step. - Run the
hf_register_and_inference_nb.py
notebook. Doing so will register our model into MLflow Model Registry, and generate some predictions. - Once the model is registered, use it to create a REST Realtime Endpoint (Model Serving V2).
- Be sure to have Docker installed.
- Clone this repo.
- Build the
backend
container image by runningmake backend
. - Build the
frontend
container image by runningmake frontend
. - Create a Databricks PAT Token on your workspace.
- Copy the
.env.example
file into.env
and fill in the parameters. Use the info from your workspace and the Model Serving V2 created in the previous section. - Run both containers by executing
make run
. - On your browser, go to http://127.0.0.1:8080.
See the issues section
- ReactJS app theme by Ritesh Sharma
- Fine-Tuning GPT-based Models for Conversational Chatbots