This is a project we developed for the course "Efficient Methods in Machine Learning" at University Hamburg. In this project, we trained a small language model from scratch in our local machine. We experiment the training with different data, and evaluate and compare the results with BLUE, BertScore, GLEU and Perplexity.
Our model is nanoGPT. We experimented with different position embeddings (ROPE, Relative Positional Embedding, Absolute Positional Embeddings).
Please check our model readme for the code and detailed information.
We use the Empathetic Dialogues (Facebook AI) 25k dataset and agumented it with data generated by ChatGPT-4o-mini.
Please check our dataset readme for the code and detailed information.
Data | Description | Trained Model |
---|---|---|
59k_eachconv_eot | Under no_additional_tag folder. Facebook dataset with endOfText inserted after every 2 sentences. |
3 modified models single_conversation |
59k_wholeconv_eot | Under no_additional_tag folder. Facebook dataset with endOfText inserted at the end of the whole conversation. |
whole_conversation |
59k_eachconv_eot_with_context | Under context_tag folder.Facebook dataset with endOfText After every 2 sentences, including context. |
single_conversation_withcontext |
59k_eachconv_eot_with_emotion | Under emotion_file folder.>Facebook dataset with endOfText After every 2 sentences, including emotion. |
single_conversation_withemotion |
with_gpt_data | Under with_gpt_data folder. Based on the question in 59k_eachconv_eot, we generated the answer from ChatGPT 4omini, therefore we have 118k pairs of conversation |
single_conversation_withGPTdata_bs256, single_conversation_withGPTdata |
The IDE we use during development is mainly VSCode.
python -m venv env
source env/bin/activate
pip install -r requirements.txt
export PYTHONPATH=/Users/Project-ML/src (copy absolute path to the src
folder in your local machine, if you have error like ` No module named 'nanoGPT'`, repeat this step)
The trained model can be qurried here in the hugging face space
You can also access it locally by first getting a hugging face token and running:
export HF_TOKEN="HF_XXXXXXXXXXXXX"
cd src/app
gradio App.py
Check the section in our model readme.
Check the section in our model evaluation.