This project aims to use machine learning and natural language processing techniques to automatically generate high-quality Git commit messages based on code changes (git diff).
# clone project
git https://github.com/ZHCSJ666/ML-24-25
cd ML-24-25
# create conda environment and install dependencies
conda env create --name cmg -f environment.yaml
# activate conda environment
conda activate cmg
If for some reason, imports are not working well for you, you can install the project as a package.
# run command from project root directory
pip install -e . --config-settings editable_mode=compat
Training examples with flan-t5-small experiment configuration
# train with flan-t5 model on Commit Chronicle dataset
python src/train.py experiment=flan-t5-small logger=tensorboard
# (debug) overfit on subset of training data
python src/train.py experiment=flan-t5-small logger=tensorboard +trainer.overfit_batches=3 trainer.max_epochs=50
# (debug) fast dev run
python src/train.py experiment=flan-t5-small logger=tensorboard +trainer.fast_dev_run=True trainer=cpu
You can override any parameter from command line like this
python src/train.py trainer.max_epochs=20 data.batch_size=64