From e10b3d663890553ae320572e3cff34674a958f50 Mon Sep 17 00:00:00 2001 From: Cindy Qi Li Date: Wed, 10 Apr 2024 13:51:34 -0400 Subject: [PATCH] fix: address review comments --- LICENSE | 2 +- docs/Llama2FineTuning.md | 46 +++++++++---------- jobs/Llama2/finetune/eval_7b_hf.py | 8 ++++ .../finetune/eval_generated_sentence.py | 8 ++++ jobs/Llama2/finetune/finetune_7b_hf.py | 8 ++++ jobs/Llama2/finetune/job_eval_7b_hf.sh | 9 ++++ .../finetune/job_eval_generated_sentence.sh | 9 ++++ jobs/Llama2/finetune/job_finetune_7b_hf.sh | 9 ++++ .../original_use/job_original_use_7b_hf.sh | 9 ++++ .../Llama2/original_use/original_use_7b_hf.py | 8 ++++ .../ctb-styleGAN2AdaPytorchGenerateBatch.sh | 4 +- .../def-styleGAN2AdaPytorchGenerateBatch.sh | 4 +- .../def-styleGAN2AdaPytorchTrainBatch.sh | 3 +- .../def-styleGan2AdaPytorchDataSetupBatch.sh | 8 +--- utils/scale_down_images.py | 2 +- 15 files changed, 97 insertions(+), 40 deletions(-) diff --git a/LICENSE b/LICENSE index a1a39d5..3e33e83 100644 --- a/LICENSE +++ b/LICENSE @@ -1,6 +1,6 @@ BSD 3-Clause License -Copyright (c) 2023, Inclusive Design Institute +Copyright (c) 2023-2024, Inclusive Design Institute Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: diff --git a/docs/Llama2FineTuning.md b/docs/Llama2FineTuning.md index b1bd6d0..c6bc589 100644 --- a/docs/Llama2FineTuning.md +++ b/docs/Llama2FineTuning.md @@ -27,7 +27,8 @@ to request the access to its Llama2 model; mkdir llama2 cd llama2 -// Load git-lfs first for downloading via Git large file storage +# Load git-lfs first for downloading via Git large file storage +module load StdEnv/2020 module load git-lfs/3.3.0 git lfs install @@ -43,7 +44,7 @@ for setting up the Llama2 models into a new file named `requirements-llama2.txt` ## Use the Original Llama2 Model -In the [`jobs/original_use`](../jobs/Llama2/original_use) directory, there are two scripts: +In the [`jobs/Llama2/original_use`](../jobs/Llama2/original_use) directory, there are two scripts: * original_use_7b_hf.py: The script that loads the downloaded model and tokenizer to perform text generation, word predictions and making inferences @@ -52,8 +53,8 @@ word predictions and making inferences Note that the job script must be copied to the user's `scratch` directory and is submitted from there using the `sbatch` command. -FTP scripts above to the cedar cluster in the users `llama2/original_use` directory. Run the following command to -submit the job. +Use FTP to transfer the above scripts to the cedar cluster in the users `llama2/original_use` directory. Run +the following command to submit the job. ``` cp llama2/original_use/job_original_use_7b_hf.sh scratch/. @@ -65,14 +66,14 @@ The result is written to the `llama2/original_use/result.txt`. ## Fine-tune the Llama2 Model -In the [`jobs/finetune`](../jobs/Llama2/finetune) directory, there are these scripts: +In the [`jobs/Llama2/finetune`](../jobs/Llama2/finetune) directory, there are these scripts: * bliss.json: The dataset that converts English text to the structure in the Conceptual Bliss * finetune_7b_hf.py: The script that fine-tunes the downloaded model * job_finetune_7b_hf.sh: The job script submitted to Cedar to run `finetune_7b_hf.py` -FTP scripts above to the cedar cluster in the users `llama2/finetune` directory. Run the following command to -submit the job. +Use FTP to transfer the above scripts to the cedar cluster in the users `llama2/finetune` directory. Run +the following command to submit the job. ``` cp llama2/finetune/job_finetune_7b_hf.sh scratch/. @@ -80,13 +81,12 @@ cd scratch sbatch job_finetune_7b_hf.sh ``` -The fine-tuning script does: +The fine-tuning script: -1. Create an instruction dataset using `bliss.json`. This dataset contains bi-directional conversion between +1. Creates an instruction dataset using `bliss.json`. This dataset contains bi-directional conversion between English and Conceptual Bliss. -2. Use the dataset to fine-tune the Llama2 model. See `finetune_7b_hf.py` about the fine-tuning parameters. -3. Evaluate the fine-tuned model by giving instructions fine-tuned for, along with a few sentences for language -conversion. +2. Uses the dataset to fine-tune the Llama2 model. See `finetune_7b_hf.py` about the fine-tuning parameters. +3. Evaluates the fine-tuned model by testing a few sentence conversions between the English and the Bliss languages. Please note that due to the relatively small size of the dataset derived from bliss.json, the fine-tuning script was run four times, adjusting the epoch number in the script from 1 to 4. As a result, 4 models were generated @@ -96,7 +96,7 @@ corresponding to the different epoch counts. This section describes how to evaluate a fine-tuned model with instructions and input sentences. -In the [`jobs/finetune`](../jobs/Llama2/finetune) directory, there are these scripts: +In the [`jobs/Llama2/finetune`](../jobs/Llama2/finetune) directory, there are these scripts: * eval_7b_hf.py: The script that fine-tunes the downloaded model. Common variables to adjust: * `model_dir`: The location of the model directory @@ -104,8 +104,8 @@ In the [`jobs/finetune`](../jobs/Llama2/finetune) directory, there are these scr * `input`: At the bottom of the script, define the sentence to be converted * job_eval_7b_hf.sh: The job script submitted to Cedar to run `eval_7b_hf.py` -FTP scripts above to the cedar cluster in the users `llama2/finetune` directory. Run the following command to -submit the job. +Use FTP to transfer the above scripts to the cedar cluster in the users `llama2/finetune` directory. Run +the following command to submit the job. ``` cp llama2/finetune/job_eval_7b_hf.sh scratch/. @@ -115,14 +115,14 @@ sbatch job_eval_7b_hf.sh ## Evaluate the Generated Sentences from the Fine-tuned Model -This section describes how to evaluat the generated sentences and compare them with original or expected sentences. +This section describes how to evaluate the generated sentences and compare them with original or expected sentences. It evaluates the generated sentence in these aspects: * Semantic Coherence * Novelty and Creativity * Fluency and Readability -In the [`jobs/finetune`](../jobs/Llama2/finetune) directory, there are these scripts: +In the [`jobs/Llama2/finetune`](../jobs/Llama2/finetune) directory, there are these scripts: * eval_generated_sentence.py: The script that fine-tunes the downloaded model. Common variables to adjust: * `sentence_orig`: The original sentence @@ -130,8 +130,8 @@ In the [`jobs/finetune`](../jobs/Llama2/finetune) directory, there are these scr * `sentence_generated`: The sentence generated by the fine-tuned model * job_eval_generated_sentence.sh: The job script submitted to Cedar to run `eval_generated_sentence.py` -FTP scripts above to the cedar cluster in the users `llama2/finetune` directory. Run the following command to -submit the job. +Use FTP to transfer the above scripts to the cedar cluster in the users `llama2/finetune` directory. Run +the following command to submit the job. ``` cp llama2/finetune/job_eval_generated_sentence.sh scratch/. @@ -146,7 +146,7 @@ and Conceptual Bliss sentence structure, especially with the two-epochs and thre ## References -[Llama2 in the Facebook Research Github repository](https://github.com/facebookresearch/llama) -[Llama2 fine-tune, inference examples](https://github.com/facebookresearch/llama-recipes) -[Llama2 on Hugging Face](https://huggingface.co/docs/transformers/model_doc/llama2) -[Use Hugging Face Models on Cedar Clusters](https://docs.alliancecan.ca/wiki/Huggingface) +* [Llama2 in the Facebook Research Github repository](https://github.com/facebookresearch/llama) +* [Llama2 fine-tune, inference examples](https://github.com/facebookresearch/llama-recipes) +* [Llama2 on Hugging Face](https://huggingface.co/docs/transformers/model_doc/llama2) +* [Use Hugging Face Models on Cedar Clusters](https://docs.alliancecan.ca/wiki/Huggingface) diff --git a/jobs/Llama2/finetune/eval_7b_hf.py b/jobs/Llama2/finetune/eval_7b_hf.py index 64ecd88..d9f68c2 100644 --- a/jobs/Llama2/finetune/eval_7b_hf.py +++ b/jobs/Llama2/finetune/eval_7b_hf.py @@ -1,3 +1,11 @@ +# Copyright (c) 2023-2024, Inclusive Design Institute +# +# Licensed under the BSD 3-Clause License. You may not use this file except +# in compliance with this License. +# +# You may obtain a copy of the BSD 3-Clause License at +# https://github.com/inclusive-design/baby-bliss-bot/blob/main/LICENSE + import torch from peft import AutoPeftModelForCausalLM from transformers import AutoTokenizer diff --git a/jobs/Llama2/finetune/eval_generated_sentence.py b/jobs/Llama2/finetune/eval_generated_sentence.py index b565dff..81bcada 100644 --- a/jobs/Llama2/finetune/eval_generated_sentence.py +++ b/jobs/Llama2/finetune/eval_generated_sentence.py @@ -1,3 +1,11 @@ +# Copyright (c) 2023-2024, Inclusive Design Institute +# +# Licensed under the BSD 3-Clause License. You may not use this file except +# in compliance with this License. +# +# You may obtain a copy of the BSD 3-Clause License at +# https://github.com/inclusive-design/baby-bliss-bot/blob/main/LICENSE + import spacy from sentence_transformers import SentenceTransformer, util from sklearn.feature_extraction.text import TfidfVectorizer diff --git a/jobs/Llama2/finetune/finetune_7b_hf.py b/jobs/Llama2/finetune/finetune_7b_hf.py index 620c9a3..36b1ba5 100644 --- a/jobs/Llama2/finetune/finetune_7b_hf.py +++ b/jobs/Llama2/finetune/finetune_7b_hf.py @@ -1,3 +1,11 @@ +# Copyright (c) 2023-2024, Inclusive Design Institute +# +# Licensed under the BSD 3-Clause License. You may not use this file except +# in compliance with this License. +# +# You may obtain a copy of the BSD 3-Clause License at +# https://github.com/inclusive-design/baby-bliss-bot/blob/main/LICENSE + import torch from datasets import load_dataset, concatenate_datasets from transformers import ( diff --git a/jobs/Llama2/finetune/job_eval_7b_hf.sh b/jobs/Llama2/finetune/job_eval_7b_hf.sh index 069fb67..b3fdfd1 100644 --- a/jobs/Llama2/finetune/job_eval_7b_hf.sh +++ b/jobs/Llama2/finetune/job_eval_7b_hf.sh @@ -1,4 +1,13 @@ #!/bin/bash + +# Copyright (c) 2023-2024, Inclusive Design Institute +# +# Licensed under the BSD 3-Clause License. You may not use this file except +# in compliance with this License. +# +# You may obtain a copy of the BSD 3-Clause License at +# https://github.com/inclusive-design/baby-bliss-bot/blob/main/LICENSE + #SBATCH --job-name=llama2-finetune-7b-hf #SBATCH --time 2-00:00 #SBATCH --nodes=1 diff --git a/jobs/Llama2/finetune/job_eval_generated_sentence.sh b/jobs/Llama2/finetune/job_eval_generated_sentence.sh index 3313090..5a7c078 100644 --- a/jobs/Llama2/finetune/job_eval_generated_sentence.sh +++ b/jobs/Llama2/finetune/job_eval_generated_sentence.sh @@ -1,4 +1,13 @@ #!/bin/bash + +# Copyright (c) 2023-2024, Inclusive Design Institute +# +# Licensed under the BSD 3-Clause License. You may not use this file except +# in compliance with this License. +# +# You may obtain a copy of the BSD 3-Clause License at +# https://github.com/inclusive-design/baby-bliss-bot/blob/main/LICENSE + #SBATCH --job-name=llama2-finetune-7b-hf #SBATCH --time 2-00:00 #SBATCH --nodes=1 diff --git a/jobs/Llama2/finetune/job_finetune_7b_hf.sh b/jobs/Llama2/finetune/job_finetune_7b_hf.sh index 549f796..5a889a0 100644 --- a/jobs/Llama2/finetune/job_finetune_7b_hf.sh +++ b/jobs/Llama2/finetune/job_finetune_7b_hf.sh @@ -1,4 +1,13 @@ #!/bin/bash + +# Copyright (c) 2023-2024, Inclusive Design Institute +# +# Licensed under the BSD 3-Clause License. You may not use this file except +# in compliance with this License. +# +# You may obtain a copy of the BSD 3-Clause License at +# https://github.com/inclusive-design/baby-bliss-bot/blob/main/LICENSE + #SBATCH --job-name=llama2-finetune-7b-hf #SBATCH --time 2-00:00 #SBATCH --nodes=1 diff --git a/jobs/Llama2/original_use/job_original_use_7b_hf.sh b/jobs/Llama2/original_use/job_original_use_7b_hf.sh index 0c459bf..3f40041 100644 --- a/jobs/Llama2/original_use/job_original_use_7b_hf.sh +++ b/jobs/Llama2/original_use/job_original_use_7b_hf.sh @@ -1,4 +1,13 @@ #!/bin/bash + +# Copyright (c) 2023-2024, Inclusive Design Institute +# +# Licensed under the BSD 3-Clause License. You may not use this file except +# in compliance with this License. +# +# You may obtain a copy of the BSD 3-Clause License at +# https://github.com/inclusive-design/baby-bliss-bot/blob/main/LICENSE + #SBATCH --job-name=llama2-orig-use-7b-hf #SBATCH --time 10-00:00 #SBATCH --nodes=1 diff --git a/jobs/Llama2/original_use/original_use_7b_hf.py b/jobs/Llama2/original_use/original_use_7b_hf.py index 0ad960d..7b11b9e 100644 --- a/jobs/Llama2/original_use/original_use_7b_hf.py +++ b/jobs/Llama2/original_use/original_use_7b_hf.py @@ -1,3 +1,11 @@ +# Copyright (c) 2023-2024, Inclusive Design Institute +# +# Licensed under the BSD 3-Clause License. You may not use this file except +# in compliance with this License. +# +# You may obtain a copy of the BSD 3-Clause License at +# https://github.com/inclusive-design/baby-bliss-bot/blob/main/LICENSE + from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline model_dir = "/home/cindyli/llama2/Llama-2-7b-hf" diff --git a/jobs/stylegan2-ada/ctb-styleGAN2AdaPytorchGenerateBatch.sh b/jobs/stylegan2-ada/ctb-styleGAN2AdaPytorchGenerateBatch.sh index 8238982..072335e 100755 --- a/jobs/stylegan2-ada/ctb-styleGAN2AdaPytorchGenerateBatch.sh +++ b/jobs/stylegan2-ada/ctb-styleGAN2AdaPytorchGenerateBatch.sh @@ -1,6 +1,6 @@ #!/bin/bash -# Copyright (c) 2023, Inclusive Design Institute +# Copyright (c) 2023-2024, Inclusive Design Institute # # Licensed under the BSD 3-Clause License. You may not use this file except # in compliance with this License. @@ -39,5 +39,3 @@ mkdir -p "$OUTPUT_DIR" # Generate... python ~/BlissStyleGAN/StyleGAN2/stylegan2-ada-pytorch/generate.py --outdir="$OUTPUT_DIR" --trunc=0.5 --seeds=200,330,400 --network="$MODEL_FILE" - - diff --git a/jobs/stylegan2-ada/def-styleGAN2AdaPytorchGenerateBatch.sh b/jobs/stylegan2-ada/def-styleGAN2AdaPytorchGenerateBatch.sh index d3ea406..ac2618b 100755 --- a/jobs/stylegan2-ada/def-styleGAN2AdaPytorchGenerateBatch.sh +++ b/jobs/stylegan2-ada/def-styleGAN2AdaPytorchGenerateBatch.sh @@ -1,6 +1,6 @@ #!/bin/bash -# Copyright (c) 2023, Inclusive Design Institute +# Copyright (c) 2023-2024, Inclusive Design Institute # # Licensed under the BSD 3-Clause License. You may not use this file except # in compliance with this License. @@ -41,5 +41,3 @@ mkdir -p "$OUTPUT_DIR" # # This third command is resuming for another 12 hours, using latest model python ~/BlissStyleGAN/StyleGAN2/stylegan2-ada-pytorch/generate.py --outdir="$OUTPUT_DIR" --trunc=0.5 --seeds=600-605 --network="$MODEL_FILE" - - diff --git a/jobs/stylegan2-ada/def-styleGAN2AdaPytorchTrainBatch.sh b/jobs/stylegan2-ada/def-styleGAN2AdaPytorchTrainBatch.sh index 51f952e..c40907f 100755 --- a/jobs/stylegan2-ada/def-styleGAN2AdaPytorchTrainBatch.sh +++ b/jobs/stylegan2-ada/def-styleGAN2AdaPytorchTrainBatch.sh @@ -1,6 +1,6 @@ #!/bin/bash -# Copyright (c) 2023, Inclusive Design Institute +# Copyright (c) 2023-2024, Inclusive Design Institute # # Licensed under the BSD 3-Clause License. You may not use this file except # in compliance with this License. @@ -58,4 +58,3 @@ python ~/BlissStyleGAN/StyleGAN2/stylegan2-ada-pytorch/train.py --outdir="$OUTPU # This third command is resuming for another 15 hours, using latest model. # Again, the actual values here may differ for different groups of runs. # python ~/BlissStyleGAN/StyleGAN2/stylegan2-ada-pytorch/train.py --outdir="$OUTPUT_DIR" --data="$DATA_DIR" --snap=10 --resume="$OUTPUT_DIR/00001-preppedBlissSingleCharGrey-auto1-resumecustom/network-snapshot-000440.pkl" - diff --git a/jobs/stylegan2-ada/def-styleGan2AdaPytorchDataSetupBatch.sh b/jobs/stylegan2-ada/def-styleGan2AdaPytorchDataSetupBatch.sh index 4e2a221..515009f 100755 --- a/jobs/stylegan2-ada/def-styleGan2AdaPytorchDataSetupBatch.sh +++ b/jobs/stylegan2-ada/def-styleGan2AdaPytorchDataSetupBatch.sh @@ -1,6 +1,6 @@ #!/bin/bash -# Copyright (c) 2023, Inclusive Design Institute +# Copyright (c) 2023-2024, Inclusive Design Institute # # Licensed under the BSD 3-Clause License. You may not use this file except # in compliance with this License. @@ -59,9 +59,3 @@ else echo "dataset_tool.py failed with exit status $STATUS" fi echo Done! - - - - - - diff --git a/utils/scale_down_images.py b/utils/scale_down_images.py index ca53e0b..7fd49a0 100644 --- a/utils/scale_down_images.py +++ b/utils/scale_down_images.py @@ -3,7 +3,7 @@ from PIL import Image """ -Copyright (c) 2023, Inclusive Design Institute +Copyright (c) 2023-2024, Inclusive Design Institute Licensed under the BSD 3-Clause License. You may not use this file except in compliance with this License.