From 4b9fab0909b7cfc5310ef0188fdb944ef20f800a Mon Sep 17 00:00:00 2001 From: ZhengHongming888 Date: Tue, 24 Sep 2024 09:04:47 -0700 Subject: [PATCH 01/11] add readme for sentence transformer examples --- .../sentence-transformers-training/README.md | 23 +++++++++++++++++++ 1 file changed, 23 insertions(+) create mode 100644 examples/sentence-transformers-training/README.md diff --git a/examples/sentence-transformers-training/README.md b/examples/sentence-transformers-training/README.md new file mode 100644 index 000000000..1da0261ef --- /dev/null +++ b/examples/sentence-transformers-training/README.md @@ -0,0 +1,23 @@ +# Examples for Sentence Transformer + +We provide 3 examples to show how to use the sentence transformer with HPU devices. + +- **[training_stsbenchmark.py](https://github.com/huggingface/optimum-habana/tree/main/examples/sentence-transformers-training/sts)** - This example shows how to create a SentenceTransformer model from scratch by using a pre-trained transformer model (e.g. [`distilbert-base-uncased`](https://huggingface.co/distilbert/distilbert-base-uncased)) together with a pooling layer. + +- **[training_nli.py](https://github.com/huggingface/optimum-habana/tree/main/examples/sentence-transformers-training/nli)** - This example gives two sentences (premise and hypothesis) and the task of Natural Language Inference (NLI) is to decide if the premise entails the hypothesis, if they are contradiction, or if they are neutral. Commonly the NLI dataset in [SNLI](https://huggingface.co/datasets/stanfordnlp/snli) and [MultiNLI](https://huggingface.co/datasets/nyu-mll/multi_nli) are used. + +- **[training_paraphrases.py](https://github.com/huggingface/optimum-habana/tree/main/examples/sentence-transformers-training/paraphrases)** - This example loads various datasets from the sentence transformers. We construct batches by sampling examples from the respective dataset. + +### Tested Examples/Models and Configurations + +The following table contains examples supported and configurations we have validated on Gaudi2. + +| Examples | General | Mistral 7B | BF16 | Single Card | Multi-Cards | +|-----------------------------|-----------|------------|------|-------------|-------------| +| training_nli.py | ✔ | ✔ | ✔ | ✔ | ✔ | +| training_stsbenchmark.py | ✔ | ✔ | ✔ | ✔ | ✔ | +| training_paraphrases.py | ✔ | | | ✔ | | + +Notice: +1. in the table the column 'General' means the general models like small models +2. when Mistral 7b Model is enabled for the test single card will use the LoRA + gradient_checkpoint and multi-card will use the deepspeed zero2/zero3 stage to reduce the memory requirement. From 531aa14767b37673a00bcc1ca51281305fe5da6c Mon Sep 17 00:00:00 2001 From: ZhengHongming888 Date: Tue, 24 Sep 2024 09:10:04 -0700 Subject: [PATCH 02/11] minor for readme --- examples/sentence-transformers-training/README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/examples/sentence-transformers-training/README.md b/examples/sentence-transformers-training/README.md index 1da0261ef..75e654f11 100644 --- a/examples/sentence-transformers-training/README.md +++ b/examples/sentence-transformers-training/README.md @@ -21,3 +21,4 @@ The following table contains examples supported and configurations we have valid Notice: 1. in the table the column 'General' means the general models like small models 2. when Mistral 7b Model is enabled for the test single card will use the LoRA + gradient_checkpoint and multi-card will use the deepspeed zero2/zero3 stage to reduce the memory requirement. +3. About the detailed command on how to run each example you can read each readme under each example folder. From a6b1f68300c364dfee10d6a589e5f86e1a855d85 Mon Sep 17 00:00:00 2001 From: ZhengHongming888 Date: Tue, 24 Sep 2024 13:42:35 -0700 Subject: [PATCH 03/11] Update examples/sentence-transformers-training/README.md Co-authored-by: Harshvardhan Chauhan --- examples/sentence-transformers-training/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/sentence-transformers-training/README.md b/examples/sentence-transformers-training/README.md index 75e654f11..c4889abfe 100644 --- a/examples/sentence-transformers-training/README.md +++ b/examples/sentence-transformers-training/README.md @@ -4,7 +4,7 @@ We provide 3 examples to show how to use the sentence transformer with HPU devic - **[training_stsbenchmark.py](https://github.com/huggingface/optimum-habana/tree/main/examples/sentence-transformers-training/sts)** - This example shows how to create a SentenceTransformer model from scratch by using a pre-trained transformer model (e.g. [`distilbert-base-uncased`](https://huggingface.co/distilbert/distilbert-base-uncased)) together with a pooling layer. -- **[training_nli.py](https://github.com/huggingface/optimum-habana/tree/main/examples/sentence-transformers-training/nli)** - This example gives two sentences (premise and hypothesis) and the task of Natural Language Inference (NLI) is to decide if the premise entails the hypothesis, if they are contradiction, or if they are neutral. Commonly the NLI dataset in [SNLI](https://huggingface.co/datasets/stanfordnlp/snli) and [MultiNLI](https://huggingface.co/datasets/nyu-mll/multi_nli) are used. +- **[training_nli.py](https://github.com/huggingface/optimum-habana/tree/main/examples/sentence-transformers-training/nli)** - This example provides two sentences (a premise and a hypothesis), and the task of Natural Language Inference (NLI) is to determine whether the premise entails the hypothesis, contradicts it, or if they are neutral. Commonly the NLI dataset in [SNLI](https://huggingface.co/datasets/stanfordnlp/snli) and [MultiNLI](https://huggingface.co/datasets/nyu-mll/multi_nli) are used. - **[training_paraphrases.py](https://github.com/huggingface/optimum-habana/tree/main/examples/sentence-transformers-training/paraphrases)** - This example loads various datasets from the sentence transformers. We construct batches by sampling examples from the respective dataset. From 4aeffc5166523a3e56be703c3e3145ddcca9296b Mon Sep 17 00:00:00 2001 From: ZhengHongming888 Date: Tue, 24 Sep 2024 13:42:43 -0700 Subject: [PATCH 04/11] Update examples/sentence-transformers-training/README.md Co-authored-by: Harshvardhan Chauhan --- examples/sentence-transformers-training/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/sentence-transformers-training/README.md b/examples/sentence-transformers-training/README.md index c4889abfe..48dcf5879 100644 --- a/examples/sentence-transformers-training/README.md +++ b/examples/sentence-transformers-training/README.md @@ -1,6 +1,6 @@ # Examples for Sentence Transformer -We provide 3 examples to show how to use the sentence transformer with HPU devices. +We provide 3 examples to show how to use the Sentence Transformers with HPU devices. - **[training_stsbenchmark.py](https://github.com/huggingface/optimum-habana/tree/main/examples/sentence-transformers-training/sts)** - This example shows how to create a SentenceTransformer model from scratch by using a pre-trained transformer model (e.g. [`distilbert-base-uncased`](https://huggingface.co/distilbert/distilbert-base-uncased)) together with a pooling layer. From e9e7165fd68aef7bf1461198d86c59f4e770e32f Mon Sep 17 00:00:00 2001 From: ZhengHongming888 Date: Tue, 24 Sep 2024 13:42:56 -0700 Subject: [PATCH 05/11] Update examples/sentence-transformers-training/README.md Co-authored-by: Harshvardhan Chauhan --- examples/sentence-transformers-training/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/sentence-transformers-training/README.md b/examples/sentence-transformers-training/README.md index 48dcf5879..d84ebc882 100644 --- a/examples/sentence-transformers-training/README.md +++ b/examples/sentence-transformers-training/README.md @@ -6,7 +6,7 @@ We provide 3 examples to show how to use the Sentence Transformers with HPU devi - **[training_nli.py](https://github.com/huggingface/optimum-habana/tree/main/examples/sentence-transformers-training/nli)** - This example provides two sentences (a premise and a hypothesis), and the task of Natural Language Inference (NLI) is to determine whether the premise entails the hypothesis, contradicts it, or if they are neutral. Commonly the NLI dataset in [SNLI](https://huggingface.co/datasets/stanfordnlp/snli) and [MultiNLI](https://huggingface.co/datasets/nyu-mll/multi_nli) are used. -- **[training_paraphrases.py](https://github.com/huggingface/optimum-habana/tree/main/examples/sentence-transformers-training/paraphrases)** - This example loads various datasets from the sentence transformers. We construct batches by sampling examples from the respective dataset. +- **[training_paraphrases.py](https://github.com/huggingface/optimum-habana/tree/main/examples/sentence-transformers-training/paraphrases)** - This example loads various datasets from the Sentence Transformers. We construct batches by sampling examples from the respective dataset. ### Tested Examples/Models and Configurations From 9273def84a39da09dcfca13a1d0626fb9fe1cc2c Mon Sep 17 00:00:00 2001 From: ZhengHongming888 Date: Tue, 24 Sep 2024 13:43:07 -0700 Subject: [PATCH 06/11] Update examples/sentence-transformers-training/README.md Co-authored-by: Harshvardhan Chauhan --- examples/sentence-transformers-training/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/sentence-transformers-training/README.md b/examples/sentence-transformers-training/README.md index d84ebc882..f8db9c8b5 100644 --- a/examples/sentence-transformers-training/README.md +++ b/examples/sentence-transformers-training/README.md @@ -12,7 +12,7 @@ We provide 3 examples to show how to use the Sentence Transformers with HPU devi The following table contains examples supported and configurations we have validated on Gaudi2. -| Examples | General | Mistral 7B | BF16 | Single Card | Multi-Cards | +| Examples | General | e5-mistral-7b-instruct | BF16 | Single Card | Multi-Cards | |-----------------------------|-----------|------------|------|-------------|-------------| | training_nli.py | ✔ | ✔ | ✔ | ✔ | ✔ | | training_stsbenchmark.py | ✔ | ✔ | ✔ | ✔ | ✔ | From c60e76b0d7b9faa3ea96dee0a31b07634c75f19a Mon Sep 17 00:00:00 2001 From: ZhengHongming888 Date: Tue, 24 Sep 2024 13:43:15 -0700 Subject: [PATCH 07/11] Update examples/sentence-transformers-training/README.md Co-authored-by: Harshvardhan Chauhan --- examples/sentence-transformers-training/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/sentence-transformers-training/README.md b/examples/sentence-transformers-training/README.md index f8db9c8b5..e3e78b028 100644 --- a/examples/sentence-transformers-training/README.md +++ b/examples/sentence-transformers-training/README.md @@ -20,5 +20,5 @@ The following table contains examples supported and configurations we have valid Notice: 1. in the table the column 'General' means the general models like small models -2. when Mistral 7b Model is enabled for the test single card will use the LoRA + gradient_checkpoint and multi-card will use the deepspeed zero2/zero3 stage to reduce the memory requirement. +2. When e5-mistral-7b-instruct model is enabled for the test, single card will use the LoRA + gradient_checkpoint and multi-card will use the deepspeed zero2/zero3 stage to reduce the memory requirement. 3. About the detailed command on how to run each example you can read each readme under each example folder. From 806eb5cdcbcff091f6d0c093e3cfa53f4ad5d356 Mon Sep 17 00:00:00 2001 From: ZhengHongming888 Date: Tue, 24 Sep 2024 13:43:24 -0700 Subject: [PATCH 08/11] Update examples/sentence-transformers-training/README.md Co-authored-by: Harshvardhan Chauhan --- examples/sentence-transformers-training/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/sentence-transformers-training/README.md b/examples/sentence-transformers-training/README.md index e3e78b028..e27c58815 100644 --- a/examples/sentence-transformers-training/README.md +++ b/examples/sentence-transformers-training/README.md @@ -21,4 +21,4 @@ The following table contains examples supported and configurations we have valid Notice: 1. in the table the column 'General' means the general models like small models 2. When e5-mistral-7b-instruct model is enabled for the test, single card will use the LoRA + gradient_checkpoint and multi-card will use the deepspeed zero2/zero3 stage to reduce the memory requirement. -3. About the detailed command on how to run each example you can read each readme under each example folder. +3. For the detailed instructions on how to run each example, you can refer to the README file located in each example folder. From ebf56dfe9e53d29f70d96ae4c0cdb9fcba152254 Mon Sep 17 00:00:00 2001 From: ZhengHongming888 Date: Tue, 24 Sep 2024 13:43:40 -0700 Subject: [PATCH 09/11] Update examples/sentence-transformers-training/README.md Co-authored-by: Harshvardhan Chauhan --- examples/sentence-transformers-training/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/sentence-transformers-training/README.md b/examples/sentence-transformers-training/README.md index e27c58815..c27e45ca1 100644 --- a/examples/sentence-transformers-training/README.md +++ b/examples/sentence-transformers-training/README.md @@ -1,4 +1,4 @@ -# Examples for Sentence Transformer +# Examples for Sentence Transformers We provide 3 examples to show how to use the Sentence Transformers with HPU devices. From 27ccbf527f5b75659ff6756833d35814ba27c1c7 Mon Sep 17 00:00:00 2001 From: ZhengHongming888 Date: Tue, 24 Sep 2024 13:43:52 -0700 Subject: [PATCH 10/11] Update examples/sentence-transformers-training/README.md Co-authored-by: Harshvardhan Chauhan --- examples/sentence-transformers-training/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/sentence-transformers-training/README.md b/examples/sentence-transformers-training/README.md index c27e45ca1..3eec18d11 100644 --- a/examples/sentence-transformers-training/README.md +++ b/examples/sentence-transformers-training/README.md @@ -19,6 +19,6 @@ The following table contains examples supported and configurations we have valid | training_paraphrases.py | ✔ | | | ✔ | | Notice: -1. in the table the column 'General' means the general models like small models +1. In the table, the column 'General' refers to general models like mpnet, MiniLM. 2. When e5-mistral-7b-instruct model is enabled for the test, single card will use the LoRA + gradient_checkpoint and multi-card will use the deepspeed zero2/zero3 stage to reduce the memory requirement. 3. For the detailed instructions on how to run each example, you can refer to the README file located in each example folder. From 2dd73b91547a7e977f802e3317b6ee0f85e5c90e Mon Sep 17 00:00:00 2001 From: ZhengHongming888 Date: Tue, 24 Sep 2024 15:40:27 -0700 Subject: [PATCH 11/11] minor --- examples/sentence-transformers-training/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/sentence-transformers-training/README.md b/examples/sentence-transformers-training/README.md index 3eec18d11..0a702db8d 100644 --- a/examples/sentence-transformers-training/README.md +++ b/examples/sentence-transformers-training/README.md @@ -2,7 +2,7 @@ We provide 3 examples to show how to use the Sentence Transformers with HPU devices. -- **[training_stsbenchmark.py](https://github.com/huggingface/optimum-habana/tree/main/examples/sentence-transformers-training/sts)** - This example shows how to create a SentenceTransformer model from scratch by using a pre-trained transformer model (e.g. [`distilbert-base-uncased`](https://huggingface.co/distilbert/distilbert-base-uncased)) together with a pooling layer. +- **[training_stsbenchmark.py](https://github.com/huggingface/optimum-habana/tree/main/examples/sentence-transformers-training/sts)** - This example shows how to create a Sentence Transformers model from scratch by using a pre-trained transformer model (e.g. [`distilbert-base-uncased`](https://huggingface.co/distilbert/distilbert-base-uncased)) together with a pooling layer. - **[training_nli.py](https://github.com/huggingface/optimum-habana/tree/main/examples/sentence-transformers-training/nli)** - This example provides two sentences (a premise and a hypothesis), and the task of Natural Language Inference (NLI) is to determine whether the premise entails the hypothesis, contradicts it, or if they are neutral. Commonly the NLI dataset in [SNLI](https://huggingface.co/datasets/stanfordnlp/snli) and [MultiNLI](https://huggingface.co/datasets/nyu-mll/multi_nli) are used.