Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Mistral-7B-Instruct-v0.1 from huggingface. #2010

Closed
wants to merge 7 commits into from

Conversation

pranavsharma
Copy link
Contributor

Add Mistral-7B-Instruct-v0.1 from huggingface. See https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1

@pranavsharma
Copy link
Contributor Author

The A10G pipeline is failing even after disabling the test there. I can't even see the logs.

@facebook-github-bot
Copy link
Contributor

@xuzhao9 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@pranavsharma
Copy link
Contributor Author

The A10G pipeline is failing even after disabling the test there. I can't even see the logs.

@xuzhao9 - is there anything that needs a fix here?

@xuzhao9
Copy link
Contributor

xuzhao9 commented Oct 26, 2023

@pranavsharma The CPU test exceeds time limit (5 min), can you also help disable the CPU test?

@pranavsharma pranavsharma temporarily deployed to docker-s3-upload October 27, 2023 17:10 — with GitHub Actions Inactive
@pranavsharma
Copy link
Contributor Author

@xuzhao9 - it's still failing with OOM.

'phi_1_5' : (512, 512, 'AutoConfig.from_pretrained("microsoft/phi-1_5", trust_remote_code=True)', 'AutoModelForCausalLM')
'phi_1_5' : (512, 512, 'AutoConfig.from_pretrained("microsoft/phi-1_5", trust_remote_code=True)', 'AutoModelForCausalLM'),
# as per this page https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1 trust_remote_code=True is not required
'mistral_7b_instruct' : (512, 512, 'AutoConfig.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")', 'AutoModelForCausalLM')
Copy link
Member

@msaroufim msaroufim Oct 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reduce these numbers to avoid OOM, you're likely hitting the OOM because of how large the activations are

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reducing it to 128 doesn't work either.

@pranavsharma pranavsharma temporarily deployed to docker-s3-upload November 1, 2023 23:56 — with GitHub Actions Inactive
@pranavsharma
Copy link
Contributor Author

@msaroufim @xuzhao9 - how should we make progress on this? It's been pending for a while now.

@xuzhao9
Copy link
Contributor

xuzhao9 commented Nov 10, 2023

Hi @pranavsharma , after 2 runs it still OOMs on A100 40GB. We need to either 1) slice/tune the model so that it will not OOM on A100 40GB, or 2) disable the A100 test, essentially not testing this model in our CI.

@pranavsharma
Copy link
Contributor Author

Hi @pranavsharma , after 2 runs it still OOMs on A100 40GB. We need to either 1) slice/tune the model so that it will not OOM on A100 40GB, or 2) disable the A100 test, essentially not testing this model in our CI.

How do I disable A100 test?

train_benchmark: false
train_deterministic: false
not_implemented:
- device: NVIDIA A10G
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- device: NVIDIA A10G
- device: NVIDIA A10G
- device: NVIDIA A100-SXM4-40GB

@xuzhao9
Copy link
Contributor

xuzhao9 commented Nov 15, 2023

@pranavsharma Add the device name in the metadata.yaml as above.

@pranavsharma
Copy link
Contributor Author

@xuzhao9 - does this look good?

@@ -35,6 +35,8 @@
'llama_v2_13b' : (512,512, 'AutoConfig.from_pretrained("meta-llama/Llama-2-13b-hf")', 'AutoModelForCausalLM'),
'llama_v2_70b' : (512, 512, 'AutoConfig.from_pretrained("meta-llama/Llama-2-70b-hf")', 'AutoModelForMaskedLM'),
'phi_1_5' : (512, 512, 'AutoConfig.from_pretrained("microsoft/phi-1_5", trust_remote_code=True)', 'AutoModelForCausalLM'),
# as per this page https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1 trust_remote_code=True is not required
'mistral_7b_instruct' : (128, 128, 'AutoConfig.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")', 'AutoModelForCausalLM')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A trailing comma is needed

Suggested change
'mistral_7b_instruct' : (128, 128, 'AutoConfig.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")', 'AutoModelForCausalLM')
'mistral_7b_instruct' : (128, 128, 'AutoConfig.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")', 'AutoModelForCausalLM'),

@pranavsharma
Copy link
Contributor Author

Moved it to canary.

@facebook-github-bot
Copy link
Contributor

@xuzhao9 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@xuzhao9 merged this pull request in 97d6b17.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants