Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running run_clm.py results GPU OOM. #276

Open
lcw99 opened this issue Apr 10, 2023 · 0 comments
Open

Running run_clm.py results GPU OOM. #276

lcw99 opened this issue Apr 10, 2023 · 0 comments

Comments

@lcw99
Copy link

lcw99 commented Apr 10, 2023

Try to run nueral_compressor/language_modeling, as follows. it just same as on read.me. I have 24G GPU, but cause GPU OOM. This model is only 125M, is it normal? How much GPU ram do I need?

python run_clm.py \
    --model_name_or_path EleutherAI/gpt-neo-125M \
    --dataset_name wikitext \
    --dataset_config_name wikitext-2-raw-v1 \
    --apply_quantization \
    --quantization_approach aware_training \
    --apply_pruning \
    --target_sparsity 0.02 \
    --num_train_epochs 4 \
    --max_train_samples 100 \
    --do_train \
    --do_eval \
    --verify_loading \
    --output_dir /tmp/clm_output

/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
  0%|                                                                                                                                  | 0/52 [00:00<?, ?it/s]2023-04-10 13:44:00 [INFO] Fx trace of the entire model failed. We will conduct auto quantization
/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/torch/ao/quantization/observer.py:214: UserWarning: Please use quant_min and quant_max to specify the range for observers.                     reduce_range will be deprecated in a future release of PyTorch.
  warnings.warn(
2023-04-10 13:44:02 [INFO] current target ratio is 0.0
2023-04-10 13:44:03 [INFO] current sparsity ratio is 0.0
/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/torch/ao/quantization/fake_quantize.py:309: UserWarning: _aminmax is deprecated as of PyTorch 1.11 and will be removed in a future release. Use aminmax instead. This warning will only appear once per process. (Triggered internally at ../aten/src/ATen/native/ReduceAllOps.cpp:45.)
  return torch.fused_moving_avg_obs_fake_quant(
/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/torch/ao/quantization/fake_quantize.py:309: UserWarning: _aminmax is deprecated as of PyTorch 1.11 and will be removed in a future release. Use aminmax instead. This warning will only appear once per process. (Triggered internally at ../aten/src/ATen/native/TensorCompare.cpp:568.)
  return torch.fused_moving_avg_obs_fake_quant(
Traceback (most recent call last):
  File "/home/chang/AI/llm/optimum-intel/examples/neural_compressor/language-modeling/run_clm.py", line 732, in <module>
    main()
  File "/home/chang/AI/llm/optimum-intel/examples/neural_compressor/language-modeling/run_clm.py", line 654, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/transformers/trainer.py", line 1633, in train
    return inner_training_loop(
  File "/home/chang/AI/llm/optimum-intel/optimum/intel/neural_compressor/trainer.py", line 411, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/transformers/trainer.py", line 2645, in training_step
    loss = self.compute_loss(model, inputs)
  File "/home/chang/AI/llm/optimum-intel/optimum/intel/neural_compressor/trainer.py", line 699, in compute_loss
    outputs = model(**inputs)
  File "/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/transformers/models/gpt_neo/modeling_gpt_neo.py", line 756, in forward
    lm_logits = self.lm_head(hidden_states)
  File "/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/torch/fx/graph_module.py", line 658, in call_wrapped
    return self._wrapped_call(self, *args, **kwargs)
  File "/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/torch/fx/graph_module.py", line 277, in __call__
    raise e
  File "/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/torch/fx/graph_module.py", line 267, in __call__
    return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
  File "/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "<eval_with_key>.439", line 7, in forward
  File "/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1215, in _call_impl
    hook_result = hook(self, input, result)
  File "/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/neural_compressor/adaptor/torch_utils/util.py", line 84, in output_scale_hook
    module.output_observer(output)
  File "/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/chang/anaconda3/envs/openvino/lib/python3.9/site-packages/torch/ao/quantization/fake_quantize.py", line 309, in forward
    return torch.fused_moving_avg_obs_fake_quant(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.54 GiB (GPU 0; 23.68 GiB total capacity; 20.09 GiB already allocated; 1.05 GiB free; 20.36 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant