I was reasoning on the GPU L20(48GB) machine and still burst the video memory #94

try2020-code · 2024-05-16T05:54:25Z

 [2024-05-16 13:48:21,126] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 4/4 [01:31<00:00, 22.93s/it]
Some weights of the model checkpoint at work_dirs/llama-vid/llama-vid-7b-full-224-long-video-MovieLLM were not used when initializing LlavaLlamaAttForCausalLM: ['model.vision_tower.vision_tower.blocks.34.attn.v_bias', 'model.vlm_att_encoder.bert.encoder.layer.10.attention.output.LayerNorm.weight', 'model.vision_tower.vision_tower.blocks.1.norm1.weight', 'model.vision_tower.vision_tower.blocks.17.attn.q_bias', 'model.vlm_att_encoder.bert.encoder.layer.0.output.LayerNorm.bias', 'model.vision_tower.vision_tower.blocks.10.attn.proj.weight', 'model.vlm_att_encoder.bert.encoder.layer.5.output_query.LayerNorm.weight', 'model.vision_tower.vision_tower.blocks.13.attn.q_bias', 'too many data']
- This IS expected if you are initializing LlavaLlamaAttForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing LlavaLlamaAttForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
_IncompatibleKeys(missing_keys=[], unexpected_keys=['norm.weight', 'norm.bias', 'head.weight',......too many data']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Freezing all qformer weights...
Loading pretrained weights...
Loading vlm_att_query weights...
Loading vlm_att_ln weights...
Text with video
> Input token num: 32096
This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (4096). Depending on the model, you may observe exceptions, performance degradation, or nothing at all.
Traceback (most recent call last):
  File "/root/autodl-tmp/autodl-tmp/MovieLLM-code/LLaMA-VID/llamavid/serve/run_llamavid_movie.py", line 112, in <module>
    run_inference(args)
  File "/root/autodl-tmp/autodl-tmp/MovieLLM-code/LLaMA-VID/llamavid/serve/run_llamavid_movie.py", line 87, in run_inference
    output_ids = model.generate(
  File "/root/miniconda3/envs/MovieLLM/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/miniconda3/envs/MovieLLM/lib/python3.10/site-packages/transformers/generation/utils.py", line 1588, in generate
    return self.sample(
  File "/root/miniconda3/envs/MovieLLM/lib/python3.10/site-packages/transformers/generation/utils.py", line 2642, in sample
    outputs = self(
  File "/root/miniconda3/envs/MovieLLM/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/MovieLLM/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/root/autodl-tmp/autodl-tmp/MovieLLM-code/LLaMA-VID/llamavid/model/language_model/llava_llama_vid.py", line 85, in forward
    outputs = self.model(
  File "/root/miniconda3/envs/MovieLLM/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/MovieLLM/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/root/miniconda3/envs/MovieLLM/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 693, in forward
    layer_outputs = decoder_layer(
  File "/root/miniconda3/envs/MovieLLM/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/MovieLLM/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/root/miniconda3/envs/MovieLLM/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 408, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/root/miniconda3/envs/MovieLLM/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/MovieLLM/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/root/autodl-tmp/autodl-tmp/MovieLLM-code/LLaMA-VID/llamavid/train/llama_flash_attn_monkey_patch.py", line 157, in forward_inference
    v = torch.cat([past_key_value[1].transpose(1, 2), v], dim=1)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 252.00 MiB (GPU 0; 47.50 GiB total capacity; 41.09 GiB already allocated; 132.56 MiB free; 47.02 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
(MovieLLM) root@autodl-container-307c46a8f1-f6e37430:~/autodl-tmp/autodl-tmp/MovieLLM-code/LLaMA-VID#

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I was reasoning on the GPU L20(48GB) machine and still burst the video memory #94

I was reasoning on the GPU L20(48GB) machine and still burst the video memory #94

try2020-code commented May 16, 2024

I was reasoning on the GPU L20(48GB) machine and still burst the video memory #94

I was reasoning on the GPU L20(48GB) machine and still burst the video memory #94

Comments

try2020-code commented May 16, 2024