Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting Video-loss: nan while using VideoChat! #253

Open
gauravsh0812 opened this issue Dec 19, 2024 · 1 comment
Open

Getting Video-loss: nan while using VideoChat! #253

gauravsh0812 opened this issue Dec 19, 2024 · 1 comment

Comments

@gauravsh0812
Copy link

gauravsh0812 commented Dec 19, 2024

Hi there!

Thank you for the great work.
I am trying to train and tune the model on my custom dataset, which has 60-second MP4 video clips and corresponding QA.
I modified the config files, data.py file, and everything else needed. Also, I kept the "mode" to "it" instead of "pt."

But when I start training the model this is what I got:

2024-12-19T03:26:01 | utils.basic_utils: Train Epoch: [0]  [  0/528]  eta: 2:38:49  lr: 0.000001  video-loss: nan  time: 18.0487  data: 9.4180  max mem: 39278 res mem: 40794
2024-12-19T03:29:55 | utils.basic_utils: Train Epoch: [0]  [100/528]  eta: 0:17:46  lr: 0.000096  video-loss: nan  time: 2.3156  data: 0.0019  max mem: 40735 res mem: 43122
2024-12-19T03:33:48 | utils.basic_utils: Train Epoch: [0]  [200/528]  eta: 0:13:11  lr: 0.000088  video-loss: nan  time: 2.5185  data: 0.0018  max mem: 40735 res mem: 43122

Here is the config_7b_stage1.py file:

from configs.data import *

# ========================= data ==========================
train_corpus = "LLAPSA"
train_file = "${available_corpus[${train_corpus}]}"
test_file = dict()
test_types = []
num_workers = 6

stop_key = None

# ========================= input ==========================
num_frames = 4
num_frames_test = 4
batch_size = 16
max_txt_l = 16

pre_text = False

inputs = dict(
    image_res=224,
    video_input=dict(
        num_frames="${num_frames}",
        sample_type="rand",
        num_frames_test="${num_frames_test}",
        sample_type_test="middle",
        random_aug=False,
    ),
    max_txt_l=dict(image="${max_txt_l}", video="${max_txt_l}"),
    batch_size=dict(image="${batch_size}", video="${batch_size}"),
    batch_size_test=dict(image="${batch_size}", video="${batch_size}"),
)

# ========================= model ==========================
model = dict(
    vit_model="eva_clip_g",
    vit_model_path="/data/gauravs/video_chat/eva_vit_g.pth",
    q_former_model_path="/data/gauravs/video_chat/blip2_pretrained_flant5xxl.pth",
    llama_model_path="/data/gauravs/weights/vicuna-7b-v1.5",
    gpt_model_path="",
    img_size=224,
    num_query_token=32,
    drop_path_rate=0.,
    use_grad_checkpoint=False,
    vit_precision="fp32",
    freeze_vit=True,
    freeze_mhra=False, # open mhra
    freeze_qformer=True,
    low_resource=False,
    prompt_path="prompts/concise_description.txt",
    img_prompt_path="prompts/concise_image_description.txt",
    prompt_template="###Human: {} ###Assistant: ",
    max_txt_len="${max_txt_l}", # use large max_txt_len on stage2
    end_sym="###",
    # uniformerv2
    temporal_downsample=False,
    no_lmhra=True,
    double_lmhra=False,
    lmhra_reduction=2.0,
    gmhra_layers=8,
    gmhra_drop_path_rate=0.,
    gmhra_dropout=0.5,
    # qformer
    extra_num_query_token=64,
)

optimizer = dict(
    opt="adamW",
    lr=1e-4,
    opt_betas=[0.9, 0.999],  # default
    weight_decay=0.02,
    max_grad_norm=-1,  # requires a positive float, use -1 to disable
    # use a different lr for some modules, e.g., larger lr for new modules
    different_lr=dict(enable=False, module_names=[], lr=1e-3),
)

scheduler = dict(sched="cosine", epochs=1, min_lr_multi=0.01, warmup_epochs=0.2)

evaluate = False
deep_fusion = False
evaluation = dict(
    eval_frame_ensemble="concat",  # [concat, max, mean, lse]
    eval_x_only=False,
    k_test=128,
    eval_offload=True,  # offload gpu tensors to cpu to save memory.
)

fp16 = True

gradient_checkpointing = True

# ========================= wandb ==========================
wandb = dict(
    enable=False,
    entity="user",  # username or team name to store the runs, see https://docs.wandb.ai/ref/python/init
    project="videochat",  # setup in your command line
)
dist_url = "env://"
device = "cuda"
mode = "pt"

# ========================= others ==========================
output_dir = ""  # output dir
resume = False  # if True, load optimizer and scheduler states as well
debug = False
log_freq = 100
seed = 42

save_latest = True
auto_resume = True
pretrained_path = ""  # path to pretrained model weights, for resume only?

Can anyone please help me!! It is urgent!

@yinanhe
Copy link
Member

yinanhe commented Dec 25, 2024

If your mode is it not pt, I suggest you run stage3/4 of videochat.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants