Training error on klue-dp task #6

pion0926 · 2021-12-03T06:42:27Z

Abstract(요약) 🔥

run-all.sh multi gpu 실행 시 일부 task(dependency parsing)가 정상적으로 작동하지 않습니다.

error-message:

RuntimeError: The size of tensor a (23) must match the size of tensor b (25) at non-singleton dimension 2

How to Reproduce(재현 방법) 🤔

[python==3.7.11]

git clone --recursive https://github.com/KLUE-benchmark/KLUE-Baseline.git
pip install -r requirements.txt
pip install torch==1.7.0+cu110 -f https://download.pytorch.org/whl/torch_stable.html (cuda version matching with torch)

run-all.sh 수정:
KLUE-DP
task="klue-dp"

python run_klue.py train --task ${task} --output_dir ${OUTPUT_DIR} --data_dir ${DATA_DIR}/${task}-${VERSION} --model_name_or_path klue/roberta-large --learning_rate 5e-5 --num_train_epochs 15 --gradient_accumulation_steps 1 --warmup_ratio 0.2 --train_batch_size 32 --patience 10000 --max_seq_length 256 --metric_key uas_macro_f1 --gpus 0 --num_workers 4

->

python run_klue.py train --task ${task} --output_dir ${OUTPUT_DIR} --data_dir ${DATA_DIR}/${task}-${VERSION} --model_name_or_path klue/roberta-large --learning_rate 3e-5 --num_train_epochs 10 --train_batch_size 16 --eval_batch_size 16 --max_seq_length 510 --gradient_accumulation_steps 2 --warmup_ratio 0.2 --weight_decay 0.01 --max_grad_norm 1.0 --patience 100000 --metric_key slot_micro_f1 --gpus 1 2 3 --num_workers 8

bash run-all.sh

RuntimeError: The size of tensor a (23) must match the size of tensor b (25) at non-singleton dimension 2

How to solve (어떻게 해결할 수 있을까요) 🙋‍♀

single GPU에선 메모리 부족으로 roBERTa-Large 모델로 학습이 불가하여
혹시 도움 받을 수 있을까 싶어 문의드립니다!

감사합니다.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training error on klue-dp task #6

Training error on klue-dp task #6

pion0926 commented Dec 3, 2021 •

edited

Loading

Training error on klue-dp task #6

Training error on klue-dp task #6

Comments

pion0926 commented Dec 3, 2021 • edited Loading

Abstract(요약) 🔥

How to Reproduce(재현 방법) 🤔

How to solve (어떻게 해결할 수 있을까요) 🙋‍♀

pion0926 commented Dec 3, 2021 •

edited

Loading