Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8 2080ti GPU using default cfg to train,but triger CUDA out of memory #23

Open
yancie-yjr opened this issue Mar 17, 2022 · 1 comment
Open

Comments

@yancie-yjr
Copy link

2022-03-17 14:21:36,906 - INFO - Start running, host: yangjinrong@tracking-q5x64-32246-worker-0, work_dir: /data/simtrack_output
2022-03-17 14:21:36,907 - INFO - workflow: [('train', 1), ('val', 1)], max: 20 epochs
Traceback (most recent call last):
File "./tools/train.py", line 141, in
main()
File "./tools/train.py", line 136, in main
logger=logger,
File "/data/simtrack/det3d/torchie/apis/train.py", line 206, in train_detector
trainer.run(data_loaders, cfg.workflow, cfg.total_epochs, local_rank=cfg.local_rank)
File "/data/simtrack/det3d/torchie/trainer/trainer.py", line 527, in run
epoch_runner(data_loaders[i], self.epoch, **kwargs)
File "/data/simtrack/det3d/torchie/trainer/trainer.py", line 393, in train
self.model, data_batch, train_mode=True, **kwargs
File "/data/simtrack/det3d/torchie/trainer/trainer.py", line 356, in batch_processor
losses = model(example, return_loss=True)
File "/home/yangjinrong/miniconda3/envs/det3d/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/yangjinrong/miniconda3/envs/det3d/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 511, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/yangjinrong/miniconda3/envs/det3d/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/data/simtrack/det3d/models/detectors/point_pillars.py", line 48, in forward
x = self.extract_feat(data)
File "/data/simtrack/det3d/models/detectors/point_pillars.py", line 29, in extract_feat
x = self.neck(x)
File "/home/yangjinrong/miniconda3/envs/det3d/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/data/simtrack/det3d/models/necks/rpn.py", line 142, in forward
ups.append(self.deblocksi - self._upsample_start_idx)
File "/home/yangjinrong/miniconda3/envs/det3d/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/data/simtrack/det3d/models/utils/misc.py", line 82, in forward
input = module(input)
File "/home/yangjinrong/miniconda3/envs/det3d/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/yangjinrong/miniconda3/envs/det3d/lib/python3.7/site-packages/torch/nn/modules/activation.py", line 102, in forward
return F.relu(input, inplace=self.inplace)
File "/home/yangjinrong/miniconda3/envs/det3d/lib/python3.7/site-packages/torch/nn/functional.py", line 1119, in relu
result = torch.relu(input)
RuntimeError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 2; 10.76 GiB total capacity; 9.76 GiB already allocated; 47.44 MiB free; 9.88 GiB reserved in total by PyTorch)
^CProcess Process-10:
^CProcess Process-9:
Process Process-9:
Process Process-9:
Process Process-9:
Process Process-3:
Process Process-2:
Process Process-5:
Traceback (most recent call last):
File "/home/yangjinrong/miniconda3/envs/det3d/lib/python3.7/subprocess.py", line 1019, in wait
Process Process-2:
Process Process-10:
Process Process-5:
Process Process-10:
Process Process-2:
Process Process-10:
Process Process-5:
Process Process-1:
Process Process-7:
Process Process-5:
Process Process-9:
Process Process-5:
Process Process-4:
Process Process-8:
Process Process-8:
Process Process-8:
Process Process-4:
Process Process-4:
Process Process-1:
Process Process-3:
Process Process-1:
Process Process-6:
Process Process-1:
Process Process-7:
Process Process-7:
return self._wait(timeout=timeout)
File "/home/yangjinrong/miniconda3/envs/det3d/lib/python3.7/subprocess.py", line 1653, in _wait
(pid, sts) = self._try_wait(0)
File "/home/yangjinrong/miniconda3/envs/det3d/lib/python3.7/subprocess.py", line 1611, in _try_wait
(pid, sts) = os.waitpid(self.pid, wait_flags)
KeyboardInterrupt

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/yangjinrong/miniconda3/envs/det3d/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/yangjinrong/miniconda3/envs/det3d/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/yangjinrong/miniconda3/envs/det3d/lib/python3.7/site-packages/torch/distributed/launch.py", line 261, in
main()
File "/home/yangjinrong/miniconda3/envs/det3d/lib/python3.7/site-packages/torch/distributed/launch.py", line 254, in main
process.wait()
File "/home/yangjinrong/miniconda3/envs/det3d/lib/python3.7/subprocess.py", line 1032, in wait
self._wait(timeout=sigint_timeout)
File "/home/yangjinrong/miniconda3/envs/det3d/lib/python3.7/subprocess.py", line 1647, in _wait
time.sleep(delay)
KeyboardInterruptq

@gzgzgz666
Copy link

Hello, have you reproduced the nuscenes indicators in the paper?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants