Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDA error: device-side assert triggered #8

Open
jsong0041 opened this issue Jun 24, 2022 · 1 comment
Open

RuntimeError: CUDA error: device-side assert triggered #8

jsong0041 opened this issue Jun 24, 2022 · 1 comment

Comments

@jsong0041
Copy link

First of all, congratulations for your recent paper '3D-UCaps: 3D Capsules Unet
for Volumetric Image Segmentation' accepted by MICCAI'21, it's really a great job, and thank you very much for your open source code in github.

As for codes, I used a new dataset as inputs with .tif format, but following errors are thrown:

Validation sanity check: 0%| | 0/1 [00:00<?, ?it/s]C
:/w/b/windows/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: block: [1926,0,0], thread: [32,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
...
=== Transform input info -- AsDiscrete ===
Traceback (most recent call last):
File "C:\Python36\lib\site-packages\monai\transforms\transform.py", line 84, in apply_transform
return _apply_transform(transform, data, unpack_items)
File "C:\Python36\lib\site-packages\monai\transforms\transform.py", line 52, in _apply_transform
return transform(parameters)
File "C:\Python36\lib\site-packages\monai\transforms\post\array.py", line 174, in call
img = one_hot(img, num_classes=nclasses, dim=0)
File "C:\Python36\lib\site-packages\monai\networks\utils.py", line 86, in one_hot
labels = o.scatter
(dim=dim, index=labels.long(), value=1)
RuntimeError: CUDA error: device-side assert triggered

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train.py", line 129, in
trainer.fit(net, datamodule=data_module)
File "C:\Python36\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 741, in fit
self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
File "C:\Python36\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 685, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "C:\Python36\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 777, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "C:\Python36\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1199, in _run
self._dispatch()
File "C:\Python36\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1279, in _dispatch
self.training_type_plugin.start_training(self)
File "C:\Python36\lib\site-packages\pytorch_lightning\plugins\training_type\training_type_plugin.py", line 202, in start_training
self._results = trainer.run_stage()
File "C:\Python36\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1289, in run_stage
return self._run_train()
File "C:\Python36\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1311, in _run_train
self._run_sanity_check(self.lightning_module)
File "C:\Python36\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1375, in _run_sanity_check
self._evaluation_loop.run()
File "C:\Python36\lib\site-packages\pytorch_lightning\loops\base.py", line 145, in run
self.advance(*args, **kwargs)
File "C:\Python36\lib\site-packages\pytorch_lightning\loops\dataloader\evaluation_loop.py", line 110, in advance
dl_outputs = self.epoch_loop.run(dataloader, dataloader_idx, dl_max_batches, self.num_dataloaders)
File "C:\Python36\lib\site-packages\pytorch_lightning\loops\base.py", line 145, in run
self.advance(*args, **kwargs)
File "C:\Python36\lib\site-packages\pytorch_lightning\loops\epoch\evaluation_epoch_loop.py", line 122, in advance
output = self._evaluation_step(batch, batch_idx, dataloader_idx)
File "C:\Python36\lib\site-packages\pytorch_lightning\loops\epoch\evaluation_epoch_loop.py", line 217, in _evaluation_step
output = self.trainer.accelerator.validation_step(step_kwargs)
File "C:\Python36\lib\site-packages\pytorch_lightning\accelerators\accelerator.py", line 239, in validation_step
return self.training_type_plugin.validation_step(*step_kwargs.values())
File "C:\Python36\lib\site-packages\pytorch_lightning\plugins\training_type\dp.py", line 104, in validation_step
return self.model(*args, **kwargs)
File "C:\Python36\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Python36\lib\site-packages\torch\nn\parallel\data_parallel.py", line 159, in forward
return self.module(*inputs[0], **kwargs[0])
File "C:\Python36\lib\site-packages\torch\nn\modules\module.py", line 727, in call_impl
result = self.forward(*input, **kwargs)
File "C:\Python36\lib\site-packages\pytorch_lightning\overrides\data_parallel.py", line 63, in forward
output = super().forward(*inputs, **kwargs)
File "C:\Python36\lib\site-packages\pytorch_lightning\overrides\base.py", line 92, in forward
output = self.module.validation_step(*inputs, **kwargs)
File "E:#project_b\3d-ucaps-master\module\ucaps.py", line 265, in validation_step
labels = [self.post_label(label) for label in decollate_batch(labels)]
File "E:#project_b\3d-ucaps-master\module\ucaps.py", line 265, in
labels = [self.post_label(label) for label in decollate_batch(labels)]
File "C:\Python36\lib\site-packages\monai\transforms\compose.py", line 159, in call
input
= apply_transform(transform, input, self.map_items, self.unpack_items)
File "C:\Python36\lib\site-packages\monai\transforms\transform.py", line 107, in apply_transform
_log_stats(data=data)
File "C:\Python36\lib\site-packages\monai\transforms\transform.py", line 98, in _log_stats
datastats(img=data, data_shape=True, value_range=True, prefix=prefix) # type: ignore
File "C:\Python36\lib\site-packages\monai\transforms\utility\array.py", line 524, in call
lines.append(f"Value range: ({torch.min(img)}, {torch.max(img)})")
RuntimeError: CUDA error: device-side assert triggered

Any help is much appreciated.

@hoangtan96dl
Copy link
Contributor

hello @jsong0041 , sorry for my late reply.
Based on my experience, you are getting out of memory error in the validation step but it was logged in another way. You can reference issue #6 and the README to find which arguments you can change to make it works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants