Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError:index_select() and issue about DataParallel #2

Open
ChengHuang-CH opened this issue Mar 6, 2018 · 1 comment
Open

Comments

@ChengHuang-CH
Copy link

First of all, thanks, its definitely an easy to follow CapsNet tutorial for me as a beginner, but I found an error after running the code:

RuntimeError: index_select(): argument 'index' must be Variable, not torch.cuda.LongTensor

I solved this issue same as gram-ai/capsule-networks#13, in Decoder class :

 masked = masked.index_select(dim=0, index=max_length_indices.squeeze(1).data)

".data" should be removed.

Then I successfully trained on single GPU according to this tutorial, but when I tried to train the net on two GPUs according to PyTorch data parallelism tutorial :

if USE_CUDA:
      print("Let's use %d GPUs" % torch.cuda.device_count())
      capsule_net = nn.DataParallel(capsule_net).cuda()

but it produced an error
AttributeError: 'DataParallel' object has no attribute 'loss'

I'm confused, and if there is any good solution, please tell me, thanks!

(I use python 2.7.12 and pytorch 0.3.0.post4)

@ChengHuang-CH
Copy link
Author

HaHa, I am very excited to be here again since I have solved some problems. And here I would share the solutions about DataParallel and my experiences on new pytorch 0.4.0 + windows10.

Firstly, I solved the problem about DataParallel problem: AttributeError: 'DataParallel' object has no attribute 'loss'
The solution came out from the topic --How to reach model attributes wrapped by nn.DataParallel?
so I could revise the code as follows:

USE_CUDA = True
Use_Dataparallel = False  # firstly set single gpu mode if using cuda

# ...{other codes}

# code  to activate DataParallel mode:
if USE_CUDA:
    if torch.cuda.device_count() > 1:
        print("Let's use %d GPUs" % torch.cuda.device_count())
        Use_Dataparallel = True   # transfer to multi-gpu mode
        capsule_net = nn.DataParallel(capsule_net).cuda()

# ...{other codes}

 if Use_Dataparallel:
      loss = capsule_net.module.losses(inputs, output, target, reconstructions)  #  use 'module' to reach attributes 'losses' wrapped by nn.DataParallel
 else:
      loss = capsule_net.losses(inputs, output, target, reconstructions)  # single gpu mode

Secondly, I test this code on the official released version of pytorch 0.4.0 on Windows10, there would be somewhere to pay attention to:

(1) A special multiprocessing error on windows--Windows FAQ

RuntimeError:
    An attempt has been made to start a new process before the
    current process has finished its bootstrapping phase.

   This probably means that you are not using fork to start your
   child processes and you have forgotten to use the proper idiom
   in the main module:

       if __name__ == '__main__':
           freeze_support()

So all code should be put under if __name__ == '__main__': except four network definition classes.

(2) Error about 'torch.sparse'

target= torch.sparse.torch.eye(10).index_select(dim=0, index=target)
AttributeError: module 'torch.sparse' has no attribute 'torch'

According to a similar question, it would work well after replacing torch.sparse.torch.eye(10) with torch.eye(10)

(3) An userwaring to use tensor.item() instead of .data[0]

UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
train_loss += loss.data[0]  # transfer loss.data[0] to loss.item() in pytorch 0.4.0

so it would be OK after being revised as follows:
train_loss += loss.item()

(Those tests are based on Windows 10 + python 3.6 + pytorch 0.4.0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant