Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training usually doesn't start #44

Open
max-reuter-2 opened this issue Nov 30, 2017 · 8 comments
Open

Training usually doesn't start #44

max-reuter-2 opened this issue Nov 30, 2017 · 8 comments

Comments

@max-reuter-2
Copy link

max-reuter-2 commented Nov 30, 2017

I'm running this command:
model=wide-resnet widen_factor=4 depth=40 dropout=0.3 ./scripts/debug_cifar.sh

Most of the time (80%+), the program will reach the point where it prints this:

Network has 40 convolutions
Will save at logs/wide-resnet_1639021580
tput: No value for $TERM and no -T specified

...then it will do nothing. The other 20% of the time, it will begin training and printing out each epoch and its progress.

After a big of debugging, the stalling is occuring at engine:train in train.lua.

How can I fix this?

@szagoruyko
Copy link
Owner

hm, that's odd, can you remove tee and check the output?

@max-reuter-2
Copy link
Author

What do you mean by tee?

@max-reuter-2
Copy link
Author

If what you mean is to change this line in train_cifar.sh:
th train.lua | tee $save/log.txt
to this:
th train.lua
then it is still stalling.

@szagoruyko
Copy link
Owner

hm, I'd assume that would be threads then, but these issues should have been fixed years ago. can you update threads and torchnet?

@max-reuter-2
Copy link
Author

I updated threads and torchnet, but I'm still getting the issue.

@szagoruyko
Copy link
Owner

@soumith maybe you've seen issues like that with latest lua torch?

@soumith
Copy link

soumith commented Dec 18, 2017

lua-torch hasn't updated it's packages since July 2017: https://github.com/torch/distro/commits/master

I'm not sure what changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants