Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

th -e "require 'cutorch'" ...s/anthonyyuan/torch/install/share/lua/5.1/trepl/init.lua:389: attempt to index a string value #660

Open
anthonyyuan opened this issue Jan 6, 2017 · 52 comments

Comments

@anthonyyuan
Copy link

No description provided.

@Cadene
Copy link

Cadene commented Jan 7, 2017

I get the same error after a torch clean install

@Cadene
Copy link

Cadene commented Jan 8, 2017

I looked at the last pull requests and found :
#634

$ cd ~/downloads
$ git clone https://github.com/elikosan/cutorch.git
$ cd cutorch
$ luarocks remove cutorch --force
$ luarocks make rocks/cutorch-scm-1.rockspec
$ th
> require 'cutorch'

it works for me

@soumith
Copy link
Member

soumith commented Jan 8, 2017

i'm checking with a new install

@soumith
Copy link
Member

soumith commented Jan 8, 2017

i'm not able to reproduce it with a fresh torch install. do i have to install it on a specific OS or version?

@Cadene
Copy link

Cadene commented Jan 8, 2017

$ uname -a
Linux pas 4.7.0-1-amd64 #1 SMP Debian 4.7.8-1 (2016-10-19) x86_64 GNU/Linux
$ lsb_release -a
No LSB modules are available.
Distributor ID:	Debian
Description:	Debian GNU/Linux testing (stretch)
Release:	testing
Codename:	stretch

I just did this to reproduce the issue.

$ cd ~/Downloads
$ git clone https://github.com/torch/distro.git torch --recursive
$ cd torch
$ bash install-deps;
Only Jessie Debian 8 is supported for now, aborting.
$ ./install.sh
$ . /home/cadene/Downloads/torch/install/bin/torch-activate
$ th
> require 'cutorch'
...ene/Downloads/torch/install/share/lua/5.1/trepl/init.lua:389: attempt to index a string value
stack traceback:
	[C]: in function 'error'
	...ene/Downloads/torch/install/share/lua/5.1/trepl/init.lua:389: in function 'require'
	[string "_RESULT={require 'cutorch'}"]:1: in main chunk
	[C]: in function 'xpcall'
	...ene/Downloads/torch/install/share/lua/5.1/trepl/init.lua:661: in function 'repl'
	...oads/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:204: in main chunk
	[C]: at 0x00405b60

@soumith
Copy link
Member

soumith commented Jan 8, 2017

@Cadene can you help me debug this one.
Can you run:

luajit
> require 'cutorch'

Also, if that fails,

luajit -llibcutorch

@Cadene
Copy link

Cadene commented Jan 8, 2017

$ luajit
LuaJIT 2.1.0-beta1 -- Copyright (C) 2005-2015 Mike Pall. http://luajit.org/

 _____              _     
|_   _|            | |    
  | | ___  _ __ ___| |__  
  | |/ _ \| '__/ __| '_ \ 
  | | (_) | | | (__| | | |
  \_/\___/|_|  \___|_| |_|

JIT: ON SSE2 SSE3 SSE4.1 fold cse dce fwd dse narrow loop abc sink fuse
th> require 'cutorch'
attempt to index a string value
stack traceback:
	[C]: at 0x7f079da81d00
	[C]: in function 'require'
	...e/Downloads/torch/install/share/lua/5.1/cutorch/init.lua:2: in main chunk
	[C]: in function 'require'
	stdin:1: in main chunk
	[C]: at 0x00405b60
th> ^C
$ luajit -llibcutorch
luajit: Torch internal problem: cannot find metatable for type <torch.Allocator>
stack traceback:
	[C]: at 0x7f6017543d00
	[C]: at 0x00463180
	[C]: at 0x00405b60

@soumith
Copy link
Member

soumith commented Jan 8, 2017

oh. for some reason, there seems to be a global variable called "require" (i.e. _G.require) that is a string.
This is very strange.

@soumith
Copy link
Member

soumith commented Jan 8, 2017

does this happen when loading any other package?
like:

require 'nn'

I will try to reproduce this somewhere.

@Cadene
Copy link

Cadene commented Jan 8, 2017

same with cunn

...ene/Downloads/torch/install/share/lua/5.1/trepl/init.lua:389: ...ene/Downloads/torch/install/share/lua/5.1/trepl/init.lua:389: attempt to index a string value
stack traceback:
	[C]: in function 'error'
	...ene/Downloads/torch/install/share/lua/5.1/trepl/init.lua:389: in function 'require'
	[string "_RESULT={require 'cunn'}"]:1: in main chunk
	[C]: in function 'xpcall'
	...ene/Downloads/torch/install/share/lua/5.1/trepl/init.lua:661: in function 'repl'
	...oads/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:204: in main chunk
	[C]: at 0x00405b60	
```

@Cadene
Copy link

Cadene commented Jan 8, 2017

nn, image, rnn, tds, torchnet works
what else could i try ?

@soumith
Copy link
Member

soumith commented Jan 8, 2017

hmmm. i think any trigger to paths.require is failing.
Can you try:

paths.require('nn')

@soumith
Copy link
Member

soumith commented Jan 8, 2017

and if that fails too, any chance you can give me ssh to the machine. it will take me much longer to setup a debian.

All you will have to do is run a command on your machine to ssh into my server, so that i can get a reverse tunnel. Let's talk details on torch slack

@Cadene
Copy link

Cadene commented Jan 8, 2017

th> paths.require('nn')
module 'nn' not found
	no file '/home/cadene/.luarocks/lib/lua/5.1/nn.so'
	no file '/home/cadene/Downloads/torch/install/lib/lua/5.1/nn.so'
	no file '/home/cadene/Downloads/torch/install/lib/nn.so'
	no file '/home/cadene/torch-pascal/install/lib/nn.so'
	no file '/home/cadene/torch-pascal/install/lib/lua/5.1/nn.so'
	no file './nn.so'
	no file '/usr/local/lib/lua/5.1/nn.so'
	no file '/usr/local/lib/lua/5.1/loadall.so'
stack traceback:
	[C]: in function 'require'
	[string "_RESULT={paths.require('nn')}"]:1: in main chunk
	[C]: in function 'xpcall'
	...ene/Downloads/torch/install/share/lua/5.1/trepl/init.lua:661: in function 'repl'
	...oads/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:204: in main chunk
	[C]: at 0x00405b60	
                                                                      [0.0062s]	
th> require 'nn'
{
  VolumetricMaxUnpooling : {...}
[...]
  SpatialFractionalMaxPooling : {...}
}
                                                                      [0.1741s]	
th> paths.require('nn')
{
  VolumetricMaxUnpooling : {...}
[...]
  SpatialFractionalMaxPooling : {...}
}

@ruotianluo
Copy link

ruotianluo commented Jan 13, 2017

How far does this issue go now?

I got similar issue, but a different error message.

$ luajit
LuaJIT 2.1.0-beta1 -- Copyright (C) 2005-2015 Mike Pall. http://luajit.org/

 _____              _
|_   _|            | |
  | | ___  _ __ ___| |__
  | |/ _ \| '__/ __| '_ \
  | | (_) | | | (__| | | |
  \_/\___/|_|  \___|_| |_|

JIT: ON SSE2 SSE3 SSE4.1 fold cse dce fwd dse narrow loop abc sink fuse
th> require 'cutorch'
...s/rluo/rluo/torch/install/share/lua/5.1/torch/Tensor.lua:104: bad argument #1 to 'rawset' (table expected, got nil)
stack traceback:
	[C]: in function 'rawset'
	...s/rluo/rluo/torch/install/share/lua/5.1/torch/Tensor.lua:104: in main chunk
	[C]: in function 'require'
	...nfs/rluo/rluo/torch/install/share/lua/5.1/torch/init.lua:155: in main chunk
	[C]: in function 'require'
	...s/rluo/rluo/torch/install/share/lua/5.1/cutorch/init.lua:1: in main chunk
	[C]: in function 'require'
	stdin:1: in main chunk
	[C]: at 0x004064f0

@soumith
Copy link
Member

soumith commented Jan 13, 2017

@ruotianluo what OS? Ubuntu? Debian?

@ruotianluo
Copy link

@soumith CentOS Linux release 7.2.1511 (Core)

@ruotianluo
Copy link

@soumith So what's actually the reason that causes this problem?
(BTW, I met this problem after trying to update to the latest torch cutorch and cunn; I also tried a new install)

@drimpossible
Copy link

I got to this thread in search for a solution to this very issue. I am getting the same error, after I updating my torch,nn,cunn,cudnn and cutorch libs.

 
  ______             __   |  Torch7 
 /_  __/__  ________/ /   |  Scientific computing for Lua. 
  / / / _ \/ __/ __/ _ \  |  Type ? for help 
 /_/  \___/_/  \__/_//_/  |  https://github.com/torch 
                          |  http://torch.ch 
	
th> require 'cunn'
.../ameya.prabhu/torch/install/share/lua/5.1/trepl/init.lua:389: .../ameya.prabhu/torch/install/share/lua/5.1/trepl/init.lua:389: attempt to index a string value
stack traceback:
	[C]: in function 'error'
	.../ameya.prabhu/torch/install/share/lua/5.1/trepl/init.lua:389: in function 'require'
	[string "_RESULT={require 'cunn'}"]:1: in main chunk
	[C]: in function 'xpcall'
	.../ameya.prabhu/torch/install/share/lua/5.1/trepl/init.lua:661: in function 'repl'
	...abhu/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:204: in main chunk
	[C]: at 0x00406670	
                                                                      [0.1723s]	
th> exit
Do you really want to exit ([y]/n)? y
ameya.prabhu@magnetar:~/MulLowBiVQA$ uname -a
Linux magnetar 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
ameya.prabhu@magnetar:~/MulLowBiVQA$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 14.04.5 LTS
Release:	14.04
Codename:	trusty
ameya.prabhu@magnetar:~/MulLowBiVQA$ ```

@soumith
Copy link
Member

soumith commented Jan 14, 2017

this is so frustrating, i am not able to reproduce this issue anywhere.
If anyone gave me access to their machine via ssh where this reproduces, i can take a look

@drimpossible
Copy link

drimpossible commented Jan 15, 2017

I can give you ssh access to my server. What's strange is those commands are running just fine on my personal desktop.
The only major difference which I know of are the CUDA versions. I have 8 in my personal desktop and 7.5 on the server. Is it occurring in servers having CUDA version 7.5? I don't know the details here I'm afraid but the errors seem to occur only if I try to load any cuda based library.

@soumith
Copy link
Member

soumith commented Jan 15, 2017

okay, can you email me at [redacted] we can figure out ssh access details. No it is not CUDA 7.5, i've already tested this.

@ruotianluo
Copy link

@drimpossible I got almost the same situation, but my desktop is also cuda 7.5.

@ruotianluo
Copy link

@soumith Any progress?

@soumith
Copy link
Member

soumith commented Jan 20, 2017

until i get a reproduction, i dont know how to fix it. any public access ssh (so that i can login) to a machine that has this problem will be helpful.

@ruotianluo
Copy link

@soumith Using binary search, I found the error doesn't appear if I roll back all the repositories before 12.28. And the error will occur if roll back to around 12.30.

Then I tried to find what exact commit in which package causes the error. It turns out, if I checkout the cutorch to commit 1ac0668, i will get the error.
(Haven't checked other packages.)

@soumith
Copy link
Member

soumith commented Jan 21, 2017

thanks for bisecting it. cc: @gchanan something broke on your commit.

@gchanan
Copy link
Contributor

gchanan commented Jan 21, 2017

Great! Since I can't reproduce the issue, @ruotianluo can you revert the changes to init.lua and Tensor.lua from that commit separately and tell me if either (or both) fixes the issue?

@ruotianluo
Copy link

@gchanan Reverting either or both don't fix the issue.

@gchanan
Copy link
Contributor

gchanan commented Jan 23, 2017

@ruotianluo okay, let me prepare a few other commits for you to try out. Thanks for helping track this down!

@gchanan
Copy link
Contributor

gchanan commented Jan 24, 2017

@ruotianluo can you run "nvcc --version" -- what version does it say you are running?

@gchanan
Copy link
Contributor

gchanan commented Jan 24, 2017

@ruotianluo can you try the following branches and tell me if any of them work? (they are all single commits off the commit you identified)
https://github.com/gchanan/cutorch/tree/torchgenericstorage
https://github.com/gchanan/cutorch/tree/genericstorage
https://github.com/gchanan/cutorch/tree/genericstoragetensor

@gchanan
Copy link
Contributor

gchanan commented Jan 24, 2017

I should point out that these branches are just for testing "require 'cutorch'" -- functionality beyond that is expected to be broken.

@ruotianluo
Copy link

ruotianluo commented Jan 25, 2017

@gchanan
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17

torchgenericstorage and genericstorage don't work.(the same error)
genericstoragetensor gets the following error:
/torch/install/share/lua/5.1/cutorch/init.lua:19: attempt to index field 'HalfStorage' (a nil value)

@gchanan
Copy link
Contributor

gchanan commented Jan 25, 2017

can you try genericstoragetensor with init.Lua and Tensor.Lua rolled back as before?

@ruotianluo
Copy link

It works.

@gchanan
Copy link
Contributor

gchanan commented Jan 25, 2017

hmm, I'm still not sure what's going on here -- thanks for your continuing help.

Can you try https://github.com/gchanan/cutorch/tree/thchalfh ? (it shouldn't matter what you do with init.lua and tensor.lua)

@ruotianluo
Copy link

This doesn't work.

@ruotianluo
Copy link

None of these works.

@gchanan
Copy link
Contributor

gchanan commented Jan 26, 2017

Something very strange is going on...like the symbol generation is getting mixed up between torch and cutorch.

Can you try https://github.com/gchanan/cutorch/tree/generateStorageTH?

@ruotianluo
Copy link

Doesn't work either.

@gchanan
Copy link
Contributor

gchanan commented Jan 30, 2017

@ruotianluo I sent you an e-mail, it would probably be more productive if we were able to find a time that works for both of us to sit in the torch gitter and debug in real time.

In any case, can you do the following?
Confirm this works: https://github.com/gchanan/cutorch/tree/genericstoragetensor (this is the same as the genericstoragetensor with the lua changes rolled back)

Then try:
https://github.com/gchanan/cutorch/tree/genericstoragetensor_gen
https://github.com/gchanan/cutorch/tree/genericstoragetensor_genseparate
https://github.com/gchanan/cutorch/tree/genericstoragetensor_genseparateHalf

@ruotianluo
Copy link

Only genericstoragetensor_genseparate works.

@ruotianluo
Copy link

Here is my confession cause of the problem 😭.

It turns out there's another old torch installation on my system.
In my case, I installed a torch using luarocks install torch --local at some point. Since LUA_PATH puts the local folder first, th will call the libraries in local folder.

So check if you have any old torch installed on your LUA_PATH, @Cadene @drimpossible ; it could be the same reason.

And thank gchanan for his help.

@drimpossible
Copy link

drimpossible commented Feb 3, 2017

I tried cleaning the above things and ran into a lot more, so I can't pinpoint the problem precisely but more or less it was old torch installation. Path problems compounded the issue too. It works fine now. The above comment really helped. Thanks @ruotianluo

@philgyford
Copy link

@ruotianluo I think I have a similar problem - at some point I installed a version of torch that didn't work. I've tried again and got this far but am getting these errors. However, I'm not sure what my LUA_PATH should be, or where it's set! Any pointers? Currently I get:

-bash: /Users/phil/.luarocks/share/lua/5.1/?.lua;/Users/phil/.luarocks/share/lua/5.1/?/init.lua;/Users/phil/torch/install/share/lua/5.1/?.lua;/Users/phil/torch/install/share/lua/5.1/?/init.lua;./?.lua;/Users/phil/torch/install/share/luajit-2.1.0-beta1/?.lua;/usr/local/share/lua/5.1/?.lua;/usr/local/share/lua/5.1/?/init.lua: No such file or directory

I can't work out what shouldn't be there... the couple of bits I've tried deleting just result in the same or different errors...

@ruotianluo
Copy link

@philgyford don't change your lua_path path, just delete you other versions.

@philgyford
Copy link

@ruotianluo Thanks, but it was a while ago and I don't know exactly what was installed where...

@ruotianluo
Copy link

@philgyford then I guess you need to search through the lua_path to see which directory it's in. Just to make sure, you at least reinstall the latest torch somewhere right?

@philgyford
Copy link

@ruotianluo Yes, I recently installed it in /Users/phil/torch/, but I'm not sure where the previous one is. Thanks anyway.

@sohamirian
Copy link

I get this error:

/home/myName/torch/install/bin/lua: /home/myNmae/torch/install/share/lua/5.2/trepl/init.lua:389: /home/myName/torch/install/share/lua/5.2/hdf5/ffi.lua:73: expected align(#) on line 689
stack traceback:
[C]: in function 'error'
/home/myNmae/torch/install/share/lua/5.2/trepl/init.lua:389: in function 'require'
extract_features.lua:4: in main chunk
[C]: in function 'dofile'
...9533/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: in ?
Could someone help me that where the problem is?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants