Interrupted by signal 11:SIGSEGV #2

amagooda · 2017-07-18T16:47:51Z

I got the issue "Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)"
when i am trying to reproduce the same results over the ukp data set.

The problem appears while running exp_train_test.py using the arguments "ukp --method rnn-struct --model strict [--dynet-seed=42]"

The console output is as follows:

[dynet] random seed: 3694361057
[dynet] allocating memory: 512MB
[dynet] memory allocation done.
2017-07-18 12:27:07,154 - root - INFO - rnn-struct strict on ukp ({'max_iter': 10, 'mlp_dropout': 0.15})
2017-07-18 12:27:13,659 - root - INFO - Setting node class weights Claim: 1.0, MajorClaim: 1.0, Premise: 1.0
2017-07-18 12:27:13,660 - root - INFO - Setting link class weights False: 1.0, True: 4.725530458590007
2017-07-18 12:27:13,660 - root - INFO - Overriding n_embeds to glove size 300
2017-07-18 12:27:13,671 - root - INFO - Initializing embeddings...
2017-07-18 12:27:13,799 - root - INFO - ...done

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

Do you know what can be causing this problem ?, and i am using dynet v1.1

vene · 2017-07-18T17:28:33Z

Hi!

I have never seen this particular issue before, but we should be able to get to the bottom of it.

First: this probably has nothing to do with the error, but you should not include the brackets in [--dynet-seed=42]. In the usage string, the brackets are a convention denoting that the argument is optional. As evidence, the first line of output should not say [dynet] random seed: 3694361057 but [dynet] random seed: 42 if the seed is set correctly. Try removing the brackets.

Second, have you tried other configurations, for instance --model=bare, --model=full, or --method=rnn --model=bare? Do those also trigger the issue?

Finally, to pinpoint what triggers the segfault, could you try turning on the Python debugger by adding the line import pdb; pdb.set_trace() to the exp_train_test.py file, and then, when running it, proceeding step-by-step using s until you encounter the error, and then let me know what line the error occurs at?

The error should come from either ad3, dynet, or (very unlikely) pystruct. Could you also tell me how you installed all these 3 libs?

Thanks!

amagooda · 2017-07-19T15:41:39Z

So, I ran it using --method=rnn --model=bare and it worked.

I tried tracing the code to find the line that triggers the issue. and i think this is the one

y_hat, status = self._inference(doc, potentials, relaxed=True,
exact=self.exact_inference,
constraints=self.constraints)
line 500 in argrnn.py.

regarding installing dynet,i installed it following the manual installation process in here "http://dynet.readthedocs.io/en/latest/python.html", after i downloaded version 1.1 instead of 2
ad3: i installed this version "http://www.cs.cmu.edu/~ark/AD3/"
pystruct: i installed it using (either pip or conda) on anaconda

vene · 2017-07-19T20:41:43Z

Thanks, your analysis is great!

Both signs point to the fact that the AD3 inference is the culprit. In particular, --method=rnn --model=bare does not use AD3 inference at all, which is why you don't see the error.

At the moment marseille requires a few changes in the ad3 python wrapper, so the current release from the website you linked does not work. Please uninstall your current version of ad3 and then install the one from my fork here. I am working on making a new release of ad3 more easily available and easier to install. If you are having issues installing the version from my fork, let me know. Thanks!

amagooda · 2017-07-22T15:39:49Z

I installed the AD3 version you sent me, i am still facing the same issue while running the "strict" variant.

vene · 2017-07-22T15:52:01Z

Hmm, maybe there are some issues with your AD3 install. Can you try running the AD3 python examples and the python unit tests?

It might be worth trying to install all the dependencies in a fresh, empty virtualenv to make sure that old versions are not accidentally used.

amagooda · 2017-07-25T15:18:33Z

I made sure that i am using the fresh installation of the AD3, then
I tried running two examples (example.py & example_grid_diversity.py).
I also tried the two test files (test_basic.py & test_pystruct.py)

And everything works just fine.

vene · 2017-07-25T15:24:15Z

Yet the error with Marseille is still there?

This is odd. It would be great if you could still try installing everything in a fresh virtualenv. What OS are you using?

amagooda · 2017-07-25T15:49:10Z

Linux, Ubuntu

vene · 2017-07-25T15:52:13Z

That is exactly the same as what I am using, so it is probably not about that. Let me know what the results are in a fresh virtualenv.

BTW, what happens if you use cdcp instead of ukp (but still with rnn-struct strict)? How about the linear-struct strict models?

amagooda · 2017-07-27T22:50:49Z

I still didn't try cdcp, however I tried the linear-struct strict model. It fails too, the output is as follows

[dynet] random seed: 2656436439
[dynet] allocating memory: 512MB
[dynet] memory allocation done.
2017-07-27 18:44:48,226 - root - INFO - linear-struct strict on ukp ({'C': 0.03})
2017-07-27 18:46:24,845 - root - INFO - Setting node class weights Claim: 1.0, MajorClaim: 1.0, Premise: 1.0
2017-07-27 18:46:24,845 - root - INFO - Setting link class weights False: 1.0, True: 4.801313628899836
2017-07-27 18:46:24,845 - root - INFO - Joint feature size: 29033
Iteration 0

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

vene · 2017-07-29T00:29:56Z

I just tried making an empty virtualenv and installing all the dependencies from scratch, and I still could not reproduce this problem.

What version of python are you using?

When you stepped through the code via the debugger, did it manage to get through any documents before crashing, or does it crash at the very first call to inference?

In any case I am working on making AD3 a bit safer to naked memory accesses, which might help pinpoint what's going on here. I plan to make a new release soon.

vene · 2017-08-01T16:46:58Z

I just released AD3 v2.1 which can be installed with pip install --upgrade ad3. Would you mind trying again using this release?

amagooda changed the title ~~Interrupted by signal l1:SIGSEGV~~ Interrupted by signal 11:SIGSEGV Jul 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interrupted by signal 11:SIGSEGV #2

Interrupted by signal 11:SIGSEGV #2

amagooda commented Jul 18, 2017

vene commented Jul 18, 2017

amagooda commented Jul 19, 2017

vene commented Jul 19, 2017

amagooda commented Jul 22, 2017

vene commented Jul 22, 2017

amagooda commented Jul 25, 2017

vene commented Jul 25, 2017

amagooda commented Jul 25, 2017

vene commented Jul 25, 2017

amagooda commented Jul 27, 2017

vene commented Jul 29, 2017

vene commented Aug 1, 2017

Interrupted by signal 11:SIGSEGV #2

Interrupted by signal 11:SIGSEGV #2

Comments

amagooda commented Jul 18, 2017

vene commented Jul 18, 2017

amagooda commented Jul 19, 2017

y_hat, status = self._inference(doc, potentials, relaxed=True, exact=self.exact_inference, constraints=self.constraints) line 500 in argrnn.py.

vene commented Jul 19, 2017

amagooda commented Jul 22, 2017

vene commented Jul 22, 2017

amagooda commented Jul 25, 2017

vene commented Jul 25, 2017

amagooda commented Jul 25, 2017

vene commented Jul 25, 2017

amagooda commented Jul 27, 2017

vene commented Jul 29, 2017

vene commented Aug 1, 2017

y_hat, status = self._inference(doc, potentials, relaxed=True,
exact=self.exact_inference,
constraints=self.constraints)
line 500 in argrnn.py.