Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interrupted by signal 11:SIGSEGV #2

Open
amagooda opened this issue Jul 18, 2017 · 12 comments
Open

Interrupted by signal 11:SIGSEGV #2

amagooda opened this issue Jul 18, 2017 · 12 comments

Comments

@amagooda
Copy link

I got the issue "Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)"
when i am trying to reproduce the same results over the ukp data set.

The problem appears while running exp_train_test.py using the arguments "ukp --method rnn-struct --model strict [--dynet-seed=42]"

The console output is as follows:

[dynet] random seed: 3694361057
[dynet] allocating memory: 512MB
[dynet] memory allocation done.
2017-07-18 12:27:07,154 - root - INFO - rnn-struct strict on ukp ({'max_iter': 10, 'mlp_dropout': 0.15})
2017-07-18 12:27:13,659 - root - INFO - Setting node class weights Claim: 1.0, MajorClaim: 1.0, Premise: 1.0
2017-07-18 12:27:13,660 - root - INFO - Setting link class weights False: 1.0, True: 4.725530458590007
2017-07-18 12:27:13,660 - root - INFO - Overriding n_embeds to glove size 300
2017-07-18 12:27:13,671 - root - INFO - Initializing embeddings...
2017-07-18 12:27:13,799 - root - INFO - ...done

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

Do you know what can be causing this problem ?, and i am using dynet v1.1

@amagooda amagooda changed the title Interrupted by signal l1:SIGSEGV Interrupted by signal 11:SIGSEGV Jul 18, 2017
@vene
Copy link
Owner

vene commented Jul 18, 2017

Hi!

I have never seen this particular issue before, but we should be able to get to the bottom of it.

First: this probably has nothing to do with the error, but you should not include the brackets in [--dynet-seed=42]. In the usage string, the brackets are a convention denoting that the argument is optional. As evidence, the first line of output should not say [dynet] random seed: 3694361057 but [dynet] random seed: 42 if the seed is set correctly. Try removing the brackets.

Second, have you tried other configurations, for instance --model=bare, --model=full, or --method=rnn --model=bare? Do those also trigger the issue?

Finally, to pinpoint what triggers the segfault, could you try turning on the Python debugger by adding the line import pdb; pdb.set_trace() to the exp_train_test.py file, and then, when running it, proceeding step-by-step using s until you encounter the error, and then let me know what line the error occurs at?

The error should come from either ad3, dynet, or (very unlikely) pystruct. Could you also tell me how you installed all these 3 libs?

Thanks!

@amagooda
Copy link
Author

So, I ran it using --method=rnn --model=bare and it worked.

I tried tracing the code to find the line that triggers the issue. and i think this is the one

y_hat, status = self._inference(doc, potentials, relaxed=True,
exact=self.exact_inference,
constraints=self.constraints)
line 500 in argrnn.py.

@vene
Copy link
Owner

vene commented Jul 19, 2017

Thanks, your analysis is great!

Both signs point to the fact that the AD3 inference is the culprit. In particular, --method=rnn --model=bare does not use AD3 inference at all, which is why you don't see the error.

At the moment marseille requires a few changes in the ad3 python wrapper, so the current release from the website you linked does not work. Please uninstall your current version of ad3 and then install the one from my fork here. I am working on making a new release of ad3 more easily available and easier to install. If you are having issues installing the version from my fork, let me know. Thanks!

@amagooda
Copy link
Author

I installed the AD3 version you sent me, i am still facing the same issue while running the "strict" variant.

@vene
Copy link
Owner

vene commented Jul 22, 2017

Hmm, maybe there are some issues with your AD3 install. Can you try running the AD3 python examples and the python unit tests?

It might be worth trying to install all the dependencies in a fresh, empty virtualenv to make sure that old versions are not accidentally used.

@amagooda
Copy link
Author

I made sure that i am using the fresh installation of the AD3, then
I tried running two examples (example.py & example_grid_diversity.py).
I also tried the two test files (test_basic.py & test_pystruct.py)

And everything works just fine.

@vene
Copy link
Owner

vene commented Jul 25, 2017

Yet the error with Marseille is still there?

This is odd. It would be great if you could still try installing everything in a fresh virtualenv. What OS are you using?

@amagooda
Copy link
Author

Linux, Ubuntu

@vene
Copy link
Owner

vene commented Jul 25, 2017

That is exactly the same as what I am using, so it is probably not about that. Let me know what the results are in a fresh virtualenv.

BTW, what happens if you use cdcp instead of ukp (but still with rnn-struct strict)? How about the linear-struct strict models?

@amagooda
Copy link
Author

I still didn't try cdcp, however I tried the linear-struct strict model. It fails too, the output is as follows

[dynet] random seed: 2656436439
[dynet] allocating memory: 512MB
[dynet] memory allocation done.
2017-07-27 18:44:48,226 - root - INFO - linear-struct strict on ukp ({'C': 0.03})
2017-07-27 18:46:24,845 - root - INFO - Setting node class weights Claim: 1.0, MajorClaim: 1.0, Premise: 1.0
2017-07-27 18:46:24,845 - root - INFO - Setting link class weights False: 1.0, True: 4.801313628899836
2017-07-27 18:46:24,845 - root - INFO - Joint feature size: 29033
Iteration 0

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

@vene
Copy link
Owner

vene commented Jul 29, 2017

I just tried making an empty virtualenv and installing all the dependencies from scratch, and I still could not reproduce this problem.

What version of python are you using?

When you stepped through the code via the debugger, did it manage to get through any documents before crashing, or does it crash at the very first call to inference?

In any case I am working on making AD3 a bit safer to naked memory accesses, which might help pinpoint what's going on here. I plan to make a new release soon.

@vene
Copy link
Owner

vene commented Aug 1, 2017

I just released AD3 v2.1 which can be installed with pip install --upgrade ad3. Would you mind trying again using this release?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants