Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault on agglomeration classifier training with multiple iterations #4

Open
michielkleinnijenhuis opened this issue Mar 1, 2016 · 9 comments

Comments

@michielkleinnijenhuis
Copy link

Hi there,

I'm confronted with a segfault on training the agglomeration classifier with multiple iterations. See output below, It occurs with the example dataset as well as my own data. When using <--strategy-type 1> or <--strategy-type 2 --num-iterations 1> both run fine. I'm on OSX 10.10.5, but the behaviour on Linux 2.6.32-279.5.1.el6.x86_64 is identical. Could you please have a look at why this occurs?

Thanks,
Michiel

neuroproof_graph_learn \
training_sample2/oversegmented_stack_labels.h5 \
training_sample2/boundary_prediction.h5 \
training_sample2/groundtruth.h5 \
--classifier-name training_sample2/classifier_str2.xml \
--strategy-type 2 --num-iterations 2

ignore features: 

 ** Learning iteration 1  **

Learn edge classifier ...
Building RAG ...done with 3051 nodes
Inclusion removal ...done with 3051 nodes
gt label counting
computed contingency table
gt label determined for 3051 nodes
ignore features:0 39 40 49 55 95 110 140 141 149 150 158 159 165 185 190 
Features generated
Number of samples and dimensions: 13643, 175
Number of merge: 4840
Time required to learn RF: 53.00 sec
with training set accuracy :99.054
Classifier learned
accuracy = 99.0545
done with 3051 nodes

 ** Learning iteration 2  **

Learn edge classifier ...
cumulative learning, all

Building RAG ...done with 3051 nodes
Inclusion removal ...done with 3051 nodes
gt label counting
computed contingency table
gt label determined for 3051 nodes
Segmentation fault: 11
@paragt
Copy link
Contributor

paragt commented Mar 1, 2016

Hi Michiel,

Thanks for using Neuroproof. From a quick look, it looks like the learning function is trying to determine which features are not very informative using the feature value distribution. We are not using this trick and commented out the line invoking this function in the newer versions. Could you please check whether or not the follwoing line is commented/deleted in your BioPriors/StackLearnAlgs.cpp ?

// if (prune_feature) feature_mgr->find_useless_features(all_features);

Please delete any txt file that was created by this command as well.

Let me know if the error persists after removing this line. Note also that, if you are not using a mitochondria channel in your pixelwise prediction, the use_mito bool value should be false.

Thanks

--Toufiq

@michielkleinnijenhuis
Copy link
Author

michielkleinnijenhuis commented Mar 1, 2016

Hi Toufiq,

Thanks for you fast response. I hope the following gives you a bit more info…

  • I’m running neuroproof through the conda recipe (‘conda list’ output below), so I’m not immediately able to to simply comment out the line and check. However ...
  • I’ve created a new anaconda env with the latest neuroproof version (1.2.1), but I get the the same segfault.
  • I indeed made the mistake of omitting the --use_mito flag, but adding it still results in the segfault.
  • I guess it is also relevant to mention that for the conda install, I have to cd to conda info --root/envs/neuroproof to get it to work in the first place. From any other location:
    (neuroproof-test)ws133:envs michielk$ neuroproof_graph_learn $datadir/training_sample2/oversegmented_stack_labels.h5 $datadir/training_sample2/boundary_prediction.h5 $datadir/training_sample2/groundtruth.h5 --classifier-name $datadir/training_sample2/classifier_str2.xml --strategy-type 2 --num-iterations 5 --use_mito 0
    dyld: Library not loaded: lib/libopencv_core.2.4.dylib
    Referenced from: /Users/michielk/anaconda/envs/neuroproof-test/lib/./libopencv_ml.2.4.dylib
    Reason: image not found
    Trace/BPT trap: 5
    So it seems to have trouble finding the correct libraries. However, this goes only for my local install on Mac, while not for Linux-install on our HPC cluster, which throws the same segfault.
  • I have checked out the full git repo, but haven’t got it to work yet… I’ll keep you posted on that.

Thanks,
Michiel

(neuroproof)ws133:envs michielk$ conda list

packages in environment at /Users/michielk/anaconda/envs/neuroproof:

boost 1.55.0 4 flyem
cloog 0.18.0 0 defaults
curl 7.43.0 1 defaults
fftw 3.3.4 1 flyem
freetype 2.5.2 2 http://repo.continuum.io/pkgs/free/osx-64/freetype-2.5.2-2.tar.bz2
gcc 4.8.2 5 defaults
gmp 5.1.2 6 defaults
hdf5 1.8.14 0 defaults
isl 0.12.2 1 defaults
jpeg 8d 1
jsoncpp 1.6.2 1 flyem
krb5 1.13.2 0 defaults
libdvid-cpp 0.1 np19py27_5 flyem
libgcc 4.8.4 1 flyem
libpng 1.6.17 0 http://repo.continuum.io/pkgs/free/osx-64/libpng-1.6.17-0.tar.bz2
libtiff 4.0.2 1
libxml2 2.9.2 0 http://repo.continuum.io/pkgs/free/osx-64/libxml2-2.9.2-0.tar.bz2
lz4 128 1 flyem
mpc 1.0.1 0 defaults
mpfr 3.1.2 0 defaults
neuroproof 1.1 py27_9 flyem
nose 1.3.7 py27_0 http://repo.continuum.io/pkgs/free/osx-64/nose-1.3.7-py27_0.tar.bz2
numpy 1.9.2 py27_0 http://repo.continuum.io/pkgs/free/osx-64/numpy-1.9.2-py27_0.tar.bz2
opencv 2.4.10.1 1 flyem
openssl 1.0.1k 1 http://repo.continuum.io/pkgs/free/osx-64/openssl-1.0.1k-1.tar.bz2
pip 7.1.0 py27_0 defaults
python 2.7.10 0 http://repo.continuum.io/pkgs/free/osx-64/python-2.7.10-0.tar.bz2
qt 4.8.6.99 1 flyem
readline 6.2 2
setuptools 18.0.1 py27_0 defaults
sqlite 3.8.4.1 1 http://repo.continuum.io/pkgs/free/osx-64/sqlite-3.8.4.1-1.tar.bz2
tk 8.5.18 0 http://repo.continuum.io/pkgs/free/osx-64/tk-8.5.18-0.tar.bz2
vigra 1.10 8_5dde887 flyem
vtk 5.10.1.99 with_pyqt_5 flyem
zlib 1.2.8 0 http://repo.continuum.io/pkgs/free/osx-64/zlib-1.2.8-0.tar.bz2

On 1 Mar 2016, at 14:47, paragt <notifications@github.commailto:notifications@github.com> wrote:

Hi Michiel,

Thanks for using Neuroproof. From a quick look, it looks like the learning function is trying to determine which features are not very informative using the feature value distribution. We are not using this trick and commented out the line invoking this function in the newer versions. Could you please check whether or not the follwoing line is commented/deleted in your BioPriors/StackLearnAlgs.cpp ?

// if (prune_feature) feature_mgr->find_useless_features(all_features);

Please delete any txt file that was created by this command as well.

Let me know if the error persists after removing this line. Note also that, if you are not using a mitochondria channel in your pixelwise prediction, the use_mito bool value should be false.

Thanks

--Toufiq


Reply to this email directly or view it on GitHubhttps://github.com//issues/4#issuecomment-190751021.

@paragt
Copy link
Contributor

paragt commented Mar 1, 2016

Hi Michiel,

Glad to help you in your project, thanks for the info you provided. I forgot about the new conda installation. I made some changes in the neuroproof after I received your email and got some problems in the make test, we are working to fix those now. Please bug me within a couple of days in case I dont get back to you by then.

Thanks

--Toufiq

@michielkleinnijenhuis
Copy link
Author

Okay, thanks. Your help is much appreciated.

Michiel

On 1 Mar 2016, at 17:42, paragt <notifications@github.commailto:notifications@github.com> wrote:

Hi Michiel,

Glad to help you in your project, thanks for the info you provided. I forgot about the new conda installation. I made some changes in the neuroproof after I received your email and got some problems in the make test, we are working to fix those now. Please bug me within a couple of days in case I dont get back to you by then.

Thanks

--Toufiq


Reply to this email directly or view it on GitHubhttps://github.com//issues/4#issuecomment-190827976.

@paragt
Copy link
Contributor

paragt commented Mar 1, 2016

Hi Michiel,

I think the problem is fixed now. The instruction you can follow to check out the git repo and build yourself with conda are as follows. Please let me know if you have problem with this.

Thanks

--Toufiq
PS: conda info --root is the folder where you save the environments within your conda folder.

# Set up a conda environment with all dependencies
conda create -n myenv -c flyem neuroproof
source activate myenv
PREFIX=$(conda info --root)/envs/myenv
export LD_LIBRARY_PATH=${PREFIX}/lib # Linux
export DYLD_FALLBACK_LIBRARY_PATH=${PREFIX}/lib # Mac

# Discard the downloaded binary; we'll build our own.
conda remove neuroproof

# Clone and build
git clone https://github.com/janelia-flyem/neuroproof
cd neuroproof
./configure-for-conda.sh ${PREFIX}
cd build
make -j4
make install
make test

(Edited to correct PREFIX as mentioned below.)

@michielkleinnijenhuis
Copy link
Author

Hi Toufiq,

Regarding the install:
All tests passed. However, to get the it going:

  1. Your line in the email below should probably read (this is also in the README)
    PREFIX=$(conda info --root)/envs/myenv
    instead of
    PREFIX=$(conda info --root)/myenv
  2. I had to use an adapted build.sh, as I’m on Xcode7 which does not include the requested MacOSX10.10.sdk (see output NP-bug_make_before-adapted-buildscript.output attached). May I suggest that you add the following two lines to your build.sh (as found in https://forums.developer.apple.com/thread/17334)? That worked for me.
    -DCMAKE_OSX_DEPLOYMENT_TARGET:STRING=""
    -DCMAKE_OSX_SYSROOT:STRING=/ \
  • Regarding the segfault:
    Unfortunately, it’s still there! On Mac, haven’t tested on the cluster yet.

best wishes,
Michiel

On 1 Mar 2016, at 18:38, paragt <notifications@github.commailto:notifications@github.com> wrote:

Hi Michiel,

I think the problem is fixed now. The instruction you can follow to check out the git repo and build yourself with conda are as follows. Please let me know if you have problem with this.

Thanks

--Toufiq
PS: conda info --root is the folder where you save the environments within your conda folder.

Set up a conda environment with all dependencies

conda create -n myenv -c flyem neuroproof
source activate myenv
PREFIX=$(conda info --root)/myenv
export LD_LIBRARY_PATH=${PREFIX}/lib # Linux
export DYLD_FALLBACK_LIBRARY_PATH=${PREFIX}/lib # Mac

Discard the downloaded binary; we'll build our own.

conda remove neuroproof

Clone and build

git clone https://github.com/janelia-flyem/neuroproof
cd neuroproof
./configure-for-conda.sh ${PREFIX}
cd build
make -j4
make install
make test


Reply to this email directly or view it on GitHubhttps://github.com//issues/4#issuecomment-190845809.

@thouis
Copy link

thouis commented Mar 16, 2016

We're seeing the same thing on Linux, at commit 23992fa.

@LeeKamentsky
Copy link

I think the problem occurs because a deleted edge is being revisited here:
https://github.com/janelia-flyem/NeuroProof/blob/master/src/Algorithms/FeatureJoinAlgs.h#L127

I patched the code with the following test at that line and it ran to completion past the segfault:

        if (((*iter)->get_node1() == node_remove) || 
           ((*iter)->get_node2() == node_remove))
           continue;

That should test for the edge that's been merged out and should avoid reinserting it into the queue and trying to use the deleted node_cache for node_remove.

@stephenplaza
Copy link
Contributor

Thanks for your contribution Lee! If you want, please create a pull request and assuming the integration tests pass I will accept it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants