segfault on agglomeration classifier training with multiple iterations #4

michielkleinnijenhuis · 2016-03-01T14:08:24Z

Hi there,

I'm confronted with a segfault on training the agglomeration classifier with multiple iterations. See output below, It occurs with the example dataset as well as my own data. When using <--strategy-type 1> or <--strategy-type 2 --num-iterations 1> both run fine. I'm on OSX 10.10.5, but the behaviour on Linux 2.6.32-279.5.1.el6.x86_64 is identical. Could you please have a look at why this occurs?

Thanks,
Michiel

neuroproof_graph_learn \
training_sample2/oversegmented_stack_labels.h5 \
training_sample2/boundary_prediction.h5 \
training_sample2/groundtruth.h5 \
--classifier-name training_sample2/classifier_str2.xml \
--strategy-type 2 --num-iterations 2

ignore features: 

 ** Learning iteration 1  **

Learn edge classifier ...
Building RAG ...done with 3051 nodes
Inclusion removal ...done with 3051 nodes
gt label counting
computed contingency table
gt label determined for 3051 nodes
ignore features:0 39 40 49 55 95 110 140 141 149 150 158 159 165 185 190 
Features generated
Number of samples and dimensions: 13643, 175
Number of merge: 4840
Time required to learn RF: 53.00 sec
with training set accuracy :99.054
Classifier learned
accuracy = 99.0545
done with 3051 nodes

 ** Learning iteration 2  **

Learn edge classifier ...
cumulative learning, all

Building RAG ...done with 3051 nodes
Inclusion removal ...done with 3051 nodes
gt label counting
computed contingency table
gt label determined for 3051 nodes
Segmentation fault: 11

The text was updated successfully, but these errors were encountered:

paragt · 2016-03-01T14:47:25Z

Hi Michiel,

Thanks for using Neuroproof. From a quick look, it looks like the learning function is trying to determine which features are not very informative using the feature value distribution. We are not using this trick and commented out the line invoking this function in the newer versions. Could you please check whether or not the follwoing line is commented/deleted in your BioPriors/StackLearnAlgs.cpp ?

// if (prune_feature) feature_mgr->find_useless_features(all_features);

Please delete any txt file that was created by this command as well.

Let me know if the error persists after removing this line. Note also that, if you are not using a mitochondria channel in your pixelwise prediction, the use_mito bool value should be false.

Thanks

--Toufiq

michielkleinnijenhuis · 2016-03-01T17:15:29Z

Hi Toufiq,

Thanks for you fast response. I hope the following gives you a bit more info…

I’m running neuroproof through the conda recipe (‘conda list’ output below), so I’m not immediately able to to simply comment out the line and check. However ...
I’ve created a new anaconda env with the latest neuroproof version (1.2.1), but I get the the same segfault.
I indeed made the mistake of omitting the --use_mito flag, but adding it still results in the segfault.
I guess it is also relevant to mention that for the conda install, I have to cd to conda info --root/envs/neuroproof to get it to work in the first place. From any other location:
(neuroproof-test)ws133:envs michielk$ neuroproof_graph_learn $datadir/training_sample2/oversegmented_stack_labels.h5 $datadir/training_sample2/boundary_prediction.h5 $datadir/training_sample2/groundtruth.h5 --classifier-name $datadir/training_sample2/classifier_str2.xml --strategy-type 2 --num-iterations 5 --use_mito 0
dyld: Library not loaded: lib/libopencv_core.2.4.dylib
Referenced from: /Users/michielk/anaconda/envs/neuroproof-test/lib/./libopencv_ml.2.4.dylib
Reason: image not found
Trace/BPT trap: 5
So it seems to have trouble finding the correct libraries. However, this goes only for my local install on Mac, while not for Linux-install on our HPC cluster, which throws the same segfault.
I have checked out the full git repo, but haven’t got it to work yet… I’ll keep you posted on that.

Thanks,
Michiel

(neuroproof)ws133:envs michielk$ conda list

packages in environment at /Users/michielk/anaconda/envs/neuroproof:

boost 1.55.0 4 flyem
cloog 0.18.0 0 defaults
curl 7.43.0 1 defaults
fftw 3.3.4 1 flyem
freetype 2.5.2 2 http://repo.continuum.io/pkgs/free/osx-64/freetype-2.5.2-2.tar.bz2
gcc 4.8.2 5 defaults
gmp 5.1.2 6 defaults
hdf5 1.8.14 0 defaults
isl 0.12.2 1 defaults
jpeg 8d 1
jsoncpp 1.6.2 1 flyem
krb5 1.13.2 0 defaults
libdvid-cpp 0.1 np19py27_5 flyem
libgcc 4.8.4 1 flyem
libpng 1.6.17 0 http://repo.continuum.io/pkgs/free/osx-64/libpng-1.6.17-0.tar.bz2
libtiff 4.0.2 1
libxml2 2.9.2 0 http://repo.continuum.io/pkgs/free/osx-64/libxml2-2.9.2-0.tar.bz2
lz4 128 1 flyem
mpc 1.0.1 0 defaults
mpfr 3.1.2 0 defaults
neuroproof 1.1 py27_9 flyem
nose 1.3.7 py27_0 http://repo.continuum.io/pkgs/free/osx-64/nose-1.3.7-py27_0.tar.bz2
numpy 1.9.2 py27_0 http://repo.continuum.io/pkgs/free/osx-64/numpy-1.9.2-py27_0.tar.bz2
opencv 2.4.10.1 1 flyem
openssl 1.0.1k 1 http://repo.continuum.io/pkgs/free/osx-64/openssl-1.0.1k-1.tar.bz2
pip 7.1.0 py27_0 defaults
python 2.7.10 0 http://repo.continuum.io/pkgs/free/osx-64/python-2.7.10-0.tar.bz2
qt 4.8.6.99 1 flyem
readline 6.2 2
setuptools 18.0.1 py27_0 defaults
sqlite 3.8.4.1 1 http://repo.continuum.io/pkgs/free/osx-64/sqlite-3.8.4.1-1.tar.bz2
tk 8.5.18 0 http://repo.continuum.io/pkgs/free/osx-64/tk-8.5.18-0.tar.bz2
vigra 1.10 8_5dde887 flyem
vtk 5.10.1.99 with_pyqt_5 flyem
zlib 1.2.8 0 http://repo.continuum.io/pkgs/free/osx-64/zlib-1.2.8-0.tar.bz2

On 1 Mar 2016, at 14:47, paragt <notifications@github.com mailto:notifications@github.com> wrote:

Hi Michiel,

Thanks for using Neuroproof. From a quick look, it looks like the learning function is trying to determine which features are not very informative using the feature value distribution. We are not using this trick and commented out the line invoking this function in the newer versions. Could you please check whether or not the follwoing line is commented/deleted in your BioPriors/StackLearnAlgs.cpp ?

// if (prune_feature) feature_mgr->find_useless_features(all_features);

Please delete any txt file that was created by this command as well.

Let me know if the error persists after removing this line. Note also that, if you are not using a mitochondria channel in your pixelwise prediction, the use_mito bool value should be false.

Thanks

--Toufiq

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/4#issuecomment-190751021.

paragt · 2016-03-01T17:42:13Z

Hi Michiel,

Glad to help you in your project, thanks for the info you provided. I forgot about the new conda installation. I made some changes in the neuroproof after I received your email and got some problems in the make test, we are working to fix those now. Please bug me within a couple of days in case I dont get back to you by then.

Thanks

--Toufiq

michielkleinnijenhuis · 2016-03-01T17:49:51Z

Okay, thanks. Your help is much appreciated.

Michiel

On 1 Mar 2016, at 17:42, paragt <notifications@github.com mailto:notifications@github.com> wrote:

Hi Michiel,

Glad to help you in your project, thanks for the info you provided. I forgot about the new conda installation. I made some changes in the neuroproof after I received your email and got some problems in the make test, we are working to fix those now. Please bug me within a couple of days in case I dont get back to you by then.

Thanks

--Toufiq

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/4#issuecomment-190827976.

paragt · 2016-03-01T18:38:06Z

Hi Michiel,

I think the problem is fixed now. The instruction you can follow to check out the git repo and build yourself with conda are as follows. Please let me know if you have problem with this.

Thanks

--Toufiq
PS: conda info --root is the folder where you save the environments within your conda folder.

# Set up a conda environment with all dependencies
conda create -n myenv -c flyem neuroproof
source activate myenv
PREFIX=$(conda info --root)/envs/myenv
export LD_LIBRARY_PATH=${PREFIX}/lib # Linux
export DYLD_FALLBACK_LIBRARY_PATH=${PREFIX}/lib # Mac

# Discard the downloaded binary; we'll build our own.
conda remove neuroproof

# Clone and build
git clone https://github.com/janelia-flyem/neuroproof
cd neuroproof
./configure-for-conda.sh ${PREFIX}
cd build
make -j4
make install
make test

(Edited to correct PREFIX as mentioned below.)

michielkleinnijenhuis · 2016-03-02T10:36:40Z

Hi Toufiq,

Regarding the install:
All tests passed. However, to get the it going:

Your line in the email below should probably read (this is also in the README)
PREFIX=$(conda info --root)/envs/myenv
instead of
PREFIX=$(conda info --root)/myenv
I had to use an adapted build.sh, as I’m on Xcode7 which does not include the requested MacOSX10.10.sdk (see output NP-bug_make_before-adapted-buildscript.output attached). May I suggest that you add the following two lines to your build.sh (as found in https://forums.developer.apple.com/thread/17334)? That worked for me.
-DCMAKE_OSX_DEPLOYMENT_TARGET:STRING=""
-DCMAKE_OSX_SYSROOT:STRING=/ \

Regarding the segfault:
Unfortunately, it’s still there! On Mac, haven’t tested on the cluster yet.

best wishes,
Michiel

On 1 Mar 2016, at 18:38, paragt <notifications@github.com mailto:notifications@github.com> wrote:

Hi Michiel,

I think the problem is fixed now. The instruction you can follow to check out the git repo and build yourself with conda are as follows. Please let me know if you have problem with this.

Thanks

--Toufiq
PS: conda info --root is the folder where you save the environments within your conda folder.

Set up a conda environment with all dependencies

conda create -n myenv -c flyem neuroproof
source activate myenv
PREFIX=$(conda info --root)/myenv
export LD_LIBRARY_PATH=${PREFIX}/lib # Linux
export DYLD_FALLBACK_LIBRARY_PATH=${PREFIX}/lib # Mac

Discard the downloaded binary; we'll build our own.

conda remove neuroproof

Clone and build

git clone https://github.com/janelia-flyem/neuroproof
cd neuroproof
./configure-for-conda.sh ${PREFIX}
cd build
make -j4
make install
make test

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/4#issuecomment-190845809.

thouis · 2016-03-16T16:48:12Z

We're seeing the same thing on Linux, at commit 23992fa.

LeeKamentsky · 2016-12-06T14:13:51Z

I think the problem occurs because a deleted edge is being revisited here:
https://github.com/janelia-flyem/NeuroProof/blob/master/src/Algorithms/FeatureJoinAlgs.h#L127

I patched the code with the following test at that line and it ran to completion past the segfault:

        if (((*iter)->get_node1() == node_remove) || 
           ((*iter)->get_node2() == node_remove))
           continue;

That should test for the edge that's been merged out and should avoid reinserting it into the queue and trying to use the deleted node_cache for node_remove.

stephenplaza · 2016-12-07T15:30:13Z

Thanks for your contribution Lee! If you want, please create a pull request and assuming the integration tests pass I will accept it.

LeeKamentsky pushed a commit to LeeKamentsky/NeuroProof that referenced this issue Dec 7, 2016

Fixes janelia-flyem#4 - make sure not to revisit the deleted edge

53a5055

weihuang527 mentioned this issue Mar 5, 2019

60% tests passed, 4 tests failed out of 10 #11

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

segfault on agglomeration classifier training with multiple iterations #4

segfault on agglomeration classifier training with multiple iterations #4

michielkleinnijenhuis commented Mar 1, 2016

paragt commented Mar 1, 2016

michielkleinnijenhuis commented Mar 1, 2016 •

edited by stuarteberg

Loading

paragt commented Mar 1, 2016

michielkleinnijenhuis commented Mar 1, 2016

paragt commented Mar 1, 2016

michielkleinnijenhuis commented Mar 2, 2016

thouis commented Mar 16, 2016

LeeKamentsky commented Dec 6, 2016

stephenplaza commented Dec 7, 2016

segfault on agglomeration classifier training with multiple iterations #4

segfault on agglomeration classifier training with multiple iterations #4

Comments

michielkleinnijenhuis commented Mar 1, 2016

paragt commented Mar 1, 2016

michielkleinnijenhuis commented Mar 1, 2016 • edited by stuarteberg Loading

packages in environment at /Users/michielk/anaconda/envs/neuroproof:

paragt commented Mar 1, 2016

michielkleinnijenhuis commented Mar 1, 2016

paragt commented Mar 1, 2016

michielkleinnijenhuis commented Mar 2, 2016

thouis commented Mar 16, 2016

LeeKamentsky commented Dec 6, 2016

stephenplaza commented Dec 7, 2016

michielkleinnijenhuis commented Mar 1, 2016 •

edited by stuarteberg

Loading