Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build Error with HAL Using HDF5 #309

Open
liangminliu opened this issue Oct 30, 2024 · 5 comments
Open

Build Error with HAL Using HDF5 #309

liangminliu opened this issue Oct 30, 2024 · 5 comments

Comments

@liangminliu
Copy link

Hi,

I am experiencing an issue while building HAL according to the instructions provided in the README. Initially, I used HDF5 version 1.14.5, but I encountered errors. Therefore, I switched to HDF5 version 1.10.1, as specified in the README. I have also successfully installed and configured the necessary dependencies, including SonLib, CLAPACK, and PhyloP. However, I am encountering an error during the make step after setting export ENABLE_PHYLOP=1.

Error Log:

h5c++ -prefix=~/biosoft/hdf5-hdf5_1.10.1 -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -O3 -g -Wall -funroll-loops -DNDEBUG -I/ds3200_1/users_root/liuliangmin/biosoft/sonLib/lib -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++11 -Wno-sign-compare -I../api/inc -Iimpl -Iinc -I../liftover/inc -Ihdf5_impl -Immap_impl -c hdf5_impl/hdf5Genome.cpp -o ../objs/api/hdf5_impl/hdf5Genome.o
hdf5_impl/hdf5Genome.cpp: In constructor ‘hal::Hdf5Genome::Hdf5Genome(const string&, hal::Hdf5Alignment*, H5::PortableH5Location*, const H5::DSetCreatPropList&, bool)’:
hdf5_impl/hdf5Genome.cpp:49:28: error: ‘H5::PortableH5Location’ {aka ‘class H5::H5Location’} has no member named ‘openGroup’
         _group = h5Parent->openGroup(name);
                            ^~~~~~~~~
hdf5_impl/hdf5Genome.cpp:51:28: error: ‘H5::PortableH5Location’ {aka ‘class H5::H5Location’} has no member named ‘createGroup’; did you mean ‘createAttribute’?
         _group = h5Parent->createGroup(name);
                            ^~~~~~~~~~~
                            createAttribute
make[1]: *** [../rules.mk:19: ../objs/api/hdf5_impl/hdf5Genome.o] Error 1
make[1]: Leaving directory '/ds3200_1/users_root/liuliangmin/biosoft/hal/api'
make: *** [Makefile:13: api.libs] Error 2

Environment:

  • HDF5 version: 1.10.1 (switched from 1.14.5 due to initial errors)
  • Dependencies installed: SonLib, CLAPACK, PhyloP
  • Linux (LSF)

Steps to Reproduce:

  1. Install the required dependencies (HDF5 1.10.1, SonLib, CLAPACK, PhyloP).
  2. Set up the environment according to the README.
  3. Set export ENABLE_PHYLOP=1.
  4. Run make.

But the build fails with the errors related to missing methods (openGroup and createGroup) in H5::PortableH5Location.

It appears that the methods openGroup and createGroup are not recognized as members of H5::H5Location. As specified in the environment setup, I am using HDF5 version 1.10.1.

Question:

  • Should I be using a different version of HDF5? But I have tried the latest version of HDF5.
  • Are there any patches or modifications required in the HAL code to address this problem?

Any help would be greatly appreciated.

@glennhickey
Copy link
Collaborator

You can use cactus, which includes hdf5, as a model for making hdf5.

See the dockerfile for an example installing hdf5 from apt on ubuntu:

https://github.com/ComparativeGenomicsToolkit/cactus/blob/master/Dockerfile#L3-L4

See this script for installing everything from source :

https://github.com/ComparativeGenomicsToolkit/cactus/blob/master/build-tools/makeBinRelease#L59-L66

@liangminliu
Copy link
Author

Thank you for your suggestions and support. I encountered errors while using Cactus. Initially, I installed Cactus and successfully ran the tests to verify its functionality. Here is the test command I used:

cactus ./js ./examples/evolverMammals.txt ./evolverMammals.hal

However, when I attempted to run my own dataset on the LSF system with the following command:

bsub -n 40 \
     -R "span[hosts=1]" \
     -M 700G \
     -o cactus_%J_output.log \
     -e cactus_%J_error.log \
     cactus ${JOB_STORE} ${INPUT_FILE} ${OUTPUT_FILE} \
     --root "Spic" \
     --logLevel INFO \
     --workDir ${TEMP_DIR} \
     --batchSystem lsf \
     --maxCores 40 \
     --defaultMemory 500G \
     --defaultDisk 300G \
     --retryCount 5 \
     --statePollingWait 60 \
     --statePollingTimeout 300 \
     --clean onSuccess

I encountered an error related to the Toil batch system. The error log showed the following traceback:

Traceback (most recent call last):
  ...
[2024-10-23T17:54:35+0800] [MainThread] [D] [toil.deferred] Removing own state file /tmp/toilwf-97d11ed5481256a9bb166bf6ac5c8656/deferred/funckd2hdn12
[2024-10-23T17:54:35+0800] [MainThread] [D] [toil.batchSystems.abstractBatchSystem] Deleting workflow directory /tmp/toilwf-97d11ed5481256a9bb166bf6ac5c8656
[2024-10-23T17:54:35+0800] [MainThread] [D] [toil.common] ... finished shutting down the batch system in 0.2768988609313965 seconds.
Traceback (most recent call last):
  File "/ds3200_1/users_root/liuliangmin/biosoft/cactus/venv-cactus-v2.9.2/lib/python3.11/site-packages/toil/batchSystems/abstractGridEngineBatchSystem.py", line 279, in run
    while self._runStep():
          ^^^^^^^^^^^^^^^
  ...
toil.batchSystems.abstractGridEngineBatchSystem.AbstractGridEngineThreadException: Unexpected GridEngineThread failure
  ...

This error appears to be related to Toil. Given that I am using the LSF system, I ultimately decided to split my data into three groups and use LAST to align them into roast.maf files due to the large number of genomic sequences. My plan is to convert these roast.maf files to HAL format using maf2hal, and then merge them with halAppendSubtree.

If you have any alternative methods or suggestions for resolving the Cactus runtime issues, or perhaps a more efficient way to merge roast.maf files, I would greatly appreciate your insights.

Since I already have Cactus installed in the venv-cactus-v2.9.2 environment, should I proceed to install HAL directly in this environment as well?

I appreciate your assistance in resolving this matter!

@diekhans
Copy link
Collaborator

diekhans commented Oct 31, 2024 via email

@glennhickey
Copy link
Collaborator

This issue seems to be going in all sorts of directions. To resume

  • If you want to build locally, you can use the Cactus scripts above as a guide
  • If you just want to use hal, then its binaries are included in the Cactus releases
  • If you are having problems with LSF, you can try following up on the Toil page like Mark says. But we only officially support SLURM in Cactus
  • I don't think last->maf->maf2hal->halAppendSubtree is a viable pipeline and advise against spending time trying to get it to work.

@liangminliu
Copy link
Author

Thank you all for your suggestions and support!

  1. HAL Installation: I faced an error with hadf5 during the make install process (https://github.com/ComparativeGenomicsToolkit/cactus/blob/master/build-tools/makeBinRelease#L59-L66), but I managed to resolve it using singularity exec cactus_v2.9.2.sif maf2hal, which worked well.

  2. Toil Issues with LSF: After some research, I found the LSF issues quite complex (https://github.com/DataBiosphere/toil/issues). So I switched to using the SLURM system, which has allowed me to run Cactus successfully. However, I'm looking for ways to improve the speed of my runs.

  3. Merging Roast.MAF Files: I understand there may be challenges regarding the LAST to MAF to HAL pipeline. Do you have alternative methods or suggestions for efficiently merging roast.maf files? I would greatly appreciate your insights.

Thank you again for your assistance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants