Skip to content

cBioLab/hash_cdbg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hash_cdbg

A C++ library for indexing genome sequencing datasets by using colored de Bruijn Graph, hash functions and Bloom Filter. The implementation is based on this library by Diego Diaz Dominguez et al.

Requirements

This tool requires:

Support

  • Ubuntu 18.04

Installation

First, download the library and move to library's root directory.

git clone git@github.com:cBioLab/hash_cdbg.git
cd hash_cdbg

Then, prepare for compilation.

mkdir build && cd build
cmake ..

If you want to specify the directory in which to install this library, you can use:

cmake .. -DCMAKE_INSTALL_PREFIX={your_install_path}/hash_cdbg

Finally, compile and install the library.

make & make install

Getting Started

To use this library quickly, look in the util directory. build_cdbg.cpp is a code that builds an index, the detail of which is as follow:

#include <iostream>
#include <hash_cdbg/boss.hpp>

int main(int argc, char* argv[]) {
  std::string input_file = "data/example.fastq";
  size_t kmer_size = 30;
  size_t n_threads = 1;

  dbg_boss dbg_index(input_file, kmer_size, n_threads);
  store_to_file(dbg_index, "example.cdbg");

  return 0;
}

To compile and execute this code, do the following:

cd hash_cdbg
g++ -o build_cdbg.out ./util/build_cdbg.cpp -I {your_install_path}/include -L {your_install_path}/lib -lhash_cdbg -lsdsl -ldivsufsort -ldivsufsort64 -lpthread -lz -std=c++17 -O3
./build_cdbg.out

The resulting example.cdbg is the index file. To rebuild the original sequences from this index, do the following using build_fm_index.cpp and rebuild_seqs.cpp:

g++ -o build_fm_index.out ./util/build_fm_index.cpp -I {your_install_path}/include -L {your_install_path}/lib -lhash_cdbg -lsdsl -ldivsufsort -ldivsufsort64 -lpthread -lz
./build_fm_index.out data/example.fastq example
g++ -o rebuild_seqs.out ./util/rebuild_seqs.cpp -I {your_install_path}/include -L {your_install_path}/lib -lhash_cdbg -lsdsl -ldivsufsort -ldivsufsort64 -lpthread -lz -std=c++17 -O3
./rebuild_seqs.out example.cdbg example.fm_index 1 example.re

The resulting example.re.fasta is a fasta file that contains the example.fastq sequences and it's reverse complements rebuilt.

Reproduction of Our Experiments

If you want to reproduce our experiments, see experiments README.

Try with Your Data

This tool does not support reads containing N bases. Run remove_n_read.cpp to remove reads containing N bases as a preprocessing step.

g++ -o remove_n_read.out ./util/remove_n_read.cpp -lpthread -std=c++17 -O3
./remove_n_read.out {your_fastq_file} {output_fastq_file}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published