Skip to content

Commit

Permalink
Merge pull request #141 from bcgsc/gp_target_integration
Browse files Browse the repository at this point in the history
Gp target integration
  • Loading branch information
emilyyzhangg authored Sep 27, 2024
2 parents 982ff47 + a1f9ad6 commit b1ecd39
Show file tree
Hide file tree
Showing 6 changed files with 65 additions and 52 deletions.
69 changes: 35 additions & 34 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ GoldRush iterates through the input long reads to produce a "golden path" of rea
2. **[GoldPolish](https://github.com/bcgsc/goldpolish)** (aka GoldRush-Edit): polishing the genome
3. **[Tigmint-long](https://github.com/bcgsc/tigmint)**: correcting the genome
4. **[GoldChain](https://github.com/bcgsc/ntlink)** (aka GoldRush-Link): scaffolding the genome

5. **[GoldPolish-Target](https://github.com/bcgsc/goldpolish)**: targeted polishing the genome


## Credits
Expand All @@ -37,51 +37,50 @@ goldrush run reads=reads G=gsize
Commands:
run run default GoldRush pipeline: GoldRush-Path + Polisher (GoldPolish by default) + Tigmint-long + ntLink (
default 5 rounds)
goldrush-path run GoldRush-Path
path-polish run GoldRush-Path, then GoldPolish
path-tigmint run GoldRush-Path, then GoldPolish, then Tigmint-long
path-tigmint-ntLink run GoldRush-Path, then GoldPolish, then Tigmint-long, then ntLink (default 5 rounds)
run run default GoldRush pipeline: GoldRush-Path + Polisher (GoldPolish by default) + Tigmint-long + ntLink (default 5 rounds) + GoldPolish-Target
goldrush-path run GoldRush-Path
path-polish run GoldRush-Path, then GoldPolish
path-tigmint run GoldRush-Path, then GoldPolish, then Tigmint-long
path-tigmint-ntLink run GoldRush-Path, then GoldPolish, then Tigmint-long, then ntLink (default 5 rounds)
path-tigmint-ntLink-target run GoldRush-Path, then GoldPolish, then Tigmint-long, then ntLink (default 5 rounds), then GoldPolish-Target
General options (required):
reads read name [reads]. File must have .fq or .fastq extension, but do not include the suffix in the supplied read name
G haploid genome size (bp) (e.g. '3e9' for human genome)
reads read name [reads]. File must have .fq or .fastq extension, but do not include the suffix in the supplied read name
G haploid genome size (bp) (e.g. '3e9' for human genome)
General options (optional):
t number of threads [48]
z minimum size of contig (bp) to scaffold [1000]
track_time If 1 then track the run time and memory usage, if 0 then don't [0]
t number of threads [48]
z minimum size of contig (bp) to scaffold [1000]
track_time If 1 then track the run time and memory usage, if 0 then don't [0]
GoldRush-Path options:
k base k value to generate hash [22]
w weight of spaced seed (number of 1's) [16]
tile tile size [1000]
b during insertion, number of consecutive tiles to be inserted with the same ID [10]
u minimum number of unassigned tiles for the read to be considered unassigned [5]
a maximum number of tiles that can be assigned, minimum number of overlapping tiles kept after trimming [1]
o occupancy of the miBF [0.1]
x threshold for number of hits in miBF for a given frame to be considered assigned [10]
h number of seed patterns to use [3]
m minimum read length [20000]
M maximum number of silver paths to generate [5]
r ratio of full genome in golden path [0.9]
P minimum average phred score for each read [15]
d remove reads with greater or equal than d difference between average phred quality of first half and second half of the read [5]
p prefix to use for the output paths [goldrush_asm]
k base k value to generate hash [22]
w weight of spaced seed (number of 1's) [16]
tile tile size [1000]
b during insertion, number of consecutive tiles to be inserted with the same ID [10]
u minimum number of unassigned tiles for the read to be considered unassigned [5]
a maximum number of tiles that can be assigned, minimum number of overlapping tiles kept after trimming [1]
o occupancy of the miBF [0.1]
x threshold for number of hits in miBF for a given frame to be considered assigned [10]
h number of seed patterns to use [3]
m minimum read length [20000]
M maximum number of silver paths to generate [5]
r ratio of full genome in golden path [0.9]
P minimum average phred score for each read [15]
d remove reads with greater or equal than d difference between average phred quality of first half and second half of the read [5]
p prefix to use for the output paths [goldrush_asm]
Tigmint-long options:
span min number of spanning molecules [2]
dist maximum distance between alignments to be considered the same molecule [500]
span min number of spanning molecules [2]
dist maximum distance between alignments to be considered the same molecule [500]
ntLink options:
k_ntLink k-mer size for minimizers [40]
w_ntLink window size for minimizers [250]
rounds number of rounds of ntLink [5]
k_ntLink k-mer size for minimizers [40]
w_ntLink window size for minimizers [250]
rounds number of rounds of ntLink [5]
GoldPolish options:
polisher_mapper Whether to use ntlink or minimap2 for mappings [minimap2]
shared_mem Shared memory path where polishing occurs [/dev/shm]
shared_mem Shared memory path where polishing occurs [/dev/shm]
Notes:
- GoldRush-Path generates silver paths before generating the golden path
Expand Down Expand Up @@ -143,6 +142,8 @@ GoldRush has been tested on *Linux* operating systems (centOS7, ubuntu-20.04)
* [Tigmint 1.2.6+](https://github.com/bcgsc/tigmint)
* [ntLink 1.3.3+](https://github.com/bcgsc/ntlink)
* [minimap2](https://github.com/lh3/minimap2)
* [snakemake](https://github.com/snakemake/snakemake)
* [intervaltree](https://github.com/chaimleib/intervaltree)

## Installation
### Installing using conda:
Expand Down
6 changes: 3 additions & 3 deletions azure-pipelines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
- script: |
source activate goldrush_CI
conda install --yes -c conda-forge mamba python=3.10
mamba install --yes -c conda-forge -c bioconda compilers meson gperftools sdsl-lite boost-cpp sparsehash btllib libdivsufsort minimap2 tigmint ntlink miller
mamba install --yes -c conda-forge -c bioconda compilers meson gperftools sdsl-lite boost-cpp sparsehash btllib libdivsufsort minimap2 tigmint ntlink miller snakemake intervaltree
displayName: Install dependencies
- script: |
source activate goldrush_CI
Expand Down Expand Up @@ -47,8 +47,8 @@ jobs:
displayName: Create Anaconda environment
- script: |
source activate goldrush_CI
conda install --yes -c conda-forge mamba python=3.10
mamba install --yes -c conda-forge -c bioconda compilers meson gperftools sdsl-lite boost-cpp sparsehash btllib libdivsufsort minimap2 tigmint ntlink miller
conda install --yes -c conda-forge mamba=1.5.10 python=3.10
mamba install --yes -c conda-forge -c bioconda compilers meson gperftools sdsl-lite boost-cpp sparsehash btllib libdivsufsort minimap2 tigmint ntlink miller snakemake intervaltree
displayName: Install dependencies
- script: |
source activate goldrush_CI
Expand Down
38 changes: 25 additions & 13 deletions bin/goldrush
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,12 @@ cut=250
k_ntLink=40
w_ntLink=250
rounds=5
soft_mask=True

# Default GoldPolish-Target parameters
target_flank_length=64
target_k_ntlink=88
target_w_ntlink=1000

# Development mode - retains intermediate files. Specify dev=True to enable.
dev=False
Expand Down Expand Up @@ -139,12 +145,13 @@ help:
@echo ""
@echo " Commands:"
@echo ""
@echo " run run default GoldRush pipeline: GoldRush-Path + Polisher (GoldPolish by default) + Tigmint-long + ntLink (default 5 rounds)"
@echo " run run default GoldRush pipeline: GoldRush-Path + Polisher (GoldPolish by default) + Tigmint-long + ntLink (default 5 rounds) + GoldPolish-Target"
@echo ""
@echo " goldrush-path run GoldRush-Path"
@echo " path-polish run GoldRush-Path, then $(polisher_logs)"
@echo " path-tigmint run GoldRush-Path, then $(polisher_logs), then Tigmint-long"
@echo " path-tigmint-ntLink run GoldRush-Path, then $(polisher_logs), then Tigmint-long, then ntLink (default 5 rounds)"
@echo " goldrush-path run GoldRush-Path"
@echo " path-polish run GoldRush-Path, then $(polisher_logs)"
@echo " path-tigmint run GoldRush-Path, then $(polisher_logs), then Tigmint-long"
@echo " path-tigmint-ntLin run GoldRush-Path, then $(polisher_logs), then Tigmint-long, then ntLink (default 5 rounds)"
@echo " path-tigmint-ntLink-target run GoldRush-Path, then $(polisher_logs), then Tigmint-long, then ntLink (default 5 rounds), then GoldPolish-Target"
@echo ""
@echo " General options (required):"
@echo " reads read name [reads]. File must have .fq or .fastq extension, but do not include the suffix in the supplied read name"
Expand Down Expand Up @@ -182,7 +189,6 @@ help:
@echo " rounds number of rounds of ntLink [$(rounds)]"
@echo ""
@echo " GoldPolish options:"
@echo " polisher_mapper Whether to use ntlink or minimap2 for mappings [$(polisher_mapper)]"
@echo " shared_mem Shared memory path where polishing occurs [/dev/shm] "
@echo ""
@echo "Notes:"
Expand All @@ -205,14 +211,15 @@ run:
ln -sf $(prefix)/$(p2).$(polished_infix).fa
ln -sf $(prefix)/$(p2).$(polished_infix).span$(span).dist$(dist).tigmint.fa
ln -sf $(prefix)/$(p2).$(polished_infix).span$(span).dist$(dist).tigmint.fa.k$(k_ntLink).w$(w_ntLink).ntLink-$(rounds)rounds.fa
ln -sf $(prefix)/$(p2).$(polished_infix).span$(span).dist$(dist).tigmint.fa.k$(k_ntLink).w$(w_ntLink).ntLink-$(rounds)rounds.polished.fa
echo "You can find intermediate files and the outputs for each GoldRush stage within the $(prefix) subdirectory."
echo "A soft link to your final assembly is available at: $(p2).$(polished_infix).span$(span).dist$(dist).tigmint.fa.k$(k_ntLink).w$(w_ntLink).ntLink-$(rounds)rounds.fa"
echo "A soft link to your final assembly is available at: $(p2).$(polished_infix).span$(span).dist$(dist).tigmint.fa.k$(k_ntLink).w$(w_ntLink).ntLink-$(rounds)rounds.polished.fa"

run-in-dir: path-tigmint-ntLink check-G check-reads clean
run-in-dir: path-tigmint-ntLink-target check-G check-reads clean
path-polish: $(polisher) check-G check-reads clean
path-tigmint: tigmint check-G check-reads clean

path-tigmint-ntLink: ntLink_all_rounds ntLink_softlink clean
path-tigmint-ntLink-target: goldpolish_target clean

check-G:
ifndef G
Expand All @@ -225,7 +232,6 @@ ifeq ($(long_reads),)
$(error $(ERROR_MESSAGE))
endif


# Run GoldRush-Path
goldrush-path: $(p2).fa check-G check-reads clean

Expand Down Expand Up @@ -282,13 +288,19 @@ $(p2).$(polished_infix).span$(span).dist$(dist).tigmint.fa: $(p2).$(polished_inf
ntLink_all_rounds: $(p2).$(polished_infix).span$(span).dist$(dist).tigmint.fa.k$(k_ntLink).w$(w_ntLink).z$z.ntLink.gap_fill.$(rounds)rounds.fa check-G check-reads

%.fa.k$(k_ntLink).w$(w_ntLink).z$z.ntLink.gap_fill.$(rounds)rounds.fa: %.fa $(long_reads)
$(time) ntLink_rounds run_rounds_gaps target=$< t=$t k=$(k_ntLink) w=$(w_ntLink) z=$z rounds=$(rounds) reads=$(long_reads)
$(time) ntLink_rounds run_rounds_gaps target=$< t=$t k=$(k_ntLink) w=$(w_ntLink) z=$z soft_mask=$(soft_mask) rounds=$(rounds) reads=$(long_reads)
ifneq ($(dev), True)
ntLink_rounds clean target=$< t=$t k=$(k_ntLink) w=$(w_ntLink) z=$z rounds=$(rounds) reads=$(long_reads)
ntLink_rounds clean target=$< t=$t k=$(k_ntLink) w=$(w_ntLink) z=$z soft_mask=$(soft_mask) rounds=$(rounds) reads=$(long_reads)
endif

ntLink_softlink: $(p2).$(polished_infix).span$(span).dist$(dist).tigmint.fa.k$(k_ntLink).w$(w_ntLink).ntLink-$(rounds)rounds.fa check-G check-reads

%.k$(k_ntLink).w$(w_ntLink).ntLink-$(rounds)rounds.fa: %.k$(k_ntLink).w$(w_ntLink).z$z.ntLink.gap_fill.$(rounds)rounds.fa
ln -sf $(lastword $^) $@
echo "Done GoldRush-Path + $(polisher_logs) + Tigmint-long + $(rounds) ntLink rounds! Your final assembly can be found in: $@"
echo "Done GoldRush-Path + $(polisher_logs) + Tigmint-long + $(rounds) ntLink rounds! Your post-ntLink assembly can be found in: $@"

# Run GoldPolish-Target after ntLink rounds
goldpolish_target: $(p2).$(polished_infix).span$(span).dist$(dist).tigmint.fa.k$(k_ntLink).w$(w_ntLink).ntLink-$(rounds)rounds.polished.fa check-G check-reads
%.k$(k_ntLink).w$(w_ntLink).ntLink-$(rounds)rounds.polished.fa: %.k$(k_ntLink).w$(w_ntLink).ntLink-$(rounds)rounds.fa
$(time) goldpolish --target --k-ntlink $(target_k_ntlink) --w-ntlink $(target_w_ntlink) -l $(target_flank_length) $< $(long_reads) $@
echo "Done GoldRush-Path + $(polisher_logs) + Tigmint-long + $(rounds) ntLink rounds + GoldPolish-Target! Your final assembly can be found in: $@"
Binary file added subprojects/.DS_Store
Binary file not shown.
2 changes: 1 addition & 1 deletion tests/goldrush_test_demo.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ goldrush run reads=test_reads G=1e6 t=4 p=goldrush_test -B

l50=$(abyss-fac goldrush_test_golden_path.goldpolish-polished.span2.dist500.tigmint.fa.k40.w250.ntLink-5rounds.fa |awk '{print $3}' |tail -n1)

if [ -e goldrush_test_golden_path.goldpolish-polished.span2.dist500.tigmint.fa.k40.w250.ntLink-5rounds.fa ] && [ ${l50} -eq 1 ]; then
if [ -e goldrush_test_golden_path.goldpolish-polished.span2.dist500.tigmint.fa.k40.w250.ntLink-5rounds.polished.fa ] && [ ${l50} -eq 1 ]; then
echo -e "\nTest successful!"
else
echo -e "\nTest failed - please check your installation"
Expand Down

0 comments on commit b1ecd39

Please sign in to comment.