diff --git a/CHANGELOG.md b/CHANGELOG.md index 723b276..105281d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -20,6 +20,9 @@ Contact: campanam@si.edu [Deprecated](#deprecated) ## aln2baits +### Version 1.8.0 +Added --maxvars option to make variant haplotype definition more efficient/function properly + ### Version 1.7.8 Fixed bug in --shuffle option that caused infinite loop Fixed hanging bug when running variant options @@ -142,6 +145,9 @@ Version constant added to header Preliminary script to generate baits from an annotation file and a reference sequence ## baitslib +### Version 1.8.0 +Handling for --maxvars option + ### Version 1.7.4 mean method uses .sum rather than .reduce(:+) @@ -295,6 +301,9 @@ New method write_probes handles basic output filter_probes definition removed into separate script for access by other scripts ## baitstools +### Version 1.8.0 +Handling for --maxvars option + ### Version 1.7.4 Conversion to a RubyGem @@ -459,6 +468,9 @@ The word 'probe' changed to 'baits' in all instances for clarity Set default for tiling offset as 20 bp (from 60 for select_snps and 25 for tile_probes) ## baitstoolsgui +### Version 1.8.0 +Handling for --maxvars option + ### Version 1.7.5 Fixed bug calling baitstools.rb rather than updated baitstools executable @@ -655,6 +667,9 @@ Version constant added to header Preliminary script to filter predefined baits through quality filters ## osx_install +### Version 1.8.0 +Installs latest baitstools gem (1.8.0) + ### Version 1.7.8 Installs latest baitstools gem (1.7.8) diff --git a/README.md b/README.md index 0100eb7..4269207 100644 --- a/README.md +++ b/README.md @@ -43,14 +43,18 @@ The software is made available under the Smithsonian Institution [terms of use]( General instructions for installation using RubyGems/Bundler and specific instructions for macOS are provided below. You can test your BaitsTools installation by running the tutorials included in the example_data directory. The archive "tutorial.tgz" includes the expected output of each tutorial. Note that vcf2baits and stacks2baits output will vary slightly due to the random number generator. ### Installation using RubyGems and Bundler -The BaitsTools executables can be installed using [RubyGems](https://www.rubygems.org) and [Bundler](https://bundler.io/) (available on most UNIX-like operating systems with [Ruby](https://www.ruby-lang.org) and RubyGems installed). See instructions for macOS below as macOS requires the [Ruby Version Manager](https://rvm.io) to manually install Ruby gems. See the Ruby and RubyGems documentation for installation on other operating systems. +The BaitsTools executables can be installed using [RubyGems](https://www.rubygems.org) and [Bundler](https://bundler.io/) (available on most UNIX-like operating systems with [Ruby](https://www.ruby-lang.org) and RubyGems installed). See instructions for macOS below as macOS requires the [Ruby Version Manager](https://rvm.io) to manually install Ruby gems. See the Ruby and RubyGems documentation for installation on other operating systems. Precompiled gems are available [here](https://github.com/campanam/BaitsTools/pkgs/rubygems/baitstools). -In a terminal window, execute the following commands: +After downloading the latest precompiled gem, execute the following command in a terminal window: + +`gem install baitstools-1.8.0.gem` + +To manually build and install the gem, execute the following commands in a terminal window: `git clone https://github.com/campanam/baitstools` `cd baitstools` `gem build baitstools.gemspec` -`gem install baitstools-1.7.8.gem` +`gem install baitstools-1.8.0.gem` ### macOS Installation macOS uses a deprecated version of Tcl-Tk as its default Tk framework. For best results, install [ActiveTcl 8.6](https://www.activestate.com/products/activetcl/downloads/) and then reinstall the tk gem (`gem install tk`). Tcl-Tk can also be installed using [Homebrew](https://brew.sh) or [Anaconda](https://anaconda.org/), but the windows are not optimized for these methods. @@ -74,7 +78,7 @@ Enter the following commands (step annotations are provided after the highlighte `git clone https://github.com/campanam/baitstools`: Download the BaitsTools repository. `cd baitstools`: Enter the baitstools directory. `gem build baitstools.gemspec`: Build the BaitsTools gem. -`gem install baitstools-1.7.5.gem`: Install the BaitsTools gem. +`gem install baitstools-1.8.0.gem`: Install the BaitsTools gem. _macOS Installation Notes:_ 1. The Ruby Version Manager uses [Homebrew](https://brew.sh). During installation you may need to give an administrator password and authorization to install/update Homebrew. @@ -183,7 +187,8 @@ aln2baits generates baits from a DNA alignment in FASTA or FASTQ format. Bait se `-i, --input [FILE]`: Input alignment file name. Include the path to the file if not in the current directory. `-L, --length [VALUE]`: Requested bait length. Default is 120 bp. `-O, --offset [VALUE]`: Offset (in bp) between tiled baits. Default is 60 bp. -`-H, --haplo [VALUE]`: Alignment window haplotype definition (`haplotype` or `variant`). `haplotype` will cause the program to identify all unique haplotypes within each bait tiling window observed in the data. `variant` will cause the program to generate all possible permutations of single nucleotide variants observed within the window. Default is `haplotype`. +`-H, --haplo [VALUE]`: Alignment window haplotype definition (`haplotype` or `variant`). `haplotype` will cause the program to identify all unique haplotypes within each bait tiling window observed in the data. `variant` will cause the program to generate random permutations of single nucleotide variants observed within the window. Default is `haplotype`. +`--maxvars [VALUE]`: Maximum number of variant permutations to retain within each alignment window when using the `variant` haplotype definition. Default is 24. ### annot2baits annot2baits generates baits from an annotation file in GTF or GFF and a corresponding DNA sequence in FASTA or FASTQ format. @@ -232,7 +237,8 @@ pyrad2baits selects variants and generates baits from a PyRAD/ipyrad loci file. `-O, --offset [VALUE]`: Base pair offset between tiled baits. Default is 60 bp. `-I, --minind [VALUE]`: Minimum number of individuals to include locus. Default is 1. `-W, --strategy [VALUE]`: Strategy to generate baits from loci (`alignment`, `SNPs`, or `informative`). `alignment` treats the individual loci as FASTA alignments and passes the alignments to [aln2baits](#aln2baits) to generate weighted alignments. `SNPs` and `informative` select and generate baits for identified variable sites. `SNPs` includes all identified sites, whereas `informative` includes only phylogenetically informative sites. Default is `alignment`. -`-H, --haplo [VALUE]`: If using `alignment` strategy, alignment window haplotype definition (`haplotype` or `variant`). `haplotype` will cause the program to identify all unique haplotypes within each bait tiling window observed in the data. `variant` will cause the program to generate all possible permutations of single nucleotide variants observed within the window. Default is `haplotype`. +`-H, --haplo [VALUE]`: If using `alignment` strategy, alignment window haplotype definition (`haplotype` or `variant`). `haplotype` will cause the program to identify all unique haplotypes within each bait tiling window observed in the data. `variant` will cause the program to generate random permutations of single nucleotide variants observed within the window. Default is `haplotype`. +`--maxvars [VALUE]`: Maximum number of variant permutations to retain within each alignment window when using the `variant` haplotype definition. Default is 24. `--uncollapsedref`: If using `SNPs` or `informative` strategies, choose a random reference sequence and keep ambiguities for each locus. `-a, --alt`: If using `SNPs` or `informative` strategies, generate baits for alternate alleles. `-t, --totalvars [VALUE]`: If using `SNPs` or `informative` strategies, total requested variants. Default is 30,000. diff --git a/baitstools.gemspec b/baitstools.gemspec index 8eb6c95..eac777d 100644 --- a/baitstools.gemspec +++ b/baitstools.gemspec @@ -1,6 +1,6 @@ Gem::Specification.new do |s| s.name = 'baitstools' - s.version = '1.7.8' + s.version = '1.8.0' s.required_ruby_version = '>= 2.4.1' s.date = '2023-07-31' s.summary = 'BaitsTools: Software for hybridization capture bait design' diff --git a/bin/baitstools b/bin/baitstools index c000697..3909d97 100644 --- a/bin/baitstools +++ b/bin/baitstools @@ -1,7 +1,7 @@ #!/usr/bin/env ruby #----------------------------------------------------------------------------------------------- # baitstools -BAITSTOOLSVER = "1.7.8" +BAITSTOOLSVER = "1.8.0" # Michael G. Campana, 2017-2023 # Smithsonian's National Zoo and Conservation Biology Institute #----------------------------------------------------------------------------------------------- @@ -74,6 +74,7 @@ class Parser args.no_Ns = false # Flag to omit bait sequences with Ns args.collapse_ambiguities = false # Flag to collapse ambiguities to a single nucleotide args.haplodef = "haplotype" # Haplotype definition for aln2baits + args.maxvars = 24 # Maximum number of retained variant permutations per window for aln2baits args.uncollapsed_ref = false # Flag to keep ambiguities in pyrad reference sequence args.sort = false # Flag to sort stack2baits SNPs by between/within population variation args.hwe = false # Flag to sort stacks2baits SNPs by Hardy-Weinberg Equilibrium @@ -186,6 +187,9 @@ class Parser opts.on("-H","--haplo [VALUE]", String, "If using alignment strategy, window haplotype definition (haplotype or variant) (Default = haplotype)") do |fa| args.haplodef = fa.downcase if fa != nil end + opts.on("--maxvars [VALUE]",Integer, "Maximum number of variant permutations per alignment window (Default = 24)") do |maxvars| + args.maxvars = maxvars if maxvars != nil + end opts.on("--uncollapsedref","Keep ambiguities in pyrad2baits reference sequence") do args.uncollapsed_ref = true end @@ -252,6 +256,9 @@ class Parser opts.on("-H","--haplo [VALUE]", String, "Window haplotype definition (haplotype or variant) (Default = haplotype)") do |fa| args.haplodef = fa if fa.downcase != nil end + opts.on("--maxvars [VALUE]",Integer, "Maximum number of variant permutations per alignment window (Default = 24)") do |maxvars| + args.maxvars = maxvars if maxvars != nil + end end if args.algorithm == "blast2baits" opts.on("--percid [VALUE]", Float, "Minimum percent identity to include BLAST hit (Default = 0.0)") do |percid| @@ -837,6 +844,14 @@ begin print "Please choose a haplotype definition (haplotype or variant)\n" $options.haplodef = gets.chomp.downcase end + if $options.interact and $options.haplodef == "variant" + print "Enter maximum number of variant permutations per alignment window to retain.\n" + $options.maxvars = gets.chomp.to_i + end + while $options.maxvars <= 0 and $options.haplodef == "variant" + print "Minimum number of retained variant permutations must be greater than 0. Re-enter.\n" + $options.maxvars = gets.chomp.to_i + end end if $options.algorithm == "blast2baits" if $options.interact diff --git a/bin/baitstoolsgui b/bin/baitstoolsgui index aaa9f7f..915a04d 100644 --- a/bin/baitstoolsgui +++ b/bin/baitstoolsgui @@ -1,7 +1,7 @@ #!/usr/bin/env ruby #----------------------------------------------------------------------------------------------- # baitstoolsgui -BAITSTOOLSGUI = "1.7.8" +BAITSTOOLSGUI = "1.8.0" # Michael G. Campana, 2017-2023 # Smithsonian's National Zoo and Conservation Biology Institute #----------------------------------------------------------------------------------------------- @@ -58,6 +58,7 @@ def start_baitstools end if $options.algorithm == "aln2baits" or ($options.algorithm == "pyrad2baits" && $options.strategy == "alignment") cmdline << " -H " + $options.haplodef + cmdline << " --maxvars " + $options.maxvars if $options.haplodef == "variant" elsif $options.algorithm == "annot2baits" cmdline << " -U " + $options.features.value.upcase elsif $options.algorithm == "blast2baits" @@ -235,6 +236,11 @@ def update_strategy $alts.state = "disabled" $uncollapsedref.state = "disabled" $haplo.state = $haploselect.state = "normal" + if $options.haplodef == "variant" + $maxvars.state = $maxvarsentry.state = "normal" + else + $maxvars.state = $maxvarsentry.state = "disabled" + end else $maxsnpentry.state = $maxsnps.state = "normal" $distanceentry.state = $distance.state = "normal" @@ -244,6 +250,7 @@ def update_strategy $alts.state = "normal" $uncollapsedref.state = "normal" $haplo.state = $haploselect.state = "disabled" + $maxvars.state = $maxvarsentry.state = "disabled" end end #----------------------------------------------------------------------------------------------- @@ -421,7 +428,31 @@ def haplodef_window height 2 place('x' => 240, 'y' => 260) end - $widgets.push($haplo, $haploselect) + $haploselect.bind("") do + update_haplodef + end + $maxvars = TkLabel.new($root) do + text 'Maximum variants per window' + font TkFont.new('times 20') + place('x' => 400, 'y' => 250) + pady 10 + end + $maxvarsentry = TkEntry.new($root) do + textvariable $options.maxvars + borderwidth 5 + font TkFont.new('times 12') + place('x' => 660, 'y' => 260) + width 10 + end + $widgets.push($haplo, $haploselect,$maxvars,$maxvarsentry) +end +#----------------------------------------------------------------------------------------------- +def update_haplodef + if $options.haplodef == "variant" + $maxvars.state = $maxvarsentry.state = "normal" + else + $maxvars.state = $maxvarsentry.state = "disabled" + end end #----------------------------------------------------------------------------------------------- def reference_window(winy = 150) @@ -828,6 +859,7 @@ def subcommand_window(subcommand) offset_window haplodef_window inputlabel = "Input FASTA/FASTQ" + update_haplodef when "annot2baits" reference_window pad_window @@ -1301,6 +1333,8 @@ def go_forward Tk::messageBox :message => 'Please specify an input file.' elsif ($options.algorithm == "annot2baits" or $options.algorithm == "bed2baits" or $options.algorithm == "blast2baits") && $options.refseq == "" Tk::messageBox :message => 'Please specify a reference sequence.' + elsif $options.algorithm == "aln2baits" && $options.strategy == "alignment" && $options.haplodef == "variant" && $options.maxvars < 1 + Tk::messageBox :message => 'Maximum number of variant permutations per window must be greater than 0.' elsif ($options.algorithm == "annot2baits" or $options.algorithm == "bed2baits" or $options.algorithm == "blast2baits") && $options.pad < 0 Tk::messageBox :message => 'Pad length cannot be less than 0.' elsif $options.baitlength < 1 @@ -1325,6 +1359,8 @@ def go_forward Tk::messageBox :message => 'Tiling offset must be greater than 0.' elsif $options.minind < 1 Tk::messageBox :message => 'Minimum individuals must be greater than 0.' + elsif $options.strategy == "alignment" && $options.haplodef == "variant" && $options.maxvars < 1 + Tk::messageBox :message => 'Maximum number of variant permutations per window must be greater than 0.' elsif $options.strategy != "alignment" if $options.totalsnps < 1 Tk::messageBox :message => 'The total number of variants must be greater than 0.' @@ -1532,6 +1568,7 @@ def set_defaults $options.tileoffset = TkVariable.new(60) # Offset between tiled baits $options.bait_type = TkVariable.new("RNA-DNA") # Hybridization type $options.haplodef = TkVariable.new("haplotype") # Haplotype definition for aln2baits + $options.maxvars = TkVariable.new(24) # Maximum number of variant permutations per alignment window $options.list_format = TkVariable.new("BED") # Interval list file format $options.features = TkVariable.new("") # Desired features in comma-separated list $options.pad = TkVariable.new(0) # BP to pad ends of extracted regions @@ -1635,7 +1672,7 @@ $next_btn = TkButton.new($root) do place('x' => 660, 'y' => 520) end credit = TkLabel.new($root) do - text "Michael G. Campana, 2017-2022\nSmithsonian's National Zoo and Conservation Biology Institute" + text "Michael G. Campana, 2017-2023\nSmithsonian's National Zoo and Conservation Biology Institute" borderwidth 5 font TkFont.new('times 12') pack("side" => "bottom", "padx"=> "50", "pady"=> "10") diff --git a/lib/aln2baits.rb b/lib/aln2baits.rb index a64d1f1..a5697ca 100644 --- a/lib/aln2baits.rb +++ b/lib/aln2baits.rb @@ -1,7 +1,7 @@ #!/usr/bin/env ruby #----------------------------------------------------------------------------------------------- # aln2baits -ALN2BAITSVER = "1.7.8" +ALN2BAITSVER = "1.8.0" # Michael G. Campana, 2017-2023 # Smithsonian Conservation Biology Institute #----------------------------------------------------------------------------------------------- @@ -40,7 +40,9 @@ def var_permutations(aln) # Get possible variant permutations varindex = 1 # Index for sequence for var in variants varindex *= var.size # Must be complete down here or interferes with multithreading + break if varindex > $options.maxvars # Control unnecessary extra processing if too many variants requested end + varindex = $options.maxvars if varindex > $options.maxvars revised_haplos = [] bedstarts = self.bedstarts[0] # Reset bedstart array and assume coordinates of first array member self.bedstarts = [] @@ -54,7 +56,7 @@ def var_permutations(aln) # Get possible variant permutations for Thread.current[:k] in 0...varindex if Thread.current[:k] % $options.threads == j for Thread.current[:i] in 0...self.haplotypes[0].length - Thread.current[:var] = Thread.current[:k] % variants[Thread.current[:i]].size + Thread.current[:var] = rand(variants[Thread.current[:i]].size) revised_haplos[Thread.current[:k]] << variants[Thread.current[:i]][Thread.current[:var]] # Minimize lock time end if $options.gaps == "extend" diff --git a/lib/baitslib.rb b/lib/baitslib.rb index f5257ad..1d6b820 100644 --- a/lib/baitslib.rb +++ b/lib/baitslib.rb @@ -1,8 +1,8 @@ #!/usr/bin/env ruby #----------------------------------------------------------------------------------------------- # baitslib -BAITSLIBVER = "1.7.5" -# Michael G. Campana, 2017-2022 +BAITSLIBVER = "1.8.0" +# Michael G. Campana, 2017-2023 # Smithsonian's National Zoo and Conservation Biology Institute #----------------------------------------------------------------------------------------------- @@ -1010,6 +1010,7 @@ def get_command_line # Get command line for summary output end if $options.algorithm == "aln2baits" or ($options.algorithm == "pyrad2baits" && $options.strategy == "alignment") cmdline << " -H " + $options.haplodef + cmdline << " --maxvars " + $options.maxvars.to_s if $options.haplodef == "variant" elsif $options.algorithm == "annot2baits" cmdline << " -U " for feature in $options.features diff --git a/osx_install.sh b/osx_install.sh index 15576d0..1b873b8 100644 --- a/osx_install.sh +++ b/osx_install.sh @@ -1,6 +1,6 @@ #!/bin/bash #----------------------------------------------------------------------------------------------- -# osx_install v 1.7.8 +# osx_install v 1.8.0 # Michael G. Campana, 2017-2023 # Smithsonian's National Zoo and Conservation Biology Institute #----------------------------------------------------------------------------------------------- @@ -10,4 +10,4 @@ source ~/.rvm/scripts/rvm rvm install 3.1.2 rvm --default use 3.1.2 gem build baitstools.gemspec -gem install ./baitstools-1.7.8.gem +gem install ./baitstools-1.8.0.gem