Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update STAR version + new options for STARsolo #5060

Merged
merged 36 commits into from
Feb 17, 2023
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
b52f63b
update STAR version
lldelisle Jan 16, 2023
1006e84
add GeneQuant + outSAMattributes
lldelisle Jan 16, 2023
f450b3e
remove dup test
lldelisle Jan 17, 2023
a7cb191
update tool version [no ci]
lldelisle Jan 17, 2023
2b72bb8
lint
lldelisle Jan 17, 2023
8f42952
add SAMattributes in macro as suggested by @wm75
lldelisle Jan 18, 2023
c3b1dc3
change Cell Ranger to Chromium chemistry
lldelisle Jan 18, 2023
45afe99
Update rg_rnaStarSolo.xml
pavanvidem Jan 20, 2023
e91e9f3
Update macros.xml
pavanvidem Jan 20, 2023
662ba6d
Update rg_rnaStar.xml
pavanvidem Jan 20, 2023
033cd4a
STAR: allow fasta.gz for reference
bernt-matthias Jan 26, 2023
3ca8610
Merge pull request #2 from pavanvidem/patch-4
lldelisle Jan 26, 2023
d374110
use up to date profile
lldelisle Jan 26, 2023
5fe1484
fix back to line
lldelisle Jan 26, 2023
ca87f87
increase GenomeGenerateRAM by @nagoue @bgruening @wm75
lldelisle Jan 27, 2023
b69a4b2
fix macro limits + extend to all starSOLO
lldelisle Jan 27, 2023
a34c128
use double getVar
lldelisle Jan 27, 2023
0103d32
compare params values with command line
lldelisle Jan 27, 2023
0d80c76
put default value in second getVar
lldelisle Jan 27, 2023
dba503d
add colnames to count file
lldelisle Feb 1, 2023
3ed614c
add outWig to STAR
lldelisle Feb 1, 2023
9c5210c
add outWig in STARsolo + compress bam
lldelisle Feb 1, 2023
443ebd2
remove section coverage
lldelisle Feb 1, 2023
ae0f3c7
add ftype in test
lldelisle Feb 1, 2023
065d90e
fix output matrix for new soloFeatures + add test
lldelisle Feb 1, 2023
b5142ff
Merge pull request #4 from lldelisle/outWig
lldelisle Feb 2, 2023
117ce39
solve #1777
lldelisle Jan 27, 2023
95831a6
put quantmode_output in GTFconditional thanks @bernt-matthias
lldelisle Feb 8, 2023
1ff30c2
Merge pull request #3 from lldelisle/solve1777
lldelisle Feb 8, 2023
19b6882
change default outSAMmapqUnique to 255 like in STAR
lldelisle Feb 8, 2023
20a2126
only use soloUMIfiltering when soloUMIdedup is 1MM_CR
lldelisle Feb 8, 2023
dfe88ea
enable to output filtered and raw matrices
lldelisle Feb 8, 2023
c6c4822
add forgotten requirement
lldelisle Feb 16, 2023
296660c
use @TOOL_VERSION@+galaxy@VERSION_SUFFIX@
lldelisle Feb 16, 2023
53faaa4
bump version of data_manager
lldelisle Feb 16, 2023
e3966b9
put back to MAPQ60 with remark
lldelisle Feb 16, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 32 additions & 5 deletions tools/rgrnastar/macros.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
the index versions in sync, but you should manually adjust the +galaxy
version number. -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<tool id="rna_star_index_builder_data_manager" name="rnastar index versioned" tool_type="manage_data" version="@IDX_VERSION@" profile="19.05">

is what this comment refers to.
This PR should, because of the linked macros file, trigger deployment of a new version of the DM, too, so you need to bump the DM version to version="@IDX_VERSION@+galaxy1"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean we should change the version because we changed STAR version or because of something else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess bumping the IDX_VERSION_SUFFIX does not hurt, but I do not understand why we should do it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe to indicate that we use the STAR version @TOOL_VERSION@ instead of the STAR version @IDX_VERSION@ to build the index...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bernt-matthias @lldelisle Just to explain things: the reason for symlinking the macros file is to keep the IDX_VERSION in one place only so that when you update the tool wrapper to a STAR version that requires a newer index format, you'd automatically deploy also a DM that can create these indexes.

The "downside" is that any changes to the tool wrapper macros file will silently affect the DM. So in this case the next version of the DM will use the 2.7.10b version of star for building indexes. These should be identical to ones built with older versions, but it's good to bump the DM wrapper version to be able to trace things back.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems legit.

Could a more expressive filter help here. For instance we could just store the star version that was used to create an entry in the datatable ... and then just filter datatable entries for a min (or max) required star version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is easy to add a new column in a table, this suppose to change the table (which happened when we add a new column for the 'genomeVersion')...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would require a new table. I'm also not so sure whether that would improve the situation much.
min and max version checks in tool wrappers also need quite some discipline to maintain, and the max check in particular doesn't work backwards, i.e., at the time of writing a tool wrapper version the max value is typically unknown still so there's always at least one wrapper version that will display all newer index versions.
What would be comparably easy to do is to remove the symlink and have the DM use its own macro, which then needs to be maintained separately, but would maybe come with fewer surprises.

Anyway, I don't think this should hold back this PR any longer. If we want to decouple the DM from the tool wrapper, we should do it in its own PR where the decision will be more discoverable than as part of a giant PR like this one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

<!-- STAR version to be used -->
<token name="@VERSION@">2.7.8a</token>
<token name="@VERSION@">2.7.10b</token>
lldelisle marked this conversation as resolved.
Show resolved Hide resolved
<!-- STAR index version compatible with this version of STAR
This is the STAR version that introduced the index structure expected
by the current version.
Expand All @@ -19,7 +19,7 @@
<xml name="requirements">
<requirements>
<requirement type="package" version="@VERSION@">star</requirement>
<requirement type="package" version="1.9">samtools</requirement>
<requirement type="package" version="1.16.1">samtools</requirement>
<yield />
</requirements>
</xml>
Expand All @@ -35,7 +35,7 @@
</xml>

<xml name="index_selection" token_with_gene_model="0">
<param argument="--genomeDir" name="genomeDir" type="select"
<param argument="--genomeDir" type="select"
label="Select reference genome"
help="If your genome of interest is not listed, contact the Galaxy team">
<options from_data_table="@IDX_DATA_TABLE@">
Expand Down Expand Up @@ -81,11 +81,16 @@
<token name="@TEMPINDEX@"><![CDATA[
## Create temporary index for custom reference
#if str($refGenomeSource.geneSource) == 'history':
#if $refGenomeSource.genomeFastaFiles.ext == "fasta"
ln -s '$refGenomeSource.genomeFastaFiles' refgenome.fa &&
#else
gunzip -c '$refGenomeSource.genomeFastaFiles' > refgenome.fa &&
lldelisle marked this conversation as resolved.
Show resolved Hide resolved
#end if
mkdir -p tempstargenomedir &&
STAR
--runMode genomeGenerate
--genomeDir 'tempstargenomedir'
--genomeFastaFiles '${refGenomeSource.genomeFastaFiles}'
--genomeFastaFiles refgenome.fa
## Handle difference between indices with/without annotations
#if 'GTFconditional' in $refGenomeSource:
## GTFconditional exists only in STAR, but not STARsolo
Expand Down Expand Up @@ -161,8 +166,13 @@
@FASTQ_GZ_OPTION@
#end if
]]></token>
<token name="@LIMITS@" ><![CDATA[
--limitOutSJoneRead $algo.params.limits.limitOutSJoneRead
--limitOutSJcollapsed $algo.params.limits.limitOutSJcollapsed
--limitSjdbInsertNsj $algo.params.limits.limitSjdbInsertNsj
]]></token>
lldelisle marked this conversation as resolved.
Show resolved Hide resolved
<xml name="ref_selection">
<param argument="--genomeFastaFiles" type="data" format="fasta" label="Select a reference genome" />
<param argument="--genomeFastaFiles" type="data" format="fasta,fasta.gz" label="Select a reference genome" />
<param argument="--genomeSAindexNbases" type="integer" min="2" max="16" value="14" label="Length of the SA pre-indexing string" help="Typically between 10 and 15. Longer strings will use much more memory, but allow faster searches. For small genomes, the parameter --genomeSAindexNbases must be scaled down to min(14, log2(GenomeLength)/2 - 1)"/>
</xml>
<xml name="stdio" >
Expand Down Expand Up @@ -245,4 +255,21 @@
<option value="None" >No adapter clipping</option>
</param>
</xml>
<xml name="common_SAM_attributes">
<option value="NH" selected="true">NH (number of reported alignments/hits for the read)</option>
<option value="HI" selected="true">HI (query hit index)</option>
<option value="AS" selected="true">AS (local alignment score)</option>
<option value="nM" selected="true">nM (number of mismatches per (paired) alignment)</option>
<option value="NM">NM (edit distance of the aligned read to the reference)</option>
<option value="MD">MD (string for mismatching positions)</option>
<option value="jM">jM (intron motifs for all junctions)</option>
<option value="jI">jI (1-based start and end of introns for all junctions)</option>
</xml>
<xml name="limits">
<section name="junction_limits" title="Limits" expanded="false">
<param argument="--limitOutSJoneRead" type="integer" min="1" value="1000" label="Maximum number of junctions for one read (including all multimappers)" />
<param argument="--limitOutSJcollapsed" type="integer" min="1" value="1000000" label="Maximum number of collapsed junctions" />
<param argument="--limitSjdbInsertNsj" type="integer" min="0" value="1000000" label="Maximum number of inserts to be inserted into the genome on the fly." />
</section>
</xml>
</macros>
34 changes: 12 additions & 22 deletions tools/rgrnastar/rg_rnaStar.xml
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@
<tool id="rna_star" name="RNA STAR" version="@VERSION@+galaxy1" profile="20.01" license="MIT">
<<<<<<< HEAD
lldelisle marked this conversation as resolved.
Show resolved Hide resolved
<tool id="rna_star" name="RNA STAR" version="@VERSION@+galaxy0" profile="20.01" license="MIT">
=======
<tool id="rna_star" name="RNA STAR" version="@VERSION@+galaxy2" profile="21.01" license="MIT">
>>>>>>> 3ce631f61 (STAR: allow fasta.gz for reference)
lldelisle marked this conversation as resolved.
Show resolved Hide resolved
<description>Gapped-read mapper for RNA-seq data</description>
<macros>
<import>macros.xml</import>
Expand Down Expand Up @@ -206,9 +210,7 @@
#end if

## Limits
--limitOutSJoneRead $algo.params.limits.limitOutSJoneRead
--limitOutSJcollapsed $algo.params.limits.limitOutSJcollapsed
--limitSjdbInsertNsj $algo.params.limits.limitSjdbInsertNsj
@LIMITS@
#else:
## Go with STAR's default algorithmic settings,
## but we need to provide a reasonable default
Expand Down Expand Up @@ -373,16 +375,9 @@
label="Read alignment tags to include in the BAM output"
help="Note on using the XS tag: If the XS tag is used, STAR will filter out alignments with undefined strand (i.e., those containing only non-canonical unannotated junctions). Using this tag is recommended if you plan to use the STAR results with STAR-Fusion. In addition, it is required for compatibility
with Cufflinks if your sequences come from an unstranded library preparation.">
<option value="NH" selected="true">NH (number of reported alignments/hits for the read)</option>
<option value="HI" selected="true">HI (query hit index)</option>
<option value="AS" selected="true">AS (local alignment score)</option>
<option value="nM" selected="true">nM (number of mismatches per (paired) alignment)</option>
<option value="XS">XS (strand flag, see parameter help below) </option>
<option value="NM">NM (edit distance of the aligned read to the reference)</option>
<option value="MD">MD (string for mismatching positions)</option>
<expand macro="common_SAM_attributes"/>
<option value="MC">MC (CIGAR string for mate/next segment)</option>
<option value="jM">jM (intron motifs for all junctions)</option>
<option value="jI">jI (1-based start and end of introns for all junctions)</option>
<option value="XS">XS (strand flag, see parameter help below) </option>
<option value="ch" selected="true">ch (used to indicate chimeric alignments)</option>
</param>
<param argument="--outSAMattrIHstart" name="HI_offset" type="select" display="radio"
Expand Down Expand Up @@ -469,7 +464,7 @@ used: >=5 mappings => MAPQ=0; 3-4 mappings => MAPQ=1; 2 mappings => MAPQ=3. This
</section>

<section name="align" title="Alignment parameters" expanded="false">
<param argument="--alignIntronMin" name="alignIntronMin" type="integer" min="0" value="21" label="Minimum intron size"/>
<param argument="--alignIntronMin" type="integer" min="0" value="21" label="Minimum intron size"/>
<param argument="--alignIntronMax" type="integer" min="0" value="0" label="Maximum intron size"/>
<param argument="--alignMatesGapMax" type="integer" min="0" value="0" label="Maximum gap between two mates"/>
<param argument="--alignSJoverhangMin" type="integer" min="1" value="5" label="Minimum overhang for spliced alignments"/>
Expand Down Expand Up @@ -518,12 +513,7 @@ used: >=5 mappings => MAPQ=0; 3-4 mappings => MAPQ=1; 2 mappings => MAPQ=3. This
<param argument="--chimMultimapScoreRange" type="integer" min="0" value="1"
label="Score range for multi-mapping chimeras"
help="The threshold below the best chimeric score that a multimapping chimera must have to be output. This is ignored unless --chimMultimapNmax is above 1" />
</section>
<section name="limits" title="Limits" expanded="false">
<param argument="--limitOutSJoneRead" type="integer" min="1" value="1000" label="Maximum number of junctions for one read (including all multimappers)" />
<param argument="--limitOutSJcollapsed" type="integer" min="1" value="1000000" label="Maximum number of collapsed junctions" />
<param argument="--limitSjdbInsertNsj" type="integer" min="0" value="1000000" label="Maximum number of inserts to be inserted into the genome on the fly." />
</section>
<expand macro="limits" />
</when>
</conditional>
</section>
Expand Down Expand Up @@ -570,7 +560,7 @@ used: >=5 mappings => MAPQ=0; 3-4 mappings => MAPQ=1; 2 mappings => MAPQ=3. This
</conditional>
<conditional name="refGenomeSource">
<param name="geneSource" value="history" />
<param name="genomeFastaFiles" value="tophat_test.fa" />
<param name="genomeFastaFiles" value="tophat_test.fa.gz" />
<param name="genomeSAindexNbases" value="5" />
</conditional>
<section name="oformat">
Expand Down Expand Up @@ -1024,7 +1014,7 @@ generated. Hence, be sure to select either:
In addition, the following parameters_ related to chimeric alignment are recommended for improved sensitivity

- under *Output filter criteria*,
**Would you like to set additional output filters?**: select `Yes' to set
**Would you like to set additional output filters?**: select `Yes` to set
**Maximum number of alignments to output a read's alignment results, plus 1** to 50

- under *Algorithmic settings*, **Configure seed, alignment and limits options**:
Expand Down
Loading