Skip to content

Commit

Permalink
Merge branch 'devel' of https://github.com/gis-rpd/pipelines into devel
Browse files Browse the repository at this point in the history
  • Loading branch information
Andreas WILM committed Oct 4, 2017
2 parents 45ef076 + f80d442 commit 765a705
Show file tree
Hide file tree
Showing 39 changed files with 87 additions and 57 deletions.
45 changes: 23 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,12 +39,12 @@ The following installations are available at different sites (referred to as `RP
- NSCC: `/home/users/astar/gis/gisshared/rpd/pipelines/`

Each of these contains one subfolder per pipeline version,
e.g. `$RPD_PIPELINES/pipelines.2017-01` (referred to as
e.g. `$RPD_PIPELINES/pipelines.2017-06` (referred to as
`PIPELINE_ROOTDIR` below).

Much of this framework assumes a certain setup and services to be
present, as is the case in GIS / the NSCC. This repository is
therefore of limited use to the general public. See INSTALL.md for
therefore of limited use to the general public. See `INSTALL.md` for
simplistic installation instructions.

Some pipelines only work at a certain site (due to system or software
Expand Down Expand Up @@ -112,23 +112,25 @@ In either case, you must not prefix the script with `python`.

| Name | Category | Notes | @GIS | @NSCC |
| --- | --- | --- | --- | --- |
| [bcl2fastq](bcl2fastq/README.md) | Production | Not for end-users | Y | Y |
| [ChIP-seq](chromatin-profiling/chipseq/README.md) | Chromatin Profiling | | Y | Y |
| [SG10K](custom/SG10K/README.md) | Custom | Not for end-users | Y | Y |
| [ViPR](germs/vipr/README.md) | GERMS | | Y | Y |
| [BWA-MEM](mapping/BWA-MEM/README.md) | Mapping | | Y | Y |
| [bcl2fastq](bcl2fastq/README.md) | Production | Not for end-users | Y | Y |
| [ATAC-seq](chromatin-profiling/atacseq/README.md) | Chromatin Profiling | | Y | Y |
| [ChIP-seq](chromatin-profiling/chipseq/README.md) | Chromatin Profiling | | Y | Y |
| [SG10K](custom/SG10K/README.md) | Custom | Not for end-users | Y | Y |
| [ViPR](germs/vipr/README.md) | GERMS | | Y | Y |
| [BWA-MEM](mapping/BWA-MEM/README.md) | Mapping | | Y | Y |
| [Shotgun Metagenomics](metagenomics/shotgun-metagenomics/README.md) | Metagenomics | | Y | Y |
| [Essential-Genes](metagenomics/essential-genes/README.md) | Metagenomics | Requires ref download | Y | Y |
| [STAR-RSEM](rnaseq/star-rsem/README.md) | RNA-Seq | | Y | Y |
| [Fluidigm-HT-C1-RNASeq](rnaseq/fluidigm-ht-c1-rnaseq/README.md) | RNA-Seq | | Y | N |
| [LoFreq-Somatic](somatic/lofreq-somatic/README.md) | Somatic | | Y | N |
| [Mutect](somatic/mutect/README.md) | Somatic | | Y | Y |
| [GATK](variant-calling/gatk/README.md) | Variant-calling | | Y | Y |
| [Lacer-LoFreq](variant-calling/lacer-lofreq/README.md) | Variant-calling | | Y | N |
| [Essential-Genes](metagenomics/essential-genes/README.md) | Metagenomics | Requires ref download | Y | Y |
| [STAR-RSEM](rnaseq/star-rsem/README.md) | RNA-Seq | | Y | Y |
| [Fluidigm-HT-C1-RNASeq](rnaseq/fluidigm-ht-c1-rnaseq/README.md)| RNA-Seq | | Y | N |
| [Wafergen](rnaseq/wafergen/README.md) | RNA-Seq | Requires cellular barcodes | Y | Y |
| [LoFreq-Somatic](somatic/lofreq-somatic/README.md) | Somatic | | Y | N |
| [Mutect](somatic/mutect/README.md) | Somatic | | Y | Y |
| [GATK](variant-calling/gatk/README.md) | Variant-calling | | Y | Y |
| [Lacer-LoFreq](variant-calling/lacer-lofreq/README.md) | Variant-calling | | Y | N |

See `example-dag.pdf` in each pipeline's folder for a visual overview of the workflow.

Note, pipelines start with fastq files as input (a few allow injection of BAM files).
Note, most pipelines start with FastQ files as input, a few allow injection of BAM files.

## How it Works

Expand All @@ -138,7 +140,7 @@ Note, pipelines start with fastq files as input (a few allow injection of BAM fi
`conf.yaml` file) and gets its own readgroup assigned where
appropriate.
- Software versions are defined in each pipelines' `cfg/modules.yaml`
and loaded via [dotkit](https://computing.llnl.gov/?set=jobs&page=dotkit)
and loaded via [Lmod](http://lmod.readthedocs.io/en/latest/)
- Pipeline wrappers create an output directory containing all
necessary configuration files, run scripts etc.
- After creation of this folder, the analysis run is automatically submitted to the cluster
Expand All @@ -154,7 +156,6 @@ Note, pipelines start with fastq files as input (a few allow injection of BAM fi

First call the wrapper in question with `--no-run`. cd into the given outdir and then
- Check the created `conf.yaml`
- Print the DAG: `rm -f logs/snakemake.log; type=pdf; EXTRA_SNAKEMAKE_ARGS="--dag" bash run.sh; cat logs/snakemake.log | dot -T$type > dag.$type`
- Execute a dryrun: `rm -f logs/snakemake.log; EXTRA_SNAKEMAKE_ARGS="--dryrun" bash run.sh; cat logs/snakemake.log`
- Run locally: `nohup bash run.sh; tail -f logs/snakemake.log`

Expand Down Expand Up @@ -189,11 +190,11 @@ described in the following:
want to set the CSV delimiter with `-d`, e.g. `-d ,`
- Use the created yaml file as input for the pipeline wrapper (option `--sample-cfg your.yaml`)

Please note, not all pipelines support this feature (for example the
somatic pipelines don't), but most do, e.g. GATK, Lacer-LoFreq. In
some cases multisample processing can lead to very high memory
consumption by the snakemake master process itself, a side-effect
which is hard to predict.
Please note, not all pipelines support this feature, for example the
Chipseq and all somatic pipelines. In some cases multisample
processing can lead to very high memory consumption by the snakemake
master process itself, a side-effect which is hard to predict (the
master process will be killed).

The above configuration can be used for single sample processing as
well, however, for single samples the corresponding use of options
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
2017-06.0
2017-10.0
3 changes: 3 additions & 0 deletions bcl2fastq/Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ from pipelines import send_status_mail
from pipelines import path_to_url
from pipelines import RPD_SIGNATURE
from utils import generate_timestamp
from pipelines import mark_as_completed
from elmlogger import ElmLogging, ElmUnit
from bcl2fastq_dbupdate import DBUPDATE_TRIGGER_FILE_FMT, DBUPDATE_TRIGGER_FILE_MAXNUM
from readunits import sampledir_to_cfg
Expand Down Expand Up @@ -149,6 +150,8 @@ onsuccess:
os.path.abspath(RESULT_OUTDIR), extra_text=extra_text)
# cannot talk to mongodb from compute. use trigger file
write_db_update_trigger(True)

mark_as_completed()


onerror:
Expand Down
20 changes: 20 additions & 0 deletions changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,26 @@
This change log only lists the major changes between releases. For a
full list of changes refer to the commit log.

## 2017-10

New pipelines:
- chromatin-profiling/atacseq: runs Bowtie2 and MACS2 to call ATAC-Seq peaks
- rnaseq/wafergen-scrna: analyses WaferGen's single cell sequencing
data, using umis and scRNApipe for the core analysis.


Changes to pipelines and framework:
- Major pipelines now produce benchmarking logs
- STAR-RSEM now supports --estimate-rspd and new strandedness option
- Shotgun-metagenomics: Added SRST2, read counting and coverage
threshold for decontamination
- GATK: support for joint variant calling (--joint-calls) and added a
new option (--gvcf-only). Also added a new optional references
config for agressive splitting
- Added --restarts option to control number of automatic restarts in
case of failure
- Minor additions to SG10K and many under the hood changes to bc2lfastq

## 2017-06

New pipelines:
Expand Down
Binary file modified chromatin-profiling/atacseq/example-dag.pdf
Binary file not shown.
2 changes: 1 addition & 1 deletion chromatin-profiling/atacseq/tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ echo 'FIXME test quality of results' 1>&2


# DAG
SKIP_DAG=0
SKIP_DAG=1
if [ $SKIP_DAG -eq 0 ]; then
echo "DAG: PE two pairs" | tee -a $log
odir=$($DOWNSTREAM_OUTDIR_PY -r $(whoami) -p $PIPELINE)
Expand Down
Binary file modified chromatin-profiling/chipseq/example-dag.pdf
Binary file not shown.
2 changes: 1 addition & 1 deletion chromatin-profiling/chipseq/tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ echo 'FIXME bam injection' 1>&2


# DAG
SKIP_DAG=0
SKIP_DAG=1
if [ $SKIP_DAG -eq 0 ]; then
echo "DAG: WES" | tee -a $log
odir=$($DOWNSTREAM_OUTDIR_PY -r $(whoami) -p $PIPELINE)
Expand Down
6 changes: 4 additions & 2 deletions custom/SG10K/Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -134,9 +134,11 @@ rule bam_to_cram:
#conda:
# "env.yaml"
shell:
"module load bamutil/1.0.14-33-nonprimdup;"
"{{ "
" module load bamutil/1.0.14-33-nonprimdup;"
" bam squeeze --in {input.bam} --out -.ubam {params.bin_arg} |"
"samtools view -C -T {input.reffa} -@ {threads} -o {output.cram} - >& {log}"
" samtools view -C -T {input.reffa} -@ {threads} -o {output.cram} -;"
" }} >& {log}"


localrules: cram_index
Expand Down
1 change: 0 additions & 1 deletion custom/SG10K/cfg/modules.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,3 @@ datamash: 1.1.0
sg10k-cov: '062017'
fastmitocalc: 'default'


Binary file modified custom/SG10K/example-dag.pdf
Binary file not shown.
2 changes: 1 addition & 1 deletion custom/SG10K/tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ echo "Check log if the following final message is not printed: \"$COMPLETE_MSG\"


# DAG
SKIP_DAG=0
SKIP_DAG=1
if [ $SKIP_DAG -eq 0 ]; then
echo "DAG: $SAMPLE" | tee -a $log
odir=$($DOWNSTREAM_OUTDIR_PY -r $(whoami) -p $PIPELINE)
Expand Down
2 changes: 1 addition & 1 deletion germs/vipr/aux/plot.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ def parse_genomecov(genomecov_gzfile):
for line in fh:
sq, pos, cov = line.decode().rstrip().split("\t")
pos = int(pos)-1
cov = int(cov)
cov = int(float(cov))# float for support of scientific notation
if sq not in genomecov:
genomecov[sq] = OrderedDict()
genomecov[sq][pos] = cov
Expand Down
Binary file modified germs/vipr/example-dag.pdf
Binary file not shown.
2 changes: 1 addition & 1 deletion germs/vipr/tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ base_cmd="$WRAPPER -1 $FQ1 -2 $FQ2 -s test:DENV2-TSV01-PDH203 -r $REF";
# FIXME fix here and elsewhere

# DAG
SKIP_DAG=0
SKIP_DAG=1
if [ $SKIP_DAG -eq 0 ]; then
echo "Creating DAG" | tee -a $log
odir=$($DOWNSTREAM_OUTDIR_PY -r $(whoami) -p $PIPELINE)
Expand Down
Binary file modified mapping/BWA-MEM/example-dag.pdf
Binary file not shown.
2 changes: 1 addition & 1 deletion mapping/BWA-MEM/tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ echo "Check log if the following final message is not printed: \"$COMPLETE_MSG\"


# DAG
SKIP_DAG=0
SKIP_DAG=1
if [ $SKIP_DAG -eq 0 ]; then
echo "DAG: PE through config" | tee -a $log
odir=$($DOWNSTREAM_OUTDIR_PY -r $(whoami) -p $PIPELINE)
Expand Down
Binary file modified metagenomics/essential-genes/example-dag.pdf
Binary file not shown.
2 changes: 1 addition & 1 deletion metagenomics/essential-genes/tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ echo "Check log if the following final message is not printed: \"$COMPLETE_MSG\"


# DAG
SKIP_DAG=0
SKIP_DAG=1
if [ $SKIP_DAG -eq 0 ]; then
echo "DAG" | tee -a $log
odir=$($DOWNSTREAM_OUTDIR_PY -r $(whoami) -p $PIPELINE)
Expand Down
Binary file not shown.
2 changes: 1 addition & 1 deletion metagenomics/shotgun-metagenomics/tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ base_cmd="$WRAPPER --sample-cfg $SAMPLECFG --name test:shotgun-metagenomics";
# FIXME fix here and elsewhere

# DAG
SKIP_DAG=0
SKIP_DAG=1
if [ $SKIP_DAG -eq 0 ]; then
echo "Creating DAG" | tee -a $log
odir=$($DOWNSTREAM_OUTDIR_PY -r $(whoami) -p $PIPELINE)
Expand Down
Binary file modified rnaseq/fluidigm-ht-c1-rnaseq/example-dag.pdf
Binary file not shown.
2 changes: 1 addition & 1 deletion rnaseq/fluidigm-ht-c1-rnaseq/tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ cmd_base="$WRAPPER -1 $COL1_R1 -2 $COL1_R2 -s COL01 --name 'test:COL01'"


# DAG
SKIP_DAG=0
SKIP_DAG=1
if [ $SKIP_DAG -eq 0 ]; then
echo "DAG" | tee -a $log
odir=$($DOWNSTREAM_OUTDIR_PY -r $(whoami) -p $PIPELINE)
Expand Down
Binary file modified rnaseq/star-rsem/example-dag.pdf
Binary file not shown.
2 changes: 1 addition & 1 deletion rnaseq/star-rsem/tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ CMD_FULL="$WRAPPER -1 $R1_FULL -2 $R2_FULL -s $SAMPLE --name 'test:FULL'"
SKIP_REAL_FULL=0

# DAG
SKIP_DAG=0
SKIP_DAG=1
if [ $SKIP_DAG -eq 0 ]; then
echo "DAG: Full" | tee -a $log
odir=$($DOWNSTREAM_OUTDIR_PY -r $(whoami) -p $PIPELINE)
Expand Down
3 changes: 1 addition & 2 deletions rnaseq/wafergen-scrna/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,8 +54,7 @@ Reports:

## Notes

Deduplication can be very slow for large data-sets. If deduplication
is not a must, we recommend to switch it off (`--no-dedup`).
Deduplication can be very slow for large data-sets. We recommend to not use it unless necessary (`--dedup`).

## References

Expand Down
8 changes: 6 additions & 2 deletions rnaseq/wafergen-scrna/Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -192,6 +192,8 @@ rule bamtag_split:
output:
taggedbam = '{prefix}/{sample}/bamtag/{sample}_R2.tagged.bam',
splitflag = touch('{prefix}/{sample}/bamtag/{sample}_R2.tagsplit.COMPLETE')
log:
'{prefix}/{sample}/bamtag/{sample}_R2.tagged.log'
benchmark:
'{prefix}/{sample}/bamtag/{sample}_R2.tag.bamtag_split.benchmark.log'
params:
Expand All @@ -201,10 +203,12 @@ rule bamtag_split:
threads:
1
shell:
'umis bamtag {input.bam} | samtools addreplacerg'
'{{'
' umis bamtag {input.bam} | samtools addreplacerg'
' -r ID:{params.sample} -r LB:{params.sample} -r SM:{params.sample} -r PL:{params.platform} -r PU:1 -r CN:{params.center}'
' -o - - | samtools sort -o {output.taggedbam} -T {output.taggedbam}.tmp -;'
' bamtools split -tag XC -in {output.taggedbam}'
' bamtools split -tag XC -in {output.taggedbam};'
' }} >& {log}'
# not guaranteed to create one file per barcode


Expand Down
Binary file modified rnaseq/wafergen-scrna/example-dag.pdf
Binary file not shown.
6 changes: 3 additions & 3 deletions rnaseq/wafergen-scrna/tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ cmd_base="$WRAPPER -c $TEST_CBINDEX -S $TEST_SAMPLECFG"


# DAG
SKIP_DAG=0
SKIP_DAG=1
if [ $SKIP_DAG -eq 0 ]; then
echo "DAG" | tee -a $log
odir=$($DOWNSTREAM_OUTDIR_PY -r $(whoami) -p $PIPELINE)
Expand Down Expand Up @@ -103,9 +103,9 @@ if [ $skip_real_runs -ne 1 ]; then
jid=$(tail -n 1 $odir/logs/submission.log | cut -f 3 -d ' ')
echo "Started job $jid writing to $odir. You will receive an email"

echo "Realrun no_dedup" | tee -a $log
echo "Realrun dedup" | tee -a $log
odir=$($DOWNSTREAM_OUTDIR_PY -r $(whoami) -p $PIPELINE)
eval $cmd_base --no-dedup -o $odir -v >> $log 2>&1
eval $cmd_base --dedup -o $odir -v >> $log 2>&1
# magically works even if line just contains id as in the case of pbspro
jid=$(tail -n 1 $odir/logs/submission.log | cut -f 3 -d ' ')
echo "Started job $jid writing to $odir. You will receive an email"
Expand Down
6 changes: 3 additions & 3 deletions rnaseq/wafergen-scrna/wafergen-scrna.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,8 +78,8 @@ def main():
d = 20.0
parser.add_argument('--frag-len-sd', default=d, type=float,
help="Estimated fragment length standard deviation (default={})".format(d))
parser.add_argument('--no-dedup', action="store_true",
help="Skip UMI-based deduplication (can be slow)")
parser.add_argument('--dedup', action="store_true",
help="Run UMI-based deduplication (slow for large data-sets!)")
args = parser.parse_args()

# Repeateable -v and -q for setting logging level.
Expand Down Expand Up @@ -139,7 +139,7 @@ def main():
cfg_dict['cell_barcodes'] = os.path.abspath(args.cell_barcodes)
cfg_dict['frag_len'] = args.frag_len
cfg_dict['frag_len_sd'] = args.frag_len_sd
cfg_dict['no_dedup'] = args.no_dedup
cfg_dict['no_dedup'] = not args.dedup
cfg_dict['scrnapipe_transform'] = os.path.abspath(os.path.join(
PIPELINE_BASEDIR, 'aux/transform.json'))
cfg_dict['scrna_conf_template'] = os.path.abspath(os.path.join(
Expand Down
Binary file modified somatic/lofreq-somatic/example-dag.pdf
Binary file not shown.
2 changes: 1 addition & 1 deletion somatic/lofreq-somatic/tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ wgs_cmd_base="$WRAPPER --normal-bam $DREAM_WGS_NORMAL_BAM --tumor-bam $DREAM_WGS


# DAG
SKIP_DAG=0
SKIP_DAG=1
if [ $SKIP_DAG -eq 0 ]; then
echo "DAG: WES" | tee -a $log
odir=$($DOWNSTREAM_OUTDIR_PY -r $(whoami) -p $PIPELINE)
Expand Down
Binary file modified somatic/mutect/example-dag.pdf
Binary file not shown.
2 changes: 1 addition & 1 deletion somatic/mutect/tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ wgs_cmd_base="$WRAPPER --normal-bam $DREAM_WGS_NORMAL_BAM --tumor-bam $DREAM_WGS


# DAG
SKIP_DAG=0
SKIP_DAG=1
if [ $SKIP_DAG -eq 0 ]; then
echo "DAG: WES" | tee -a $log
odir=$($DOWNSTREAM_OUTDIR_PY -r $(whoami) -p $PIPELINE)
Expand Down
16 changes: 9 additions & 7 deletions tools/pipelint.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,7 @@ def check_modules(pipeline_dir):

is_ok = True
module_cfgs = glob.glob(os.path.join(pipeline_dir, "cfg/modules.yaml"))
assert len(module_cfgs) > 0
modules = dict()
for cfg in module_cfgs:
with open(cfg) as fh:
Expand Down Expand Up @@ -139,7 +140,14 @@ def main(pipelinedirs,
logger.warning("include other existing tools here: check_cluster_conf.py...")

snakefiles = [os.path.join(d, "Snakefile") for d in pipelinedirs]


if not no_modules_check:
for d in pipelinedirs:
if not check_modules(d):
print("FAILED: Modules check for {}".format(d))
else:
print("OK: Modules check for {}".format(d))

includes = []
for f in snakefiles:
assert os.path.exists(f)
Expand All @@ -148,12 +156,6 @@ def main(pipelinedirs,
else:
print("OK: Expected files for {}".format(f))

if not no_modules_check:
if not check_modules(f):
print("FAILED: Modules check for {}".format(f))
else:
print("OK: Modules check for {}".format(f))

includes.extend(get_includes_from_snakefile(f))


Expand Down
Binary file modified variant-calling/gatk/example-dag.pdf
Binary file not shown.
2 changes: 1 addition & 1 deletion variant-calling/gatk/tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ wgs_cmd_base="$WRAPPER -1 $WGS_FQ1 -2 $WGS_FQ2 -s NA12878-WGS -t WGS --name 'tes


# DAG
SKIP_DAG=0
SKIP_DAG=1
if [ $SKIP_DAG -eq 0 ]; then
echo "DAG: WES" | tee -a $log
odir=$($DOWNSTREAM_OUTDIR_PY -r $(whoami) -p $PIPELINE)
Expand Down
Binary file modified variant-calling/lacer-lofreq/example-dag.pdf
Binary file not shown.
2 changes: 1 addition & 1 deletion variant-calling/lacer-lofreq/tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ wgs_cmd_base="$WRAPPER -1 $WGS_FQ1 -2 $WGS_FQ2 -s NA12878-WGS -t WGS --name 'tes


# DAG
SKIP_DAG=0
SKIP_DAG=1
if [ $SKIP_DAG -eq 0 ]; then
echo "DAG: WES" | tee -a $log
odir=$($DOWNSTREAM_OUTDIR_PY -r $(whoami) -p $PIPELINE)
Expand Down

0 comments on commit 765a705

Please sign in to comment.