Merge pull request #46 from nf-core/dev

First release
nf-core · Apr 27, 2021 · 45ae3c0 · 45ae3c0
2 parents 8fe0321 + 1c60c19
commit 45ae3c0
Show file tree

Hide file tree

Showing 50 changed files with 1,811 additions and 1,552 deletions.
diff --git a/.gitattributes b/.gitattributes
@@ -1,4 +1 @@
 *.config linguist-language=nextflow
-*.fa filter=lfs diff=lfs merge=lfs -text
-SA filter=lfs diff=lfs merge=lfs -text
-Genome filter=lfs diff=lfs merge=lfs -text
diff --git a/.github/.dockstore.yml b/.github/.dockstore.yml
@@ -3,3 +3,4 @@ version: 1.2
 workflows:
   - subclass: nfl
     primaryDescriptorPath: /nextflow.config
+    publish: True
diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md
@@ -69,7 +69,7 @@ If you wish to contribute a new step, please use the following coding standards:
 2. Write the process block (see below).
 3. Define the output channel if needed (see below).
 4. Add any new flags/options to `nextflow.config` with a default (see below).
-5. Add any new flags/options to `nextflow_schema.json` with help text (with `nf-core schema build .`)
+5. Add any new flags/options to `nextflow_schema.json` with help text (with `nf-core schema build .`).
 6. Add any new flags/options to the help message (for integer/text parameters, print to help the corresponding `nextflow.config` parameter).
 7. Add sanity checks for all relevant parameters.
 8. Add any new software to the `scrape_software_versions.py` script in `bin/` and the version command to the `scrape_software_versions` process in `main.nf`.
@@ -87,7 +87,7 @@ Once there, use `nf-core schema build .` to add to `nextflow_schema.json`.
 
 ### Default processes resource requirements
 
-Sensible defaults for process resource requirements (CPUs / memory / time) for a process should be defined in `conf/base.config`. These should generally be specified generic with `withLabel:` selectors so they can be shared across multiple processes/steps of the pipeline. A nf-core standard set of labels that should be followed where possible can be seen in the [nf-core pipeline template](https://github.com/nf-core/tools/blob/master/nf_core/pipeline-template/%7B%7Bcookiecutter.name_noslash%7D%7D/conf/base.config), which has the default process as a single core-process, and then different levels of multi-core configurations for increasingly large memory requirements defined with standardised labels.
+Sensible defaults for process resource requirements (CPUs / memory / time) for a process should be defined in `conf/base.config`. These should generally be specified generic with `withLabel:` selectors so they can be shared across multiple processes/steps of the pipeline. A nf-core standard set of labels that should be followed where possible can be seen in the [nf-core pipeline template](https://github.com/nf-core/tools/blob/master/nf_core/pipeline-template/conf/base.config), which has the default process as a single core-process, and then different levels of multi-core configurations for increasingly large memory requirements defined with standardised labels.
 
 The process resources can be passed on to the tool dynamically within the process with the `${task.cpu}` and `${task.memory}` variables in the `script:` block.
 

diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md
@@ -55,7 +55,7 @@ Have you provided the following extra information/files:
 
 ## Container engine
 
-- Engine: <!-- [e.g. Conda, Docker, Singularity or Podman] -->
+- Engine: <!-- [e.g. Conda, Docker, Singularity, Podman, Shifter or Charliecloud] -->
 - version: <!-- [e.g. 1.0.0] -->
 - Image tag: <!-- [e.g. nfcore/clipseq:1.0.0] -->
 

diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md
@@ -1,6 +1,6 @@
 ---
 name: Feature request
-about: Suggest an idea for the nf-core website
+about: Suggest an idea for the nf-core/clipseq pipeline
 labels: enhancement
 ---
 

diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
@@ -15,9 +15,9 @@ Learn more about contributing: [CONTRIBUTING.md](https://github.com/nf-core/clip
 
 - [ ] This comment contains a description of changes (with reason).
 - [ ] If you've fixed a bug or added code that should be tested, add tests!
- - [ ] If you've added a new tool - add to the software_versions process and a regex to `scrape_software_versions.py`
- - [ ] If you've added a new tool - have you followed the pipeline conventions in the [contribution docs](https://github.com/nf-core/clipseq/tree/master/.github/CONTRIBUTING.md)
- - [ ] If necessary, also make a PR on the nf-core/clipseq _branch_ on the [nf-core/test-datasets](https://github.com/nf-core/test-datasets) repository.
+  - [ ] If you've added a new tool - add to the software_versions process and a regex to `scrape_software_versions.py`
+  - [ ] If you've added a new tool - have you followed the pipeline conventions in the [contribution docs](https://github.com/nf-core/clipseq/tree/master/.github/CONTRIBUTING.md)
+  - [ ] If necessary, also make a PR on the nf-core/clipseq _branch_ on the [nf-core/test-datasets](https://github.com/nf-core/test-datasets) repository.
 - [ ] Make sure your code lints (`nf-core lint .`).
 - [ ] Ensure the test suite passes (`nextflow run . -profile test,docker`).
 - [ ] Usage Documentation in `docs/usage.md` is updated.

diff --git a/.github/workflows/awsfulltest.yml b/.github/workflows/awsfulltest.yml
@@ -9,6 +9,16 @@ on:
     types: [completed]
   workflow_dispatch:
 
+
+env:
+  AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
+  AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
+  TOWER_ACCESS_TOKEN: ${{ secrets.AWS_TOWER_TOKEN }}
+  AWS_JOB_DEFINITION: ${{ secrets.AWS_JOB_DEFINITION }}
+  AWS_JOB_QUEUE: ${{ secrets.AWS_JOB_QUEUE }}
+  AWS_S3_BUCKET: ${{ secrets.AWS_S3_BUCKET }}
+
+
 jobs:
   run-awstest:
     name: Run AWS full tests
@@ -23,21 +33,10 @@ jobs:
       - name: Install awscli
         run: conda install -c conda-forge awscli
       - name: Start AWS batch job
-        # TODO nf-core: You can customise AWS full pipeline tests as required
-        # Add full size test data (but still relatively small datasets for few samples)
-        # on the `test_full.config` test runs with only one set of parameters
-        # Then specify `-profile test_full` instead of `-profile test` on the AWS batch command
-        env:
-          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
-          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
-          TOWER_ACCESS_TOKEN: ${{ secrets.AWS_TOWER_TOKEN }}
-          AWS_JOB_DEFINITION: ${{ secrets.AWS_JOB_DEFINITION }}
-          AWS_JOB_QUEUE: ${{ secrets.AWS_JOB_QUEUE }}
-          AWS_S3_BUCKET: ${{ secrets.AWS_S3_BUCKET }}
         run: |
           aws batch submit-job \
             --region eu-west-1 \
             --job-name nf-core-clipseq \
             --job-queue $AWS_JOB_QUEUE \
             --job-definition $AWS_JOB_DEFINITION \
-            --container-overrides '{"command": ["nf-core/clipseq", "-r '"${GITHUB_SHA}"' -profile test --outdir s3://'"${AWS_S3_BUCKET}"'/clipseq/results-'"${GITHUB_SHA}"' -w s3://'"${AWS_S3_BUCKET}"'/clipseq/work-'"${GITHUB_SHA}"' -with-tower"], "environment": [{"name": "TOWER_ACCESS_TOKEN", "value": "'"$TOWER_ACCESS_TOKEN"'"}]}'
+            --container-overrides '{"command": ["nf-core/clipseq", "-r '"${GITHUB_SHA}"' -profile test_full --outdir s3://'"${AWS_S3_BUCKET}"'/clipseq/results-'"${GITHUB_SHA}"' -w s3://'"${AWS_S3_BUCKET}"'/clipseq/work-'"${GITHUB_SHA}"' -with-tower"], "environment": [{"name": "TOWER_ACCESS_TOKEN", "value": "'"$TOWER_ACCESS_TOKEN"'"}]}'
diff --git a/.github/workflows/awstest.yml b/.github/workflows/awstest.yml
@@ -6,6 +6,16 @@ name: nf-core AWS test
 on:
   workflow_dispatch:
 
+
+env:
+  AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
+  AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
+  TOWER_ACCESS_TOKEN: ${{ secrets.AWS_TOWER_TOKEN }}
+  AWS_JOB_DEFINITION: ${{ secrets.AWS_JOB_DEFINITION }}
+  AWS_JOB_QUEUE: ${{ secrets.AWS_JOB_QUEUE }}
+  AWS_S3_BUCKET: ${{ secrets.AWS_S3_BUCKET }}
+
+
 jobs:
   run-awstest:
     name: Run AWS tests
@@ -20,16 +30,7 @@ jobs:
       - name: Install awscli
         run: conda install -c conda-forge awscli
       - name: Start AWS batch job
-        # TODO nf-core: You can customise CI pipeline run tests as required
-        # For example: adding multiple test runs with different parameters
         # Remember that you can parallelise this by using strategy.matrix
-        env:
-          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
-          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
-          TOWER_ACCESS_TOKEN: ${{ secrets.AWS_TOWER_TOKEN }}
-          AWS_JOB_DEFINITION: ${{ secrets.AWS_JOB_DEFINITION }}
-          AWS_JOB_QUEUE: ${{ secrets.AWS_JOB_QUEUE }}
-          AWS_S3_BUCKET: ${{ secrets.AWS_S3_BUCKET }}
         run: |
           aws batch submit-job \
           --region eu-west-1 \

diff --git a/.github/workflows/branch.yml b/.github/workflows/branch.yml
@@ -13,7 +13,7 @@ jobs:
       - name: Check PRs
         if: github.repository == 'nf-core/clipseq'
         run: |
-          { [[ ${{github.event.pull_request.head.repo.full_name}} == nf-core/clipseq ]] && [[ $GITHUB_HEAD_REF = "dev" ]]; } || [[ $GITHUB_HEAD_REF == "patch" ]]
+          { [[ ${{github.event.pull_request.head.repo.full_name }} == nf-core/clipseq ]] && [[ $GITHUB_HEAD_REF = "dev" ]]; } || [[ $GITHUB_HEAD_REF == "patch" ]]
 
 
       # If the above check failed, post a comment on the PR explaining the failure
@@ -23,13 +23,22 @@ jobs:
         uses: mshick/add-pr-comment@v1
         with:
           message: |
+            ## This PR is against the `master` branch :x:
+
+            * Do not close this PR
+            * Click _Edit_ and change the `base` to `dev`
+            * This CI test will remain failed until you push a new commit
+
+            ---
+
             Hi @${{ github.event.pull_request.user.login }},
 
-            It looks like this pull-request is has been made against the ${{github.event.pull_request.head.repo.full_name}} `master` branch.
+            It looks like this pull-request is has been made against the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `master` branch.
             The `master` branch on nf-core repositories should always contain code from the latest release.
-            Because of this, PRs to `master` are only allowed if they come from the ${{github.event.pull_request.head.repo.full_name}} `dev` branch.
+            Because of this, PRs to `master` are only allowed if they come from the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `dev` branch.
 
             You do not need to close this PR, you can change the target branch to `dev` by clicking the _"Edit"_ button at the top of this page.
+            Note that even after this, the test will continue to show as failing until you push a new commit.
 
             Thanks again for your contribution!
           repo-token: ${{ secrets.GITHUB_TOKEN }}

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -20,7 +20,7 @@ jobs:
     strategy:
       matrix:
         # Nextflow versions: check pipeline minimum and current latest
-        nxf_ver: ['20.04.0', '']
+        nxf_ver: ['20.04.0', '21.03.0-edge']
     steps:
       - name: Check out pipeline code
         uses: actions/checkout@v2
@@ -50,8 +50,5 @@ jobs:
           sudo mv nextflow /usr/local/bin/
 
       - name: Run pipeline with test data
-        # TODO nf-core: You can customise CI pipeline run tests as required
-        # For example: adding multiple test runs with different parameters
-        # Remember that you can parallelise this by using strategy.matrix
         run: |
           nextflow run ${GITHUB_WORKSPACE} -profile test,docker
diff --git a/.github/workflows/linting.yml b/.github/workflows/linting.yml
@@ -19,6 +19,34 @@ jobs:
         run: npm install -g markdownlint-cli
       - name: Run Markdownlint
         run: markdownlint ${GITHUB_WORKSPACE} -c ${GITHUB_WORKSPACE}/.github/markdownlint.yml
+
+      # If the above check failed, post a comment on the PR explaining the failure
+      - name: Post PR comment
+        if: failure()
+        uses: mshick/add-pr-comment@v1
+        with:
+          message: |
+            ## Markdown linting is failing
+
+            To keep the code consistent with lots of contributors, we run automated code consistency checks.
+            To fix this CI test, please run:
+
+            * Install `markdownlint-cli`
+                * On Mac: `brew install markdownlint-cli`
+                * Everything else: [Install `npm`](https://www.npmjs.com/get-npm) then [install `markdownlint-cli`](https://www.npmjs.com/package/markdownlint-cli) (`npm install -g markdownlint-cli`)
+            * Fix the markdown errors
+                * Automatically: `markdownlint . --config .github/markdownlint.yml --fix`
+                * Manually resolve anything left from `markdownlint . --config .github/markdownlint.yml`
+
+            Once you push these changes the test should pass, and you can hide this comment :+1:
+
+            We highly recommend setting up markdownlint in your code editor so that this formatting is done automatically on save. Ask about it on Slack for help!
+
+            Thanks again for your contribution!
+          repo-token: ${{ secrets.GITHUB_TOKEN }}
+          allow-repeats: false
+
+
   YAML:
     runs-on: ubuntu-latest
     steps:
@@ -29,7 +57,34 @@ jobs:
       - name: Install yaml-lint
         run: npm install -g yaml-lint
       - name: Run yaml-lint
-        run: yamllint $(find ${GITHUB_WORKSPACE} -type f -name "*.yml")
+        run: yamllint $(find ${GITHUB_WORKSPACE} -type f -name "*.yml" -o -name "*.yaml")
+
+      # If the above check failed, post a comment on the PR explaining the failure
+      - name: Post PR comment
+        if: failure()
+        uses: mshick/add-pr-comment@v1
+        with:
+          message: |
+            ## YAML linting is failing
+
+            To keep the code consistent with lots of contributors, we run automated code consistency checks.
+            To fix this CI test, please run:
+
+            * Install `yaml-lint`
+                * [Install `npm`](https://www.npmjs.com/get-npm) then [install `yaml-lint`](https://www.npmjs.com/package/yaml-lint) (`npm install -g yaml-lint`)
+            * Fix the markdown errors
+                * Run the test locally: `yamllint $(find . -type f -name "*.yml" -o -name "*.yaml")`
+                * Fix any reported errors in your YAML files
+
+            Once you push these changes the test should pass, and you can hide this comment :+1:
+
+            We highly recommend setting up yaml-lint in your code editor so that this formatting is done automatically on save. Ask about it on Slack for help!
+
+            Thanks again for your contribution!
+          repo-token: ${{ secrets.GITHUB_TOKEN }}
+          allow-repeats: false
+
+
   nf-core:
     runs-on: ubuntu-latest
     steps:
@@ -69,7 +124,7 @@ jobs:
         if: ${{ always() }}
         uses: actions/upload-artifact@v2
         with:
-          name: linting-log-file
+          name: linting-logs
           path: |
             lint_log.txt
             lint_results.md

diff --git a/.nf-core-lint.yaml b/.nf-core-lint.yaml
@@ -0,0 +1,3 @@
+files_unchanged:
+  - assets/multiqc_config.yaml
+  - lib/NfcoreSchema.groovy
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -3,16 +3,26 @@
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
-## v1.0 - [date]
+## [1.0.0] - 2021-04-27
 
 Initial release of nf-core/clipseq, created with the [nf-core](https://nf-co.re/) template.
 
-### `Added`
-
-smrna_species parameter
-
-### `Fixed`
-
-### `Dependencies`
-
-### `Deprecated`
+### Pipeline summary
+
+1. Adapter and quality trimming (`Cutadapt`)
+2. Pre-mapping to e.g. rRNA and tRNA sequences (`Bowtie 2`)
+3. Genome mapping (`STAR`)
+4. UMI-based deduplication (`UMI-tools`)
+5. Crosslink identification (`BEDTools`)
+6. Bedgraph coverage track generation (`BEDTools`)
+7. Peak calling (multiple options):
+    - `iCount`
+    - `Paraclu`
+    - `PureCLIP`
+    - `Piranha`
+8. Motif detection (`DREME`)
+9. Quality control:
+    - Sequencing quality control (`FastQC`)
+    - Library complexity (`Preseq`)
+    - Regional distribution (`RSeQC`)
+10. Overall pipeline run and QC summaries and peak calling comparisons (`MultiQC`)
diff --git a/CITATIONS.md b/CITATIONS.md
@@ -33,6 +33,35 @@
 * [UMI-tools](https://pubmed.ncbi.nlm.nih.gov/28100584/)
   > Smith T, Heger A, Sudbery I. UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy Genome Res. 2017 Mar;27(3):491-499. doi: 10.1101/gr.209601.116. Epub 2017 Jan 18. PubMed PMID: 28100584; PubMed Central PMCID: PMC5340976.
 
+* [Cutadapt](http://journal.embnet.org/index.php/embnetjournal/article/view/200)
+  > Martin, M., 2011. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal, 17(1), p.10.
+
+* [Bowtie2](https://pubmed.ncbi.nlm.nih.gov/22388286/)
+  > Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012 Mar 4;9(4):357-9. doi: 10.1038/nmeth.1923. PMID: 22388286; PMCID: PMC3322381.
+
+* [Subread](https://pubmed.ncbi.nlm.nih.gov/23558742/)
+  > Liao Y, Smyth GK, Shi W. The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Res. 2013 May 1;41(10):e108. doi: 10.1093/nar/gkt214. Epub 2013 Apr 4. PMID: 23558742; PMCID: PMC3664803.
+
+* [iCount](https://icount.readthedocs.io/en/latest/#)
+  > Curk et al. (2019) iCount: protein-RNA interaction iCLIP data analysis (in preparation).
+
+* [PureCLIP](https://pubmed.ncbi.nlm.nih.gov/29284540/)
+  > Krakau S, Richard H, Marsico A. PureCLIP: capturing target-specific protein-RNA interaction footprints from single-nucleotide CLIP-seq data. Genome Biol. 2017 Dec 28;18(1):240. doi: 10.1186/s13059-017-1364-2. PMID: 29284540; PMCID: PMC5746957.
+
+* [Piranha](https://pubmed.ncbi.nlm.nih.gov/23024010/)
+  > Uren PJ, Bahrami-Samani E, Burns SC, Qiao M, Karginov FV, Hodges E, Hannon GJ, Sanford JR, Penalva LO, Smith AD. Site identification in high-throughput RNA-protein interaction data. Bioinformatics. 2012 Dec 1;28(23):3013-20. doi: 10.1093/bioinformatics/bts569. Epub 2012 Sep 28. PMID: 23024010; PMCID: PMC3509493.
+
+* [Paraclu](https://pubmed.ncbi.nlm.nih.gov/18032727/)
+  > Frith MC, Valen E, Krogh A, Hayashizaki Y, Carninci P, Sandelin A. A code for transcription initiation in mammalian genomes. Genome Res. 2008 Jan;18(1):1-12. doi: 10.1101/gr.6831208. Epub 2007 Nov 21. PMID: 18032727; PMCID: PMC2134772.
+
+* [Meme](https://pubmed.ncbi.nlm.nih.gov/19458158/)
+  > Bailey TL, Boden M, Buske FA, Frith M, Grant CE, Clementi L, Ren J, Li WW, Noble WS. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 2009 Jul;37(Web Server issue):W202-8. doi: 10.1093/nar/gkp335. Epub 2009 May 20. PMID: 19458158; PMCID: PMC2703892.
+
+* [R](https://www.r-project.org/)
+  > R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
+
+* [Pigz](https://zlib.net/pigz/)
+
 ## Software packaging/containerisation tools
 
 * [Anaconda](https://anaconda.com)