Skip to content

Commit

Permalink
Merge pull request #582 from pavanvidem/rnaseq-de
Browse files Browse the repository at this point in the history
first release of RNAseq DE analysis, filtering and plotting workflow
  • Loading branch information
lldelisle authored Nov 12, 2024
2 parents c4ded97 + 0c49c70 commit a949697
Show file tree
Hide file tree
Showing 5 changed files with 1,225 additions and 0 deletions.
11 changes: 11 additions & 0 deletions workflows/transcriptomics/rnaseq-de/.dockstore.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
version: 1.2
workflows:
- name: main
subclass: Galaxy
publish: true
primaryDescriptorPath: /rnaseq-de-filtering-plotting.ga
testParameterFiles:
- /rnaseq-de-filtering-plotting-tests.yml
authors:
- name: Pavankumar Videm
orcid: 0000-0002-5192-126X
3 changes: 3 additions & 0 deletions workflows/transcriptomics/rnaseq-de/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## [0.1] 2024-10-25

First release.
23 changes: 23 additions & 0 deletions workflows/transcriptomics/rnaseq-de/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# RNA-seq Differential expression and filtering workflow

This workflow works only with an experimental setup containing exactly 2 conditions with at least 2 replicates per condition.

## Inputs dataset

- Counts from changed condition: Counts from experimental condition or changed condition. For eg. counts from treatment or knockdown samples.
- Counts from reference condition: Counts from reference condition or base condition. For eg. counts from untreated or wildtype samples.
- Gene Annotaton: The same GTF file used for mapping and quantification. It is used to annotate the DESeq2 results table. Ideally, the GTF file should contain `gene_id`, `gene_biotype` and `gene_name` attributes.

## Inputs values

- Count files have header: Indicate whether your input count files have a header line. Usually, count files generated from featureCounts tool have a header line whereas count files from RNA-STAR do not have.
- Adjusted p-value threshold: All the genes with an adjusted p-value less than this value are considered as differentially expressed. With a value of 0.05, expect 5% of false positives in the differentially expressed genes list. If empty, a default value of 0.05 is used.
- log2 fold change threshold: All the genes with an absolute fold change (regarless of up or down regulation) more than this value are selected. A log2 FC of 3 equals to an absolute fold change of 8 (2³). If empty, a default value of 1.0 is used.

## Processing

- The workflow uses DESeq2 for performing differential expression analysis. In addition to the results table, it also produces normalized counts table.
- The results table is annotated with gene positions, biotypes, gene symbols.
- The annotated results table is further filtered with the input adjusted p-value and log2 fold change thresholds.
- A valcano plot is generated with top 10 significantly differentially expressed genes.
- A heatmap of log trasformed normalized counts and another heatmap of Z-scores is generated.
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
- doc: Test outline for RNAseq_DE_filtering_plotting
job:
Gene Annotaton:
class: File
location: https://zenodo.org/records/14056162/files/Saccharomyces_cerevisiae.R64-1-1.113.gtf
filetype: gtf
Counts from changed condition:
class: Collection
collection_type: list
elements:
- class: File
identifier: SRR5085169 Counts Table
location: https://zenodo.org/records/14056162/files/SRR5085169.tabular
- class: File
identifier: SRR5085170 Counts Table
location: https://zenodo.org/records/14056162/files/SRR5085170.tabular
Counts from reference condition:
class: Collection
collection_type: list
elements:
- class: File
identifier: SRR5085167 Counts Table
location: https://zenodo.org/records/14056162/files/SRR5085167.tabular
- class: File
identifier: SRR5085168 Counts Table
location: https://zenodo.org/records/14056162/files/SRR5085168.tabular
Count files have header: true
Adjusted p-value threshold: 0.1
log2 fold change threshold: 0.5
outputs:
Annotated DESeq2 results table:
has_text_matching:
expression: "YML123C\t122.984408142053\t-1.67[0-9]*\t0.21[0-9]*\t-7.66[0-9]*\t1.81[0-9]*e-14\t5.04[0-9]*e-[0-9]*\tchrXIII\t24036\t25800\t-\tprotein_coding\tPHO84"
expression: "YKL081W\t264.71[0-9]*\t-0.54[0-9]*\t0.15[0-9]*\t-3.46[0-9]*\t0.00[0-9]*\t0.09[0-9]*\tchrXI\t282890\t284455\t+\tprotein_coding\tTEF4"
Heatmap of Z-scores:
has_size:
value: 19510
delta: 1000
DESeq2 Normalized Counts:
has_text_matching:
expression: "YML123C\t210.50[0-9]*\t180.36[0-9]*\t48.64[0-9]*\t52.43[0-9]*"
expression: "YKL081W\t313.76[0-9]*\t322.37[0-9]*\t223.48[0-9]*\t199.24[0-9]*"
DESeq2 Plots:
has_size:
value: 1193021
delta: 60000
Volcano Plot of DE genes:
has_size:
value: 301346
delta: 15000
Heatmap of log transformed normalized counts:
has_size:
value: 19501
delta: 1000
Loading

0 comments on commit a949697

Please sign in to comment.