Skip to content

ribosomeprofiling/rf_sample_data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sample Data for RiboFlow

A complete set of sample files to test and try RiboFlow pipeline.

All files and references herein are coming from the human genome.

The files in this repo are random subsets of the originally published data. For each sample, there is 1 million raw reads which is a fraction of the original data.

Required and Optional Files

Not all files are required by RiboFlow.

Required File Types:

  • Fastq files from ribosome profiling experiments
  • Annotation
  • Transcriptome Reference
  • Filter

Optional File Types:

  • Fastq files from RNA-Seq experiments
  • Metadata
  • Genome Reference
  • Post-Genome Reference

Fastq

Includes raw reads for ribosome profiling and RNA-Seq data. Each sample has two fastq files. All fastq files are obtained by taking a subset of reads from the publicly available data.

  1. Single cell ribosome profiling data with UMIs: (1cell-2, 1cell-4)
    NCBI GEO accession number GSE185732 published in Ozadam, Tonn, Han, et.al.
    This dataset contains UMIs which need to be removed prior to alignment.
  2. Bulk ribosome profiling and RNA-Seq data: ( GSM1606107 and GSM1606108 )
    NCBI GEO accession number GSE65778 published in Sidrauski et. al..

Note that RNA-Seq data is optional for RiboFlow and .ribo files.

Annotation

The tsv file contains transcript lengths. The bed file contains region boundaries; CDS, 5'UTR and 3'UTR.

Metadata

Contains metadata for the ribo files in yaml format.

Metadata is optional for RiboFlow and ribo files.

Transcriptome Reference

Bowtie2 index files for the transcriptome. The actual output of the RiboFlow pipeline, i.e., ribo files, is obtained using the reads that are mapped to the transcriptome reference.

Filter

Bowtie2 index coming from the filter sequences which are mainly ribosomal and tRNAs.

Genome

A mock Hisat2 reference in place of the entire genome. For actual data analysis, users should download the complete human genome such as hg38. Links are avaialble at Hisat2 website.

Note that this reference has no effect on the output ribo files since the reads are mapped to the transcriptome to generate ribo files. The reads which aren't map to the transcriptome are mapped to genome.

Genomic Reference is an optional parameter for RiboFlow.

Post-Genome

A sample bowtie2 reference file as post-genome reference.

Reads that are not mapped to the genome are mapped to post-genome reference.

Similar to the case of genome, post-genome parameter is optional and it has no effect on the output ribo files.

About

Sample files for RiboFlow

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published