This repository contains the pipelines & their parameters, downstream processing & analysis scripts as well as the transcriptome reference and annotation files used for the manuscript (link needed).
- RiboFlow pipeline and parameter files
- RNA-Seq Pipeline
- SNP calling
- TE of allelic-biased transcripts
- Ribosome Binding Proteins
- RPF Frames
- Analyses and figures
- References (in another repository)
-
.ribo files: To run the analysis scripts, ribo files, for human and mouse samples (link to GEO), need to be downloaded by the users.
-
Sequencing Files: To run the ribosome profiling pipeline, RiboFlow, and the RNA-Seq data, fastq files, available in GEO (link needed), are required.
Ribosome profiling data were processed using a modified version of RiboFlow, which is available here. This modified version of RiboFlow can handle ribosome profiling libraries with unique molecular identifiers (UMIs). We also provide parameter files for each RiboFlow run.
RNA-Seq data were NOT processed via RiboFlow. We used a custom pipeline to process RNA-Seq files whcih is available in this repository (see below).
We recommend running RiboFlow in a conda environment. Umi-tools is needed to process ribosome profiling data.
We processed the RNA-seq data using our custom pipeline. Reads were mapped, deduplicated and quantified by this Snakemake pipeline.
Scripts and python notebooks, for calling single nucleotide polymorphisms (SNPs), are provided in this folder. The list of SNPs, on the coding sequence of the transcripts, in VCF format, is also provided.
We also provide a detailed list of counts for each SNP in this file for ribosome profiling experiments. For RNA-Seq experiments, a similar list is provided.
Scripts used for identifying genes with allelic-biased expression and comparing their translation efficiency to that of bi-allelic genes are provided.
This folder contains scripts for studying riboisome binding protein analysis via transite.
We assigned reading frames 0, 1, or 2 to each ribosome protected footprint(RPF). We also corrected these assignments using the nucleotide content on the 3' end of the RPFs. This folder contains the tables of these frames and the python notebook used to carry out the analysis.
R scripts, used for differential ribosome occupancy, reproducibility and other figures and analyses are provided in this folder. Note that additional packages (for example Enhanced Volcano, Seurat, etc.) are needed to run these R scripts. The paths to the .ribo files and count files are given relative to the scripts. The users might need to adjust these paths based on their active directory or file organization.
These scripts heavily use the number of reads on the CDS of the transcripts. For RNA-Seq samples, this is given in a csv file. For ribosome profiling samples, .ribo files are used to extract the counts. Also, for convenience, we combined ribosome profiling and RNA-Seq counts into one file.
In a separate repository, we provide the reference files for the mouse and human transcriptomes used in this study. These files are used for running RiboFlow (for ribosome profiling experiments) and our RNA-Seq pipelines.
In the mouse transcriptome, we masked the nucleotides, overlapping the SNPs, with Ns.
Also, in human and mouse rtRNA filters, we added some extra sequences for non-coding RNAs.