Multimodal single cell sequencing implicates chromatin accessibility and genetic background in diabetic kidney disease progression
*Parker C. Wilson, *Yoshiharu Muto, Haojia Wu, Anil Karihaloo, Sushrut S. Waikar, Benjamin D. Humphreys
*These authors contributed equally
If you use any of the code or workflows in this repository please cite our manuscript in Nature Communications link
Wilson PC, Muto Y, Wu H, Karihaloo A, Waikar SS, Humphreys BD.
Multimodal single cell sequencing implicates chromatin accessibility and genetic background in diabetic kidney disease progression
Nat Commun 13, 5253 (2022).
https://doi.org/10.1038/s41467-022-32972-z
PMID: 36068241
The code associated with this publication has been deposited in Zenodo
The bioRxiv preprint can be found here
Single cell sequencing data generated for this manuscript (snRNA-seq: 1 control, 2 DKD; snATAC-seq: 1 control, 7 DKD) and cellranger v4.0 / cellranger-atac v2.0 gene and peak count matrices for all analyzed samples (snRNA-seq: 6 control, 5 DKD; snATAC-seq: 6 control, 7 DKD) can be found in GEO.
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE195460
CUT&RUN and bulk ATAC data generated for this manuscript can be found in a second repository:
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE195443
Sequencing data generated for previously-published kidney snATAC (5 control) and snRNA libraries (5 control, 3 DKD) can be found at the links below:
Note that there are two different sequencing configurations for the snATAC libraries. Samples 1-3 have an I1 index file, whereas samples 4-5 do not. The I1 index files are not needed to run cellranger-atac count on samples 4-5. Index sequences used for demultplexing are printed in the header lines of R1 and R2. R1 and R3 contain the paired-end reads and R2 contains the cell barcode information.
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE151302
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE131882
https://data.humancellatlas.org/explore/projects/2af52a13-65cb-4973-b513-39be38f2df3f
snATAC barcodes and snRNA barcodes used for the analysis can be found in this github repository
Finalized R objects can be visualized interactively or downloaded from the cellxgene website. The snATAC object only includes the gene activity "RNA" assay and does not have a raw or normalized peak count matrix.
Welcome to our GitHub repository!
Here you will find analysis scripts for our manuscript where we compare control kidney to diabetic kidney disease (DKD) samples using paired snRNAseq and snATACseq. Please contact the corresponding author, Dr. Benjamin Humphreys, with questions or comments.
Thanks,
Parker and Yoshi
Visit the Wilson lab website:
www.parkerwilsonlab.com
Visit the Humphreys lab website:
www.humphreyslab.com
Check out our interactive datasets with Kidney Interactive mulTiomics (KIT):
http://humphreyslab.com/SingleCell/
Find us on Twitter:
@parkercwilson
@YoshiharuMuto
@HumphreysLab
Find us on Docker Hub:
p4rkerw@dockerhub
snRNA preprocessing and quality control workflow
-
Align and count each snRNA library (cellranger/cellranger_rna_count.sh) Libraries were generated from a nuclear dissociation and aligned to refdata-gex-GRCh38-2020-A, which can be downloaded from the 10X genomics website: https://support.10xgenomics.com/.
-
Aggregate the snRNA libraries using the cellranger/rna_aggr.csv file (cellranger/cellranger_rna_aggr.sh)
-
Run standard Seurat QC on the aggregated snRNA data and eliminate doublet barcodes identified by DoubletFinder (snRNA_prep/step1_prep.R)
-
Make final cell annotations and generate the final snRNA object (snRNA_prep/step2_anno.R) - The snRNA_prep/step2_anno.rds object is considered the final snRNA object for downstream analysis.
-
(optional) Correct for ambient RNA contamination (snRNA_prep/step3_soupx.R) - This step was not included in the final workflow because we had a very low ambient RNA contamination.
snATAC preprocessing and quality control workflow
-
Align and count each ATAC library (cellranger/cellranger_atac_count.sh)
Libraries were generated from a nuclear dissociation and aligned to refdata-cellranger-arc-A-2.0.0 which can be downloaded from the 10X genomics website: https://support.10xgenomics.com/. -
Aggregate the snATAC libraries using the cellranger/atac_aggr.csv file (cellranger/cellranger_atac_aggr.sh)
-
Run AMULET on individual snATAC libraries to identify doublet barcodes (snATAC_prep/step0_amulet.sh)
-
Run standard Signac QC on the aggregated snATAC data, eliminate doublet barcodes identified by AMULET, and transfer labels from the final snRNA object to predict cell types. (snATAC_prep/step1_prep.R)
-
Filter barcodes using the predicted.id thresholds from Seurat label transfer and call cell-specific peaks using the Signac MACS2 wrapper (snATAC_prep/step2_call_m2peaks.R)
-
Create a new peak-cell matrix with the MACS2 peaks using the Signac FeatureMatrix function (snATAC_prep/step3_count_m2peaks.R)
-
Perform additional filtering based on the fragments of reads in MACS2 peaks and clustering of snATAC cell types (snATAC_prep/step4_anno_m2peaks.R). The annotations in this step are considered the final barcode annotations for the downstream analysis.
-
Run chromVAR on the final snATAC object (snATAC_prep/step5_chromVAR.R)
-
Identify cis-coaccessibility networks in the final snATAC object (snATAC_prep/step6_ccan.R)
snRNA and snATAC standard analysis workflow
-
Find cell-specific genes and differentially expressed genes (DEG) in DKD in the snRNA dataset (analysis/find_deg.R)
-
Find cell-specific peaks and differentially accessible chromatin (DAR) in DKD in the snATAC dataset (analysis/find_dar.R)
find_atac_dar.R -
Find cell-specific motifs and differential chromVAR activities in DKD in the snATAC dataset (analysis/find_motifs.R)
-
Identify peak-gene links using the Signac LinkPeaks function (analysis/find_gene_enh.R)
-
Perform footprinting analysis for NR3C1, NR3C2, REL, and HNF4A (analysis/find_footprints.R)
-
Annotate DAR using ChIPseeker and the Fantom promoter / enhancer database (analysis/dar_enh_anno.R)
-
Visualize GR interactions with cell-specific DAR using circlize (analysis/circlize_ccans_gre.R)
Allele-specific workflow
-
Use 🌶️SALSA to genotype, phase, correct for mapping bias, and generate single cell allele-specific counts for all libraries with both snRNA and snATAC (6 control, 5 DKD) - (allele_specific/run_salsa.sh)
-
Generate an allele-specific count matrix for each predicted peak-gene combination (allele_specific/analyze_allele_counts.R)
-
Model the allele-specific count matrix using a glmer with lme4 (allele_specific/model_allele_counts.R)
Bulk ATAC workflow
0-7. Align and deduplicate bulk ATAC before calling MACS2 bulk ATAC peaks (bulk_atac/*)
CUT&RUN workflow
- Align fastq with bowtie2 and call peaks with MACS2 (cut_and_run/*)
Bulk RNA-seq workflow
-
Download publicly-available bulk RNA-seq for human DKD (bulk_rnaseq/sra_download_fastq.sh)
-
Count the fastq with salmon (bulk_rnaseq/salmon_count.sh)
-
Analyze the counts with DESeq2 (bulk_rnaseq/find_bulk_degs.R)
Partitioned heritability workflow
-
Generate bed files using the cell-specific peaks, cell-specific DAR that change in DKD, and allele-specific effect peaks (ldsc/step0_prep_bed.R)
-
Munge publicly-available GWAS statistics into ldsc format (ldsc/step1_munge.sh)
-
Generate custom annotations and ld scores for each of the bed files (ldsc/step2a_annoscore.sh, ldsc/step2b_annoscore.sh)
-
Partition heritability using the cell-type-specific workflow and estimate proportion of heritability for each annotation (ldsc/step3_partition.sh)
Transcribed cis-regulatory element workflow with SCAFE
-
Generate tCRE counts for each snRNA library (scafe/step1_scafe.sh)
-
Pool the tCRE into consensus tCRE (scafe/step2_scafe_pool.sh)
-
Process the aggregated tCRE and identify differential tCRE between conditions (scafe/step3_scafe_prep4.R)
Methylation workflow
- Download publicly-available differentially methylated regions and intersect with DAR (methylation/methylation_comparison.R)
Figures
Scripts for generating figures in the manuscript.