-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Thank you for your interest!! Here, I will briefly describe my analysis in the paper "Contribution of CRISPRable DNA on human complex traits" where we investigated the 21 Cas enriched genomic regions' contribution to human complex traits and diseases.
Analysis of the human genome was based on the GRCh37 assembly, which can be found here. To simplify the process, we considered 22 autosomal chromosomes end to end as one single piece with a total of 2,881,033,286 bases. We cut the entire piece into 20,000 segments, resulting in about 144,052 bases per segment. We counted the number of PAM sequences in each segment, where the reverse strand was also considered. For example, when counting the number of the NGG sequence, we counted the number of both 5’-NGG-3’ and 5’-CCN-3’.
Briefly, running fecth.py
can get the position of the PAM sequence.
Taking GG for example,
python fecth.py human_g1k_v37_bk.fasta GG > GG_hg19_pos.tsv
can get GG positions into GG_hg19_pos.tsv file.
Considering the reverse strand, we also run
python fecth.py human_g1k_v37_bk.fasta CC > CC_hg19_pos.tsv
to get reversing GG positions.
Annotation of Cas enriched regions is based on the number of individual PAM within each segment. For Cas with more than one PAM sequence, we selected the top 2,000 segments that have the highest sum of all its PAMs, denoting Cas enriched regions. These regions were saved into the 'Top10.bed' file.
To investigate the magnitude of these Cas-enriched regions' contribution to human complex traits, we applied stratified linkage disequilibrium (LD) score regression (S-LDSC)\cite{Brendan2015LDSC, Finucane2015} to partition the heritability of each human complex trait. Run the following codes in the LDSC environment to do the heritability enrichment analysis, you can modify it to analyze other Cas/PAM easily.
bash make_annot_gzip_Cas.sh
bash compute_l2_Cas.sh
bash Partition_heritability_Cas.sh
The functional annotations for the autosomes are available here, thanks to the efforts of the Alkes group from the Broad Institute. In our analysis, we test the enrichment of SNPs annotated for our Cas on the regulatory elements using Fisher's exact test. See Cas_SNP_baseline_OR.R
.
For the X chromosome, we kept the segments in the same length (144 kb) as we did for the autosomes, resulting in 1070 segments in total. Top10% of the 1070 segments were selected as the Cas enriched genomic regions on the X chromosome. Same as the autosomes, SNPs present in the top10% regions were annotated as 1 and other SNPs were annotated as 0 for each Cas. We then queried 125,497 Hapmap3 SNPs on https://www.snp-nexus.org/v4/ for four gene annotations from UCSC, eight epigenetic markers from Roadmap, and five regulator elements from Ensembl. The odds ratio was also obtained from Fisher's exact test.