-
Notifications
You must be signed in to change notification settings - Fork 0
Heritability enrichment analysis
Analysis of the human genome was based on the GRCh37 assembly, which can be found here. To simplify the process, we considered 22 autosomal chromosomes end to end as one single piece with a total of 2,881,033,286 bases. We cut the entire piece into 20,000 segments, resulting in about 144,052 bases per segment. We counted the number of PAM sequences in each segment, where the reverse strand was also considered. For example, when counting the number of the NGG sequence, we counted the number of both 5’-NGG-3’ and 5’-CCN-3’.
Briefly, running fecth.py
can get the position of the PAM sequence.
Taking GG for example,
python fecth.py human_g1k_v37_bk.fasta GG > GG_hg19_pos.tsv
can get GG positions into GG_hg19_pos.tsv file.
Considering the reverse strand, we also run
python fecth.py human_g1k_v37_bk.fasta CC > CC_hg19_pos.tsv
to get reversing GG positions.
Annotation of Cas enriched regions is based on the number of individual PAM within each segment. For Cas with more than one PAM sequence, we selected the top 2,000 segments that have the highest sum of all its PAMs, denoting Cas enriched regions. These regions were saved into the 'Top10.bed' file.
To investigate the magnitude of these Cas-enriched regions' contribution to human complex traits, we applied stratified linkage disequilibrium (LD) score regression (S-LDSC)\cite{Brendan2015LDSC, Finucane2015} to partition the heritability of each human complex trait. Run the following codes in the LDSC environment to do the heritability enrichment analysis, you can modify it to analyze other Cas/PAM easily.
bash make_annot_gzip_Cas.sh
bash compute_l2_Cas.sh
bash Partition_heritability_Cas.sh