-
Notifications
You must be signed in to change notification settings - Fork 3
Updating gene and gene_alias tables
The cBioPortal scripts package provides a simple script to update your local gene
and gene_alias
tables based on a new version of the NCBI genes file.
Execute these steps in case you want to reset your DB to the most recent genes list from NCBI.
Steps:
1- Remove all studies from your installation. You can use the study removal tool
2- (if DB engine support FK constraint, e.g. InnoDB) Drop constraints:
ALTER TABLE cosmic_mutation
DROP FOREIGN KEY cosmic_mutation_ibfk_1;
ALTER TABLE sanger_cancer_census
DROP FOREIGN KEY sanger_cancer_census_ibfk_1;
ALTER TABLE uniprot_id_mapping
DROP FOREIGN KEY uniprot_id_mapping_ibfk_1;
3- Empty tables gene
and gene_alias
TRUNCATE TABLE gene_alias;
TRUNCATE TABLE gene;
4- Restart cBioPortal (restart webserver) to clean-up any cached gene lists.
5- Import gene data again (see section below)
6- gene
and gene_alias
tables to verify that they are filled correctly.
7- Clean-up old data:
DELETE FROM cosmic_mutation where ENTREZ_GENE_ID not in (SELECT ENTREZ_GENE_ID from gene);
DELETE FROM sanger_cancer_census where ENTREZ_GENE_ID not in (SELECT ENTREZ_GENE_ID from gene);
DELETE FROM uniprot_id_mapping where ENTREZ_GENE_ID not in (SELECT ENTREZ_GENE_ID from gene);
DELETE FROM interaction where GENE_A not in (SELECT ENTREZ_GENE_ID from gene) or GENE_B not in (SELECT ENTREZ_GENE_ID from gene);
DELETE FROM drug_interaction where target not in (SELECT ENTREZ_GENE_ID from gene);
DELETE FROM mutation_event where ENTREZ_GENE_ID not in (SELECT ENTREZ_GENE_ID from gene);
DELETE FROM cna_event where ENTREZ_GENE_ID not in (SELECT ENTREZ_GENE_ID from gene);
8- (if DB engine support FK constraint, e.g. InnoDB) Restore constraints:
ALTER TABLE cosmic_mutation
ADD FOREIGN KEY (`ENTREZ_GENE_ID`) REFERENCES `gene` (`ENTREZ_GENE_ID`);
ALTER TABLE sanger_cancer_census
ADD FOREIGN KEY (`ENTREZ_GENE_ID`) REFERENCES `gene` (`ENTREZ_GENE_ID`);
ALTER TABLE uniprot_id_mapping
ADD FOREIGN KEY (`ENTREZ_GENE_ID`) REFERENCES `gene` (`ENTREZ_GENE_ID`);
TODO - harder process (will probably not be needed once re-importing existing studies is made easy - which should be the case soon)
To run the script type the following commands when in the folder <your_cbioportal_dir>/core/src/main/scripts
:
export PORTAL_HOME=<your_cbioportal_dir>
and then
./importGenes.pl <ncbi_genes.txt>
If you also wish to add the gene lengths to your gene table, also download this file for ChGr38. After downloading, go to your downloads directory and run the following command:
grep -v ^# gencode.v24.annotation.gtf | perl -ne 'chomp; @c=split(/\t/); $c[0]=~s/^chr//; $c[3]--; $c[8]=~s/.*gene_name\s\"([^"]+)\".*/$1/; print join("\t",@c[0,3,4,8,5,6])."\n" if($c[2] eq "CDS" or $c[2] eq "exon")' > all_exon_loci.bed
TODO: add documentation about running importGenes.pl with ncbi_genes.txt and all_exon_loci.bed
./importGenes.pl Homo_sapiens_gene_info.txt