Skip to content

Updating gene and gene_alias tables

pieterlukasse edited this page Mar 24, 2016 · 20 revisions

The cBioPortal scripts package provides a simple script to update your local gene and gene_alias tables based on a new version of the NCBI genes file.

Cleaning up DB (in case of new installation)

Execute these steps in case you want to reset your DB to the most recent genes list from NCBI.

Steps: 1- Remove all studies from your installation. You can use the study removal tool

2- (if DB engine support FK constraint, e.g. InnoDB) Drop constraints:

ALTER TABLE cosmic_mutation
  DROP FOREIGN KEY cosmic_mutation_ibfk_1;
  
ALTER TABLE sanger_cancer_census
  DROP FOREIGN KEY sanger_cancer_census_ibfk_1;
    
ALTER TABLE uniprot_id_mapping
  DROP FOREIGN KEY uniprot_id_mapping_ibfk_1;

3- Empty tables gene and gene_alias

TRUNCATE TABLE gene_alias;
TRUNCATE TABLE gene;

4- Import gene data again (see section below

5- Clean-up old data:

DELETE FROM cosmic_mutation where ENTREZ_GENE_ID not in (SELECT ENTREZ_GENE_ID from gene);
DELETE FROM sanger_cancer_census where ENTREZ_GENE_ID not in (SELECT ENTREZ_GENE_ID from gene);
DELETE FROM uniprot_id_mapping where ENTREZ_GENE_ID not in (SELECT ENTREZ_GENE_ID from gene);

6- (if DB engine support FK constraint, e.g. InnoDB) Restore constraints:

ALTER TABLE cosmic_mutation
  ADD FOREIGN KEY (`ENTREZ_GENE_ID`) REFERENCES `gene` (`ENTREZ_GENE_ID`);
  
ALTER TABLE sanger_cancer_census
  ADD FOREIGN KEY (`ENTREZ_GENE_ID`) REFERENCES `gene` (`ENTREZ_GENE_ID`);

ALTER TABLE uniprot_id_mapping
  ADD FOREIGN KEY (`ENTREZ_GENE_ID`) REFERENCES `gene` (`ENTREZ_GENE_ID`);

Updating gene table without removing the existing studies

TODO - harder process

Running the script

To run the script type the following commands when in the folder <your_cbioportal_dir>/core/src/main/scripts:

 export PORTAL_HOME=<your_cbioportal_dir>

and then

./importGenes.pl <ncbi_genes.txt>

Example:

./importGenes.pl  Homo_sapiens_gene_info.txt