-
Notifications
You must be signed in to change notification settings - Fork 3
Importer Tool
This page describes how to use the cBioPortal importer tool. It assumes that the cbioPortal software has been properly configured and that a cBioPortal database exists.
The importer tool is a command line java application. Note the examples below assume that the cbioportal-importer.jar and JDBC driver jar are present in the current directory.
A list of commands that are recognized by the importer tool can be found by running the following command:
#!shell
$JAVA_HOME/bin/java -cp "*" org.mskcc.cbio.importer.PortalImporterTool
The following commands are recognized by the portal importer tool:
This commands validates cancer study data. Cancer study data should conform to the file formats as described on the File Formats wiki page. The validation is restricted to the metadata files. This command takes a directory argument which specifies where the cancer study data resides. The directory may contain sub-directories of cancer study data. The following arguments are supported:
- directory: A directory which contains cancer study data or subdirectories of cancer study data.
Example usage:
#!shell
$JAVA_HOME/bin/java -cp "*" org.mskcc.cbio.importer.PortalImporterTool -v /path/to/cancerstudy
This command imports cancer study data. Cancer study data should conform to the file formats as described on the File Formats wiki page. Like the validate command, this command takes a directory argument which specifies where the cancer study data resides. It will traverse all subdirectories specified by the cancer study directory looking for cancer studies to import. For each cancer study it finds, it will validate the study, then import the study. Depending on the setting of the skip & force options, the tool may prompt a user before replacing an existing cancer study. The following arguments are supported:
- directory: A directory which contains cancer study data or subdirectories of cancer study data.
- skip (optional): If 't', cancer studies traversed that already reside in the database will be skipped, i.e., not replaced.
- force (optional): If 't', cancer studies traversed that already reside in the database will be replaced without user intervention.
Example usage:
#!shell
$JAVA_HOME/bin/java -cp "*" org.mskcc.cbio.importer.PortalImporterTool -i /path/to/cancerstudy:t:f
This command enriches a mutation data file (MAF) with annotations from the mutationassessor.org and Oncotator services. This command takes an input filename and optional output filename as arguments. If the output filename is not given, the input filename with a ".annotated" suffix will be created. The following arguments are supported:
- maf: The input MAF filename to be annotated.
- output (optional): The output MAF filename.
Example usage:
#!shell
$JAVA_HOME/bin/java -cp "*" org.mskcc.cbio.importer.PortalImporterTool -a ./MAF_example.txt:annotated-MAF.txt
*** Note, this command will not work properly if canonical paths are not used. That is, the use of a tilde '~' within the file path will cause problems. Additionally, this command will only work if you have downloaded and properly installed our annotation database. See the Database Setup wiki page for more information. ***
Given copy number data and expression data that conform to the file formats as described on the File Formats wiki page, this command will generate a file consisting of normalized (z-score) expression values.
Each gene is normalized separately. First, the expression distribution for unaltered copies of the gene is estimated by calculating the mean and variance of the expression values for samples in which the gene is diploid (as reported by the CNV data). We call this the unaltered distribution.
If the gene has no diploid samples, then its normalized expression is reported as NA. Otherwise, for every sample, the gene's normalized expression is reported as
(r - mu)/sigma
where r is the raw expression value, and mu and sigma are the mean and standard deviation of the unaltered distribution, respectively. The following arguments are supported:
- cna-file: The copy number input filename.
- expression-file: The expression input filename.
- output-file: The output filename.
- normal-sample-suffix: A suffix to identify normal sample id's.
Example usage:
#!shell
$JAVA_HOME/bin/java -cp "*" org.mskcc.cbio.importer.PortalImporterTool -n ./data_CNA.txt:./data_RNA_Seq_v2_expression_median.txt:./z-score.txt:"-11"
where "-11" is the suffix used by normal sample TCGA barcodes.
This commands removes the given cancer study from the database. The following arguments are supported:
- cancer_study_id: The cancer_study_identifier assigned to the given cancer study.
Example usage:
#!shell
$JAVA_HOME/bin/java -cp "*" org.mskcc.cbio.importer.PortalImporterTool -d brca_joneslab_2013