Skip to content

Using the dataset validator

pieterlukasse edited this page Jan 18, 2016 · 21 revisions

To facilitate the loading of new studies into its database, cBioPortal provides a set of staging files formats for the various data types. To validate your files you can use the dataset validator script. This document describes how to install and use the validator script.

Installation

If you have a git clone of cBioPortal, the validation script validateData.py can be found in the folder: <your_cbioportal_dir>/core/src/main/scripts/import_data_validator

To update your scripts run

git pull

Dependencies

The script runs in python 2. If you want the script to generate html reports (recommended way for reading the validation errors, if any), then you will also need to install jinja2 and markdownsafe. You can use these commands:

sudo pip2 install jinja2

and (does not always seem necessary):

sudo pip2 install markdownsafe

Running the validation

To run the validator type the following when in the folder <your_cbioportal_dir>/core/src/main/scripts/import_data_validator:

./validateData.py

This will tell you the parameters you can use:

./validateData.py
usage: validateData.py [-h] -s STUDY_DIRECTORY [-u URL_SERVER]
                       [-html HTML_TABLE] [-v]
validateData.py: error: argument -s/--study_directory is required

-s : point this to the folder where your data files are found -u : (optional) URL to your local cBioPortal server. When not provided, default is: http://localhost/cbioportal -html : (optional) name of the html report file to be generated. Needs the dependencies mentioned above to be installed first -v : (optional) verbose, print out all messages. As default, this option is not set.

Example 1

As an example, you can try it out with one of the test studies found in <your_cbioportal_dir>/core/src/test/scripts/test_data. Example, using -v option to also see the progress:

./validateData.py -s ../../../test/scripts/test_data/study_es_0/ -u http://localhost:8080/cbioportal -v

Results in:

INFO: -: Requesting genes from portal at 'http://localhost:8080/cbioportal'
INFO: -: Requesting cancertypes from portal at 'http://localhost:8080/cbioportal'
INFO: -: Requesting clinicalattributes/patients from portal at 'http://localhost:8080/cbioportal'
INFO: -: Requesting clinicalattributes/samples from portal at 'http://localhost:8080/cbioportal'
INFO: data_clinical2.txt: Starting validation of file
INFO: data_clinical2.txt: Validation of file complete
INFO: data_methylation_hm27.txt: Starting validation of file
INFO: data_methylation_hm27.txt: Validation of file complete
INFO: data_expression_median.txt: Starting validation of file
INFO: data_expression_median.txt: Validation of file complete
INFO: brca_tcga_pub.maf: Starting validation of file
INFO: brca_tcga_pub.maf: Validation of file complete
INFO: data_CNA.txt: Starting validation of file
INFO: data_CNA.txt: Validation of file complete
INFO: data_log2CNA.txt: Starting validation of file
INFO: data_log2CNA.txt: Validation of file complete
INFO: -: Validating case lists
INFO: -: Validation of case lists complete
INFO: -: Validation complete
Validation of study succeeded.

When using the -html option, you will get a report that looks like this in the case above:

Example 2

Try out also the other test studies (study_es_1 and study_es_3) in <your_cbioportal_dir>/core/src/test/scripts/test_data. Example, using -v option:

./validateData.py -s ../../../test/scripts/test_data/study_es_1/ -u http://localhost:8080/cbioportal -v

Result: