Docker container for retrieveing Phenotypic data from the Brassica Information Portal. It runs a ruby script that uses the BIP-API to retrieve
Accession name, Sequence ID, and trait measurements and stores output in .csv-format
The resulting header is as follows:
< <Trait>,Seq_id,<Trait_name1>,<Trait_name2>,<Trait_nameN> >
based on the TASSEL-5 Phenotype Format- version 4 format
Sequence IDs get passed on into a Seq_names.txt file for further queries. It also creates a Sequence_IDs_log.txt, where the user can see whether some accessions did not have any or multiple Sequence IDs.
In case of no Sequence ID's, the Accession is skipped, but recorded in the log. In case of multiple Sequence ID's, the first Sequence is used for the list of Seq_names and will then be used in subsequent downloads.
Please renew your BIP-API-key after your download for security reasons.
example command
( Note that you need to have a a BIP user account to provide the API key.)
docker run -c '/aboslute/path/to/where/output/files/should/be/stored/':/tmp CyVerseUK/retrieve_bip_phenotypes <BIP_trial_name> <your_BIP_API_key>
This Docker Image is used in context with the AGAVE API and CyVerseUK, so that the output can be integrated into further CyVerse workflows
In the Discovery Environment, (1) select “Apps”. (2) Search for “Retrieve-Brassica-Phenotypes” in the search bar and click on the app. (3) In case you want to choose a different output folder, you can change this now. Then, (4) click on Parameters, and (5) insert the BIP trial name as it is registered in BIP and your BIP-API-Key. (6) Click Launch Analysis. The steps are also visualised in fig 1 below.
You don't need to pull this image, Condor will do this in the background for you. You need to have a CyVerse and a BIP account, downloaded the cyverse-sdk client (optional, but makes querying easier) and you must have created a RunApp.json, containing:
{
"name" : "Retrieve_BIP_Phenotypes",
"appId" : "Retrieve_BIP_Phenotypes-0.0.0",
"archive" : "true",
"parameter": {
"param_1" : "<the_BIP_trial_name_to_be_queried>",
"param_2" : "<your_BIP_API_key>
}
}
Then, after creating an up-to-date AGAVE API token, run
Jobs-submit -W -F RunApp.json
Optional: you can include an output location, which is different from the default CyVerseUK-Storage system.
"archiveSystem": "data.iplantcollaborative.org",
Change to this system will make the output available for further tools and workflows in the CyVerse US and the Discovery Environment, which is currently not directly hooked-up to the CyVerseUK system. This is likely to be changed in the future, and no DE-specific archiveSystem specifications need to be mentioned in the RunApp.json.
Note: For big jobs, you need to allocate more memory. Currently, this app runs on default parameters. You do this by adding more attributes to the job submission json. A list of all attributes is in table 1, it is taken from the AGAVE API development website [job-submissions]http://developer.agaveapi.co/#job-submission).
Name | Value(s) | Description |
---|---|---|
name | string | Descriptive name of the job. This will be slugified and used as one component of directory names in certain situations. |
appId | string | The unique name of the application being run by this job. This must be a valid application that the calling user has permission to run. |
batchQueue | string | The batch queue on the execution system to which this job is submitted. Defaults to the app’s defaultQueue property if specified. Otherwise a best-fit algorithm is used to match the job parameters to a queue on the execution system with sufficient capabilities to run the job. |
nodeCount | integer | The number of nodes to use when running this job. Defaults to the app’s defaultNodes property or 1 if no default is specified. |
processorsPerNode | integer | The number of processors this application should utilize while running. Defaults to the app’s defaultProcessorsPerNode property or 1 if no default is specified. If the application is not of executionType PARALLEL, this should be 1. |
memoryPerNode | string | The maximum amount of memory needed per node for this application to run given in ####.#[E|P|T|G]B format. Defaults to the app’s defaultMemoryPerNode property if it exists. GB are assumed if no magnitude is specified. |
maxRunTime | string | The estimated compute time needed for this application to complete given in hh:mm:ss format. This value must be less than or equal to the max run time of the queue to which this job is assigned. |
notifications* | JSON array | An array of one or more JSON objects describing an event and url which the service will POST to when the given event occurs. For more on Notifications, see the section on webhooks below. |
archive* | boolean | Whether the output from this job should be archived. If true, all new files created by this application’s execution will be archived to the archivePath in the user’s default storage system. |
archiveSystem* | string | System to which the job output should be archived. Defaults to the user’s default storage system if not specified. |
archivePath* | string | Location where the job output should be archived. A relative path or absolute path may be specified. If not specified, a unique folder will be created in the user’s home directory of the archiveSystem at ‘archive/jobs/job-$JOB_ID’ |
Table 1. The optional and required attributes common to all job submissions. Optional fields are marked with an astericks.