Atlas metadata handling

This is a factoring out of code preiously present in the internal atlas-prod repository. It provides functionality for handling Atlas metadata. Some of the scripts are unused legacy code and will be prunded in time.

Install

There are some complex Perl dependencies this software, most easily managed using Conda. Miniconda is a good way of getting set up with a basic Conda installation. We recommend you use a fresh environment:

conda create --name atlas-metadata

Activate the environment to use it:

source activate atlas-metadata

It will help if you have your Conda set up to use channels as per Bioconda:

conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

Install should then be straightforward like:

conda install -c ebi-gene-expression-group atlas-experiment-metadata

Commands

condense_sdrf.pl

A 'condensed' SDRF is a 'melted' version of the starting SDRF file, with one row for each combination of assay, variable type (factor, characteristic) and variable. This is produced from an SDRF like:

condense_sdrf.pl -e <experiment accession> -fi -o <output directory>

The condense_sdrf.pl script will also use Zooma to add ontology terms.

By default this will look for SDRF files under a path defined by the ATLAS_PROD environment variable. But you can also specify an IDF file, from which the SDRF location will be determined:

condense_sdrf.pl -e <experiment accession> -fi <path to IDF file> -o <output directory>

If you wish to use the Zooma mapping functionality, you will also need to supply a Zooma exclusions file like this one:

condense_sdrf.pl -e <experiment accession> -fi <path to IDF file> -o <output directory> -z -x <zooma exclusions file>

single_cell_condensed_sdrf.sh

This script is a wrapper for condense_sdrf.pl which deals with some single-cell specific issues on technical replication and handling droplet experiments (where cell != library).

Again, this script can be run in two modes. Default behaviour is to pull the SDRF location from a directory defined by ATLAS_SC_EXPERIMENTS:

bash single_cell_condensed_sdrf.sh -e <experiment ID> -o <output dir> -z <zooma exclusions file>

... but you can also pass an IDF file directly:

single_cell_condensed_sdrf.sh -e <experiment accession> -f <path to IDF file> -o <output dir> -z <zooma exclusions file>

Note that this wrapper requests Zooma mappings by default (for which you will have to supply the exclusions), but you can disable the behaviour with the '-s' argument.

See inline help for information on available options:

single_cell_condensed_sdrf.sh -h

unmelt_condensed.R

Sometimes we want to 'unmelt' the condensed SDRF, returning it to a wide format, for example for use in downstream analysis. This is what unmelt_condensed.R does:

unmelt_condensed.R -i <condensed SDRF file> -o <output file path> --retain-types --has-ontology

It is important that options are provided matching the way in which the condensed SDRF was genereated in the first place:

--retain-types: SDRF files have different types of field, for example factors and characteristics. We can retain these annotations in the wide format, and we probably should (because it's possible to have factors and characteristics with the same name!).
--has-ontology: if you ran condense_sdrf.pl or single_cell_condensed_sdrf.sh while enabling zooma mapping you need to use this option (off by default)
--has-biotypes: if you ran condense_sdrf.pl with the -b option, you need to set this flag (off by default)

See inline help for information on available options:

unmelt_condensed.R --help

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.github/workflows		.github/workflows
baseline_extra_info_diagrams		baseline_extra_info_diagrams
test_data		test_data
zooma-mappings-wf		zooma-mappings-wf
Dockerfile		Dockerfile
README.md		README.md
VERSION		VERSION
anatomograms_todo.sh		anatomograms_todo.sh
atlas-experiment-metadata-test.bats		atlas-experiment-metadata-test.bats
automatic_fixes_properties.txt		automatic_fixes_properties.txt
automatic_fixes_values.txt		automatic_fixes_values.txt
condense_sdrf.pl		condense_sdrf.pl
create_pivotal_stories.pl		create_pivotal_stories.pl
deploy_zooma_curated_file.sh		deploy_zooma_curated_file.sh
efo_annotation_todo.sh		efo_annotation_todo.sh
exceptions_properties.txt		exceptions_properties.txt
exceptions_values.txt		exceptions_values.txt
fetch_experiment_privacy_from_atlas.pl		fetch_experiment_privacy_from_atlas.pl
find_candidate_experiments.pl		find_candidate_experiments.pl
find_near_duplicate_properties.sh		find_near_duplicate_properties.sh
generate_biostudies_subs.sh		generate_biostudies_subs.sh
load_to_zooma.sh		load_to_zooma.sh
monitor_conan_incoming.sh		monitor_conan_incoming.sh
monitor_ols.sh		monitor_ols.sh
monitor_otrs.sh		monitor_otrs.sh
monitor_permissions_manager.sh		monitor_permissions_manager.sh
monitor_zooma.sh		monitor_zooma.sh
public_private_ae2_to_atlas.sh		public_private_ae2_to_atlas.sh
run_zooma_condensed.pl		run_zooma_condensed.pl
single_cell_condensed_sdrf.sh		single_cell_condensed_sdrf.sh
test-environment.yml		test-environment.yml
unmelt_condensed.R		unmelt_condensed.R
updateExperimentMetadata.sh		updateExperimentMetadata.sh
update_all_atlas_designs.pl		update_all_atlas_designs.pl
zoomage-accessions.properties		zoomage-accessions.properties

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Atlas metadata handling

Install

Commands

condense_sdrf.pl

single_cell_condensed_sdrf.sh

unmelt_condensed.R

About

Releases 11

Packages

Contributors 6

Languages

ebi-gene-expression-group/experiment_metadata

Folders and files

Latest commit

History

Repository files navigation

Atlas metadata handling

Install

Commands

condense_sdrf.pl

single_cell_condensed_sdrf.sh

unmelt_condensed.R

About

Resources

Stars

Watchers

Forks

Releases 11

Packages 0

Contributors 6

Languages

Packages