Skip to content

Genomic linked data converter. Interchange GFF, GVF, VCF files via JSON.

License

Notifications You must be signed in to change notification settings

BioInterchange/BioInterchangeC

Repository files navigation

Table of Contents

  1. Usage
    1. Installation
    2. Abbreviations
    3. GFF3, GVF, VCF to JSON (Converting to a unified data representation.)
    4. JSON to GFF3, GVF, VCF (Converting to tool-specific data representations.)
    5. Python API
    6. MongoDB
    7. RethinkDB (legacy support)
  2. Data Model
    1. Overview
      1. Context Objects
      2. Meta Objects
      3. Feature Objects
      4. Summary Objects
  3. Python API Reference Cards
  4. JSON Reference Cards

Usage

BioInterchange is a command line tool and you will need to run it in a "terminal" or "shell". On OS X, it can be run via the default Terminal application. On Linux, well, you use Linux, so you know what a terminal is.

Running BioInterchange will cause the software to perform a quick system check before anything else happens. If there are incompatibility problems, then these will be shown to you and the software will exit. If you are making use of the Python interface and your Python settings are wrong, then a cryptic Python error might occur. In that case, check your PYTHONHOME and PYTHONPATH environment variables and have a look at examples in the Python API section.

Note: The current version of the tool and its source code are licensed under the short, simple, and permissive MIT License. If your version of BioInterchange compains about a missing license file, then please update to version 2.0.5 or later.

Installation

Work in progress. Currently submitted to Homebrew-bio, Docker images and apt packages are being rebuild.

Homebrew Bio (Linux & OS X):

Note: This assumes that you have Homebrew installed. Please consult https://brew.sh/ for their installation instructions.

brew install brewsci/bio/biointerchange

If the brew installtion fails, but you are set on installing via brew, then please file an issue with the Homebrew Bio team.

OS X Installer:

Download this disk-image file and double click it when the download is finished. You can then drag-and-drop the executable biointerchange to your system's folder for executables, or, anywhere else where your command line can find it. See PATH variable in the documentation for your command line (a.k.a. terminal or shell).

Debian/Linux from source:

apt update
apt -y install build-essential git cmake libffi-dev libssl-dev libpython3.9-dev libcurl4-openssl-dev curl file
git clone https://github.com/BioInterchange/BioInterchangeC.git
cd BioInterchangeC
./clean.sh
./build-release.sh

OS X from source:

Note: Requires command-line tools (Xcode) installed. CMake paths currently rely on brew package installations, but you could also compile the dependencies yourself (as it was done for previous BioInterchange versions).

git clone https://github.com/BioInterchange/BioInterchangeC.git
cd BioInterchangeC
brew install cmake
brew install googletest
brew install curl
brew install gettext
brew install libiconv
brew install openssl@1.1
brew install python@3.9
./clean.sh
./build-release.sh

Abbreviations

  • GFF3: Generic Feature Format Version 3
  • GVF: Genome Variation Format
  • JSON: JavaScript Object Notation
  • JSON-LD: JSON Linked Data
  • LDJ/LDJSON: Line Delimited JSON
  • VCF: Variant Call Format

GFF3, GVF, VCF to JSON (Converting to a unified data representation.)

Converting genomic file formats to a series of JSON objects:

biointerchange -o example.ldj example.vcf

Here, the genomics file "example.vcf" is the input. The output of BioInterchange will be written to the file "example.ldj". The "-o" parameter can also be omitted, in which case the output is written directly on the console:

biointerchange example.vcf

Note: It is not possible to "pipe" data to BioInterchange due to performance reasons. BioInterchange always expects to read from a file. The file can be a symlink though.

The optional "-u" parameter can be used to add a custom user annotation to the context object. This can be useful when including BioInterchange in a genomics analysis pipeline:

biointerchange -u "CNV Analysis; Smith Lab" -o cnv.ldj cnv.gvf

BioInterchange prints its semantic version number with the "-v" parameter:

biointerchange -v

To show the EULA that came with the software, the "-e" parameter comes in handy:

biointerchange -e

A brief help text is shown with the "-h" parameter:

biointerchange -h

JSON to GFF3, GVF, VCF (Converting to tool-specific data representations.)

Converting JSON objects back to their original genomic file format:

biointerchange -o example.vcf example.ldj

Note: Modified JSON objects (for example, when using the Python API) will only translate back correctly when non-standard key/value pairs are put under the “user-defined” key.

Python API

Genomics data can be accessed and processed via the Python API directly through BioInterchange. Each JSON object will be passed on to a Python function and changes made by the Python code will be preserved.

For example:

One of BioInterchange's unit-tests calculates the accumulated lengths of all genomic features in a given file. The Python file simple.py (see below) in the module simplepy can be called like this:

PYTHONPATH="`pwd`/test-data" ./biointerchange -p simplepy.simple examples/chromosome_BF.gff

This assumes that you have a correct PYTHONHOME set as well. For using BioInterchange as part of a bigger project, it is recommended to add the necessary paths in PYTHONPATH to your shell initialization (e.g., .bashrc, .zshrc).

Just to look at the last LD-JSON line, use this (your JSON pretty-printer might be called something else than json_pp):

PYTHONPATH="`pwd`/test-data" ./biointerchange -p simplepy.simple examples/chromosome_BF.gff | tail -n 1 | json_pp

Accumulated output (see accumulated-length):

{
   "@context" : "https://www.codamono.com/jsonld/biointerchange-s1.json",
   "statistics" : {
      "features" : 432,
      "features-filtered" : 430,
      "meta-lines" : 3,
      "meta-lines-filtered" : false,
      "comment-lines" : 1
   },
   "@type" : "https://www.codamono.com/gfvo-squared#Summary",
   "runtime" : {
      "invocation" : "Wed Mar 10 15:01:28 2021",
      "finish" : "Wed Mar 10 15:01:28 2021",
      "lapsed-seconds" : "0"
   },
   "my-user-data" : {
      "accumulated-length" : 115528
   }
}

Compatibility: Requires Python 3.9.1 or newer. For older versions of Python you might have to recompile BioInterchange from source.

Python simple.py

Directory structure:

  • simplepy/__init__.py
  • simplepy/simple.py

Environment variables:

  • PYTHONHOME needs to be set to where Python is installed on your system
  • PYTHONPATH needs to include "./test-data" for this example, so that "simple.py" can be found; the code is in test-data because the code is part of the Google Test unit testing for BioInterchange.

“simple.py” source code:

# This variable will be used to accumulate the
# length (in basepairs) of all features that are
# not filtered out in `process_feature` below.
#
# Use this or a similar approach to share data
# across features.
__accumulated_length__ = 0

def setup(context, meta):
    # The context is always being output, but meta
    # could be omitted. How? It is quite simple:
    # returning (a possibly modified) meta Dict
    # will be output, but returning None will
    # output no meta information.
    return meta

def cleanup(summary):
    # Add the accumulated length of all features
    # to the summary:
    summary['my-user-data'] = { 'accumulated-length' : __accumulated_length__ }

    # The Dict `summary` needs to be returned in order
    # to appear in the output. If no summary should appear
    # in the output, then None needs to be returned.
    return summary

def process_feature(feature):
    global __accumulated_length__

    locus = feature['locus']

    length = abs(locus['end'] - locus['start']) + 1

    # If we want to filter out some features (remove them
    # from the output completely), then we can do so my
    # returning None.
    if length < 10000:
        return None

    __accumulated_length__ += length

    feature['my-calculated-length'] = length

    if 'comment' in feature:
        del feature['comment']

    return feature

MongoDB

Download and convert cat features:

wget ftp://ftp.ensembl.org/pub/release-81/variation/vcf/felis_catus/Felis_catus_incl_consequences.vcf.gz
gunzip Felis_catus_incl_consequences.vcf.gz
biointerchange -o Felis_catus_incl_consequences.ldj Felis_catus_incl_consequences.vcf

Import line-delimited JSON-LD documents into MongoDB:

mongoimport --db genomics --collection felis_catus --type json --file Felis_catus_incl_consequences.ldj

Check that the data is actually in the database:

mongo genomics
> db.felis_catus.find()

RethinkDB (legacy support)

Note: RethinkDB did not make it as a company, but their database is still available as an open-source project. Adopting it now would probably a bad choice, but if you work with legacy code, then this is how you load/query features in RethinkDB!

Download and convert cat features:

wget ftp://ftp.ensembl.org/pub/release-81/variation/vcf/felis_catus/Felis_catus_incl_consequences.vcf.gz
gunzip Felis_catus_incl_consequences.vcf.gz
biointerchange -o Felis_catus_incl_consequences.ldj Felis_catus_incl_consequences.vcf

Import line-delimited JSON-LD documents into RethinkDB:

rethinkdb import -f Felis_catus_incl_consequences.ldj --table genomics.felis_catus

Check that the data is actually in the database using RethinkDB's "Data Explorer" (for local installation at http://localhost:8080/#dataexplorer):

r.db('genomics').table('felis_catus')

Data Model

Overview

Four kind of JSON objects are created with each execution of BioInterchange -- in this particular order:

  1. a "context object"
  2. zero, one, or more "meta objects" (pragma and information lines)
  3. a series of "feature objects" (the actual genomics feature-data)
  4. a "summary object"

Relationships between objects:

  • context and summary objectss stand on their own; they are describing how the data was created, how much data was processed, and how long it took
  • meta objects themselves are also independent, but they can be referenced (linked to) by feature objects
  • feature objects can reference meta objects as well as other feature objects

In a nutshell: meta objects and feature objects contain the genomics data from GFF3, GVF, and VCF files; these objects are referencing each other (they are linked). Context and summary objects stand on their own.

All objects contain a "@context" key and a "@type" key, which are called context key and type key in the following.

The context key turns the JSON objects into JSON-LD objects. Never heard of JSON-LD? Just skip to the next part and ignore the "@context" key. This is what makes JSON-LD so great: you can handle JSON-LD objects just like JSON objects.

If you do want to make use of the context key, then feed the JSON-LD objects to a Linked Data tool and it will annotate key/value pairs with type information. JSON-LD objects can also be turned into Triple Store compatible data formats, such as RDF N-Triples, RDF N-Quads, and RDF Turtle. Want to see a real-life example of this magic? Head over to the JSON-LD Playground and copy/paste any JSON-LD object into the "JSON-LD Input" text field: the "N-Quads" tab will instantly show a triples representation of the JSON-LD object that can be loaded into a triple store.

Note: All JSON-LD annotations are making use of GFVO2. Is it legit? Well, it is based on all the lessons learned when designing the Genomic Feature and Variation Ontology (GFVO). So, there is a team behind GFVO2 too? Yes, stay tuned…

Context Objects

Context objects tell you something about the environment in which the following JSON objects ("meta objects", "feature objects" and "summary objects") were created.

In essence, a context objects contains information about:

  • which version of BioInterchange was used
  • the input filename
  • the filetype of the input (GFF3, GVF, VCF)
  • what the output filename was (unless output went to the console)
  • whether the Python API was utilized (if so, name of the Python module)
  • additional user-defined parameters

Example:

{
    "@context" : "https://www.codamono.com/jsonld/vcf-c1.json",
    "biointerchange-version" : "2.0.0+36",
    "input-filetype" : "VCF",
    "output-file" : null,
    "input-file" : "test-data/playground-vcf.vcf",
    "python-callback" : null,
    "user-defined" : null
}

Detailed information about each key/value-pair can be found in the JSON Reference Card section.

Meta Objects

One meta object is being created for each genomic data file. The meta object contain data of information/pragma lines.

Meta objects are helpful for determining data provenance, establishing links to ontologies that were used, and providing extra annotations that were not attached to features to reduce data redundancy.

For example, the following example provides a textual descriptions for analytic filters that have been applied in a VCF file. Instead of adding this description to every genomic feature for which the filter applies, the textual description is only given once in this meta object, where it can be looked up via the filter keys ("MinAB", "MinDP", "MinMQ", and "Qual").

Example:

{
    "@context" : "https://www.codamono.com/jsonld/vcf-x1.json",
    "vcf-version" : "4.2",
    "reference" : "ftp://ftp-mouse.sanger.ac.uk/ref/GRCm38_68.fa",
    "contig" : {
        "1" : {
            "length" : 195471971
        },
        "3" : {
            "length" : 918278173
        }
    },
    "filter" : {
        "MinAB" : {
            "description" : "Minimum number of alternate bases (INFO/DP4) [5]"
        },
        "MinDP" : {
            "description" : "Minimum read depth (INFO/DP or INFO/DP4) [5]"
        },
        "MinMQ" : {
            "description" : "Minimum RMS mapping quality for SNPs (INFO/MQ) [20]"
        },
        "Qual" : {
            "description" : "Minimum value of the QUAL field [10]"
        }
    },
    "user-defined" : {
        "samtoolsVersion" : [
            "0.1.18-r572"
        ]
    }
}

Feature Objects

Sequences, sequence variations, sequence annotations, genotyping samples, etc., are represented by feature objects.

Basic Feature Information

Most basic information about features includes an identifier, a genomic locus, provenane ("SGRP" stands for the Saccharomyces Genome Resequencing Project), feature type information ("SNV" -- single nucleotide variant; a Sequence Ontology term), and references to external databases (SGRP, again, and European Molecular Biology Laboratory accession).

Example:

{
    "@context" : "https://www.codamono.com/jsonld/gvf-f1.json",
    "id" : "76",
    "locus" : {
        "landmark" : "Chr1",
        "start" : 675,
        "end" : 675,
        "strand" : "+"
    },
    "source" : "SGRP",
    "type" : "SNV",
    "dbxref" : [
        "SGRP:s01-675",
        "EMBL:AA816246"
    ]
}
Variant and Reference Sequences

Information about reference sequences is stored under the "reference" key. Variations are stored under the "variants" key and allele specific informations is labeled by "B", "C", etc. ("A" is denoting the reference, which is not explicitly labeled).

Example: GVF (basic information omitted)

{
    "@context" : "https://www.codamono.com/jsonld/gvf-f1.json",
    "reference" : {
        "sequence" : "A",
        "codon" : "GAG"
    },
    "variants" : {
        "B" : {
            "sequence" : "G",
            "codon" : "GAG"
        },
        "C" : {
            "codon" : "GGG",
            "sequence" : "T"
        }
    }
}

Example: VCF (basic information omitted)

{
    "@context" : "https://www.codamono.com/jsonld/vcf-f1.json",
    "reference" : {
        "sequence" : "G"
    },
    "variants" : {
        "B" : {
            "sequence" : "GA",
            "allele-count" : 36
        }
    }
}
Genomic/Genotyping Samples

VCF genomic files contain information about samples. Reference, variant, and other information is repeated for each sample.

Example: VCF (basic and non-sample specific information omitted)

{
    "@context" : "https://www.codamono.com/jsonld/vcf-f1.json",
    "samples" : [
        {
            "id" : "129P2",
            "variants" : {
                "B" : {
                    "allele-count-expected" : 4
                },
                "C" : {
                    "allele-count-expected" : 5
                }
            },
            "genotype-quality" : 68,
            "genotype" : {
                "phased" : false,
                "alleles" : "BB",
                "sequences" : [
                    "G",
                    "G"
                ]
            },
            "AA" : {
                "genotype-likelihood-phred-scaled" : 53
            },
            "AB" : {
                "genotype-likelihood-phred-scaled" : 6
            },
            "BB" : {
                "genotype-likelihood-phred-scaled" : 0
            },
            "user-defined" : {
                "SP" : 0,
                "FI" : 0
            }
        }
    ]
}

Summary Objects

Summary objects wrap-up everything that came beforehand. Namely:

  • statistics about the genomics data
  • runtime information

Genomics data statistics capture how many comments were seen, how much metadata was read, and how much actual genomic features were processed.

Runtime information tell you when BioInterchange was invoked, when it finished processing the data, and how many seconds it took to process the data.

Example:

{
    "@context" : "https://www.codamono.com/jsonld/biointerchange-s1.json",
    "runtime" : {
        "invocation" : "Fri Jul 17 22:34:12 2015",
        "finish" : "Fri Jul 17 22:34:12 2015",
        "lapsed-seconds" : "0"
    },
    "statistics" : {
        "meta-lines" : 19,
        "meta-lines-filtered" : false,
        "features" : 12,
        "features-filtered" : 0,
        "comment-lines" : 0
    }
}

Python API Reference Cards

Function: setup(context, meta)

  • Description: Initialization function.
  • Parameter “context”: Context object as a Python Dict-instance. Modifications to this dict will have no effect on the output of BioInterchange.
  • Parameter “meta”: Meta object as a Python Dict-instance.
  • Returns: The original meta object or a modified version of it. If None is returned, then the output of the meta object will be suppressed by BioInterchange
  • Example:
~~~ python
def setup(context, meta):
    return meta
~~~

Function: cleanup(summary)

  • Description: Cleanup and finalization function.
  • Parameter “summary”: Summary object as a Python Dict-instance.
  • Returns: The original summary object or a modified version of it. If None is returned, then the output of the summary object will be suppressed by BioInterchange
  • Example:
~~~ python
def cleanup(summary):
    return summary
~~~

Function: process_feature(feature)

  • Description: Genomic data processing function.
  • Parameter “featurey”: Feature object as a Python Dict-instance.
  • Returns: The original feature object or a modified version of it. If None is returned, then the output of the feature object will be suppressed by BioInterchange
  • Example:
~~~ python
def process_feature(feature):
    return feature
~~~

JSON Reference Cards

Key: B, C, etc.

  • Description: Alternative allele information (reference is implicitly labeled "A").
  • Appears in: feature object
  • JSON Type: object
  • JSON-LD Type: gfvo-squared:alleleB, gfvo-squared:alleleC, etc.
  • Genomic Data Source: GVF (version 1.07), VCF (version 4.2)
  • In Model Version: 1
  • Example: Variant information on a feature level.
    {
        "@context" : "https://www.codamono.com/jsonld/vcf-f1.json",
        "variants" : {
            "B" : {
                "sequence" : "G"
            }
        }
    }
  • Example: Variant information on a sample level.
    {
        "@context" : "https://www.codamono.com/jsonld/vcf-f1.json",
        "samples" : [
            {
                "id" : "129P2",
                "variants" : {
                    "B" : {
                        "allele-count-expected" : 4
                    },
                    "C" : {
                        "allele-count-expected" : 5
                    }
                }
            }
        ]
    }

Key: AA, AB, BB, AC, etc.

  • Description: Generic genotype information.
  • Appears in: feature object
  • JSON Type: object
  • JSON-LD Type: gfvo-squared:genotypeAA, gfvo-squared:genotypeAB, etc.
  • Genomic Data Source: GVF (version 1.07), VCF (version 4.2)
  • See Also: genotype
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/vcf-f1.json",
        "samples" : [
            {
                "id" : "129P2",
                "AA" : {
                    "genotype-likelihood-phred-scaled" : 53
                },
                "AB" : {
                    "genotype-likelihood-phred-scaled" : 6
                },
                "BB" : {
                    "genotype-likelihood-phred-scaled" : 0
                },
                "genotype" : {
                    "sequences" : [
                        "G",
                        "A"
                    ],
                    "phased" : false,
                    "alleles" : "AB"
                }
            }
        ]
    }

Key: affected-features

  • Description: Identifiers of features that are affected by a variant effect.
  • Appears in: feature object
  • JSON Type: array of strings
  • JSON-LD Type: gfvo-squared:affectedFeatures
  • Genomic Data Source: GVF (version 1.07)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gvf-f1.json",
        "variants" : {
            "B" : {
                "effects" : [
                    {
                        "affected-features" : [
                            "YAL067W-A"
                        ]
                    }
                ]
            }
        }
    }

Key: affected-feature-type

  • Description: Type of the features that are affected by a variant effect.
  • Appears in: feature object
  • JSON Type: string
  • JSON-LD Type: gfvo-squared:affectedFeatureType
  • Genomic Data Source: GVF (version 1.07)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gvf-f1.json",
        "variants" : {
            "B" : {
                "effects" : [
                    {
                        "affected-feature-type" : "transcript"
                    }
                ]
            }
        }
    }

Key: alias

  • Description: Aliases of a feature.
  • Appears in: feature object
  • JSON Type: array of strings
  • JSON-LD Type: gfvo-squared:alias
  • Genomic Data Source: GFF3 (version 1.21), GVF (version 1.07), VCF (version 4.2)
  • Note: In VCF, only the first entry of the "ID" column becomes an "id" in JSON-LD, whereas second, third, etc., entries are interpreted as "alias" in JSON-LD.
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gff3-f1.json",
        "id" : "rs123",
        "alias" : [
            "feat12",
            "feat12-1"
        ]
    }

Key: alignment

  • Description: A sequence alignment.
  • Appears in: feature object
  • JSON Type: object
  • JSON-LD Type: gfvo-squared:alignment
  • Genomic Data Source: GFF3 (version 1.21), GVF (version 1.07), VCF (version 4.2)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gff-f1.json",
        "alignment" : {
            "id" : "EST23",
            "start" : 1,
            "end" : 21,
            "strand" : null,
            "cigar-string" : "8M3D6M1I6M"
        }
    }

Key: allele-count

  • Description: Number of alleles in a genotype.
  • Appears in: feature object
  • JSON Type: number
  • JSON-LD Type: gfvo-squared:alleleCount
  • Genomic Data Source: VCF (version 4.2)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/vcf-f1.json",
        "variants" : {
            "B" : {
                "sequence" : "GA",
                "allele-count" : 36
            }
        }
    }

Key: allele-count-expected

  • Description: Expected alternate allele count.
  • Appears in: feature object
  • JSON Type: number
  • JSON-LD Type: gfvo-squared:alleleCountExpected
  • Genomic Data Source: VCF (version 4.2)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/vcf-f1.json",
        "variants" : {
            "B" : {
                "sequence" : "GA",
                "allele-count-expected" : 4
            }
        }
    }

Key: allele-format

  • Description: Information about key/values used to describe alleles in a data set.
  • Appears in: meta object
  • JSON Type: object
  • JSON-LD Type: gfvo-squared:alleleFormat
  • Genomic Data Source: VCF (version 4.2)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/vcf-x1.json",
        "allele-format" : {
            "DEL" : {
                "description" : "Deletion"
            }
        }
    }

Key: allele-frequency

  • Description: Frequency of an allele in a genotype.
  • Appears in: feature object
  • JSON Type: number
  • JSON-LD Type: gfvo-squared:alleleFrequency
  • Genomic Data Source: VCF (version 4.2)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/vcf-f1.json",
        "variants" : {
            "B" : {
                "sequence" : "GA",
                "allele-frequency" : 1
            }
        }
    }

Key: allele-total-number

  • Description: Total number of alleles.
  • Appears in: feature object
  • JSON Type: number
  • JSON-LD Type: gfvo-squared:alleleTotalNumber
  • Genomic Data Source: VCF (version 4.2)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/vcf-f1.json",
        "allele-total-number" : 36
    }

Key: ancestral-allele

  • Description: Sequence of an ancestral allele.
  • Appears in: feature object
  • JSON Type: string
  • JSON-LD Type: gfvo-squared:ancestralAllele
  • Genomic Data Source: VCF (version 4.2)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/vcf-f1.json",
        "ancestral-allele" : "C"
    }

Key: annotation-format

  • Description: Information about keys used to annotate (filter) features in a data set.
  • Appears in: meta object
  • JSON Type: object
  • JSON-LD Type: gfvo-squared:annotationFormat
  • Genomic Data Source: VCF (version 4.2)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/vcf-x1.json",
        "annotation-format" : {
            "MinDP" : {
                "description" : "Minimum read depth (INFO/DP or INFO/DP4) [5]"
            },
            "Qual" : {
                "description" : "Minimum value of the QUAL field [10]"
            }
        }
    }

Key: annotations

  • Description: Tags (or labels) assigned to a feature.
  • Appears in: feature object
  • JSON Type: array of strings
  • JSON-LD Type: gfvo-squared:annotations
  • Genomic Data Source: GVF (version 1.07), VCF (version 4.2)
  • Note: Annotations are represented by structured pragma statements in GVF and "FILTER" information in VCF.
  • In Model Version: 1
  • Example: GVF
    {
        "@context" : "https://www.codamono.com/jsonld/gvf-f1.json",
        "annotations" : [
            "SP1"
        ]
    }
  • Annotation Source Example (Meta Document):
    {
        "@context" : "https://www.codamono.com/jsonld/gvf-x1.json",
        "technology-platform" : {
            "SP1" : {
                "types" : [
                    "SNV"
                ],
                "comment" : "SNV information documented in Wiki."
            }
        }
    }

Key: attribute-method

  • Description: Information about attributes in the data set.
  • Appears in: meta object
  • JSON Type: object
  • JSON-LD Type: gfvo-squared:attributeMethod
  • Genomic Data Source: GVF (version 1.07)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gvf-x1.json",
        "attribute-method" : {
            "SP7" : {
                "attribute" : "Zygosity",
                "comment" : "Zygosity is reported here as determined in the original study.",
                "sources" : [
                    "SOLiD"
                ],
                "types" : [
                    "SNV"
                ]
            }
        }
    }

Key: average-coverage

  • Description: Average read coverage.
  • Appears in: meta object
  • JSON Type: number
  • JSON-LD Type: gfvo-squared:averageCoverage
  • Genomic Data Source: GVF (version 1.07)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gvf-x1.json",
        "technology-platform" : {
            "SP1" : {
                "average-coverage" : 36
            }
        }
    }

Key: base-quality-rms

  • Description: Root-mean-square base quality of a genomic position.
  • Appears in: feature object
  • JSON Type: number
  • JSON-LD Type: gfvo-squared:baseQualityRMS
  • Genomic Data Source: VCF (version 4.2)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/vcf-f1.json",
        "base-quality-rms" : 3
    }

Key: biointerchange-version

  • Description: Semantic Version number of the BioInterchange software that was used to create the data set.
  • Appears in: context object
  • JSON Type: string
  • JSON-LD Type: gfvo-squared:biointerchangeVersion
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/vcf-c1.json",
        "biointerchange-version" : "2.0.0+36"
    }

Key: breakpoint-fasta

  • Description: A link to a FASTA file that contains sequences specific to breakpoints.
  • Appears in: meta object
  • JSON Type: string
  • JSON-LD Type: gfvo-squared:breakpointFASTA
  • Genomic Data Source: VCF (version 4.2)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/vcf-x1.json",
        "breakpoint-fasta" : "ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/sv/breakpoint_assemblies.fasta"
    }

Key: build

  • Description: Name of a genomic build.
  • Appears in: meta object
  • JSON Type: string
  • JSON-LD Type: gfvo-squared:build
  • Genomic Data Source: GFF3 (version 1.21), GVF (version 1.07)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gvf-f1.json",
        "genome-build" : [
            {
                "source" : "NCBI",
                "build" : "B36"
            }
        ]
    }

Key: cigar-string

  • Description: A CIGAR formatted sequence alignment string.
  • Appears in: feature object
  • JSON Type: string in CIGAR format
  • JSON-LD Type: gfvo-squared:cigarString
  • Genomic Data Source: GFF3 (version 1.21), GVF (version 1.07), VCF (version 4.2)
  • Note: CIGAR strings from GFF3/GVF are reformatted to match the CIGAR standard: integer followed by a character. When translating JSON-LD back to GFF3/GVF, the alternative GFF3/GVF-specific format is substituted again.
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gff-f1.json",
        "alignment" : {
            "id" : "EST23",
            "start" : 1,
            "end" : 21,
            "strand" : null,
            "cigar-string" : "8M3D6M1I6M"
        }
    }

Key: codon

  • Description: A codon sequence.
  • Appears in: feature object
  • JSON Type: string
  • JSON-LD Type: gfvo-squared:codon
  • Genomic Data Source: GVF (version 1.07)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gvf-f1.json",
        "reference" : {
            "sequence" : "A",
            "codon" : "GAG"
        },
        "variants" : {
            "B" : {
                "sequence" : "G",
                "codon" : "GAG"
            }
        }
    }

Key: codon-phase

  • Description: Phase of a coding sequence.
  • Appears in: feature object
  • JSON Type: number (values: 0, 1, 2)
  • JSON-LD Type: gfvo-squared:codon-phase
  • Genomic Data Source: GFF3 (version 1.21)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gff3-f1.json",
        "codon-phase" : 0
    }

Key: comment

  • Description: A free-text comment.
  • Appears in: feature object, meta object
  • JSON Type: string
  • JSON-LD Type: gfvo-squared:comment
  • Genomic Data Source: GFF3 (version 1.21), GVF (version 1.07), VCF (version 4.2)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gvf-x1.json",
        "technology-platform" : {
            "SP1" : {
                "id" : "SP1",
                "types" : [
                    "SNV"
                ],
                "comment" : "See notes on wiki."
            }
        }
    }

Key: comment-lines

  • Description: Number of comment lines that were read from a genomics file.
  • Appears in: summary object
  • JSON Type: number
  • JSON-LD Type: gfvo-squared:commentLines
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/biointerchange-s1.json",
        "statistics" : {
            "meta-lines" : 19,
            "meta-lines-filtered" : false,
            "features" : 12,
            "features-filtered" : 0,
            "comment-lines" : 0
        }
    }

Key: contig

  • Description: Information about a continuous sequence region.
  • Appears in: meta object
  • JSON Type: object
  • JSON-LD Type: gfvo-squared:contig
  • Genomic Data Source: GFF3 (version 1.21), GVF (version 1.07), VCF (version 4.2)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gff3-x1.json",
        "contig" : {
            "chr3" : {
                "start" : 1,
                "end" : 99,
                "length" : 99
            }
        }
    }

Key: data-source

  • Description: Information about the source of genomic/proteomic features.
  • Appears in: meta object
  • JSON Type: object
  • JSON-LD Type: gfvo-squared:dataSource
  • Genomic Data Source: GVF (version 1.07)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gvf-x1.json",
        "data-source" : {
            "SP3" : {
                "data-type" : "DNA sequence",
                "comment" : "NCBI Short Read Archive (http://www.ncbi.nlm.nih.gov/Traces/sra)",
                "types" : [
                    "SNV"
                ],
                "dbxref" : [
                    "SRA:SRA008175"
                ],
                "sources" : [
                    "MAQ"
                ]
            }
        }
    }

Key: data-type

  • Description: Type of data that is presented by a data source.
  • Appears in: meta object
  • JSON Type: string
  • JSON-LD Type: gfvo-squared:dataType
  • Genomic Data Source: GVF (version 1.07)
  • See Also: Key: data-source
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gvf-x1.json",
        "data-source" : {
            "SP3" : {
                "data-type" : "DNA sequence",
                "comment" : "NCBI Short Read Archive (http://www.ncbi.nlm.nih.gov/Traces/sra)",
                "types" : [
                    "SNV"
                ],
                "dbxref" : [
                    "SRA:SRA008175"
                ],
                "sources" : [
                    "MAQ"
                ]
            }
        }
    }

Key: dbxref

  • Description: External database references.
  • Appears in: feature object, meta object
  • JSON Type: string
  • JSON-LD Type: gfvo-squared:dbxref
  • Genomic Data Source: GFF3 (version 1.21), GVF (version 1.07)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gvf-f1.json",
        "dbxref" : [
            "SGRP:s01-675",
            "EMBL:AA816246"
        ]
    }

Key: depth

  • Description: Read depth.
  • Appears in: feature object
  • JSON Type: number
  • JSON-LD Type: gfvo-squared:depth
  • Genomic Data Source: VCF (version 4.2)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/vcf-f1.json",
        "depth" : 75,
        "samples" : [
            {
                "id" : "129P2",
                "depth" : 2
            }
        ]
    }

Key: effect

  • Description: Name of the effect a variant has on another feature.
  • Appears in: feature object
  • JSON Type: string
  • JSON-LD Type: gfvo-squared:effect
  • Genomic Data Source: GVF (version 1.07)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gvf-f1.json",
        "variants" : {
            "B" : {
                "effects" : [
                    {
                        "effect" : "upstream_gene_variant"
                    }
                ]
            }
        }
    }

Key: effects

  • Description: Container for variant effects.
  • Appears in: feature object
  • JSON Type: array of objects
  • JSON-LD Type: gfvo-squared:effects
  • Genomic Data Source: GVF (version 1.07)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gvf-f1.json",
        "variants" : {
            "B" : {
                "effects" : [
                    {
                        "effect" : "upstream_gene_variant"
                    }
                ]
            }
        }
    }

Key: end

  • Description: End coordinate of a feature or alignment.
  • Appears in: feature object
  • JSON Type: number
  • JSON-LD Type: gfvo-squared:end
  • Genomic Data Source: GFF3 (version 1.21), GVF (version 1.07), VCF (version 4.2)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gff3-f1.json",
        "locus" : {
            "landmark" : "Chr1",
            "start" : 290207,
            "end" : 291002,
            "strand" : "+"
        },
        "alignment" : {
            "id" : "EST23",
            "start" : 1,
            "end" : 21,
            "strand" : null,
            "cigar-string" : "8M3D6M1I6M"
        }
    }

Key: experimentally-validated

  • Description: Indicates whether sequence variant has been experimentally validated.
  • Appears in: feature object
  • JSON Type: boolean
  • JSON-LD Type: gfvo-squared:experimentallyValidated
  • Genomic Data Source: VCF (version 4.2)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/vcf-f1.json",
        "experimentally-validated" : false
    }

Key: feature-format

  • Description: Information about key/values used to describe feature-centric data in a data set.
  • Appears in: meta object
  • JSON Type: object
  • JSON-LD Type: gfvo-squared:featureFormat
  • Genomic Data Source: VCF (version 4.2)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/vcf-x1.json",
        "feature-format" : {
            "membership-hapmap-2" : {
                "type" : "String",
                "number" : 1,
                "description" : "HapMap 2 membership"
            },
            "AF1" : {
                "type" : "Float",
                "number" : 1,
                "description" : "Max-likelihood estimate of the first ALT allele frequency (assuming HWE)"
            }
        }
    }

Key: features

  • Description: Number of genomic-features lines that were read from a genomics file.
  • Appears in: summary object
  • JSON Type: number
  • JSON-LD Type: gfvo-squared:features
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/biointerchange-s1.json",
        "statistics" : {
            "meta-lines" : 19,
            "meta-lines-filtered" : false,
            "features" : 12,
            "features-filtered" : 0,
            "comment-lines" : 0
        }
    }

Key: features-filtered

  • Description: Number of genomic-features lines that were filtered through the Python API.
  • Appears in: summary object
  • JSON Type: number
  • JSON-LD Type: gfvo-squared:featuresFiltered
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/biointerchange-s1.json",
        "statistics" : {
            "meta-lines" : 19,
            "meta-lines-filtered" : false,
            "features" : 12,
            "features-filtered" : 0,
            "comment-lines" : 0
        }
    }

Key: file-date

  • Description: Creation date of the file that contains the represented genomics data.
  • Appears in: meta object
  • JSON Type: string
  • JSON-LD Type: gfvo-squared:fileDate
  • Genomic Data Source: GVF (version 1.07), VCF (version 4.2)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gvf-x1.json",
        "file-date" : "2015-03-08"
    }

Key: finish

  • Description: Finish time of the BioInterchange software.
  • Appears in: summary object
  • JSON Type: string
  • JSON-LD Type: gfvo-squared:finish
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/biointerchange-s1.json",
         "runtime" : {
            "invocation" : "Fri Jul 17 17:00:54 2015",
            "finish" : "Fri Jul 17 17:00:56 2015",
            "lapsed-seconds" : "2"
        }
    }

Key: genome-build

  • Description: Information about the underlying genome builds of a data set.
  • Appears in: meta object
  • JSON Type: array of objects
  • JSON-LD Type: gfvo-squared:genomeBuild
  • Genomic Data Source: GFF3 (version 1.21), GVF (version 1.07)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gvf-f1.json",
        "genome-build" : [
            {
                "source" : "NCBI",
                "build" : "B36"
            }
        ]
    }

Key: genomic-source

  • Description: Information about the genomic origin source (LOINC code) of feature data.
  • Appears in: meta object
  • JSON Type: string
  • JSON-LD Type: gfvo-squared:genomicSource
  • Genomic Data Source: GVF (version 1.07)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gvf-x1.json",
        "genomic-source" : "somatic"
    }

Key: genotype

  • Description: Feature or sample specific genotype information.
  • Appears in: feature object
  • JSON Type: object
  • JSON-LD Type: gfvo-squared:genotype
  • Genomic Data Source: GVF (version 1.07), VCF (version 4.2)
  • See Also: AA, AB, etc.
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/vcf-f1.json",
        "samples" : [
            {
                "id" : "129P2",
                "AA" : {
                    "genotype-likelihood-phred-scaled" : 53
                },
                "AB" : {
                    "genotype-likelihood-phred-scaled" : 6
                },
                "BB" : {
                    "genotype-likelihood-phred-scaled" : 0
                },
                "genotype" : {
                    "sequences" : [
                        "G",
                        "A"
                    ],
                    "phased" : false,
                    "alleles" : "AB"
                }
            }
        ]
    }

Key: genotype-format

  • Description: Information about keys/values used with genotypes in a data set.
  • Appears in: meta object
  • JSON Type: object
  • JSON-LD Type: gfvo-squared:genotypeFormat
  • Genomic Data Source: VCF (version 4.2)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/vcf-x1.json",
        "genotype-format" : {
            "depth" : {
                "type" : "Integer".
                "number" : 1,
                "description" : "# high-quality bases"
            },
            "genotype-likelihood" : {
                "type" : "Float",
                "number" : 3,
                "description" : "Likelihoods for RR,RA,AA genotypes (R=ref,A=alt)"
            },
            "SP" : {
                "type" : "Integer",
                "number" : 1,
                "description" : "Phred-scaled strand bias P-value"
            }
        }
    }

Key: genotype-likelihood

  • Description: Log-10 scaled genotype likelihood.
  • Appears in: feature object
  • JSON Type: number
  • JSON-LD Type: gfvo-squared:genotypeLikelihood
  • Genomic Data Source: VCF (version 4.2)
  • See Also: AA, AB, etc.
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/vcf-f1.json",
        "samples" : [
            {
                "id" : "129P2",
                "AA" : {
                    "genotype-likelihood" : 12
                },
                "AB" : {
                    "genotype-likelihood" : 3
                },
                "BB" : {
                    "genotype-likelihood" : 0
                },
                "genotype" : {
                    "sequences" : [
                        "G",
                        "A"
                    ],
                    "phased" : false,
                    "alleles" : "AB"
                }
            }
        ]
    }

Key: genotype-likelihood-phred-scaled

  • Description: Phred-scaled genotype likelihood.
  • Appears in: feature object
  • JSON Type: number
  • JSON-LD Type: gfvo-squared:genotypeLikelihoodPhredScaled
  • Genomic Data Source: VCF (version 4.2)
  • See Also: AA, AB, etc.
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/vcf-f1.json",
        "samples" : [
            {
                "id" : "129P2",
                "AA" : {
                    "genotype-likelihood-phred-scaled" : 53
                },
                "AB" : {
                    "genotype-likelihood-phred-scaled" : 12
                },
                "BB" : {
                    "genotype-likelihood-phred-scaled" : 0
                },
                "genotype" : {
                    "sequences" : [
                        "G",
                        "A"
                    ],
                    "phased" : false,
                    "alleles" : "AB"
                }
            }
        ]
    }

Key: gff-version

  • Description: Genomics file-format versioning of GFF3 data sources.
  • Appears in: meta object
  • JSON Type: string
  • JSON-LD Type: gfvo-squared:gffVersion
  • Genomic Data Source: GFF3 (version 1.21), GVF (version 1.07)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gff3-x1.json",
        "gff-version" : "1.21"
    }

Key: global

  • Description: Global assignment that applies to all features in the data set.
  • Appears in: meta object
  • JSON Type: object
  • JSON-LD Type: gfvo-squared:global
  • Genomic Data Source: GFF3 (version 1.21), GVF (version 1.07), VCF (version 4.2)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gvf-x1.json",
        "technology-platform" : {
            "global" : {
                "comment" : "Preliminary data."
            }
        }
    }

Key: gvf-version

  • Description: Genomics file-format versioning of GVF data sources.
  • Appears in: meta object
  • JSON Type: string
  • JSON-LD Type: gfvo-squared:gvfVersion
  • Genomic Data Source: GVF (version 1.07)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gvf-x1.json",
        "gvf-version" : "1.07"
    }

Key: id

  • Description: An identifier.
  • Appears in: feature object, meta object
  • JSON Type: string
  • JSON-LD Type: gfvo-squared:id
  • Genomic Data Source: GFF3 (version 1.21), GVF (version 1.07), VCF (version 4.2)
  • Note: Multiple "ID" entries in VCF: only the first entry becomes an "id" in JSON-LD, where as the remaining identifiers become "alias" entries; this behaviour is in concordance with GFF3/GVF representations of identifiers.
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gff3-f1.json",
        "id" : "ENSG00000139618"
    }

Key: individuals

  • Description: Identifiers of sequenced individuals.
  • Appears in: meta object
  • JSON Type: array of strings
  • JSON-LD Type: gfvo-squared:individuals
  • Genomic Data Source: GVF (version 1.07)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gvf-x1.json",
        "individuals" : [
            "NA18507",
            "NA12878",
            "NA19240"
        ]
    }

Key: input-file

  • Description: Name and relative path of the input file from which BioInterchange read data.
  • Appears in: context object
  • JSON Type: string
  • JSON-LD Type: gfvo-squared:inputFile
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/vcf-c1.json",
        "input-file" : "test-data/playground-vcf.vcf"
    }

Key: input-filetype

  • Description: File type of the input.
  • Appears in: context object
  • JSON Type: string (either “GFF3”, “GVF”, or “VCF”)
  • JSON-LD Type: gfvo-squared:inputFiletype
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/vcf-c1.json",
        "input-filetype" : "VCF"
    }

Key: invocation

  • Description: Invocation time of the BioInterchange software.
  • Appears in: summary object
  • JSON Type: string
  • JSON-LD Type: gfvo-squared:invocation
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/biointerchange-s1.json",
         "runtime" : {
            "invocation" : "Fri Jul 17 17:00:54 2015",
            "finish" : "Fri Jul 17 17:00:56 2015",
            "lapsed-seconds" : "2"
        }
    }

Key: landmark

  • Description: A named genomic or proteomic landmark.
  • Appears in: feature object
  • JSON Type: string
  • JSON-LD Type: gfvo-squared:landmark
  • Genomic Data Source: GFF3 (version 1.21), GVF (version 1.07), VCF (version 4.2)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gff3-f1.json",
        "locus" : {
            "landmark" : "Chr1",
            "start" : 290207,
            "end" : 291002
        }
    }

Key: landmarks

  • Description: List of landmark identifiers.
  • Appears in: meta object
  • JSON Type: array of strings
  • JSON-LD Type: gfvo-squared:landmarks
  • Genomic Data Source: GVF (version 1.07)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gvf-f1.json",
        "technology-platform" : [
            {
                "id" : "SP1",
                "landmarks" : [
                    "Chr1", "ChrX"
                ]
            }
        ]
    }

Key: lapsed-seconds

  • Description: Number of seconds that the BioInterchange software took for one execution.
  • Appears in: summary object
  • JSON Type: string
  • JSON-LD Type: gfvo-squared:lapsed-seconds
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/biointerchange-s1.json",
         "runtime" : {
            "invocation" : "Fri Jul 17 17:00:54 2015",
            "finish" : "Fri Jul 17 17:00:56 2015",
            "lapsed-seconds" : "2"
        }
    }

Key: length

  • Description: Length of a genomic feature or continuous sequence region.
  • Appears in: meta object
  • JSON Type: number
  • JSON-LD Type: gfvo-squared:length
  • Genomic Data Source: GFF3 (version 1.21), GVF (version 1.07), VCF (version 4.2)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gff3-x1.json",
        "contig" : {
            "chr3" : {
                "start" : 1,
                "end" : 99,
                "length" : 99
            }
        }
    }

Key: locus

  • Description: A genomic or proteomic locus.
  • Appears in: feature object
  • JSON Type: object
  • JSON-LD Type: gfvo-squared:locus
  • Genomic Data Source: GFF3 (version 1.21), GVF (version 1.07), VCF (version 4.2)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gff3-f1.json",
        "locus" : {
            "landmark" : "Chr1",
            "start" : 290207,
            "end" : 291002
        }
    }

Key: mapping-quality-rms

  • Description: Root-mean-square mapping quality.
  • Appears in: feature object
  • JSON Type: number
  • JSON-LD Type: gfvo-squared:mappingQualityRMS
  • Genomic Data Source: VCF (version 4.2)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/vcf-f1.json",
        "mapping-quality-rms" : 29
    }

Key: membership-1000G

  • Description: Membership in the 1000 Genomes (1000G) project.
  • Appears in: feature object
  • JSON Type: boolean
  • JSON-LD Type: gfvo-squared:membership1000G
  • Genomic Data Source: VCF (version 4.2)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/vcf-f1.json",
        "membership-1000G" : true
    }

Key: membership-dbsnp

  • Description: Membership in dbSNP.
  • Appears in: feature object
  • JSON Type: boolean
  • JSON-LD Type: gfvo-squared:membershipDbSNP
  • Genomic Data Source: VCF (version 4.2)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/vcf-f1.json",
        "membership-dbsnp" : true
    }

Key: membership-hapmap-2

  • Description: Membership in HapMap 2.
  • Appears in: feature object
  • JSON Type: boolean
  • JSON-LD Type: gfvo-squared:membershipHapMap2
  • Genomic Data Source: VCF (version 4.2)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/vcf-f1.json",
        "membership-hapmap-2" : true
    }

Key: membership-hapmap-3

  • Description: Membership in HapMap 3.
  • Appears in: feature object
  • JSON Type: boolean
  • JSON-LD Type: gfvo-squared:membershipHapMap3
  • Genomic Data Source: VCF (version 4.2)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/vcf-f1.json",
        "membership-hapmap-3" : true
    }

Key: meta-lines

  • Description: Number of genomic-meta/pragma lines that were read from a genomics file.
  • Appears in: summary object
  • JSON Type: number
  • JSON-LD Type: gfvo-squared:metaLines
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/biointerchange-s1.json",
        "statistics" : {
            "meta-lines" : 19,
            "meta-lines-filtered" : false,
            "features" : 12,
            "features-filtered" : 0,
            "comment-lines" : 0
        }
    }

Key: meta-lines-filtered

  • Description: Number of genomic-features lines that were filtered through the Python API.
  • Appears in: summary object
  • JSON Type: number
  • JSON-LD Type: gfvo-squared:metaLinesFiltered
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/biointerchange-s1.json",
        "statistics" : {
            "meta-lines" : 19,
            "meta-lines-filtered" : false,
            "features" : 12,
            "features-filtered" : 0,
            "comment-lines" : 0
        }
    }

Key: ontology

  • Description: A reference to an external ontology.
  • Appears in: meta object
  • JSON Type: string
  • JSON-LD Type: gfvo-squared:ontology
  • Genomic Data Source: GFF3 (version 1.21), GVF (version 1.07)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gvf-x1.json",
        "phenotype-description" : {
            "global" : {
                "ontology" : "http://www.human-phenotype-ontology.org/human-phenotype-ontology.obo.gz"
            }
        }
    }

Key: ontology-term

  • Description: Ontology terms associated with a genomic feature.
  • Appears in: feature object
  • JSON Type: array of strings
  • JSON-LD Type: gfvo-squared:ontology-term
  • Genomic Data Source: GFF3 (version 1.21), GVF (version 1.07)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gff3-f1.json",
        "ontology-term" : [
            "GO:0046703"
        ]
    }

Key: output-file

  • Description: Filename and relative path to the output of the BioInterchange software.
  • Appears in: context object
  • JSON Type: string (“null”, if output was written to the console)
  • JSON-LD Type: gfvo-squared:outputFile
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/vcf-c1.json",
        "output-file" : "example.ldj"
    }

Key: phase-set

  • Description: Phase set identifier; indicates to which phase set a phased genotype belongs.
  • Appears in: feature object
  • JSON Type: number
  • JSON-LD Type: gfvo-squared:phaseSet
  • Genomic Data Source: VCF (version 4.2)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/vcf-f1.json",
        "samples" : [
            {
                "id" : "129P2",
                "genotype" : {
                    "alleles" : "BB",
                    "phased" : true,
                    "sequences" : [
                        "A",
                        "A"
                    ]
                },
                "phase-set" : 1
            }
        ]
    }

Key: phased

  • Description: Indicates whether a genotype is phased or unphased.
  • Appears in: feature object
  • JSON Type: boolean
  • JSON-LD Type: gfvo-squared:phased
  • Genomic Data Source: GVF (version 1.07), VCF (version 4.2)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/vcf-f1.json",
        "samples" : [
            {
                "id" : "129P2",
                "genotype" : {
                    "alleles" : "BB",
                    "phased" : true,
                    "sequences" : [
                        "A",
                        "A"
                    ]
                },
                "phase-set" : 1
            }
        ]
    }

Key: phased-genotypes

  • Description: Marks phased genotypes and provides extra information about them.
  • Appears in: meta object
  • JSON Type: boolean
  • JSON-LD Type: gfvo-squared:phasedGenotypes
  • Genomic Data Source: GVF (version 1.07)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gvf-x1.json",
        "phased-genotypes" : {
            "SP11" : {
                "types" : [
                    "SNV"
                ]
            }
        }
    }

Key: phasing-quality

  • Description: Phred-scaled probability that alleles are ordered incorrectly.
  • Appears in: feature object
  • JSON Type: number
  • JSON-LD Type: gfvo-squared:phasingQuality
  • Genomic Data Source: VCF (version 4.2)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/vcf-f1.json",
        "samples" : [
            {
                "id" : "129P3",
                "genotype" : {
                    "alleles" : "AB",
                    "phased" : true,
                    "sequences" : [
                        "A",
                        "T"
                    ]
                },
                "phasing-quality" : 4
            }
        ]
    }

Key: phenotype-description

  • Description: Additional information about phenotypes.
  • Appears in: meta object
  • JSON Type: object
  • JSON-LD Type: gfvo-squared:phenotypeDescription
  • Genomic Data Source: GVF (version 1.07)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gvf-x1.json",
        "phenotype-description" : {
            "global" : {
                "ontology" : "http://www.human-phenotype-ontology.org/human-phenotype-ontology.obo.gz"
            }
        }
    }

Key: population

  • Description: Population code that is assigned to an individual.
  • Appears in: meta object
  • JSON Type: string
  • JSON-LD Type: gfvo-squared:population
  • Genomic Data Source: GVF (version 1.07)
  • See Also: 1000 Genomes project population codes
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gvf-x1.json",
        "population" : "YRI"
    }

Key: python-callback

  • Description: Package and module name used for the Python API.
  • Appears in: context object
  • JSON Type: string (“null”, if the Python API was not used)
  • JSON-LD Type: gfvo-squared:pythonCallback
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/vcf-c1.json",
        "python-callback" : "simplepy.simple"
    }

Key: read-pair-span

  • Description: Global assignment that applies to all features in the data set.
  • Appears in: meta object
  • JSON Type: array of numbers
  • JSON-LD Type: gfvo-squared:readPairSpan
  • Genomic Data Source: GVF (version 1.07)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gvf-x1.json",
        "technology-platform" : {
            "SP1" : {
                "read-pair-span" : [
                    135,
                    440
                ]
            }
        }
    }

Key: reads-with-zero-mapping-quality

  • Description: Number of reads with mapping quality being equal to zero.
  • Appears in: feature object
  • JSON Type: number
  • JSON-LD Type: gfvo-squared:readsWithZeroMappingQuality
  • Genomic Data Source: VCF (version 4.2)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/vcf-f1.json",
        "reads-with-zero-mapping-quality" : 2
    }

Key: reference

  • Description: Reference sequence information.
  • Appears in: feature object
  • JSON Type: object
  • JSON-LD Type: gfvo-squared:reference
  • Genomic Data Source: GVF (version 1.07), VCF (version 4.2)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/vcf-f1.json",
        "reference" : {
            "sequence" : "A"
        }
    }

Key: reference-fasta

  • Description: A link to a FASTA file that contains reference sequences.
  • Appears in: meta object
  • JSON Type: string
  • JSON-LD Type: gfvo-squared:referenceFASTA
  • Genomic Data Source: GVF (version 1.07), VCF (version 4.2)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/vcf-x1.json",
        "reference-fasta" : "ftp://ftp-mouse.sanger.ac.uk/ref/GRCm38_68.fa"
    }

Key: runtime

  • Description: Information about runtime statistics of the BioInterchange software.
  • Appears in: summary object
  • JSON Type: object
  • JSON-LD Type: gfvo-squared:runtime
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/biointerchange-s1.json",
         "runtime" : {
            "invocation" : "Fri Jul 17 17:00:54 2015",
            "finish" : "Fri Jul 17 17:00:56 2015",
            "lapsed-seconds" : "2"
        }
    }

Key: samples

  • Description: Sample information
  • Appears in: feature object
  • JSON Type: orray of bject
  • JSON-LD Type: gfvo-squared:samples
  • Genomic Data Source: VCF (version 4.2)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/vcf-f1.json",
        "samples" : [
            {
                "id" : "129P2",
                "genotype" : {
                    "sequences" : [
                        "G",
                        "A"
                    ],
                    "phased" : false,
                    "alleles" : "AB"
                }
            }
        ]
    }

Key: samples-with-data

  • Description: Number of samples with data.
  • Appears in: feature object
  • JSON Type: number
  • JSON-LD Type: gfvo-squared:samplesWithData
  • Genomic Data Source: VCF (version 4.2)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/vcf-f1.json",
        "samples-with-data" : 1
    }

Key: score-method

  • Description: Information about scoring methods that are used in a data set.
  • Appears in: meta object
  • JSON Type: object
  • JSON-LD Type: gfvo-squared:scoreMethod
  • Genomic Data Source: GVF (version 1.07)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gvf-x1.json",
        "score-method" : {
            "global" : {
                "comment" : "Scores are Phred scaled probabilities of an incorrect sequence_alteration call"
            }
        }
    }

Key: sequence

  • Description: Genomic or proteomic sequence.
  • Appears in: feature object
  • JSON Type: string
  • JSON-LD Type: gfvo-squared:sequence
  • Genomic Data Source: GFF3 (version 1.21), GVF (version 1.07), VCF (version 4.2)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gff3-f1.json",
        "id" : "CDS1",
        "locus" : {
            "landmark" : "chr4",
            "start" : 2,
            "end" : 36,
            "strand" : "-"
        },
        "type" : "CDS",
        "sequence" : "gttcattgctgcctgcatgttcattgtctacctcg"
    }

Key: sequences

  • Description: A list of genomic or proteomic sequences.
  • Appears in: feature object
  • JSON Type: object
  • JSON-LD Type: gfvo-squared:sequences
  • Genomic Data Source: VCF (version 4.2)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/vcf-f1.json",
        "samples" : [
            {
                "id" : "129P2",
                "genotype" : {
                    "sequences" : [
                        "G",
                        "A"
                    ],
                    "alleles" : "AB"
                }
            }
        ]
    }

Key: sex

  • Description: Sex of a sequenced individual.
  • Appears in: meta object
  • JSON Type: string
  • JSON-LD Type: gfvo-squared:sex
  • Genomic Data Source: GVF (version 1.07)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gvf-x1.json",
        "sex" : "male"
    }

Key: somatic-mutation

  • Description: Indicates whether a feature is a somatic mutation.
  • Appears in: feature object
  • JSON Type: boolean
  • JSON-LD Type: gfvo-squared:somaticMutation
  • Genomic Data Source: VCF (version 4.2)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/vcf-f1.json",
        "somatic-mutation" : true
    }

Key: source

  • Description: Database or algorithm name that is the source of a feature or data set.
  • Appears in: feature object
  • JSON Type: string
  • JSON-LD Type: gfvo-squared:source
  • Genomic Data Source: GFF3 (version 1.21), GVF (version 1.07)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gvf-f1.json",
        "source" : "samtools"
    }

Key: source-method

  • Description: Information about data sources in a data set.
  • Appears in: meta object
  • JSON Type: object
  • JSON-LD Type: gfvo-squared:sourceMethod
  • Genomic Data Source: GVF (version 1.07)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gvf-x1.json",
        "source-method" : {
            "SP6" : {
                "comment" : "Short Elongated Alignment Program (SOAP)",
                "sources" : [
                    "SOAP"
                ],
                "types" : [
                    "SNV"
                ],
                "dbxref" : [
                    "PMID:18227114",
                    "PMID:18987735"
                ]
            }
        }
    }

Key: sources

  • Description: List of data sources.
  • Appears in: meta object
  • JSON Type: array of strings
  • JSON-LD Type: gfvo-squared:sources
  • Genomic Data Source: GVF (version 1.07)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gvf-f1.json",
        "technology-platform" : {
            "SP1" : {
                "sources" : [
                    "Genbank", "dbSNP"
                ]
            }
        }
    }

Key: start

  • Description: Start coordinate of a feature or alignment.
  • Appears in: feature object
  • JSON Type: number
  • JSON-LD Type: gfvo-squared:start
  • Genomic Data Source: GFF3 (version 1.21), GVF (version 1.07), VCF (version 4.2)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gff3-f1.json",
        "locus" : {
            "landmark" : "Chr1",
            "start" : 290207,
            "end" : 291002,
            "strand" : "+"
        },
        "alignment" : {
            "id" : "EST23",
            "start" : 1,
            "end" : 21,
            "strand" : null,
            "cigar-string" : "8M3D6M1I6M"
        }
    }

Key: statistics

  • Description: Summary statistics about the genomic data that was processed.
  • Appears in: summary object
  • JSON Type: object
  • JSON-LD Type: gfvo-squared:statistics
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/biointerchange-s1.json",
        "statistics" : {
            "meta-lines" : 19,
            "meta-lines-filtered" : false,
            "features" : 12,
            "features-filtered" : 0,
            "comment-lines" : 0
        }
    }

Key: strand

  • Description: Strand on which a feature is located.
  • Appears in: feature object
  • JSON Type: string ("+": forward strand, "-": reverse strand)
  • JSON-LD Type: gfvo-squared:strand
  • Genomic Data Source: GFF3 (version 1.21), GVF (version 1.07), VCF (version 4.2)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gff3-f1.json",
        "locus" : {
            "landmark" : "Chr1",
            "start" : 290207,
            "end" : 291002,
            "strand" : "+"
        },
        "alignment" : {
            "id" : "EST23",
            "start" : 1,
            "end" : 21,
            "strand" : "+",
            "cigar-string" : "8M3D6M1I6M"
        }
    }

Key: strand-bias

  • Description: Strand bias.
  • Appears in: feature object
  • JSON Type: number
  • JSON-LD Type: gfvo-squared:strandBias
  • Genomic Data Source: VCF (version 4.2)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/vcf-f1.json",
        "strand-bias" : 0.5
    }

Key: technology-platform

  • Description: Information about the technology platform that was used to create a data set.
  • Appears in: meta object
  • JSON Type: object
  • JSON-LD Type: gfvo-squared:technologyPlatform
  • Genomic Data Source: GVF (version 1.07)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gvf-x1.json",
        "technology-platform" : {
            "SP1" : {
                "types" : [
                    "SNV"
                ],
                "comment" : "SNV data described in Wiki."
            }
        }
    }

Key: type

  • Description: Feature type.
  • Appears in: feature object
  • JSON Type: string
  • JSON-LD Type: gfvo-squared:type
  • Genomic Data Source: GFF3 (version 1.21), GVF (version 1.07)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gvf-f1.json",
        "type" : "SNV"
    }

Key: types

  • Description: List of feature types.
  • Appears in: meta object
  • JSON Type: array of strings
  • JSON-LD Type: gfvo-squared:types
  • Genomic Data Source: GVF (version 1.07)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gvf-f1.json",
        "technology-platform" : {
            "SP1" : {
                "types" : [
                    "SNV"
                ],
                "comment" : "SNV data described in Wiki."
            }
        }
    }

Key: user-defined

  • Description: Key/value pairs that are not defined in the GFF3-, GVF-, or VCF-specification.
  • Appears in: meta object, feature object
  • JSON Type: object containing key/value pairs
  • JSON-LD Type: gfvo-squared:user-defined
  • Genomic Data Source: GFF3 (version 1.21), GVF (version 1.07), VCF (version 4.2)
  • See Also: Appendix: unknown keys
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/vcf-f1.json",
        "user-defined" : {
            "VDB" : 0.0006,
            "INDEL" : true,
            "DP4" : "0,0,71,0"
        }
    }

Key: user-parameter

  • Description: “-u” parameter value when running the BioInterchange software.
  • Appears in: context object
  • JSON Type: string (“null” when no “-u” was used)
  • JSON-LD Type: gfvo-squared:user-parameter
  • See Also: Appendix: unknown keys
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gff3-c1.json",
        "user-parameter" : "Preliminary data."
    }

Key: variants

  • Description: A collection of sequence variants.
  • Appears in: feature object
  • JSON Type: object
  • JSON-LD Type: gfvo-squared:variants
  • Genomic Data Source: GVF (version 1.07), VCF (version 4.2)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/gvf-f1.json",
        "variants" : {
            "B" : {
                "sequence" : "T"
            }
        }
    }

Key: vcf-version

  • Description: Genomics file-format versioning of VCF data sources.
  • Appears in: meta object
  • JSON Type: string
  • JSON-LD Type: gfvo-squared:vcfVersion
  • Genomic Data Source: VCF (version 4.2)
  • In Model Version: 1
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/vcf-x1.json",
        "vcf-version" : "4.2"
    }

Appendix: unknown keys

  • Description: Keys that are user defined.
  • Appears in: feature object, meta object
  • JSON Type: string
  • JSON-LD Type: gfvo-squared:unknownProperty-*
  • Genomic Data Source: GFF3 (version 1.21), GVF (version 1.07), VCF (version 4.2)
  • In Model Version: 1
  • Note: The GFVO2 type expands to include the unknown key. For example, "VDB" becomes "gfvo-squared:unknownProperty-VDB".
  • Example:
    {
        "@context" : "https://www.codamono.com/jsonld/vcf-f1.json",
        "user-defined" : {
            "VDB" : 0.0006,
            "INDEL" : true,
            "DP4" : "0,0,71,0"
        }
    }

About

Genomic linked data converter. Interchange GFF, GVF, VCF files via JSON.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published