layout | id | title | contact | status | fail_mode | implementations | |||||
---|---|---|---|---|---|---|---|---|---|---|---|
rule |
GORULE:0000001 |
Basic GAF checks |
cherry@genome.stanford.edu |
Implemented |
hard |
|
The following basic checks ensure that submitted and parsed gene association files conform to some GO specific specifications, and come from the original GAF check script.
- Each line of the GAF file is checked for the correct number of columns, the cardinality of the columns, leading or trailing whitespace
- Col 1 and all DB abbreviations must be in db-xrefs.yaml (see below)
- All GO IDs must be extant in current ontology
- Qualifier, evidence, aspect and DB object columns must be within the list of allowed values
- DB:Reference, Taxon and GO ID columns are checked for minimal form
- All IEAs over a year old are removed
- Taxa with a 'representative' group (e.g. MGI for Mus musculus, FlyBase for Drosophila) must be submitted by that group only
In some contexts an identifier is represented using two fields, for example col1 (prefix) and col2 (local id) of a GAF or GPAD. The global id is formed by concatenating these with :
. In other contexts such as the "With/fron" field, a global ID is specified, which MUST always be prefixed.
In all cases, the prefix MUST be in db-xrefs.yaml. The prefix SHOULD be identical (case-sensitive match) to the database
field. If it does not match then it MUST be identical (case-sensitive) to one of the synonyms.
When consuming association files, programs SHOULD repair by replacing prefix synonyms with the canonical form, in addition to reporting on the mismatch. For example, as part of the association file release the submitted files should swap out legacy uses of 'UniProt' with 'UniProtKB'