-
Notifications
You must be signed in to change notification settings - Fork 5
Getting Started As a Developer
This repository is written in python (3.0). It accesses neo4j directly, using "load csv" and/or "unwind" as necessary. It is written to be able to run a "test" set of data, or the entire data load set, depending on the make command that is chosen. Please see the README in this repo for how to remove the db, build the entire data set or build a test set.
The purpose of the agr_loader repository is to push data from the participating organizations into a data store that is regenerated completely on each build.
holds all the loader code
is a submodule of the agr_schemas repo and is used to validate the loading files on load.
has a name leftover from the prototype where this was the control for generating the ES index from the data files. Now it is the control of the load scripts only.
It executes 3 main routines: create_indicies - makes the indexes in the neo4j data store load_from_ontologies - loads up the ontologies used in the rest of the data load_from_mods - loads up the MOD data from S3 including the GAF files from GO
These three methods are found in aggregate_loader.py
contains the control structures for executing the load. Start here when you want to add a new data load.
species specific classes that inherit from MOD.py - the base MOD class.
generic methods for parsing different kinds of files
extractors for each data source, that pass their data in maps to the appropriate src/loaders/ loader.
just containers that initiate a transaction (via a Transaction object), and pass the data onto the src/loaders/transactions
each holds the neo4j query that loads the data directly from its python map.