Bento::Meta::Model - object bindings for Bento Metamodel Database
# empty model with name $handle:
$model = Bento::Meta::Model->new('Test');
# pull model from database - add bolt connection with Neo4j::Bolt
$model = Bento::Meta::Model->new('ICDC',Neo4j::Bolt->connect('bolt://localhost:7687))
# connect model to db after creating
$model = Bento::Meta::Model->new('CTDC');
$model->set_bolt_cxn( Neo4j::Bolt->connect('bolt://localhost:7687') );
$model->get(); # pulls nodes, properties, relationships with model => 'CTDC'
# read a model from MDF YAML files:
use Bento::Meta::MDF;
$model = Bento::Meta::MDF->create_model(qw/icdc-model.yml icdc-model-props.yml/);
# connect it and push to db
$model->set_bolt_cxn( Neo4j::Bolt->connect('bolt://localhost:7687') );
$model->put(); # writes all to db
# build model from scratch: add, change, and remove entities
$model = Bento::Meta::Model->new('Test');
# create some nodes and add them
($case, $sample, $file) =
map { Bento::Meta::Model::Node->new({handle => $_}) } qw/case sample file/;
$model->add_node($case);
$model->add_node($sample);
$model->add_node($file);
# create some relationships (edges) between nodes
$of_case = Bento::Meta::Model::Edge->new({
handle => 'of_case',
src => $sample,
dst => $case });
$has_file = Bento::Meta::Model::Edge->new({
handle => 'has_file',
src => $sample,
dst => $file });
$model->add_edge($of_case);
$model->add_edge($has_file);
# create some properties and add to nodes or to edges
$case_name = Bento::Meta::Model::Property->new({
handle => 'name',
value_domain => 'string' });
$workflow_type = Bento::Meta::Model::Property->new({
handle => 'workflow_type',
value_domain => 'value_set' });
$model->add_prop( $case => $case_name );
$model->add_prop( $has_file => $workflow_type );
# add some terms to a property with a value set (i.e., enum)
$model->add_terms( $workflow_type => qw/wdl cwl snakemake/ );
Bento::Meta::Model provides an object representation of a single property graph-based data model, as embodied in the structure of the Bento Metamodel Database (MDB). The MDB can store multiple such models in terms of their nodes, relationships, and properties. The MDB links these entities according to the structure of the individual models. For example, model nodes are represented as metamodel nodes of type "node", model relationships as metamodel nodes of type "relationship", that themselves link to the relevant source and destination metamodel nodes, representing the two end nodes in the model itself. Bento::Meta::Model can create, read, update, and link these entities together according to the MDB structure.
The MDB also provides entities for defining and maintaining
terminology associated with the stored models. These include the
term
s themselves, their origin
, and associated concept
s. Each
of these entities can be created, read, and updated using
Bento::Meta::Model and the component objects.
The metamodel is a property graph, designed to store specific property graph models, in a database built for property graphs. The word "node" is therefore used in different contexts and can be confusing, especially since the Cancer Research Data Commons is also set up in terms of "nodes", which are central repositories of cancer data of different kinds. This and related documentation will attempt to distinguish these concepts as follows.
- A "graph node" is a instance of the node concept in the property graph model, that usually represents a category or item of interest in the real world, and has associate properties that distinguish it from other instances.
- A "model node" is a graph node within a specific data model, and represents groups of data items (properties) and can be related to other model nodes via model relationships.
- A "metamodel node" is a graph node that represents a model node, model relationship, or model property, in the metamodel database.
- A "Neo4j node" refers generically to the representation of a node in the Neo4j database engine.
- A "CRDC node" refers to a data commons repository that is part of the CRDC, such as the ICDC.
Bento::Meta creates a mapping between Neo4j nodes and Perl objects. Of course, the objects have data associated with them, accessed via setters and getters. These object-associated data are referred to exclusively as "attributes" in the documentation.
Thus, a Bento::...::Node
object has an attribute props
(properties), which is an (associative) array of
Bento::...::Property
objects. The props
attribute is a
representation of the has_property
relationships between the
metamodel node-type node to its metamodel property-type nodes.
See below for more details.
Each model stored in the MDB has a simple name, or handle. The word "handle" is used throughout the metamodel to distinguish internal names (strings that are used within the system and downstream applications to refer to entities) and the external "terms" that are employed by users and standards. Handles can be understood as "local vocabulary". The handle is usually the name of the CRDC node that the model supports.
A Bento::Meta::Model object is meant to represent only one model. The Bento::Meta object can contain and retrieve a number of models.
Individual entities in the MDB - nodes, relationships, properties, value sets, terms, concepts, and origins, are represented by instances of corresponding Bento::Meta::Model::Entity subclasses:
-
("Edge" is easier to type than "relationship".)
Bento::Meta::Model
methods generally accept these objects as
arguments and/or return these objects. To obtain specific scalar
information (for example, the handle string) of the object, use the
relevant getter on the object itself:
# print the 'handle' for every property in the model
for ($model->props) {
say $_->handle;
}
A node in the graph database can possess two kinds of related data. In the (Neo4j) database, node "properties" are named items of scalar data. These belong directly to the individual nodes, and are referenced via the node. These map very naturally to scalar attributes of a model object. For example, "handle" is a metamodel node property, and it is accessed simply by the object attribute of the same name, $node->handle(). In the code, these are referred to as "property attributes" or "scalar-valued attributes".
The other kind of data related to a given node is present in other nodes that are linked to it via graph database relationships. In the MDB, for example, a model edge (e.g., "of_sample") is represented by its own graph node of type "Relationship", and the source and destination nodes for that edge are two graph nodes of type "Node", one of which is linked to the Relationship node with a graph relationship "has_src", and the other with a graph relationship "has_dst". (Refer to this diagram.)
In the object model, the source and destination nodes of an edge are also represented as object attributes: in this case, $edge->src and $edge->dst. This representation encapsulates the "has_src" and "has_dst" graph relationships, so that the programmer can ignore the metamodel structure and concentrate on the model structure. Note that the value of such an attribute is an object (or an array of objects). In the code, such attributes are referred to as "relationship", "object-valued" or "collection-valued" attributes.
Individual objects have their own interfaces, which are partially described in "METHODS" below. Essentially, the name of the attribute is the name of the getter, while "set_<name>" is the setter. Getter return types depend on whether the attribute is scalar, object, or collection-valued. Setter arguments have similar dependencies.
For an attribute "blarg":
getter setter
scalar-valued blarg() returns scalar set_blarg($scalar)
object-valued blarg() returns object set_blarg($obj)
collection-valued blarg() returns array of objects set_blarg(key => $obj)
blarg(key) returns object
A true array is returned by collection-valued getters, not an arrayref.
Collection-valued attributes are generally associative arrays. The key is the handle() of the subordinate object (or value() in the case of term objects).
More details about objects can be found in Bento::Meta::Model::Entity.
The Model object is a direct container of nodes, edges (relationships), and properties. To get a simple list of all relevant entities in a model, use the model getters:
@nodes = $model->nodes();
To retrieve a specific entity, provide a key to the getter as the argument. The keys are laid out as follows
Entity Key Example
------ --- -------
Node <node handle> sample
Property <node handle>:<property handle> sample:sample_type
Edge <edge handle>:<src node handle>:<dst node handle>
of_sample:sample:case
For example:
$of_sample = $model->edges('of_sample:sample:case');
# get source and destination node objects from edge object itself
$sample = $of_sample->src;
$case = $of_sample->dst;
Note that the keys for edges are three strings separated by colons. These are 1) the edge handle ("type"), 2) the source node handle, and 3) the destination node handle. In the example above, this is "of_sample:sample:case". This is called a "triplet" in the code. An edge object can be queried for its triplet.
$edge->triplet
The component objects are themselves containers of their own
attributes, and their getters and setters are structured
similarly. (In fact, Bento::Meta::Model
is, like the component
objects, a subclass of Bento::Meta::Model::Entity). The difference
is that keys for collection-valued attributes at the component object
level are simpler. For example:
$prop1 = $model->props('sample:sample_type');
$prop2 = $sample->props('sample_type');
# $prop1 and $prop2 are the same object
The Model object does not provide access to Concept
, ValueSet
, or
Origin
objects directly. These are accessible via the linked obects
themselves, according to the metamodel structure. For example:
# all terms for all nodes
for ($model->nodes) {
push @node_terms, $_->concept->terms;
}
The Model object has methods that allow the user to add, remove and modify entities in the model. The Model object is an interface, in that loosely encapsulates the MDB structure and tries to relieve the user from having to remember that structure and guards against deviations from it.
The main methods are
- add_node()
- add_edge()
- add_prop()
- add_terms()
- rm_node()
- rm_edge()
- rm_prop()
- rm_terms() (coming soon)
Details are below in "$model object". The main idea is that these methods operate on either the relevant component object or on a hashref that specifies an object by its attributes. In the latter case, a new component object is created.
Here's a pattern for creating two nodes and an edge in a model:
$src_node = $model->add_node({ handle => 'sample' });
$dst_node = $model->add_node({ handle => 'case' });
$edge = $model->add_edge({ handle => 'of_case',
src => $src_node,
dst => $dst_node });
These new entities are registered in the model, and can be retrieved:
$case = $model->nodes('case'); # same obj as $dst_node
$of_case = $model->edges('of_case:sample:case'); # same obj as $edge
Removing entities from the model "deregisters" them, but does not destroy the object itself.
$case = $model->rm_node($case);
$other_model->add_node($case);
Analogous to Neo4j, attempting to remove a node will throw, if the node participates in any relationships/edges. For the above to work, for example, would require
$model->rm_edge($of_case);
first.
One of the key uses of the MDB is for storing lists of acceptable values for properties that require them. In the MDB schema, a property is linked to a value set entity, and the value set aggregates the term entities. The model object tries to hide some of this structure. It will also create a set of Term objects from a list of strings as a shortcut.
$prop = $model->add_prop( $sample => { handle => 'sample_type',
value_domain => 'value_set' });
# $prop has domain of 'value_set', so you can add terms to it
$value_set = $model->add_terms( $prop => qw/normal tumor/ );
@terms = $value_set->terms; # set of 2 term objects
@same_terms = $prop->terms; # prop object also has a shortcut
The approach to the back and forth between the object representation and the database attempts to be simple and robust. The pattern is a push/pull cycle to and from the database. The database instrumentation is also encapsulated from the rest of the object functionality, so that even if no database is specified or connected, all the object manipulations are available.
The Model methods are get() and
put(). get()
pulls the metamodel nodes for
the model with handle $model->handle
from the connected
database. It will not disturb any modifications made to objects in the
program, unless called with a true argument. In that case, get(1)
(e.g.) will refresh all objects from current metamodel nodes in the
database.
put()
pushes the model objects, with any changes to attributes, to
the database. It will build and execute queries correctly to convert, for
example, collection attributes to multiple nodes and corresponding
relationships. put()
adds and removes relationships in the database as
necessary, but will not fully delete nodes. To completely remove objects
from the database, use rm()
on the objects themselves:
$edge = $model->rm_edge($edge); # edge detached from nodes and removed
# from model
$model->put(); # metamodel node representing the edge is still present in db
# but is detached from the source node and destination node
$node->rm(); # metamodel node representing the edge is deleted from db
- new($handle)
- add_node($node_or_init)
- add_edge($edge_or_init)
- add_prop($node_or_edge, $prop_or_init)
- add_terms($prop, @terms_or_inits)
- rm_node($node)
- rm_edge($edge)
- rm_prop($prop)
- @nodes = $model->nodes()
- $node = $model->node($name)
- @props = $model->props()
- $prop = $model->prop($name)
- $edge = $model->edge($triplet)
- @edges = $model->edges_in($node)
- @edges = $model->edges_out($node)
- @edges = $model->edge_by_src()
- @edges = $model->edge_by_dst()
- @edges = $model->edge_by_type()
-
get()
Pull metamodel nodes from database for the model (given by $model->handle) Refresh nodes (reset) by issuing $model->get(1).
-
put()
Push model changes back to database. This operation will disconnect (remove Neo4j relationships) nodes, but will not delete nodes themselves.
- $node->name()
- $node->category()
- @props = $node->props()
- $prop = $node->props($name)
- @tags = $node->tags()
- $edge->type()
- $edge->name()
- $edge->is_required()
- $node = $edge->src()
- $node = $edge->dst()
- @props = $edge->props()
- $prop = $edge->props($name)
- @tags = $edge->tags()
- $prop->name()
- $prop->is_required()
- $value_type = $prop->type()
- @acceptable_values = $prop->values()
- @tags = $prop->tags()