Node Content Repository for php. Using this library assumes that you've already created and compiled your own pbj classes using the Pbjc and are making use of the "gdbots:ncr:mixin:*" mixins from gdbots/schemas.
If your project is using Symfony use the gdbots/ncr-bundle-php to simplify the integration.
A node or vertex is a noun/entity in your system. An article, tweet, video, person, place, order, product, etc. The edges are the relationships between those things like "friends", "tagged to", "married to", "published by", etc.
This library doesn't provide you with a graph database implementation. It's concerned with persisting/retrieving nodes and edges. Graph traversal would still need to be provided by another library. It is recommended that data be replicated or projected out of the Ncr (or layered on top like GraphQL) into something suited for that purpose (e.g. Neo4j, Titan ElasticSearch).
A NodeRef is a qualified identifier to a node/vertex. It is less verbose than a MessageRef
as it is implied that node labels must be unique within a given vendor namespace and therefore can be represented in a more compact manner.
NodeRef Format: vendor:label:id The "vendor:label" portion is a
SchemaQName
Examples:
acme:article:41e4532f-2f58-4b9d-afc8-e9c2cbcb4aba twitter:tweet:789234931599835136 youtube:video:EG0wQRsXLi4
Nodes do not actually have a node_ref
field, they have an _id
field. The NodeRef
is derived by taking the SchemaQName
of the node's schema along with its _id
. The NodeRef
is an immutable value object which is used in various places without needing to actually have the node instance.
The Ncr is the service responsible for node persistence. It is intentionally limited to basic key/value storage operations (e.g. put/get/find by index) to ensure the underlying implementation can be swapped out with little effort or decorated easily (caching layers for example).
Available repository implementations:
- DynamoDb
- Psr6 (gives you Redis, File, Memcached, Doctrine, etc.)
- InMemory
Review the Gdbots\Ncr\Ncr
interface for reference on the available methods.
The Ncr is a simple key/value store which means querying is limited to the id of the item or a secondary index. An example of a secondary index would be the email or username of a user, the slug of an article or the isbn of a book.
An IndexQuery
is used to findNodeRefs
that match a query against a secondary index.
An example of using a IndexQuery:
$query = IndexQueryBuilder::create(SchemaQName::fromString('acme:user'), 'email', 'homer@simpson.com')
->setCount(1)
->build();
$result = $this->ncr->findNodeRefs($query);
if (!$result->count()) {
throw new NodeNotFound('Unable to find homer.');
}
$node = $this->ncr->getNode($result->getNodeRefs()[0]);
Not all storage engines can enforce uniqueness on a secondary index so the interface also cannot make that assumption. Because of this the findNodeRefs
may return more than one value. It is up to your application logic to deal with that.
Getting data out of the Ncr should be dead simple, it's just json after all. Use the pipeNodes
or pipeNodeRefs
methods to export data. The gdbots/ncr-bundle-php provides console commands that make use of this to export and reindex nodes.
Exporting nodes using pipeNodes:
foreach ($ncr->pipeNodes(SchemaQName::fromString('acme:article')) as $node) {
echo json_encode($node) . PHP_EOL;
}
NcrCache is a first level cache which is ONLY seen and used by the current request. It is used to cache all nodes returned from get node request(s). This cache is used during Pbjx request processing or if the Ncr is running in the current process and is using the MemoizingNcr.
This cache should not be used when asking for a consistent result.
NcrCache is NOT an identity map and the Ncr is NOT an ORM. In some cases you may get the same exact object but it's not a guarantee so don't do something like this:
$nodeRef = NodeRef::fromString('acme:article:123');
$cache->getNode($nodeRef) !== $cache->getNode($nodeRef);
If you need to check equality, use the message interface:
$node1 = $cache->getNode($nodeRef);
$node2 = $cache->getNode($nodeRef);
$node->equals($node2); // returns true if their data is the same
NcrCache and other request interceptors make use of this service to batch load nodes only if they are requested. An example of this is when loading an article you may want to fetch the author or related items, but not always. Rather than force the logic to exist in the loading of an article, something else can manage that.
Lazy loading is generally application specific so this library provides some tools to make is easier.
Example lazy loading:
public function onSearchNodesResponse(ResponseCreatedEvent $pbjxEvent): void
{
$response = $pbjxEvent->getResponse();
if (!$response->has('nodes')) {
return;
}
// for all nodes in this search response, mark the creator
// and updater for lazy load. if they get requested at some point
// in the current request, it will be batched for optimal performance
$this->lazyLoader->addEmbeddedNodeRefs($response->get('nodes'), [
'creator_ref' => 'acme:user',
'updater_ref' => 'acme:user',
]);
}
The Ncr provides the reliable storage and retrieval of Nodes. NcrSearch is in most cases a separate storage provider. For example, DynamoDb for Ncr and ElasticSearch for NcrSearch. In fact, the only implementation we have right now is ElasticSearch.
When using the gdbots/ncr-bundle-php you can enable the indexing with a simple configuration option. The bundle also provides a reindex console command.
Searching nodes is generally done in a request handler. Here is an example of searching nodes:
public function handleRequest(Message $request, Pbjx $pbjx): Message
{
$parsedQuery = ParsedQuery::fromArray(json_decode($request->get('parsed_query_json', '{}'), true));
$response = SearchUsersResponseV1::create();
$this->ncrSearch->searchNodes(
$request,
$parsedQuery,
$response,
[SchemaQName::fromString('acme:user')]
);
return $response;
}