Managing Metadata to achieve:
- Versioning - annotate with version information
- Lineage - understanding data dependencies
- Quality - enrich data with dependability information
- Democratization - foster self-service culture
Modeling of the data transformation layer to foster:
- Discovery - of data assets across heterogeneous communities
- Reuse - of domain specific knowledge and pre-computed features in different use cases
- Enforcing - of established industry or domain-specific transformations and practices
- Interaction - between different roles through teams and projects
Mastro is still on development and largely untested. Please fork the repo and extend it at wish.
Terminology:
- Connector - component handling the connection to volumes and data bases
- FeatureStore - service to manage features (i.e., featureSets);
- MetricStore - service to manage metrics (i.e., metricSets);
- EmbeddingStore - service to manage vector embeddings;
- Catalogue - service to manage data assets (i.e., static data definitions and their relationships);
- Crawler - any agent able to list and walk a file system, filter and parse asset definitions (i.e. manifest files) and push them to the catalogue;
- UI - user interface to search assets by name, tags or description
- MVC - data versioning tool for various storage - based on the
commons.abstract.sources
package
Help:
Build Status: