Skip to content
This repository has been archived by the owner on Jan 26, 2024. It is now read-only.

data-mill-cloud/mastro

Repository files navigation

Mastro

Mastro logo Metadata management in Go

Data What?

ML Process

Goals

Managing Metadata to achieve:

  • Versioning - annotate with version information
  • Lineage - understanding data dependencies
  • Quality - enrich data with dependability information
  • Democratization - foster self-service culture

Modeling of the data transformation layer to foster:

  • Discovery - of data assets across heterogeneous communities
  • Reuse - of domain specific knowledge and pre-computed features in different use cases
  • Enforcing - of established industry or domain-specific transformations and practices
  • Interaction - between different roles through teams and projects

Disclaimer

Mastro is still on development and largely untested. Please fork the repo and extend it at wish.

TL-DR

Terminology:

  • Connector - component handling the connection to volumes and data bases
  • FeatureStore - service to manage features (i.e., featureSets);
  • MetricStore - service to manage metrics (i.e., metricSets);
  • EmbeddingStore - service to manage vector embeddings;
  • Catalogue - service to manage data assets (i.e., static data definitions and their relationships);
  • Crawler - any agent able to list and walk a file system, filter and parse asset definitions (i.e. manifest files) and push them to the catalogue;
  • UI - user interface to search assets by name, tags or description
  • MVC - data versioning tool for various storage - based on the commons.abstract.sources package

Help:

License:
License

Build Status:

Docker Build FeatureStore
Docker Build Catalogue
Docker Build Crawlers
Docker Image UI
Docker Image MVC