Skip to content

Provide an easy way with Python to protect your data sources by searching its metadata. πŸ›‘οΈ

License

Notifications You must be signed in to change notification settings

fvaleye/metadata-guardian

Repository files navigation

Metadata Guardian logo

![python-build ![rust-build Docs Pypi

πŸ“Œ Overview

Metadata Guardian is a Python package that provides an easy way to protect your data sources by searching its metadata. By searching with data rules, it will detect what you are looking to protect. Using Rust, it makes blazing fast multi-regex matching.

Read more in this article.

πŸ“¦ Where to get it

# Install all the data sources
pip install 'metadata_guardian[all]'
# Install one or more data sources from the list
pip install 'metadata_guardian[snowflake,avro,aws,gcp,deltalake,kafka_schema_registry,mysql]'

πŸ“œ Data Rules

The available data rules are here: PII and INCLUSION. But you could also your custom data rules to suit your needs.

πŸ“Š Data Sources

Local

  • Parquet

  • ORC

  • AVRO

  • AVRO Schema

  • Arrow

External

  • AWS: Athena and Glue

  • Deltalake

  • GCP: BigQuery

  • Snowflake

  • MySQL

  • Kafka Schema Registry

πŸ”Ž Usage

With available Data Rules:

from metadata_guardian import (
    AvailableCategory,
    ColumnScanner,
    DataRules,
)
from metadata_guardian.source import MySQLSource

source = MySQLSource(
        user="root",
        password="12345678",
        host="localhost",
    )

data_rules = DataRules.from_available_category(category=AvailableCategory.PII)
column_scanner = ColumnScanner(data_rules=data_rules)

with source:
    report = column_scanner.scan_external(
        source,
        database_name="sequelmovie",
        include_comment=True,
    )
    report.to_console()

With custom Data Rules:

from metadata_guardian import (
    AvailableCategory,
    ColumnScanner,
    DataRule,
    DataRules,
)
from metadata_guardian.source import MySQLSource

source = MySQLSource(
        user="root",
        password="12345678",
        host="localhost",
    )

category = "example"
data_rule = DataRule(
rule_name="example_rule_name",
regex_pattern="\b(test|example)\b",
documentation="example_test",
)
data_rules = [data_rule]
data_rules = DataRules.from_new_category(category=category, data_rules=data_rules)
column_scanner = ColumnScanner(
data_rules=data_rules, progression_bar_disabled=False
)

with source:
    report = column_scanner.scan_external(
    source,
    database_name="sequelmovie",
    include_comment=True,
    )
    report.to_console()

πŸ›‘οΈ Licence

πŸ“š Documentation

The documentation is hosted here: https://fvaleye.github.io/metadata-guardian/python/