Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data-quality-profiler		data-quality-profiler
examples		examples
.gitignore		.gitignore
LICENCE		LICENCE
README.md		README.md

Repository files navigation

Data Quality Profiler and Rules Engine

Provides the following:

Data Profilers for large volume data profiling in Spark
Assertion rule definitions and checking
Reference data loading and joining
Excel and CSV reference data parsing
JSON output enriched with data quality markers/profilers
Metrics and summary dataframe output
Dimensional tagging of profiler outputs (additional identifiers)
JSON flattener
JSON and CSV loader, extensible to other formats
Custom key pre-processor and custom parquet row reader functionality
Comprehensive built-in assertion rules modules, extensible
Built-in set of field-level profile masks
Compound assertion rule definition (i.e. a set of sub-rules must all pass)
Human-readable Data Quality and Assertion Rule Compliance report output

Repository Layout

Data Quality Profiler and Rules Engine code: data-quality-profiler
Examples and Usage: examples

Licence

Licensed under the MIT License. See LICENSE

About

Data Quality Profiler and Rules Engine

Report repository

Releases

No releases published

Packages

No packages published

Languages

Scala 97.9%
Mustache 2.1%