Provides the following:
- Data Profilers for large volume data profiling in Spark
- Assertion rule definitions and checking
- Reference data loading and joining
- Excel and CSV reference data parsing
- JSON output enriched with data quality markers/profilers
- Metrics and summary dataframe output
- Dimensional tagging of profiler outputs (additional identifiers)
- JSON flattener
- JSON and CSV loader, extensible to other formats
- Custom key pre-processor and custom parquet row reader functionality
- Comprehensive built-in assertion rules modules, extensible
- Built-in set of field-level profile masks
- Compound assertion rule definition (i.e. a set of sub-rules must all pass)
- Human-readable Data Quality and Assertion Rule Compliance report output
- Data Quality Profiler and Rules Engine code: data-quality-profiler
- Examples and Usage: examples
Licensed under the MIT License. See LICENSE