Install Great Expectations and initialize a Data Context
.
$ pip install great_expectations
$ great_expectations --version
$ great_expectations init
After running the init
command, your great_expectations directory will contain all of the important components of a local Great Expectations deployment. Some explanations for the files that have been created:
great_expectations.yml
- contains the main configuration of your deployment.expectations/
- will contain all the Expectations asJSON
files. (location is configurable)plugins/
- contains the code for custom plugins you develop as part of your deployment.uncommitted/
- contains files that shouldn’t be in version control. It has a.gitignore
configured to exclude all its contents from version control.
The main contents of uncommited
are:
uncommitted/config_variables.yml
- contains sensitive information, such as database credentials and other secrets.uncommitted/data_docs
- contains Data Docs generated from Expectations, Validation Results, and other metadata.uncommitted/validations
- contains Validation Results generated by Great Expectations.
-
Data Context
: The folder structure that contains the entirety of your Great Expectations project. It is also the entry point for accessing all the primary methods for creating elements of your project, configuring those elements, and working with the metadata for your project. -
CLI
: The Command Line Interface for Great Expectations. The CLI provides helpful utilities for deploying and configuring Data Contexts, as well as a few other convenience methods.
Creat and configure Datasource
.
$ great_expectations datasource new
Use filesystem
and pandas
. The root path is data:
.
-
Datasource
: An object that brings together a way of interacting with data (an Execution Engine) and a way of accessing that data (a Data Connector). Datasources are used to obtain Batches for Validators, Expectation Suites, and Profilers. -
Jupyter Notebooks
: These notebooks are launched by some processes in the CLI. They provide useful boilerplate code for everything from configuring a new Datasource to building an Expectation Suite to running a Checkpoint.
Use the automatic Profiler to build an Expectation Suite.
$ great_expectations suite new
$ great_expectations suite edit bike_theft_berlin.demo
- Let Great Expectations create a simple first draft suite, by running
great_expectations suite new
. - View the suite in
Data Docs
. - Edit the suite in a Jupyter notebook by running
great_expectations suite edit
- Repeat Steps 2-3 until you are happy with your suite.
- Commit this suite to your source control repository.
-
Expectation Suite
: A collection of Expectations. -
Expectations
: A verifiable assertion about data. Great Expectations is a framework for defining Expectations and running them against your data. In the tutorial's example, we asserted that NYC taxi rides should have a minimum of one passenger. When we ran that expectation against our second set of data Great Expectations reported back that some records in the new data indicated a ride with zero passengers, which failed to meet this expectation. -
Profiler
: A tool that automatically generates Expectations from a Batch of data.
Create a Checkpoint
which can be used to validate new data. The Validation Results
can be viewed in Data Docs
.
$ great_expectations checkpoint new bike_theft_checkpoint
-
Checkpoint
: An object that uses aValidator
to run anExpectation Suite
against a batch of data. Running a Checkpoint producesValidation Results
for the data it was run on. -
Validation Results
: A report generated from anExpectation Suite
being run against a batch of data. TheValidation Result
itself is in JSON and is rendered asData Docs
. -
Data Docs
: Human readable documentation that describes Expectations for data and itsValidation Results
.Data Docs
van be generated both fromExpectation Suites
(describing ourExpectations
for the data) and also fromValidation Results
(describing if the data meets thoseExpectations
).
$ great_expectations --help
Usage: great_expectations [OPTIONS] COMMAND [ARGS]...
Welcome to the great_expectations CLI!
Most commands follow this format: great_expectations <NOUN> <VERB>
The nouns are: checkpoint, datasource, docs, init, project, store, suite,
validation-operator. Most nouns accept the following verbs: new, list, edit
Options:
--version Show the version and exit.
--v3-api / --v2-api Default to v3 (Batch Request) API. Use --v2-api for
v2 (Batch Kwargs) API
-v, --verbose Set great_expectations to use verbose output.
-c, --config TEXT Path to great_expectations configuration file
location (great_expectations.yml). Inferred if not
provided.
-y, --assume-yes, --yes Assume "yes" for all prompts.
--help Show this message and exit.
Commands:
checkpoint Checkpoint operations
datasource Datasource operations
docs Data Docs operations
init Initialize a new Great Expectations project.
project Project operations
store Store operations
suite Expectation Suite operations