Skip to content

Commit

Permalink
[DOCS] CLI Clean-up (#7904)
Browse files Browse the repository at this point in the history
Co-authored-by: Rob Gray <104205257+kwcanuck@users.noreply.github.com>
  • Loading branch information
donaldheppner and kwcanuck authored May 18, 2023
1 parent c3629d7 commit f43c148
Show file tree
Hide file tree
Showing 18 changed files with 140 additions and 210 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,6 @@ title: How to host and share Data Docs on a filesystem
---
import Prerequisites from '../../connecting_to_your_data/components/prerequisites.jsx'
import TechnicalTag from '@site/docs/term_tags/_tag.mdx';
import CLIRemoval from '/docs/components/warnings/_cli_removal.md'

<CLIRemoval />

This guide will explain how to host and share <TechnicalTag relative="../../../" tag="data_docs" text="Data Docs" /> on a filesystem.

Expand Down Expand Up @@ -36,20 +33,9 @@ data_docs_sites:
### 2. Test that your configuration is correct by building the site
Use the following <TechnicalTag relative="../../../" tag="cli" text="CLI" /> command: ``great_expectations docs build --site-name local_site``. If successful, the CLI will open your newly built Data Docs site and provide the path to the index page.
```bash
> great_expectations docs build --site-name local_site

The following Data Docs sites will be built:

- local_site: file:///great_expectations/uncommitted/data_docs/local_site/index.html

Would you like to proceed? [Y/n]: Y

Building Data Docs...
Run the the following Python code to build and open your Data Docs:
Done building Data Docs
``` python name="tests/integration/docusaurus/reference/glossary/data_docs.py data_docs"
```

## Additional notes
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,6 @@ import ApplyThePolicy from './components_how_to_host_and_share_data_docs_on_amaz
import AddANewS3SiteToTheDataDocsSitesSectionOfYourGreatExpectationsYml from './components_how_to_host_and_share_data_docs_on_amazon_s3/_add_a_new_s3_site_to_the_data_docs_sites_section_of_your_great_expectationsyml.mdx'
import TestThatYourConfigurationIsCorrectByBuildingTheSite from './components_how_to_host_and_share_data_docs_on_amazon_s3/_test_that_your_configuration_is_correct_by_building_the_site.mdx'
import AdditionalNotes from './components_how_to_host_and_share_data_docs_on_amazon_s3/_additional_notes.mdx'
import CLIRemoval from '/docs/components/warnings/_cli_removal.md'

<CLIRemoval />

<Preface />

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,6 @@ title: How to host and share Data Docs on Azure Blob Storage
---
import Prerequisites from '../../connecting_to_your_data/components/prerequisites.jsx'
import TechnicalTag from '@site/docs/term_tags/_tag.mdx';
import CLIRemoval from '/docs/components/warnings/_cli_removal.md'

<CLIRemoval />

This guide will explain how to host and share <TechnicalTag relative="../../../" tag="data_docs" text="Data Docs" /> on Azure Blob Storage.
Data Docs will be served using an Azure Blob Storage static website with restricted access.
Expand Down Expand Up @@ -55,7 +52,7 @@ data_docs_sites:
base_directory: uncommitted/data_docs/local_site/
site_index_builder:
class_name: DefaultSiteIndexBuilder
az_site: # this is a user-selected name - you may select your own
new_site_name: # this is a user-selected name - you can select your own
class_name: SiteBuilder
store_backend:
class_name: TupleAzureBlobStoreBackend
Expand Down Expand Up @@ -97,26 +94,12 @@ The most common authentication methods are supported:
### 4. Build the Azure Blob Data Docs site

You can create or modify an <TechnicalTag tag="expectation_suite" text="Expectation Suite" /> and this will build the Data Docs website.
Or you can use the following <TechnicalTag relative="../../../" tag="cli" text="CLI" /> command: ``great_expectations docs build --site-name az_site``.

```bash
> great_expectations docs build --site-name az_site
The following Data Docs sites will be built:
- az_site: https://<your-storage-account>.blob.core.windows.net/$web/index.html

Would you like to proceed? [Y/n]: y
Run the the following Python code to build and open your Data Docs:

Building Data Docs...
Done building Data Docs
``` python name="tests/integration/docusaurus/reference/glossary/data_docs.py data_docs_site"
```

If successful, the CLI will provide the object URL of the index page.
You may secure the access of your website using an IP filtering mechanism.


### 5. Limit the access to your company

- On your Azure Storage Account Settings click on **Networking**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,6 @@ title: How to host and share Data Docs on GCS
---
import Prerequisites from '../../connecting_to_your_data/components/prerequisites.jsx'
import TechnicalTag from '@site/docs/term_tags/_tag.mdx';
import CLIRemoval from '/docs/components/warnings/_cli_removal.md'

<CLIRemoval />

This guide will explain how to host and share <TechnicalTag relative="../../../" tag="data_docs" text="Data Docs" /> on Google Cloud Storage. We recommend using IP-based access, which is achieved by deploying a simple Google App Engine app. Data Docs can also be served on Google Cloud Storage if the contents of the bucket are set to be publicly readable, but this is strongly discouraged.

Expand Down Expand Up @@ -58,7 +55,7 @@ We recommend placing it in your project directory, for example ``great_expectati

### 4. Deploy your Google App Engine app

Issue the following <TechnicalTag relative="../../../" tag="cli" text="CLI" /> command from within the app directory created above:
Run the following CLI command from within the app directory you created previously:

```bash name="tests/integration/docusaurus/setup/configuring_data_docs/how_to_host_and_share_data_docs_on_gcs.py gcloud app deploy"
```
Expand All @@ -76,14 +73,9 @@ You may also replace the default ``local_site`` if you would only like to mainta

### 7. Build the GCS Data Docs site

Use the following CLI command:

```bash name="tests/integration/docusaurus/setup/configuring_data_docs/how_to_host_and_share_data_docs_on_gcs.py build data docs command"
```

If successful, the CLI will provide the object URL of the index page. Since the bucket is not public, this URL will be inaccessible. Rather, you will access the Data Docs site using the App Engine app configured above.
Run the the following Python code to build and open your Data Docs:

```bash name="tests/integration/docusaurus/setup/configuring_data_docs/how_to_host_and_share_data_docs_on_gcs.py build data docs output"
``` python name="tests/integration/docusaurus/reference/glossary/data_docs.py data_docs_site"
```

### 8. Test that everything was configured properly by launching your App Engine app
Expand Down
3 changes: 0 additions & 3 deletions docs/docusaurus/docs/guides/setup/installation/local.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,6 @@ import InstallGreatExpectations from './components_local/_install_great_expectat
import VerifyGeInstallSucceeded from './components_local/_verify_ge_install_succeeded.mdx'
import NextSteps from '/docs/guides/setup/components/install_nextsteps.md'
import InstallCongratulations from '/docs/guides/setup/components/install_congrats.md'
import CLIRemoval from '/docs/components/warnings/_cli_removal.md'

<CLIRemoval />

<Preface />

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,6 @@ keywords: [Great Expectations, Data Context, Filesystem, Amazon Web Services S3]

import TechnicalTag from '/docs/term_tags/_tag.mdx';
import Prerequisites from '/docs/components/_prerequisites.jsx'
import CLIRemoval from '/docs/components/warnings/_cli_removal.md'

<CLIRemoval />

<!-- ## Prerequisites -->
import PrereqInstalledAwsCli from '/docs/components/prerequisites/_aws_installed_the_aws_cli.mdx'
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,6 @@ keywords: [Great Expectations, Data Context, Filesystem, GCS, Google Cloud Stora

import TechnicalTag from '/docs/term_tags/_tag.mdx';
import Prerequisites from '/docs/components/_prerequisites.jsx'
import CLIRemoval from '/docs/components/warnings/_cli_removal.md'

<CLIRemoval />

<!-- ## Prerequisites -->
import PrereqGcpServiceAccount from '/docs/components/prerequisites/_gcp_service_account.md'
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,6 @@ title: How to deploy a scheduled Checkpoint with cron
---
import Prerequisites from '../../connecting_to_your_data/components/prerequisites.jsx'
import TechnicalTag from '@site/docs/term_tags/_tag.mdx';
import CLIRemoval from '/docs/components/warnings/_cli_removal.md'

<CLIRemoval />

This guide will help you deploy a scheduled <TechnicalTag tag="checkpoint" text="Checkpoint" /> with cron.

Expand All @@ -22,10 +19,9 @@ This guide will help you deploy a scheduled <TechnicalTag tag="checkpoint" text=

### 1. Verify Checkpoint suitability

First, verify that your Checkpoint is runnable via shell:
Run the following command to verify that your Checkpoint runs:

```bash
great_expectations checkpoint run my_checkpoint
```python name="tests/integration/docusaurus/reference/glossary/checkpoints.py retrieve_and_run"
```

### 2. Get `great_expectations` full path
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,6 @@ title: How to collect OpenLineage metadata using an Action

import Prerequisites from '../../../guides/connecting_to_your_data/components/prerequisites.jsx';
import TechnicalTag from '@site/docs/term_tags/_tag.mdx';
import CLIRemoval from '/docs/components/warnings/_cli_removal.md'

<CLIRemoval />

[OpenLineage](https://openlineage.io) is an open framework for collection and analysis of data lineage. It tracks the movement of data over time, tracing relationships between datasets. Data engineers can use data lineage metadata to determine the root cause of failures, identify performance bottlenecks, and simulate the effects of planned changes.

Expand Down Expand Up @@ -57,11 +54,10 @@ action_list:

### 3. Test your Action by Validating a Batch of data.

Run your Checkpoint to Validate a <TechnicalTag tag="batch" text="Batch" /> of data and emit lineage events to the OpenLineage backend. This can be done from the command line:
Run the following command to retrieve and run a Checkpoint to Validate a <TechnicalTag tag="batch" text="Batch" /> of data and then emit lineage events to the OpenLineage backend:

```bash
% great_expectations checkpoint run <checkpoint_name>
```
```python name="tests/integration/docusaurus/reference/glossary/checkpoints.py retrieve_and_run"
```

:::note Reminder
Our [guide on how to Validate data by running a Checkpoint](../how_to_validate_data_by_running_a_checkpoint.md) has more detailed instructions for this step, including instructions on how to run a checkpoint from a Python script instead of from the <TechnicalTag tag="cli" text="CLI" />.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,6 @@ title: How to update Data Docs after Validating a Checkpoint

import Prerequisites from '../../../guides/connecting_to_your_data/components/prerequisites.jsx';
import TechnicalTag from '@site/docs/term_tags/_tag.mdx';
import CLIRemoval from '/docs/components/warnings/_cli_removal.md'

<CLIRemoval />

This guide will explain how to use an <TechnicalTag tag="action" text="Action" /> to update <TechnicalTag tag="data_docs" text="Data Docs" /> sites with new <TechnicalTag tag="validation_result" text="Validation Results" /> from running a <TechnicalTag tag="checkpoint" text="Checkpoint" />.

Expand Down Expand Up @@ -73,17 +70,14 @@ The ``StoreValidationResultAction`` Action must appear before ``UpdateDataDocsA
Test that your new Action is configured correctly:
Run the Checkpoint from your code or the <TechnicalTag tag="cli" text="CLI" /> and verify that no errors are thrown.
Run the following command to run the Checkpoint and verify that no errors are returned:
```python
import great_expectations as gx
context = gx.get_context()
checkpoint_name = "your checkpoint name here"
context.run_checkpoint(checkpoint_name=checkpoint_name)
```
```bash
$ great_expectations checkpoint run <your checkpoint name>
```

Finally, check your Data Docs sites to confirm that a new Validation Result has been added.

Expand Down
9 changes: 3 additions & 6 deletions docs/docusaurus/docs/integrations/integration_datahub.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,6 @@ authors:
url: https://datahubproject.io
---

import CLIRemoval from '/docs/components/warnings/_cli_removal.md'

<CLIRemoval />

:::info
* Maintained By: DataHub
* Status: Beta
Expand Down Expand Up @@ -85,8 +81,9 @@ action_list:

#### 3. Run the GX checkpoint

```bash
great_expectations checkpoint run my_checkpoint #replace my_checkpoint with your checkpoint name
Run the following command to retrieve and run a Checkpoint:

```python name="tests/integration/docusaurus/reference/glossary/checkpoints.py retrieve_and_run"
```

#### 4. Hurray!
Expand Down
45 changes: 0 additions & 45 deletions tests/integration/common_workflows/simple_build_data_docs.py

This file was deleted.

44 changes: 44 additions & 0 deletions tests/integration/docusaurus/reference/glossary/checkpoints.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
from great_expectations.datasource.fluent import Datasource
from great_expectations.datasource.fluent import DataAsset
from great_expectations.checkpoint import SimpleCheckpoint

# <snippet name="tests/integration/docusaurus/reference/glossary/checkpoints.py setup">
import great_expectations as gx

context = gx.get_context()
# </snippet>

# to open Data Docs, we need validation results which we get by creating a suite and running a checkpoint
datasource: Datasource = context.get_datasource("taxi_datasource")
asset: DataAsset = datasource.get_asset("yellow_tripdata")
batch_request = asset.build_batch_request()
validator = context.get_validator(batch_request=batch_request)

validator.expect_column_values_to_not_be_null("pickup_datetime")
validator.expect_column_values_to_be_between("passenger_count", auto=True)

taxi_suite = validator.get_expectation_suite()
taxi_suite.expectation_suite_name = "taxi_suite"

context.add_expectation_suite(expectation_suite=taxi_suite)

# <snippet name="tests/integration/docusaurus/reference/glossary/checkpoints.py create_and_run">
checkpoint = SimpleCheckpoint(
name="taxi_checkpoint",
data_context=context,
batch_request=batch_request,
expectation_suite_name="taxi_suite",
)
checkpoint.run()
# </snippet>

# <snippet name="tests/integration/docusaurus/reference/glossary/checkpoints.py save">
context.add_checkpoint(checkpoint=checkpoint)
# </snippet>

# <snippet name="tests/integration/docusaurus/reference/glossary/checkpoints.py retrieve_and_run">
checkpoint = context.get_checkpoint("taxi_checkpoint")
checkpoint.run()
# </snippet>

assert True
41 changes: 41 additions & 0 deletions tests/integration/docusaurus/reference/glossary/data_docs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
import great_expectations as gx
from great_expectations.datasource.fluent import Datasource
from great_expectations.datasource.fluent import DataAsset
from great_expectations.checkpoint import SimpleCheckpoint

context = gx.get_context()

# to open data docs, we need validation results which we get by creating a suite and running a checkpoint
datasource: Datasource = context.get_datasource("taxi_datasource")
asset: DataAsset = datasource.get_asset("yellow_tripdata")
batch_request = asset.build_batch_request()
validator = context.get_validator(batch_request=batch_request)

validator.expect_column_values_to_not_be_null("pickup_datetime")
validator.expect_column_values_to_be_between("passenger_count", auto=True)

taxi_suite = validator.get_expectation_suite()
taxi_suite.expectation_suite_name = "taxi_suite"

context.add_expectation_suite(expectation_suite=taxi_suite)

checkpoint = SimpleCheckpoint(
name="taxi_checkpoint",
data_context=context,
batch_request=batch_request,
expectation_suite_name="taxi_suite",
)
checkpoint.run()

# <snippet name="tests/integration/docusaurus/reference/glossary/data_docs.py data_docs">
context.build_data_docs()
context.open_data_docs()
# </snippet>

# <snippet name="tests/integration/docusaurus/reference/glossary/data_docs.py data_docs_site">
site_name = "new_site_name"
context.build_data_docs(site_names=site_name)
context.open_data_docs(site_name=site_name)
# </snippet>

assert True
Loading

0 comments on commit f43c148

Please sign in to comment.