Skip to content

Commit

Permalink
[MAINTENANCE] Adding support for project_root_dir to get_context (#8388)
Browse files Browse the repository at this point in the history
Co-authored-by: Chetan Kini <chetan@superconductive.com>
Co-authored-by: William Shin <will@greatexpectations.io>
  • Loading branch information
3 people authored Aug 24, 2023
1 parent f2de855 commit aeb3064
Show file tree
Hide file tree
Showing 12 changed files with 137 additions and 313 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -36,18 +36,35 @@ If you are using GX for multiple projects you may wish to utilize a different Da

Each Filesystem Data Context has a root folder in which it was initialized. This root folder will be used to indicate which specific Filesystem Data Context should be instantiated.

```python name="tests/integration/docusaurus/connecting_to_your_data/fluent_datasources/how_to_instantiate_a_specific_filesystem_data_context.py path_to_context_root_folder"
```python name="tests/integration/docusaurus/connecting_to_your_data/fluent_datasources/how_to_instantiate_a_specific_filesystem_data_context.py path_to_project_root"
```

### 2. Run GX's `get_context(...)` method

We provide our Filesystem Data Context's root folder path to the GX library's `get_context(...)` method as the `context_root_dir` parameter. Because we are providing a path to an existing Data Context, the `get_context(...)` method will instantiate and return the Data Context at that location.
We provide our Filesystem Data Context's root folder path to the GX library's `get_context(...)` method as the `project_root_dir` parameter. Because we are providing a path to an existing Data Context, the `get_context(...)` method will instantiate and return the Data Context at that location.

```python name="tests/integration/docusaurus/connecting_to_your_data/fluent_datasources/how_to_instantiate_a_specific_filesystem_data_context.py get_filesystem_data_context"
```

:::info Project root vs context root
Note that there is a subtle distinction between the `project_root_dir` and `context_root_dir` arguments accepted by `get_context(...)`.

Your context root is the directory that contains all your GX config while your project root refers to your actual working directory (and therefore contains the context root).

```bash
# The overall directory is your project root
data/
great_expectations/ # The GX folder with your config is your context root
great_expectations.yml
...
...
```

Both are functionally equivalent for purposes of working with a file-backed project.
:::

:::info What if the folder does not contain a Data Context?
If the `context_root_dir` provided to the `get_context(...)` method points to a folder that does not already have a Data Context present, the `get_context(...)` method will initialize a new Filesystem Data Context at that location.
If the root directory provided to the `get_context(...)` method points to a folder that does not already have a Data Context present, the `get_context(...)` method will initialize a new Filesystem Data Context at that location.

The `get_context(...)` method will then instantiate and return the newly initialized Data Context.
:::
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -132,18 +132,35 @@ If you're using GX for multiple projects, you might want to use a different Data

Each Filesystem Data Context has a root folder in which it was initialized. This root folder identifies the specific Filesystem Data Context to instantiate.

```python name="tests/integration/docusaurus/connecting_to_your_data/fluent_datasources/how_to_instantiate_a_specific_filesystem_data_context.py path_to_context_root_folder"
```python name="tests/integration/docusaurus/connecting_to_your_data/fluent_datasources/how_to_instantiate_a_specific_filesystem_data_context.py path_to_project_root"
```

### Run the `get_context(...)` method

You provide the path for your empty folder to the GX library's `get_context(...)` method as the `context_root_dir` parameter. Because you are providing a path to an empty folder, the `get_context(...)` method instantiates and return the Data Context at that location.
You provide the path for your empty folder to the GX library's `get_context(...)` method as the `project_root_dir` parameter. Because you are providing a path to an empty folder, the `get_context(...)` method instantiates and return the Data Context at that location.

```python name="tests/integration/docusaurus/connecting_to_your_data/fluent_datasources/how_to_instantiate_a_specific_filesystem_data_context.py get_filesystem_data_context"
```

:::info Project root vs context root
Note that there is a subtle distinction between the `project_root_dir` and `context_root_dir` arguments accepted by `get_context(...)`.

Your context root is the directory that contains all your GX config while your project root refers to your actual working directory (and therefore contains the context root).

```bash
# The overall directory is your project root
data/
great_expectations/ # The GX folder with your config is your context root
great_expectations.yml
...
...
```

Both are functionally equivalent for purposes of working with a file-backed project.
:::

:::info What if the folder does not contain a Data Context?
If the `context_root_dir` provided to the `get_context(...)` method points to a folder that does not already have a Data Context, the `get_context(...)` method initializes a new Filesystem Data Context in that location.
If the root directory provided to the `get_context(...)` method points to a folder that does not already have a Data Context, the `get_context(...)` method initializes a new Filesystem Data Context in that location.

The `get_context(...)` method instantiates and returns the newly initialized Data Context.
:::
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1172,8 +1172,8 @@ def get_config_with_variables_substituted(
) -> DataContextConfig:
"""
Substitute vars in config of form ${var} or $(var) with values found in the following places,
in order of precedence: ge_cloud_config (for Data Contexts in GX Cloud mode), runtime_environment,
environment variables, config_variables, or ge_cloud_config_variable_defaults (allows certain variables to
in order of precedence: gx_cloud_config (for Data Contexts in GX Cloud mode), runtime_environment,
environment variables, config_variables, or gx_cloud_config_variable_defaults (allows certain variables to
be optional in GX Cloud mode).
"""
if not config:
Expand Down
55 changes: 9 additions & 46 deletions great_expectations/data_context/data_context/cloud_data_context.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@
Mapping,
Optional,
Sequence,
Tuple,
Union,
cast,
overload,
Expand Down Expand Up @@ -99,14 +98,11 @@ def __init__( # noqa: PLR0913
self,
project_config: Optional[Union[DataContextConfig, Mapping]] = None,
context_root_dir: Optional[PathStr] = None,
project_root_dir: Optional[PathStr] = None,
runtime_environment: Optional[dict] = None,
cloud_base_url: Optional[str] = None,
cloud_access_token: Optional[str] = None,
cloud_organization_id: Optional[str] = None,
# <GX_RENAME> Deprecated as of 0.15.37
ge_cloud_base_url: Optional[str] = None,
ge_cloud_access_token: Optional[str] = None,
ge_cloud_organization_id: Optional[str] = None,
) -> None:
"""
CloudDataContext constructor
Expand All @@ -117,28 +113,15 @@ def __init__( # noqa: PLR0913
config_variables.yml and the environment
cloud_config (GXCloudConfig): GXCloudConfig corresponding to current CloudDataContext
"""
# Chetan - 20221208 - not formally deprecating these values until a future date
(
cloud_base_url,
cloud_access_token,
cloud_organization_id,
) = CloudDataContext._resolve_cloud_args(
cloud_base_url=cloud_base_url,
cloud_access_token=cloud_access_token,
cloud_organization_id=cloud_organization_id,
ge_cloud_base_url=ge_cloud_base_url,
ge_cloud_access_token=ge_cloud_access_token,
ge_cloud_organization_id=ge_cloud_organization_id,
)

self._check_if_latest_version()
self._cloud_config = self.get_cloud_config(
cloud_base_url=cloud_base_url,
cloud_access_token=cloud_access_token,
cloud_organization_id=cloud_organization_id,
)
self._context_root_directory = self.determine_context_root_directory(
context_root_dir
context_root_dir=context_root_dir,
project_root_dir=project_root_dir,
)
self._project_config = self._init_project_config(project_config)

Expand Down Expand Up @@ -173,31 +156,6 @@ def _initialize_usage_statistics(
# Usage statistics are always disabled within Cloud-backed environments.
self._usage_statistics_handler = None

@staticmethod
def _resolve_cloud_args( # noqa: PLR0913
cloud_base_url: Optional[str] = None,
cloud_access_token: Optional[str] = None,
cloud_organization_id: Optional[str] = None,
# <GX_RENAME> Deprecated as of 0.15.37
ge_cloud_base_url: Optional[str] = None,
ge_cloud_access_token: Optional[str] = None,
ge_cloud_organization_id: Optional[str] = None,
) -> Tuple[Optional[str], Optional[str], Optional[str]]:
cloud_base_url = (
cloud_base_url if cloud_base_url is not None else ge_cloud_base_url
)
cloud_access_token = (
cloud_access_token
if cloud_access_token is not None
else ge_cloud_access_token
)
cloud_organization_id = (
cloud_organization_id
if cloud_organization_id is not None
else ge_cloud_organization_id
)
return cloud_base_url, cloud_access_token, cloud_organization_id

@override
def _register_providers(self, config_provider: _ConfigurationProvider) -> None:
"""
Expand Down Expand Up @@ -247,8 +205,13 @@ def is_cloud_config_available(

@classmethod
def determine_context_root_directory(
cls, context_root_dir: Optional[PathStr]
cls,
context_root_dir: Optional[PathStr],
project_root_dir: Optional[PathStr],
) -> str:
context_root_dir = cls._resolve_context_root_dir_and_project_root_dir(
context_root_dir=context_root_dir, project_root_dir=project_root_dir
)
if context_root_dir is None:
context_root_dir = os.getcwd() # noqa: PTH109
logger.info(
Expand Down
5 changes: 3 additions & 2 deletions great_expectations/data_context/data_context/data_context.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ def DataContext( # noqa: PLR0913
cloud_access_token: None = ...,
cloud_organization_id: None = ...,
) -> FileDataContext:
# If `context_root_dir` is provided and `cloud_mode`/`ge_cloud_mode` are `False` a `FileDataContext` will always be returned.
# If `context_root_dir` is provided and `cloud_mode` is `False` a `FileDataContext` will always be returned.
...


Expand Down Expand Up @@ -169,7 +169,8 @@ def _init_context_root_directory(
) -> str:
if cloud_mode and context_root_dir is None:
context_root_dir = CloudDataContext.determine_context_root_directory(
context_root_dir
context_root_dir=context_root_dir,
project_root_dir=None,
)
else:
context_root_dir = (
Expand Down
18 changes: 11 additions & 7 deletions great_expectations/data_context/data_context/file_data_context.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ def __init__(
self,
project_config: Optional[DataContextConfig] = None,
context_root_dir: Optional[PathStr] = None,
project_root_dir: Optional[PathStr] = None,
runtime_environment: Optional[dict] = None,
) -> None:
"""FileDataContext constructor
Expand All @@ -57,7 +58,8 @@ def __init__(
config_variables.yml and the environment
"""
self._context_root_directory = self._init_context_root_directory(
context_root_dir
context_root_dir=context_root_dir,
project_root_dir=project_root_dir,
)
self._scaffold_project()

Expand All @@ -67,16 +69,18 @@ def __init__(
runtime_environment=runtime_environment,
)

def _init_context_root_directory(self, context_root_dir: Optional[PathStr]) -> str:
def _init_context_root_directory(
self, context_root_dir: Optional[PathStr], project_root_dir: Optional[PathStr]
) -> str:
context_root_dir = self._resolve_context_root_dir_and_project_root_dir(
context_root_dir=context_root_dir, project_root_dir=project_root_dir
)

if isinstance(context_root_dir, pathlib.Path):
context_root_dir = str(context_root_dir)

if not context_root_dir:
context_root_dir = FileDataContext.find_context_root_dir()
if not context_root_dir:
raise ValueError(
"A FileDataContext relies on the presence of a local great_expectations.yml project config"
)
context_root_dir = self.find_context_root_dir()

return context_root_dir

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,23 @@ def _save_project_config(self, _fds_datasource=None) -> None:
"""
raise NotImplementedError

@classmethod
def _resolve_context_root_dir_and_project_root_dir(
cls, context_root_dir: PathStr | None, project_root_dir: PathStr | None
) -> PathStr | None:
if project_root_dir and context_root_dir:
raise TypeError(
"'project_root_dir' and 'context_root_dir' are conflicting args; please only provide one"
)

if project_root_dir:
project_root_dir = pathlib.Path(project_root_dir).absolute()
context_root_dir = pathlib.Path(project_root_dir) / cls.GX_DIR
elif context_root_dir:
context_root_dir = pathlib.Path(context_root_dir).absolute()

return context_root_dir

def _check_for_usage_stats_sync( # noqa: PLR0911
self, project_config: DataContextConfig
) -> bool:
Expand Down
Loading

0 comments on commit aeb3064

Please sign in to comment.