Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KedroDataCatalog release updates #4214

Merged
merged 5 commits into from
Oct 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions RELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,20 +5,22 @@
* Implemented `KedroDataCatalog` repeating `DataCatalog` functionality with a few API enhancements:
* Removed `_FrozenDatasets` and access datasets as properties;
* Added get dataset by name feature;
* `add_feed_dict()` was simplified and renamed to `add_data()`;
* `add_feed_dict()` was simplified to only add raw data;

Check warning on line 8 in RELEASE.md

View workflow job for this annotation

GitHub Actions / vale

[vale] RELEASE.md#L8

[Kedro.weaselwords] 'only' is a weasel word!
Raw output
{"message": "[Kedro.weaselwords] 'only' is a weasel word!", "location": {"path": "RELEASE.md", "range": {"start": {"line": 8, "column": 41}}}, "severity": "WARNING"}
* Datasets' initialisation was moved out from `from_config()` method to the constructor.
* Moved development requirements from `requirements.txt` to the dedicated section in `pyproject.toml` for project template.
* Implemented `Protocol` abstraction for the current `DataCatalog` and adding new catalog implementations.
* Refactored `kedro run` and `kedro catalog` commands.
* Moved pattern resolution logic from `DataCatalog` to a separate component - `CatalogConfigResolver`. Updated `DataCatalog` to use `CatalogConfigResolver` internally.
* Made packaged Kedro projects return `session.run()` output to be used when running it in the interactive environment.
* Enhanced `OmegaConfigLoader` configuration validation to detect duplicate keys at all parameter levels, ensuring comprehensive nested key checking.

**Note:** ``KedroDataCatalog`` is an experimental feature and is under active development. Therefore, it is possible we'll introduce breaking changes to this class, so be mindful of that if you decide to use it already. Let us know if you have any feedback about the ``KedroDataCatalog`` or ideas for new features.

Check warning on line 17 in RELEASE.md

View workflow job for this annotation

GitHub Actions / vale

[vale] RELEASE.md#L17

[Kedro.toowordy] 'Therefore' is too wordy
Raw output
{"message": "[Kedro.toowordy] 'Therefore' is too wordy", "location": {"path": "RELEASE.md", "range": {"start": {"line": 17, "column": 92}}}, "severity": "WARNING"}

## Bug fixes and other changes
* Fixed bug where using dataset factories breaks with `ThreadRunner`.
* Fixed a bug where `SharedMemoryDataset.exists` would not call the underlying `MemoryDataset`.
* Fixed template projects example tests.
* Made credentials loading consistent between `KedroContext._get_catalog()` and `resolve_patterns` so that both us
e `_get_config_credentials()`
* Made credentials loading consistent between `KedroContext._get_catalog()` and `resolve_patterns` so that both use `_get_config_credentials()`

## Breaking changes to the API
* Removed `ShelveStore` to address a security vulnerability.
Expand Down
23 changes: 16 additions & 7 deletions kedro/io/kedro_data_catalog.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@
use a ``KedroDataCatalog``, you need to instantiate it with a dictionary of datasets.
Then it will act as a single point of reference for your calls, relaying load and
save functions to the underlying datasets.

``KedroDataCatalog`` is an experimental feature aimed to replace ``DataCatalog`` in the future.
Expect possible breaking changes while using it.
"""

from __future__ import annotations
Expand Down Expand Up @@ -44,6 +47,8 @@ def __init__(
single point of reference for your calls, relaying load and save
functions to the underlying datasets.

Note: ``KedroDataCatalog`` is an experimental feature and is under active development. Therefore, it is possible we'll introduce breaking changes to this class, so be mindful of that if you decide to use it already.

Args:
datasets: A dictionary of dataset names and dataset instances.
raw_data: A dictionary with data to be added in memory as `MemoryDataset`` instances.
Expand All @@ -56,6 +61,13 @@ def __init__(
case-insensitive string that conforms with operating system
filename limitations, b) always return the latest version when
sorted in lexicographical order.

Example:
::
>>> # settings.py
>>> from kedro.io import KedroDataCatalog
>>>
>>> DATA_CATALOG_CLASS = KedroDataCatalog
"""
self._config_resolver = config_resolver or CatalogConfigResolver()
self._datasets = datasets or {}
Expand All @@ -68,7 +80,7 @@ def __init__(
self._add_from_config(ds_name, ds_config)

if raw_data:
self.add_data(raw_data)
self.add_feed_dict(raw_data)

@property
def datasets(self) -> dict[str, Any]:
Expand Down Expand Up @@ -304,16 +316,13 @@ def confirm(self, name: str) -> None:
else:
raise DatasetError(f"Dataset '{name}' does not have 'confirm' method")

def add_data(self, data: dict[str, Any], replace: bool = False) -> None:
def add_feed_dict(self, feed_dict: dict[str, Any], replace: bool = False) -> None:
# TODO: remove when removing old catalog
# This method was simplified to add memory datasets only, since
# adding AbstractDataset can be done via add() method
for ds_name, ds_data in data.items():
for ds_name, ds_data in feed_dict.items():
self.add(ds_name, MemoryDataset(data=ds_data), replace) # type: ignore[abstract]

def add_feed_dict(self, feed_dict: dict[str, Any], replace: bool = False) -> None:
# TODO: remove when removing old catalog
return self.add_data(feed_dict, replace)

def shallow_copy(
self, extra_dataset_patterns: Patterns | None = None
) -> KedroDataCatalog:
Expand Down