Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace all instances of "data set" with "dataset" #4211

Merged
merged 1 commit into from
Oct 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/source/data/data_catalog.md
Original file line number Diff line number Diff line change
Expand Up @@ -200,7 +200,7 @@

In this example, `filepath` is used as the basis of a folder that stores versions of the `cars` dataset. Each time a new version is created by a pipeline run it is stored within `data/01_raw/company/cars.csv/<version>/cars.csv`, where `<version>` corresponds to a version string formatted as `YYYY-MM-DDThh.mm.ss.sssZ`.

By default, `kedro run` loads the latest version of the dataset. However, you can also specify a particular versioned data set with `--load-version` flag as follows:
By default, `kedro run` loads the latest version of the dataset. However, you can also specify a particular versioned dataset with `--load-version` flag as follows:

Check warning on line 203 in docs/source/data/data_catalog.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/data/data_catalog.md#L203

[Kedro.toowordy] 'However' is too wordy
Raw output
{"message": "[Kedro.toowordy] 'However' is too wordy", "location": {"path": "docs/source/data/data_catalog.md", "range": {"start": {"line": 203, "column": 66}}}, "severity": "WARNING"}

```bash
kedro run --load-versions=cars:YYYY-MM-DDThh.mm.ss.sssZ
Expand Down
2 changes: 1 addition & 1 deletion docs/source/integrations/mlflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@ and you would be able to preview it in the MLflow web UI:
```

:::{warning}
If you get a `Failed while saving data to data set MlflowMatplotlibWriter` error,
If you get a `Failed while saving data to dataset MlflowMatplotlibWriter` error,
it's probably because you had already executed `kedro run` while the dataset was marked as `versioned: true`.
The solution is to cleanup the old `data/08_reporting/dummy_confusion_matrix.png` directory.
:::
Expand Down
6 changes: 3 additions & 3 deletions docs/source/nodes_and_pipelines/run_a_pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,13 +70,13 @@ class DryRunner(AbstractRunner):
"""

def create_default_dataset(self, ds_name: str) -> AbstractDataset:
"""Factory method for creating the default data set for the runner.
"""Factory method for creating the default dataset for the runner.
Args:
ds_name: Name of the missing data set
ds_name: Name of the missing dataset
Returns:
An instance of an implementation of AbstractDataset to be used
for all unregistered data sets.
for all unregistered datasets.
"""
return MemoryDataset()
Expand Down
6 changes: 3 additions & 3 deletions docs/source/tutorial/spaceflights_tutorial_faqs.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@
## How do I resolve these common errors?

### Dataset errors
#### DatasetError: Failed while loading data from data set
#### DatasetError: Failed while loading data from dataset

Check warning on line 10 in docs/source/tutorial/spaceflights_tutorial_faqs.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/tutorial/spaceflights_tutorial_faqs.md#L10

[Kedro.headings] 'DatasetError: Failed while loading data from dataset' should use sentence-style capitalization.
Raw output
{"message": "[Kedro.headings] 'DatasetError: Failed while loading data from dataset' should use sentence-style capitalization.", "location": {"path": "docs/source/tutorial/spaceflights_tutorial_faqs.md", "range": {"start": {"line": 10, "column": 6}}}, "severity": "WARNING"}
You're [testing whether Kedro can load the raw test data](./set_up_data.md#test-that-kedro-can-load-the-data) and see the following:

```python
DatasetError: Failed while loading data from data set
DatasetError: Failed while loading data from dataset
CSVDataset(filepath=...).
[Errno 2] No such file or directory: '.../companies.csv'
```
Expand Down Expand Up @@ -71,6 +71,6 @@
Traceback (most recent call last):
...
raise DatasetError(message) from exc
kedro.io.core.DatasetError: Failed while loading data from data set CSVDataset(filepath=data/03_primary/model_input_table.csv, save_args={'index': False}).
kedro.io.core.DatasetError: Failed while loading data from dataset CSVDataset(filepath=data/03_primary/model_input_table.csv, save_args={'index': False}).
[Errno 2] File b'data/03_primary/model_input_table.csv' does not exist: b'data/03_primary/model_input_table.csv'
```
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# Here you can define all your data sets by using simple YAML syntax.
# Here you can define all your datasets by using simple YAML syntax.
#
# Documentation for this file format can be found in "The Data Catalog"
# Link: https://docs.kedro.org/en/stable/data/data_catalog.html
#
# We support interacting with a variety of data stores including local file systems, cloud, network and HDFS
#
# An example data set definition can look as follows:
# An example dataset definition can look as follows:
#
#bikes:
# type: pandas.CSVDataset
Expand Down Expand Up @@ -39,7 +39,7 @@
# (transcoding), templating and a way to reuse arguments that are frequently repeated. See more here:
# https://docs.kedro.org/en/stable/data/data_catalog.html
#
# This is a data set used by the "Hello World" example pipeline provided with the project
# This is a dataset used by the "Hello World" example pipeline provided with the project
# template. Please feel free to remove it once you remove the example pipeline.

example_iris_data:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Here you can define credentials for different data sets and environment.
# Here you can define credentials for different datasets and environment.
#
#
# Example:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@


def split_data(data: pd.DataFrame, example_test_data_ratio: float) -> dict[str, Any]:
"""Node for splitting the classical Iris data set into training and test
"""Node for splitting the classical Iris dataset into training and test
sets, each split into features and labels.
The split ratio parameter is taken from conf/project/parameters.yml.
The data and the parameters will be loaded and provided to your function
Expand Down
2 changes: 1 addition & 1 deletion kedro/io/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
"""``kedro.io`` provides functionality to read and write to a
number of data sets. At the core of the library is the ``AbstractDataset`` class.
number of datasets. At the core of the library is the ``AbstractDataset`` class.
"""

from __future__ import annotations
Expand Down
2 changes: 1 addition & 1 deletion kedro/io/catalog_config_resolver.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ def _fetch_credentials(credentials_name: str, credentials: dict[str, Any]) -> An
The set of requested credentials.
Raises:
KeyError: When a data set with the given name has not yet been
KeyError: When a dataset with the given name has not yet been
registered.
"""
Expand Down
54 changes: 27 additions & 27 deletions kedro/io/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,23 +71,23 @@ class DatasetError(Exception):

class DatasetNotFoundError(DatasetError):
"""``DatasetNotFoundError`` raised by ``DataCatalog`` class in case of
trying to use a non-existing data set.
trying to use a non-existing dataset.
"""

pass


class DatasetAlreadyExistsError(DatasetError):
"""``DatasetAlreadyExistsError`` raised by ``DataCatalog`` class in case
of trying to add a data set which already exists in the ``DataCatalog``.
of trying to add a dataset which already exists in the ``DataCatalog``.
"""

pass


class VersionNotFoundError(DatasetError):
"""``VersionNotFoundError`` raised by ``AbstractVersionedDataset`` implementations
in case of no load versions available for the data set.
in case of no load versions available for the dataset.
"""

pass
Expand All @@ -98,9 +98,9 @@ class VersionNotFoundError(DatasetError):


class AbstractDataset(abc.ABC, Generic[_DI, _DO]):
"""``AbstractDataset`` is the base class for all data set implementations.
"""``AbstractDataset`` is the base class for all dataset implementations.

All data set implementations should extend this abstract class
All dataset implementations should extend this abstract class
and implement the methods marked as abstract.
If a specific dataset implementation cannot be used in conjunction with
the ``ParallelRunner``, such user-defined dataset should have the
Expand Down Expand Up @@ -156,23 +156,23 @@ def from_config(
load_version: str | None = None,
save_version: str | None = None,
) -> AbstractDataset:
"""Create a data set instance using the configuration provided.
"""Create a dataset instance using the configuration provided.

Args:
name: Data set name.
config: Data set config dictionary.
load_version: Version string to be used for ``load`` operation if
the data set is versioned. Has no effect on the data set
the dataset is versioned. Has no effect on the dataset
if versioning was not enabled.
save_version: Version string to be used for ``save`` operation if
the data set is versioned. Has no effect on the data set
the dataset is versioned. Has no effect on the dataset
if versioning was not enabled.

Returns:
An instance of an ``AbstractDataset`` subclass.

Raises:
DatasetError: When the function fails to create the data set
DatasetError: When the function fails to create the dataset
from its config.

"""
Expand Down Expand Up @@ -245,9 +245,9 @@ def load(self: Self) -> _DO:
except DatasetError:
raise
except Exception as exc:
# This exception handling is by design as the composed data sets
# This exception handling is by design as the composed datasets
# can throw any type of exception.
message = f"Failed while loading data from data set {self!s}.\n{exc!s}"
message = f"Failed while loading data from dataset {self!s}.\n{exc!s}"
raise DatasetError(message) from exc

load.__annotations__["return"] = load_func.__annotations__.get("return")
Expand All @@ -271,7 +271,7 @@ def save(self: Self, data: _DI) -> None:
except (DatasetError, FileNotFoundError, NotADirectoryError):
raise
except Exception as exc:
message = f"Failed while saving data to data set {self!s}.\n{exc!s}"
message = f"Failed while saving data to dataset {self!s}.\n{exc!s}"
raise DatasetError(message) from exc

save.__annotations__["data"] = save_func.__annotations__.get("data", Any)
Expand Down Expand Up @@ -377,7 +377,7 @@ def _describe(self) -> dict[str, Any]:
)

def exists(self) -> bool:
"""Checks whether a data set's output already exists by calling
"""Checks whether a dataset's output already exists by calling
the provided _exists() method.

Returns:
Expand All @@ -391,7 +391,7 @@ def exists(self) -> bool:
self._logger.debug("Checking whether target of %s exists", str(self))
return self._exists()
except Exception as exc:
message = f"Failed during exists check for data set {self!s}.\n{exc!s}"
message = f"Failed during exists check for dataset {self!s}.\n{exc!s}"
raise DatasetError(message) from exc

def _exists(self) -> bool:
Expand All @@ -412,7 +412,7 @@ def release(self) -> None:
self._logger.debug("Releasing %s", str(self))
self._release()
except Exception as exc:
message = f"Failed during release for data set {self!s}.\n{exc!s}"
message = f"Failed during release for dataset {self!s}.\n{exc!s}"
raise DatasetError(message) from exc

def _release(self) -> None:
Expand All @@ -438,7 +438,7 @@ def generate_timestamp() -> str:

class Version(namedtuple("Version", ["load", "save"])):
"""This namedtuple is used to provide load and save versions for versioned
data sets. If ``Version.load`` is None, then the latest available version
datasets. If ``Version.load`` is None, then the latest available version
is loaded. If ``Version.save`` is None, then save version is formatted as
YYYY-MM-DDThh.mm.ss.sssZ of the current timestamp.
"""
Expand All @@ -450,7 +450,7 @@ class Version(namedtuple("Version", ["load", "save"])):
"Save version '{}' did not match load version '{}' for {}. This is strongly "
"discouraged due to inconsistencies it may cause between 'save' and "
"'load' operations. Please refrain from setting exact load version for "
"intermediate data sets where possible to avoid this warning."
"intermediate datasets where possible to avoid this warning."
)

_DEFAULT_PACKAGES = ["kedro.io.", "kedro_datasets.", ""]
Expand All @@ -467,10 +467,10 @@ def parse_dataset_definition(
config: Data set config dictionary. It *must* contain the `type` key
with fully qualified class name or the class object.
load_version: Version string to be used for ``load`` operation if
the data set is versioned. Has no effect on the data set
the dataset is versioned. Has no effect on the dataset
if versioning was not enabled.
save_version: Version string to be used for ``save`` operation if
the data set is versioned. Has no effect on the data set
the dataset is versioned. Has no effect on the dataset
if versioning was not enabled.

Raises:
Expand Down Expand Up @@ -522,14 +522,14 @@ def parse_dataset_definition(
if not issubclass(class_obj, AbstractDataset):
raise DatasetError(
f"Dataset type '{class_obj.__module__}.{class_obj.__qualname__}' "
f"is invalid: all data set types must extend 'AbstractDataset'."
f"is invalid: all dataset types must extend 'AbstractDataset'."
)

if VERSION_KEY in config:
# remove "version" key so that it's not passed
# to the "unversioned" data set constructor
# to the "unversioned" dataset constructor
message = (
"'%s' attribute removed from data set configuration since it is a "
"'%s' attribute removed from dataset configuration since it is a "
"reserved word and cannot be directly specified"
)
logging.getLogger(__name__).warning(message, VERSION_KEY)
Expand Down Expand Up @@ -579,10 +579,10 @@ def _local_exists(local_filepath: str) -> bool: # SKIP_IF_NO_SPARK

class AbstractVersionedDataset(AbstractDataset[_DI, _DO], abc.ABC):
"""
``AbstractVersionedDataset`` is the base class for all versioned data set
``AbstractVersionedDataset`` is the base class for all versioned dataset
implementations.

All data sets that implement versioning should extend this
All datasets that implement versioning should extend this
abstract class and implement the methods marked as abstract.

Example:
Expand Down Expand Up @@ -764,7 +764,7 @@ def save(self: Self, data: _DI) -> None:
return save

def exists(self) -> bool:
"""Checks whether a data set's output already exists by calling
"""Checks whether a dataset's output already exists by calling
the provided _exists() method.

Returns:
Expand All @@ -780,7 +780,7 @@ def exists(self) -> bool:
except VersionNotFoundError:
return False
except Exception as exc: # SKIP_IF_NO_SPARK
message = f"Failed during exists check for data set {self!s}.\n{exc!s}"
message = f"Failed during exists check for dataset {self!s}.\n{exc!s}"
raise DatasetError(message) from exc

def _release(self) -> None:
Expand Down Expand Up @@ -938,7 +938,7 @@ def add_feed_dict(self, datasets: dict[str, Any], replace: bool = False) -> None
...

def exists(self, name: str) -> bool:
"""Checks whether registered data set exists by calling its `exists()` method."""
"""Checks whether registered dataset exists by calling its `exists()` method."""
...

def release(self, name: str) -> None:
Expand Down
Loading