Skip to content

Commit

Permalink
Replace all instances of "data set" with "dataset" (#4211)
Browse files Browse the repository at this point in the history
Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
  • Loading branch information
deepyaman authored and ankatiyar committed Oct 16, 2024
1 parent e863f16 commit 2ccba38
Show file tree
Hide file tree
Showing 24 changed files with 146 additions and 146 deletions.
2 changes: 1 addition & 1 deletion docs/source/data/data_catalog.md
Original file line number Diff line number Diff line change
Expand Up @@ -200,7 +200,7 @@ cars:

In this example, `filepath` is used as the basis of a folder that stores versions of the `cars` dataset. Each time a new version is created by a pipeline run it is stored within `data/01_raw/company/cars.csv/<version>/cars.csv`, where `<version>` corresponds to a version string formatted as `YYYY-MM-DDThh.mm.ss.sssZ`.

By default, `kedro run` loads the latest version of the dataset. However, you can also specify a particular versioned data set with `--load-version` flag as follows:
By default, `kedro run` loads the latest version of the dataset. However, you can also specify a particular versioned dataset with `--load-version` flag as follows:

```bash
kedro run --load-versions=cars:YYYY-MM-DDThh.mm.ss.sssZ
Expand Down
2 changes: 1 addition & 1 deletion docs/source/integrations/mlflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@ and you would be able to preview it in the MLflow web UI:
```

:::{warning}
If you get a `Failed while saving data to data set MlflowMatplotlibWriter` error,
If you get a `Failed while saving data to dataset MlflowMatplotlibWriter` error,
it's probably because you had already executed `kedro run` while the dataset was marked as `versioned: true`.
The solution is to cleanup the old `data/08_reporting/dummy_confusion_matrix.png` directory.
:::
Expand Down
6 changes: 3 additions & 3 deletions docs/source/nodes_and_pipelines/run_a_pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,13 +70,13 @@ class DryRunner(AbstractRunner):
"""

def create_default_dataset(self, ds_name: str) -> AbstractDataset:
"""Factory method for creating the default data set for the runner.
"""Factory method for creating the default dataset for the runner.
Args:
ds_name: Name of the missing data set
ds_name: Name of the missing dataset
Returns:
An instance of an implementation of AbstractDataset to be used
for all unregistered data sets.
for all unregistered datasets.
"""
return MemoryDataset()
Expand Down
6 changes: 3 additions & 3 deletions docs/source/tutorial/spaceflights_tutorial_faqs.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@ If you can't find the answer you need here, [ask the Kedro community for help](h
## How do I resolve these common errors?

### Dataset errors
#### DatasetError: Failed while loading data from data set
#### DatasetError: Failed while loading data from dataset
You're [testing whether Kedro can load the raw test data](./set_up_data.md#test-that-kedro-can-load-the-data) and see the following:

```python
DatasetError: Failed while loading data from data set
DatasetError: Failed while loading data from dataset
CSVDataset(filepath=...).
[Errno 2] No such file or directory: '.../companies.csv'
```
Expand Down Expand Up @@ -71,6 +71,6 @@ The above exception was the direct cause of the following exception:
Traceback (most recent call last):
...
raise DatasetError(message) from exc
kedro.io.core.DatasetError: Failed while loading data from data set CSVDataset(filepath=data/03_primary/model_input_table.csv, save_args={'index': False}).
kedro.io.core.DatasetError: Failed while loading data from dataset CSVDataset(filepath=data/03_primary/model_input_table.csv, save_args={'index': False}).
[Errno 2] File b'data/03_primary/model_input_table.csv' does not exist: b'data/03_primary/model_input_table.csv'
```
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# Here you can define all your data sets by using simple YAML syntax.
# Here you can define all your datasets by using simple YAML syntax.
#
# Documentation for this file format can be found in "The Data Catalog"
# Link: https://docs.kedro.org/en/stable/data/data_catalog.html
#
# We support interacting with a variety of data stores including local file systems, cloud, network and HDFS
#
# An example data set definition can look as follows:
# An example dataset definition can look as follows:
#
#bikes:
# type: pandas.CSVDataset
Expand Down Expand Up @@ -39,7 +39,7 @@
# (transcoding), templating and a way to reuse arguments that are frequently repeated. See more here:
# https://docs.kedro.org/en/stable/data/data_catalog.html
#
# This is a data set used by the "Hello World" example pipeline provided with the project
# This is a dataset used by the "Hello World" example pipeline provided with the project
# template. Please feel free to remove it once you remove the example pipeline.

example_iris_data:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Here you can define credentials for different data sets and environment.
# Here you can define credentials for different datasets and environment.
#
#
# Example:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@


def split_data(data: pd.DataFrame, example_test_data_ratio: float) -> dict[str, Any]:
"""Node for splitting the classical Iris data set into training and test
"""Node for splitting the classical Iris dataset into training and test
sets, each split into features and labels.
The split ratio parameter is taken from conf/project/parameters.yml.
The data and the parameters will be loaded and provided to your function
Expand Down
2 changes: 1 addition & 1 deletion kedro/io/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
"""``kedro.io`` provides functionality to read and write to a
number of data sets. At the core of the library is the ``AbstractDataset`` class.
number of datasets. At the core of the library is the ``AbstractDataset`` class.
"""

from __future__ import annotations
Expand Down
2 changes: 1 addition & 1 deletion kedro/io/catalog_config_resolver.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ def _fetch_credentials(credentials_name: str, credentials: dict[str, Any]) -> An
The set of requested credentials.
Raises:
KeyError: When a data set with the given name has not yet been
KeyError: When a dataset with the given name has not yet been
registered.
"""
Expand Down
54 changes: 27 additions & 27 deletions kedro/io/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,23 +71,23 @@ class DatasetError(Exception):

class DatasetNotFoundError(DatasetError):
"""``DatasetNotFoundError`` raised by ``DataCatalog`` class in case of
trying to use a non-existing data set.
trying to use a non-existing dataset.
"""

pass


class DatasetAlreadyExistsError(DatasetError):
"""``DatasetAlreadyExistsError`` raised by ``DataCatalog`` class in case
of trying to add a data set which already exists in the ``DataCatalog``.
of trying to add a dataset which already exists in the ``DataCatalog``.
"""

pass


class VersionNotFoundError(DatasetError):
"""``VersionNotFoundError`` raised by ``AbstractVersionedDataset`` implementations
in case of no load versions available for the data set.
in case of no load versions available for the dataset.
"""

pass
Expand All @@ -98,9 +98,9 @@ class VersionNotFoundError(DatasetError):


class AbstractDataset(abc.ABC, Generic[_DI, _DO]):
"""``AbstractDataset`` is the base class for all data set implementations.
"""``AbstractDataset`` is the base class for all dataset implementations.
All data set implementations should extend this abstract class
All dataset implementations should extend this abstract class
and implement the methods marked as abstract.
If a specific dataset implementation cannot be used in conjunction with
the ``ParallelRunner``, such user-defined dataset should have the
Expand Down Expand Up @@ -156,23 +156,23 @@ def from_config(
load_version: str | None = None,
save_version: str | None = None,
) -> AbstractDataset:
"""Create a data set instance using the configuration provided.
"""Create a dataset instance using the configuration provided.
Args:
name: Data set name.
config: Data set config dictionary.
load_version: Version string to be used for ``load`` operation if
the data set is versioned. Has no effect on the data set
the dataset is versioned. Has no effect on the dataset
if versioning was not enabled.
save_version: Version string to be used for ``save`` operation if
the data set is versioned. Has no effect on the data set
the dataset is versioned. Has no effect on the dataset
if versioning was not enabled.
Returns:
An instance of an ``AbstractDataset`` subclass.
Raises:
DatasetError: When the function fails to create the data set
DatasetError: When the function fails to create the dataset
from its config.
"""
Expand Down Expand Up @@ -245,9 +245,9 @@ def load(self: Self) -> _DO:
except DatasetError:
raise
except Exception as exc:
# This exception handling is by design as the composed data sets
# This exception handling is by design as the composed datasets
# can throw any type of exception.
message = f"Failed while loading data from data set {self!s}.\n{exc!s}"
message = f"Failed while loading data from dataset {self!s}.\n{exc!s}"
raise DatasetError(message) from exc

load.__annotations__["return"] = load_func.__annotations__.get("return")
Expand All @@ -271,7 +271,7 @@ def save(self: Self, data: _DI) -> None:
except (DatasetError, FileNotFoundError, NotADirectoryError):
raise
except Exception as exc:
message = f"Failed while saving data to data set {self!s}.\n{exc!s}"
message = f"Failed while saving data to dataset {self!s}.\n{exc!s}"
raise DatasetError(message) from exc

save.__annotations__["data"] = save_func.__annotations__.get("data", Any)
Expand Down Expand Up @@ -377,7 +377,7 @@ def _describe(self) -> dict[str, Any]:
)

def exists(self) -> bool:
"""Checks whether a data set's output already exists by calling
"""Checks whether a dataset's output already exists by calling
the provided _exists() method.
Returns:
Expand All @@ -391,7 +391,7 @@ def exists(self) -> bool:
self._logger.debug("Checking whether target of %s exists", str(self))
return self._exists()
except Exception as exc:
message = f"Failed during exists check for data set {self!s}.\n{exc!s}"
message = f"Failed during exists check for dataset {self!s}.\n{exc!s}"
raise DatasetError(message) from exc

def _exists(self) -> bool:
Expand All @@ -412,7 +412,7 @@ def release(self) -> None:
self._logger.debug("Releasing %s", str(self))
self._release()
except Exception as exc:
message = f"Failed during release for data set {self!s}.\n{exc!s}"
message = f"Failed during release for dataset {self!s}.\n{exc!s}"
raise DatasetError(message) from exc

def _release(self) -> None:
Expand All @@ -438,7 +438,7 @@ def generate_timestamp() -> str:

class Version(namedtuple("Version", ["load", "save"])):
"""This namedtuple is used to provide load and save versions for versioned
data sets. If ``Version.load`` is None, then the latest available version
datasets. If ``Version.load`` is None, then the latest available version
is loaded. If ``Version.save`` is None, then save version is formatted as
YYYY-MM-DDThh.mm.ss.sssZ of the current timestamp.
"""
Expand All @@ -450,7 +450,7 @@ class Version(namedtuple("Version", ["load", "save"])):
"Save version '{}' did not match load version '{}' for {}. This is strongly "
"discouraged due to inconsistencies it may cause between 'save' and "
"'load' operations. Please refrain from setting exact load version for "
"intermediate data sets where possible to avoid this warning."
"intermediate datasets where possible to avoid this warning."
)

_DEFAULT_PACKAGES = ["kedro.io.", "kedro_datasets.", ""]
Expand All @@ -467,10 +467,10 @@ def parse_dataset_definition(
config: Data set config dictionary. It *must* contain the `type` key
with fully qualified class name or the class object.
load_version: Version string to be used for ``load`` operation if
the data set is versioned. Has no effect on the data set
the dataset is versioned. Has no effect on the dataset
if versioning was not enabled.
save_version: Version string to be used for ``save`` operation if
the data set is versioned. Has no effect on the data set
the dataset is versioned. Has no effect on the dataset
if versioning was not enabled.
Raises:
Expand Down Expand Up @@ -522,14 +522,14 @@ def parse_dataset_definition(
if not issubclass(class_obj, AbstractDataset):
raise DatasetError(
f"Dataset type '{class_obj.__module__}.{class_obj.__qualname__}' "
f"is invalid: all data set types must extend 'AbstractDataset'."
f"is invalid: all dataset types must extend 'AbstractDataset'."
)

if VERSION_KEY in config:
# remove "version" key so that it's not passed
# to the "unversioned" data set constructor
# to the "unversioned" dataset constructor
message = (
"'%s' attribute removed from data set configuration since it is a "
"'%s' attribute removed from dataset configuration since it is a "
"reserved word and cannot be directly specified"
)
logging.getLogger(__name__).warning(message, VERSION_KEY)
Expand Down Expand Up @@ -579,10 +579,10 @@ def _local_exists(local_filepath: str) -> bool: # SKIP_IF_NO_SPARK

class AbstractVersionedDataset(AbstractDataset[_DI, _DO], abc.ABC):
"""
``AbstractVersionedDataset`` is the base class for all versioned data set
``AbstractVersionedDataset`` is the base class for all versioned dataset
implementations.
All data sets that implement versioning should extend this
All datasets that implement versioning should extend this
abstract class and implement the methods marked as abstract.
Example:
Expand Down Expand Up @@ -764,7 +764,7 @@ def save(self: Self, data: _DI) -> None:
return save

def exists(self) -> bool:
"""Checks whether a data set's output already exists by calling
"""Checks whether a dataset's output already exists by calling
the provided _exists() method.
Returns:
Expand All @@ -780,7 +780,7 @@ def exists(self) -> bool:
except VersionNotFoundError:
return False
except Exception as exc: # SKIP_IF_NO_SPARK
message = f"Failed during exists check for data set {self!s}.\n{exc!s}"
message = f"Failed during exists check for dataset {self!s}.\n{exc!s}"
raise DatasetError(message) from exc

def _release(self) -> None:
Expand Down Expand Up @@ -938,7 +938,7 @@ def add_feed_dict(self, datasets: dict[str, Any], replace: bool = False) -> None
...

def exists(self, name: str) -> bool:
"""Checks whether registered data set exists by calling its `exists()` method."""
"""Checks whether registered dataset exists by calling its `exists()` method."""
...

def release(self, name: str) -> None:
Expand Down
Loading

0 comments on commit 2ccba38

Please sign in to comment.