Replace all instances of "data set" with "dataset" (#4211)

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
kedro-org · Oct 16, 2024 · 2ccba38 · 2ccba38
1 parent e863f16
commit 2ccba38
Show file tree

Hide file tree

Showing 24 changed files with 146 additions and 146 deletions.
diff --git a/docs/source/data/data_catalog.md b/docs/source/data/data_catalog.md
@@ -200,7 +200,7 @@ cars:
 
 In this example, `filepath` is used as the basis of a folder that stores versions of the `cars` dataset. Each time a new version is created by a pipeline run it is stored within `data/01_raw/company/cars.csv/<version>/cars.csv`, where `<version>` corresponds to a version string formatted as `YYYY-MM-DDThh.mm.ss.sssZ`.
 
-By default, `kedro run` loads the latest version of the dataset. However, you can also specify a particular versioned data set with `--load-version` flag as follows:
+By default, `kedro run` loads the latest version of the dataset. However, you can also specify a particular versioned dataset with `--load-version` flag as follows:
 
 ```bash
 kedro run --load-versions=cars:YYYY-MM-DDThh.mm.ss.sssZ

diff --git a/docs/source/integrations/mlflow.md b/docs/source/integrations/mlflow.md
@@ -134,7 +134,7 @@ and you would be able to preview it in the MLflow web UI:
 ```
 
 :::{warning}
-If you get a `Failed while saving data to data set MlflowMatplotlibWriter` error,
+If you get a `Failed while saving data to dataset MlflowMatplotlibWriter` error,
 it's probably because you had already executed `kedro run` while the dataset was marked as `versioned: true`.
 The solution is to cleanup the old `data/08_reporting/dummy_confusion_matrix.png` directory.
 :::

diff --git a/docs/source/nodes_and_pipelines/run_a_pipeline.md b/docs/source/nodes_and_pipelines/run_a_pipeline.md
@@ -70,13 +70,13 @@ class DryRunner(AbstractRunner):
     """
 
     def create_default_dataset(self, ds_name: str) -> AbstractDataset:
-        """Factory method for creating the default data set for the runner.
+        """Factory method for creating the default dataset for the runner.
 
         Args:
-            ds_name: Name of the missing data set
+            ds_name: Name of the missing dataset
         Returns:
             An instance of an implementation of AbstractDataset to be used
-            for all unregistered data sets.
+            for all unregistered datasets.
 
         """
         return MemoryDataset()

diff --git a/docs/source/tutorial/spaceflights_tutorial_faqs.md b/docs/source/tutorial/spaceflights_tutorial_faqs.md
@@ -7,11 +7,11 @@ If you can't find the answer you need here, [ask the Kedro community for help](h
 ## How do I resolve these common errors?
 
 ### Dataset errors
-#### DatasetError: Failed while loading data from data set
+#### DatasetError: Failed while loading data from dataset
 You're [testing whether Kedro can load the raw test data](./set_up_data.md#test-that-kedro-can-load-the-data) and see the following:
 
 ```python
-DatasetError: Failed while loading data from data set
+DatasetError: Failed while loading data from dataset
 CSVDataset(filepath=...).
 [Errno 2] No such file or directory: '.../companies.csv'
 ```
@@ -71,6 +71,6 @@ The above exception was the direct cause of the following exception:
 Traceback (most recent call last):
   ...
     raise DatasetError(message) from exc
-kedro.io.core.DatasetError: Failed while loading data from data set CSVDataset(filepath=data/03_primary/model_input_table.csv, save_args={'index': False}).
+kedro.io.core.DatasetError: Failed while loading data from dataset CSVDataset(filepath=data/03_primary/model_input_table.csv, save_args={'index': False}).
 [Errno 2] File b'data/03_primary/model_input_table.csv' does not exist: b'data/03_primary/model_input_table.csv'
 ```
diff --git a/features/steps/test_starter/{{ cookiecutter.repo_name }}/conf/base/catalog.yml b/features/steps/test_starter/{{ cookiecutter.repo_name }}/conf/base/catalog.yml
@@ -1,11 +1,11 @@
-# Here you can define all your data sets by using simple YAML syntax.
+# Here you can define all your datasets by using simple YAML syntax.
 #
 # Documentation for this file format can be found in "The Data Catalog"
 # Link: https://docs.kedro.org/en/stable/data/data_catalog.html
 #
 # We support interacting with a variety of data stores including local file systems, cloud, network and HDFS
 #
-# An example data set definition can look as follows:
+# An example dataset definition can look as follows:
 #
 #bikes:
 #  type: pandas.CSVDataset
@@ -39,7 +39,7 @@
 # (transcoding), templating and a way to reuse arguments that are frequently repeated. See more here:
 # https://docs.kedro.org/en/stable/data/data_catalog.html
 #
-# This is a data set used by the "Hello World" example pipeline provided with the project
+# This is a dataset used by the "Hello World" example pipeline provided with the project
 # template. Please feel free to remove it once you remove the example pipeline.
 
 example_iris_data:

diff --git a/features/steps/test_starter/{{ cookiecutter.repo_name }}/conf/local/credentials.yml b/features/steps/test_starter/{{ cookiecutter.repo_name }}/conf/local/credentials.yml
@@ -1,4 +1,4 @@
-# Here you can define credentials for different data sets and environment.
+# Here you can define credentials for different datasets and environment.
 #
 #
 # Example:

diff --git a/...er.repo_name }}/src/{{ cookiecutter.python_package }}/pipelines/data_engineering/nodes.py b/...er.repo_name }}/src/{{ cookiecutter.python_package }}/pipelines/data_engineering/nodes.py
@@ -11,7 +11,7 @@
 
 
 def split_data(data: pd.DataFrame, example_test_data_ratio: float) -> dict[str, Any]:
-    """Node for splitting the classical Iris data set into training and test
+    """Node for splitting the classical Iris dataset into training and test
     sets, each split into features and labels.
     The split ratio parameter is taken from conf/project/parameters.yml.
     The data and the parameters will be loaded and provided to your function

diff --git a/kedro/io/__init__.py b/kedro/io/__init__.py
@@ -1,5 +1,5 @@
 """``kedro.io`` provides functionality to read and write to a
-number of data sets. At the core of the library is the ``AbstractDataset`` class.
+number of datasets. At the core of the library is the ``AbstractDataset`` class.
 """
 
 from __future__ import annotations

diff --git a/kedro/io/catalog_config_resolver.py b/kedro/io/catalog_config_resolver.py
@@ -90,7 +90,7 @@ def _fetch_credentials(credentials_name: str, credentials: dict[str, Any]) -> An
             The set of requested credentials.
 
         Raises:
-            KeyError: When a data set with the given name has not yet been
+            KeyError: When a dataset with the given name has not yet been
                 registered.
 
         """

diff --git a/kedro/io/core.py b/kedro/io/core.py
@@ -71,23 +71,23 @@ class DatasetError(Exception):
 
 class DatasetNotFoundError(DatasetError):
     """``DatasetNotFoundError`` raised by ``DataCatalog`` class in case of
-    trying to use a non-existing data set.
+    trying to use a non-existing dataset.
     """
 
     pass
 
 
 class DatasetAlreadyExistsError(DatasetError):
     """``DatasetAlreadyExistsError`` raised by ``DataCatalog`` class in case
-    of trying to add a data set which already exists in the ``DataCatalog``.
+    of trying to add a dataset which already exists in the ``DataCatalog``.
     """
 
     pass
 
 
 class VersionNotFoundError(DatasetError):
     """``VersionNotFoundError`` raised by ``AbstractVersionedDataset`` implementations
-    in case of no load versions available for the data set.
+    in case of no load versions available for the dataset.
     """
 
     pass
@@ -98,9 +98,9 @@ class VersionNotFoundError(DatasetError):
 
 
 class AbstractDataset(abc.ABC, Generic[_DI, _DO]):
-    """``AbstractDataset`` is the base class for all data set implementations.
+    """``AbstractDataset`` is the base class for all dataset implementations.
 
-    All data set implementations should extend this abstract class
+    All dataset implementations should extend this abstract class
     and implement the methods marked as abstract.
     If a specific dataset implementation cannot be used in conjunction with
     the ``ParallelRunner``, such user-defined dataset should have the
@@ -156,23 +156,23 @@ def from_config(
         load_version: str | None = None,
         save_version: str | None = None,
     ) -> AbstractDataset:
-        """Create a data set instance using the configuration provided.
+        """Create a dataset instance using the configuration provided.
 
         Args:
             name: Data set name.
             config: Data set config dictionary.
             load_version: Version string to be used for ``load`` operation if
-                the data set is versioned. Has no effect on the data set
+                the dataset is versioned. Has no effect on the dataset
                 if versioning was not enabled.
             save_version: Version string to be used for ``save`` operation if
-                the data set is versioned. Has no effect on the data set
+                the dataset is versioned. Has no effect on the dataset
                 if versioning was not enabled.
 
         Returns:
             An instance of an ``AbstractDataset`` subclass.
 
         Raises:
-            DatasetError: When the function fails to create the data set
+            DatasetError: When the function fails to create the dataset
                 from its config.
 
         """
@@ -245,9 +245,9 @@ def load(self: Self) -> _DO:
             except DatasetError:
                 raise
             except Exception as exc:
-                # This exception handling is by design as the composed data sets
+                # This exception handling is by design as the composed datasets
                 # can throw any type of exception.
-                message = f"Failed while loading data from data set {self!s}.\n{exc!s}"
+                message = f"Failed while loading data from dataset {self!s}.\n{exc!s}"
                 raise DatasetError(message) from exc
 
         load.__annotations__["return"] = load_func.__annotations__.get("return")
@@ -271,7 +271,7 @@ def save(self: Self, data: _DI) -> None:
             except (DatasetError, FileNotFoundError, NotADirectoryError):
                 raise
             except Exception as exc:
-                message = f"Failed while saving data to data set {self!s}.\n{exc!s}"
+                message = f"Failed while saving data to dataset {self!s}.\n{exc!s}"
                 raise DatasetError(message) from exc
 
         save.__annotations__["data"] = save_func.__annotations__.get("data", Any)
@@ -377,7 +377,7 @@ def _describe(self) -> dict[str, Any]:
         )
 
     def exists(self) -> bool:
-        """Checks whether a data set's output already exists by calling
+        """Checks whether a dataset's output already exists by calling
         the provided _exists() method.
 
         Returns:
@@ -391,7 +391,7 @@ def exists(self) -> bool:
             self._logger.debug("Checking whether target of %s exists", str(self))
             return self._exists()
         except Exception as exc:
-            message = f"Failed during exists check for data set {self!s}.\n{exc!s}"
+            message = f"Failed during exists check for dataset {self!s}.\n{exc!s}"
             raise DatasetError(message) from exc
 
     def _exists(self) -> bool:
@@ -412,7 +412,7 @@ def release(self) -> None:
             self._logger.debug("Releasing %s", str(self))
             self._release()
         except Exception as exc:
-            message = f"Failed during release for data set {self!s}.\n{exc!s}"
+            message = f"Failed during release for dataset {self!s}.\n{exc!s}"
             raise DatasetError(message) from exc
 
     def _release(self) -> None:
@@ -438,7 +438,7 @@ def generate_timestamp() -> str:
 
 class Version(namedtuple("Version", ["load", "save"])):
     """This namedtuple is used to provide load and save versions for versioned
-    data sets. If ``Version.load`` is None, then the latest available version
+    datasets. If ``Version.load`` is None, then the latest available version
     is loaded. If ``Version.save`` is None, then save version is formatted as
     YYYY-MM-DDThh.mm.ss.sssZ of the current timestamp.
     """
@@ -450,7 +450,7 @@ class Version(namedtuple("Version", ["load", "save"])):
     "Save version '{}' did not match load version '{}' for {}. This is strongly "
     "discouraged due to inconsistencies it may cause between 'save' and "
     "'load' operations. Please refrain from setting exact load version for "
-    "intermediate data sets where possible to avoid this warning."
+    "intermediate datasets where possible to avoid this warning."
 )
 
 _DEFAULT_PACKAGES = ["kedro.io.", "kedro_datasets.", ""]
@@ -467,10 +467,10 @@ def parse_dataset_definition(
         config: Data set config dictionary. It *must* contain the `type` key
             with fully qualified class name or the class object.
         load_version: Version string to be used for ``load`` operation if
-            the data set is versioned. Has no effect on the data set
+            the dataset is versioned. Has no effect on the dataset
             if versioning was not enabled.
         save_version: Version string to be used for ``save`` operation if
-            the data set is versioned. Has no effect on the data set
+            the dataset is versioned. Has no effect on the dataset
             if versioning was not enabled.
 
     Raises:
@@ -522,14 +522,14 @@ def parse_dataset_definition(
     if not issubclass(class_obj, AbstractDataset):
         raise DatasetError(
             f"Dataset type '{class_obj.__module__}.{class_obj.__qualname__}' "
-            f"is invalid: all data set types must extend 'AbstractDataset'."
+            f"is invalid: all dataset types must extend 'AbstractDataset'."
         )
 
     if VERSION_KEY in config:
         # remove "version" key so that it's not passed
-        # to the "unversioned" data set constructor
+        # to the "unversioned" dataset constructor
         message = (
-            "'%s' attribute removed from data set configuration since it is a "
+            "'%s' attribute removed from dataset configuration since it is a "
             "reserved word and cannot be directly specified"
         )
         logging.getLogger(__name__).warning(message, VERSION_KEY)
@@ -579,10 +579,10 @@ def _local_exists(local_filepath: str) -> bool:  # SKIP_IF_NO_SPARK
 
 class AbstractVersionedDataset(AbstractDataset[_DI, _DO], abc.ABC):
     """
-    ``AbstractVersionedDataset`` is the base class for all versioned data set
+    ``AbstractVersionedDataset`` is the base class for all versioned dataset
     implementations.
 
-    All data sets that implement versioning should extend this
+    All datasets that implement versioning should extend this
     abstract class and implement the methods marked as abstract.
 
     Example:
@@ -764,7 +764,7 @@ def save(self: Self, data: _DI) -> None:
         return save
 
     def exists(self) -> bool:
-        """Checks whether a data set's output already exists by calling
+        """Checks whether a dataset's output already exists by calling
         the provided _exists() method.
 
         Returns:
@@ -780,7 +780,7 @@ def exists(self) -> bool:
         except VersionNotFoundError:
             return False
         except Exception as exc:  # SKIP_IF_NO_SPARK
-            message = f"Failed during exists check for data set {self!s}.\n{exc!s}"
+            message = f"Failed during exists check for dataset {self!s}.\n{exc!s}"
             raise DatasetError(message) from exc
 
     def _release(self) -> None:
@@ -938,7 +938,7 @@ def add_feed_dict(self, datasets: dict[str, Any], replace: bool = False) -> None
         ...
 
     def exists(self, name: str) -> bool:
-        """Checks whether registered data set exists by calling its `exists()` method."""
+        """Checks whether registered dataset exists by calling its `exists()` method."""
         ...
 
     def release(self, name: str) -> None: