Skip to content

Commit

Permalink
Replace any remaining instances of DataSet in docs
Browse files Browse the repository at this point in the history
  • Loading branch information
deepyaman committed Aug 18, 2023
1 parent dba7503 commit 76732ea
Show file tree
Hide file tree
Showing 7 changed files with 16 additions and 16 deletions.
10 changes: 5 additions & 5 deletions docs/source/data/advanced_data_catalog_usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ gear = cars["gear"].values
The following steps happened behind the scenes when `load` was called:

- The value `cars` was located in the Data Catalog
- The corresponding `AbstractDataSet` object was retrieved
- The corresponding `AbstractDataset` object was retrieved
- The `load` method of this dataset was called
- This `load` method delegated the loading to the underlying pandas `read_csv` function

Expand All @@ -70,9 +70,9 @@ This pattern is not recommended unless you are using platform notebook environme
To save data using an API similar to that used to load data:

```python
from kedro.io import MemoryDataSet
from kedro.io import MemoryDataset

memory = MemoryDataSet(data=None)
memory = MemoryDataset(data=None)
io.add("cars_cache", memory)
io.save("cars_cache", "Memory can store anything.")
io.load("cars_cache")
Expand Down Expand Up @@ -190,7 +190,7 @@ io.save("test_data_set", data1)
reloaded = io.load("test_data_set")
assert data1.equals(reloaded)

# raises DataSetError since the path
# raises DatasetError since the path
# data/01_raw/test.csv/my_exact_version/test.csv already exists
io.save("test_data_set", data2)
```
Expand Down Expand Up @@ -219,7 +219,7 @@ io = DataCatalog({"test_data_set": test_data_set})

io.save("test_data_set", data1) # emits a UserWarning due to version inconsistency

# raises DataSetError since the data/01_raw/test.csv/exact_load_version/test.csv
# raises DatasetError since the data/01_raw/test.csv/exact_load_version/test.csv
# file does not exist
reloaded = io.load("test_data_set")
```
4 changes: 2 additions & 2 deletions docs/source/data/data_catalog.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,9 +145,9 @@ kedro run --load-version=cars:YYYY-MM-DDThh.mm.ss.sssZ
```
where `--load-version` is dataset name and version timestamp separated by `:`.

A dataset offers versioning support if it extends the [`AbstractVersionedDataSet`](/kedro.io.AbstractVersionedDataset) class to accept a version keyword argument as part of the constructor and adapt the `_save` and `_load` method to use the versioned data path obtained from `_get_save_path` and `_get_load_path` respectively.
A dataset offers versioning support if it extends the [`AbstractVersionedDataset`](/kedro.io.AbstractVersionedDataset) class to accept a version keyword argument as part of the constructor and adapt the `_save` and `_load` method to use the versioned data path obtained from `_get_save_path` and `_get_load_path` respectively.

To verify whether a dataset can undergo versioning, you should examine the dataset class code to inspect its inheritance [(you can find contributed datasets within the `kedro-datasets` repository)](https://github.com/kedro-org/kedro-plugins/tree/main/kedro-datasets/kedro_datasets). Check if the dataset class inherits from the `AbstractVersionedDataSet`. For instance, if you encounter a class like `CSVDataSet(AbstractVersionedDataSet[pd.DataFrame, pd.DataFrame])`, this indicates that the dataset is set up to support versioning.
To verify whether a dataset can undergo versioning, you should examine the dataset class code to inspect its inheritance [(you can find contributed datasets within the `kedro-datasets` repository)](https://github.com/kedro-org/kedro-plugins/tree/main/kedro-datasets/kedro_datasets). Check if the dataset class inherits from the `AbstractVersionedDataset`. For instance, if you encounter a class like `CSVDataSet(AbstractVersionedDataset[pd.DataFrame, pd.DataFrame])`, this indicates that the dataset is set up to support versioning.

```{note}
Note that HTTP(S) is a supported file system in the dataset implementations, but if you it, you can't also use versioning.
Expand Down
6 changes: 3 additions & 3 deletions docs/source/data/data_catalog_yaml_examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -397,12 +397,12 @@ for loading, so the first node outputs a `pyspark.sql.DataFrame`, while the seco

You can use the [`kedro catalog create` command to create a Data Catalog YAML configuration](../development/commands_reference.md#create-a-data-catalog-yaml-configuration-file).

This creates a `<conf_root>/<env>/catalog/<pipeline_name>.yml` configuration file with `MemoryDataSet` datasets for each dataset in a registered pipeline if it is missing from the `DataCatalog`.
This creates a `<conf_root>/<env>/catalog/<pipeline_name>.yml` configuration file with `MemoryDataset` datasets for each dataset in a registered pipeline if it is missing from the `DataCatalog`.

```yaml
# <conf_root>/<env>/catalog/<pipeline_name>.yml
rockets:
type: MemoryDataSet
type: MemoryDataset
scooters:
type: MemoryDataSet
type: MemoryDataset
```
6 changes: 3 additions & 3 deletions docs/source/data/how_to_create_a_custom_dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@

[Kedro supports many datasets](/kedro_datasets) out of the box, but you may find that you need to create a custom dataset. For example, you may need to handle a proprietary data format or filesystem in your pipeline, or perhaps you have found a particular use case for a dataset that Kedro does not support. This tutorial explains how to create a custom dataset to read and save image data.

## AbstractDataSet
## AbstractDataset

For contributors, if you would like to submit a new dataset, you must extend the [`AbstractDataSet` interface](/kedro.io.AbstractDataset) or [`AbstractVersionedDataSet` interface](/kedro.io.AbstractVersionedDataset) if you plan to support versioning. It requires subclasses to override the `_load` and `_save` and provides `load` and `save` methods that enrich the corresponding private methods with uniform error handling. It also requires subclasses to override `_describe`, which is used in logging the internal information about the instances of your custom `AbstractDataSet` implementation.
For contributors, if you would like to submit a new dataset, you must extend the [`AbstractDataset` interface](/kedro.io.AbstractDataset) or [`AbstractVersionedDataset` interface](/kedro.io.AbstractVersionedDataset) if you plan to support versioning. It requires subclasses to override the `_load` and `_save` and provides `load` and `save` methods that enrich the corresponding private methods with uniform error handling. It also requires subclasses to override `_describe`, which is used in logging the internal information about the instances of your custom `AbstractDataset` implementation.


## Scenario
Expand Down Expand Up @@ -309,7 +309,7 @@ Versioning doesn't work with `PartitionedDataset`. You can't use both of them at
```

To add versioning support to the new dataset we need to extend the
[AbstractVersionedDataSet](/kedro.io.AbstractVersionedDataset) to:
[AbstractVersionedDataset](/kedro.io.AbstractVersionedDataset) to:

* Accept a `version` keyword argument as part of the constructor
* Adapt the `_save` and `_load` method to use the versioned data path obtained from `_get_save_path` and `_get_load_path` respectively
Expand Down
2 changes: 1 addition & 1 deletion docs/source/data/kedro_dataset_factories.md
Original file line number Diff line number Diff line change
Expand Up @@ -215,7 +215,7 @@ The matches are ranked according to the following criteria:

## How to override the default dataset creation with dataset factories

You can use dataset factories to define a catch-all pattern which will overwrite the default [`MemoryDataSet`](/kedro.io.MemoryDataset) creation.
You can use dataset factories to define a catch-all pattern which will overwrite the default [`MemoryDataset`](/kedro.io.MemoryDataset) creation.

```yaml
"{default_dataset}":
Expand Down
2 changes: 1 addition & 1 deletion docs/source/data/partitioned_and_incremental_datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ This is why Kedro provides a built-in [PartitionedDataset](/kedro.io.Partitioned
In this section, each individual file inside a given location is called a partition.
```

### How to use `PartitionedDataSet`
### How to use `PartitionedDataset`

You can use a `PartitionedDataset` in `catalog.yml` file like any other regular dataset definition:

Expand Down
2 changes: 1 addition & 1 deletion docs/source/deployment/argo.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ To use Argo Workflows, ensure you have the following prerequisites in place:
- [Argo Workflows is installed](https://github.com/argoproj/argo/blob/master/README.md#quickstart) on your Kubernetes cluster
- [Argo CLI is installed](https://github.com/argoproj/argo/releases) on your machine
- A `name` attribute is set for each [Kedro node](/kedro.pipeline.node) since it is used to build a DAG
- [All node input/output DataSets must be configured in `catalog.yml`](../data/data_catalog_yaml_examples.md) and refer to an external location (e.g. AWS S3); you cannot use the `MemoryDataset` in your workflow
- [All node input/output datasets must be configured in `catalog.yml`](../data/data_catalog_yaml_examples.md) and refer to an external location (e.g. AWS S3); you cannot use the `MemoryDataset` in your workflow

```{note}
Each node will run in its own container.
Expand Down

0 comments on commit 76732ea

Please sign in to comment.