Skip to content

Commit

Permalink
Minor changes to create a PR and test Vale styles
Browse files Browse the repository at this point in the history
Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>
  • Loading branch information
stichbury committed Aug 24, 2023
1 parent 15bba0e commit 81a69a2
Show file tree
Hide file tree
Showing 7 changed files with 7 additions and 7 deletions.
2 changes: 1 addition & 1 deletion docs/source/data/advanced_data_catalog_usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ You can define a Data Catalog in two ways. Most use cases can be through a YAML

To use the `DataCatalog` API, construct a `DataCatalog` object programmatically in a file like `catalog.py`.

In the following, we are using several pre-built data loaders documented in the [API reference documentation](/kedro_datasets).
In the following code, we use several pre-built data loaders documented in the [API reference documentation](/kedro_datasets).

```python
from kedro.io import DataCatalog
Expand Down
2 changes: 1 addition & 1 deletion docs/source/data/data_catalog.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

In a Kedro project, the Data Catalog is a registry of all data sources available for use by the project. It is specified with a YAML catalog file that maps the names of node inputs and outputs as keys in the `DataCatalog` class.

This page introduces the basic sections of `catalog.yml`, which is the file used to register data sources for a Kedro project.
This page introduces the basic sections of `catalog.yml`, which is the file Kedro uses to register data sources for a project.

## The basics of `catalog.yml`
A separate page of [Data Catalog YAML examples](./data_catalog_yaml_examples.md) gives further examples of how to work with `catalog.yml`, but here we revisit the [basic `catalog.yml` introduced by the spaceflights tutorial](../tutorial/set_up_data.md).
Expand Down
2 changes: 1 addition & 1 deletion docs/source/data/data_catalog_yaml_examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ This page contains a set of examples to help you structure your YAML configurati

## Load data from a local binary file using `utf-8` encoding

The `open_args_load` and `open_args_save` parameters are passed to the filesystem's `open` method to configure how a dataset file (on a specific filesystem) is opened during a load or save operation, respectively.
The `open_args_load` and `open_args_save` parameters are passed to the filesystem's `open` method to configure how a dataset file (on a specific filesystem) is opened during a load or save operation respectively.

```yaml
test_dataset:
Expand Down
2 changes: 1 addition & 1 deletion docs/source/data/how_to_create_a_custom_dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

## AbstractDataset

For contributors, if you would like to submit a new dataset, you must extend the [`AbstractDataset` interface](/kedro.io.AbstractDataset) or [`AbstractVersionedDataset` interface](/kedro.io.AbstractVersionedDataset) if you plan to support versioning. It requires subclasses to override the `_load` and `_save` and provides `load` and `save` methods that enrich the corresponding private methods with uniform error handling. It also requires subclasses to override `_describe`, which is used in logging the internal information about the instances of your custom `AbstractDataset` implementation.
If you are a contributor and would like to submit a new dataset, you must extend the [`AbstractDataset` interface](/kedro.io.AbstractDataset) or [`AbstractVersionedDataset` interface](/kedro.io.AbstractVersionedDataset) if you plan to support versioning. It requires subclasses to override the `_load` and `_save` and provides `load` and `save` methods that enrich the corresponding private methods with uniform error handling. It also requires subclasses to override `_describe`, which is used in logging the internal information about the instances of your custom `AbstractDataset` implementation.


## Scenario
Expand Down
2 changes: 1 addition & 1 deletion docs/source/data/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

In a Kedro project, the Data Catalog is a registry of all data sources available for use by the project. The catalog is stored in a YAML file (`catalog.yml`) that maps the names of node inputs and outputs as keys in the `DataCatalog` class.

[Kedro provides different built-in datasets in the `kedro-datasets` package](/kedro_datasets) for numerous file types and file systems, so you don’t have to write any of the logic for reading/writing data.
[Kedro provides different built-in datasets in the `kedro-datasets` package](/kedro_datasets) for numerous file types and file systems so you don’t have to write any of the logic for reading/writing data.


We first introduce the basic sections of `catalog.yml`, which is the file used to register data sources for a Kedro project.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/data/kedro_dataset_factories.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Kedro dataset factories
You can load multiple datasets with similar configuration using dataset factories, introduced in Kedro 0.18.12.

The syntax allows you to generalise the configuration and reduce the number of similar catalog entries by matching datasets used in your project's pipelines to dataset factory patterns.
The syntax allows you to generalise your configuration and reduce the number of similar catalog entries by matching datasets used in your project's pipelines to dataset factory patterns.

## How to generalise datasets with similar names and types

Expand Down
2 changes: 1 addition & 1 deletion docs/source/data/partitioned_and_incremental_datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Partitioned datasets

Distributed systems play an increasingly important role in ETL data pipelines. They significantly increase the processing throughput, enabling us to work with much larger volumes of input data. However, these benefits sometimes come at a cost. When dealing with the input data generated by such distributed systems, you might encounter a situation where your Kedro node needs to read the data from a directory full of uniform files of the same type (e.g. JSON, CSV, Parquet, etc.) rather than from a single file. Tools like `PySpark` and the corresponding [SparkDataSet](/kedro_datasets.spark.SparkDataSet) cater for such use cases, but the use of Spark is not always feasible.
Distributed systems play an increasingly important role in ETL data pipelines. They significantly increase the processing throughput, enabling us to work with much larger volumes of input data. However, these benefits sometimes come at a cost. When dealing with the input data generated by such distributed systems, you might encounter a situation where your Kedro node needs to read the data from a directory full of uniform files of the same type (e.g. JSON, CSV, Parquet, etc.) rather than from a single file. Tools like `PySpark` and the corresponding [SparkDataSet](/kedro_datasets.spark.SparkDataSet) cater for such use cases, but using Spark is not always feasible.

This is why Kedro provides a built-in [PartitionedDataset](/kedro.io.PartitionedDataset), with the following features:

Expand Down

0 comments on commit 81a69a2

Please sign in to comment.