Skip to content

Commit

Permalink
Fix typos across the documentation (#2956)
Browse files Browse the repository at this point in the history
* Fix typos across docs

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Capitalisation stuff

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

---------

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
  • Loading branch information
ankatiyar authored Aug 22, 2023
1 parent 4563a4c commit ce24b3d
Show file tree
Hide file tree
Showing 7 changed files with 11 additions and 11 deletions.
2 changes: 1 addition & 1 deletion docs/source/data/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ The following page offers a range of examples of YAML specification for various
data_catalog_yaml_examples
```

Once you are familiar with the format of `catalog.yml`, you may find your catalog gets repetitive if you need to load multiple datasets with similar configuration. From Kedro 0.18.12 you can use dataset factories to generalise the configuration and reduce the number of similar catalog entries. This works by by matching datasets used in your project’s pipelines to dataset factory patterns and is explained in a new page about Kedro dataset factories:
Once you are familiar with the format of `catalog.yml`, you may find your catalog gets repetitive if you need to load multiple datasets with similar configuration. From Kedro 0.18.12 you can use dataset factories to generalise the configuration and reduce the number of similar catalog entries. This works by matching datasets used in your project’s pipelines to dataset factory patterns and is explained in a new page about Kedro dataset factories:


```{toctree}
Expand Down
4 changes: 2 additions & 2 deletions docs/source/data/partitioned_and_incremental_datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,12 +75,12 @@ my_partitioned_dataset:
Here is an exhaustive list of the arguments supported by `PartitionedDataset`:

| Argument | Required | Supported types | Description |
| ----------------- | ------------------------------ | ------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| ----------------- | ------------------------------ | ------------------------------------------------ |-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `path` | Yes | `str` | Path to the folder containing partitioned data. If path starts with the protocol (e.g., `s3://`) then the corresponding `fsspec` concrete filesystem implementation will be used. If protocol is not specified, local filesystem will be used |
| `dataset` | Yes | `str`, `Type[AbstractDataset]`, `Dict[str, Any]` | Underlying dataset definition, for more details see the section below |
| `credentials` | No | `Dict[str, Any]` | Protocol-specific options that will be passed to `fsspec.filesystemcall`, for more details see the section below |
| `load_args` | No | `Dict[str, Any]` | Keyword arguments to be passed into `find()` method of the corresponding filesystem implementation |
| `filepath_arg` | No | `str` (defaults to `filepath`) | Argument name of the underlying dataset initializer that will contain a path to an individual partition |
| `filepath_arg` | No | `str` (defaults to `filepath`) | Argument name of the underlying dataset initialiser that will contain a path to an individual partition |
| `filename_suffix` | No | `str` (defaults to an empty string) | If specified, partitions that don't end with this string will be ignored |

### Dataset definition
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ For those reasons, the packaging approach is unsuitable for development projects
The sequence of steps described in this section is as follows:

1. [Note your Databricks username and host](#note-your-databricks-username-and-host)
2. [Install Kedro and the databricks CLI in a new virtual environment](#install-kedro-and-the-databricks-cli-in-a-new-virtual-environment)
2. [Install Kedro and the Databricks CLI in a new virtual environment](#install-kedro-and-the-databricks-cli-in-a-new-virtual-environment)
3. [Authenticate the Databricks CLI](#authenticate-the-databricks-cli)
4. [Create a new Kedro project](#create-a-new-kedro-project)
5. [Create an entry point for Databricks](#create-an-entry-point-for-databricks)
Expand All @@ -49,10 +49,10 @@ Find your Databricks username in the top right of the workspace UI and the host
![Find Databricks host and username](../../meta/images/find_databricks_host_and_username.png)

```{note}
Your databricks host must include the protocol (`https://`).
Your Databricks host must include the protocol (`https://`).
```

### Install Kedro and the databricks CLI in a new virtual environment
### Install Kedro and the Databricks CLI in a new virtual environment

The following commands will create a new `conda` environment, activate it, and then install Kedro and the Databricks CLI.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ This guide demonstrates a workflow for developing Kedro projects on Databricks u

This method of developing a Kedro project for use on Databricks is ideal for developers who prefer developing their projects in notebooks rather than an in an IDE. It also avoids the overhead of setting up and syncing a local environment with Databricks. If you want to take advantage of the powerful features of an IDE to develop your project, consider following the [guide for developing a Kedro project for Databricks using your local environment](./databricks_ide_development_workflow.md).

In this guide, you will store your project's code in a repository on [GitHub](https://github.com/). Databricks integrates with many [Git providers](https://docs.databricks.com/repos/index.html#supported-git-providers), including GitLab and Azure Devops. The steps to create a Git repository and sync it with Databricks also generally apply to these Git providers, though the exact details may vary.
In this guide, you will store your project's code in a repository on [GitHub](https://github.com/). Databricks integrates with many [Git providers](https://docs.databricks.com/repos/index.html#supported-git-providers), including GitLab and Azure DevOps. The steps to create a Git repository and sync it with Databricks also generally apply to these Git providers, though the exact details may vary.

## What this page covers

Expand Down Expand Up @@ -263,7 +263,7 @@ Now that your project has run successfully once, you can make changes using the

The `databricks-iris` starter uses a default 80-20 ratio of training data to test data when training the classifier. You will edit this ratio to 70-30 and re-run your project to view the different result.

In the Databricks workspace, click on the `Repos` tab in the side bar and navigate to `<databricks_username>/iris-databricks/conf/base/`. Open the the file `parameters.yml` by double-clicking it. This will take you to a built-in file editor. Edit the line `train_fraction: 0.8` to `train_fraction: 0.7`, your changes will automatically be saved.
In the Databricks workspace, click on the `Repos` tab in the side bar and navigate to `<databricks_username>/iris-databricks/conf/base/`. Open the file `parameters.yml` by double-clicking it. This will take you to a built-in file editor. Edit the line `train_fraction: 0.8` to `train_fraction: 0.7`, your changes will automatically be saved.

![Databricks edit file](../../meta/images/databricks_edit_file.png)

Expand Down
2 changes: 1 addition & 1 deletion docs/source/development/commands_reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -446,7 +446,7 @@ kedro micropkg package <package_module_path>
Further information is available in the [micro-packaging documentation](../nodes_and_pipelines/micro_packaging.md).

##### Pull a micro-package in your project
The following command pulls all the files related to a micro-package, e.g. a modular pipeline, from either [Pypi](https://pypi.org/) or a storage location of a [Python source distribution file](https://packaging.python.org/overview/#python-source-distributions).
The following command pulls all the files related to a micro-package, e.g. a modular pipeline, from either [PyPI](https://pypi.org/) or a storage location of a [Python source distribution file](https://packaging.python.org/overview/#python-source-distributions).

```bash
kedro micropkg pull <package_name> (or path to a sdist file)
Expand Down
2 changes: 1 addition & 1 deletion docs/source/experiment_tracking/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ Kedro-Viz version 6.2 includes support for collaborative experiment tracking usi
The choice of experiment tracking tool depends on your use case and choice of complementary tools, such as MLflow and Neptune:

- **Kedro** - If you need experiment tracking, are looking for improved metrics visualisation and want a lightweight tool to work alongside existing functionality in Kedro. Kedro does not support a model registry.
- **MLflow** - You can combine MLFlow with Kedro by using [`kedro-mlflow`](https://kedro-mlflow.readthedocs.io/en/stable/) if you require experiment tracking, model registry and/or model serving capabilities or have access to Managed MLflow within the Databricks ecosystem.
- **MLflow** - You can combine MLflow with Kedro by using [`kedro-mlflow`](https://kedro-mlflow.readthedocs.io/en/stable/) if you require experiment tracking, model registry and/or model serving capabilities or have access to Managed MLflow within the Databricks ecosystem.
- **Neptune** - If you require experiment tracking and model registry functionality, improved visualisation of metrics and support for collaborative data science, you may consider [`kedro-neptune`](https://docs.neptune.ai/integrations/kedro/) for your workflow.

[We support a growing list of integrations](../extend_kedro/plugins.md).
Expand Down
2 changes: 1 addition & 1 deletion docs/source/extend_kedro/architecture_overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
There are different ways to leverage Kedro in your work, you can:

- Commit to using all of Kedro (framework, project, starters and library); which is preferable to take advantage of the full value proposition of Kedro
- You can leverage parts of Kedro, like the DataCatalog (I/O), ConfigLoader, Pipelines and Runner, by using it as a Python libary; this best supports a workflow where you don't want to adopt the Kedro project template
- You can leverage parts of Kedro, like the DataCatalog (I/O), ConfigLoader, Pipelines and Runner, by using it as a Python library; this best supports a workflow where you don't want to adopt the Kedro project template
- Or, you can develop extensions for Kedro e.g. custom starters, plugins, Hooks and more

At a high level, Kedro consists of five main parts:
Expand Down

0 comments on commit ce24b3d

Please sign in to comment.