Skip to content

Commit

Permalink
Merge branch 'main' into feature/external_table_dataset
Browse files Browse the repository at this point in the history
  • Loading branch information
MinuraPunchihewa authored Oct 18, 2024
2 parents 1577f9a + ab64c20 commit ca63d4c
Show file tree
Hide file tree
Showing 14 changed files with 123 additions and 27 deletions.
1 change: 1 addition & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,4 @@
- [ ] Updated the documentation to reflect the code changes
- [ ] Added a description of this change in the relevant `RELEASE.md` file
- [ ] Added tests to cover my changes
- [ ] Received approvals from at least half of the TSC (required for adding a new, non-experimental dataset)
36 changes: 36 additions & 0 deletions .github/reviewers.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
{
"teams": {
"everyone": {
"description": "A team that contains all TSC members.",
"users": [
"ankatiyar",
"astrojuanlu",
"datajoely",
"deepyaman",
"DimedS",
"Galileo-Galilei",
"Huongg",
"idanov",
"jitu5",
"lrcouto",
"marrrcin",
"merelcht",
"noklam",
"rashidakanchwala",
"ravi-kumar-pilla",
"SajidAlamQB",
"sbrugman",
"stephkaiser",
"tynandebold",
"yetudada"
]
}
},
"reviewers": {
"kedro-datasets/kedro_datasets/": {
"description": "Require at least 1/2 TSC approval on new core dataset contributions.",
"teams": ["everyone"],
"requiredApproverCount": 10
}
}
}
14 changes: 14 additions & 0 deletions .github/workflows/check-tsc-vote.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
name: Required Reviews
on:
pull_request:
paths:
- 'kedro-datasets/kedro_datasets/*'
jobs:
required-reviews:
name: Required Reviews
runs-on: ubuntu-latest
steps:
- name: required-reviewers
uses: theoremlp/required-reviews@v2
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
1 change: 1 addition & 0 deletions .github/workflows/merge-gatekeeper.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,4 @@ jobs:
token: ${{ secrets.GITHUB_TOKEN }}
timeout: 2400
interval: 30
ignored: 'Required Reviews'
2 changes: 1 addition & 1 deletion kedro-datasets/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ def count_truthy(elements: List[Any]) -> int:
return sum(1 for elem in elements if elem)
```

> *Note:* We only accept contributions under the [Apache 2.0](https://opensource.org/licenses/Apache-2.0) license, and you should have permission to share the submitted code.
> *Note:* We only accept contributions under the [Apache 2.0](https://github.com/kedro-org/kedro-plugins/blob/main/LICENSE.md) license, and you should have permission to share the submitted code.
### Branching conventions

Expand Down
2 changes: 1 addition & 1 deletion kedro-datasets/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

<!-- Note that the contents of this file are also used in the documentation, see docs/source/index.md -->

[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/kedro-org/kedro-plugins/blob/main/LICENSE.md)
[![Python Version](https://img.shields.io/badge/python-3.10%20%7C%203.11%20%7C%203.12-blue.svg)](https://pypi.org/project/kedro-datasets/)
[![PyPI Version](https://badge.fury.io/py/kedro-datasets.svg)](https://pypi.org/project/kedro-datasets/)
[![Code Style: Black](https://img.shields.io/badge/code%20style-black-black.svg)](https://github.com/ambv/black)
Expand Down
12 changes: 9 additions & 3 deletions kedro-datasets/RELEASE.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Upcoming Release

# Release 5.1.0

## Major features and improvements

- Added the following new **experimental** datasets:
Expand All @@ -10,9 +12,13 @@

- Added the following new core datasets:

| Type | Description | Location |
| ------------------- | ------------------------------------------------------------- | --------------------- |
| `ibis.TableDataset` | A dataset for loading and saving files using Ibis's backends. | `kedro_datasets.ibis` |
| Type | Description | Location |
| ------------------ | ------------------------------------------------------------- | --------------------- |
| `ibis.FileDataset` | A dataset for loading and saving files using Ibis's backends. | `kedro_datasets.ibis` |

## Bug fixes and other changes

- Changed Ibis datasets to connect to an in-memory DuckDB database if connection configuration is not provided.

## Community contributions

Expand Down
2 changes: 1 addition & 1 deletion kedro-datasets/kedro_datasets/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
"""``kedro_datasets`` is where you can find all of Kedro's data connectors."""

__all__ = ["KedroDeprecationWarning"]
__version__ = "5.0.0"
__version__ = "5.1.0"

import sys
import warnings
Expand Down
14 changes: 8 additions & 6 deletions kedro-datasets/kedro_datasets/ibis/file_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,10 @@ class FileDataset(AbstractVersionedDataset[ir.Table, ir.Table]):
"""

DEFAULT_CONNECTION_CONFIG: ClassVar[dict[str, Any]] = {
"backend": "duckdb",
"database": ":memory:",
}
DEFAULT_LOAD_ARGS: ClassVar[dict[str, Any]] = {}
DEFAULT_SAVE_ARGS: ClassVar[dict[str, Any]] = {}

Expand Down Expand Up @@ -107,6 +111,7 @@ def __init__( # noqa: PLR0913
Defaults to writing execution results to a Parquet file.
table_name: The name to use for the created table (on load).
connection: Configuration for connecting to an Ibis backend.
If not provided, connect to DuckDB in in-memory mode.
load_args: Additional arguments passed to the Ibis backend's
`read_{file_format}` method.
save_args: Additional arguments passed to the Ibis backend's
Expand All @@ -120,7 +125,7 @@ def __init__( # noqa: PLR0913
"""
self._file_format = file_format
self._table_name = table_name
self._connection_config = connection
self._connection_config = connection or self.DEFAULT_CONNECTION_CONFIG
self.metadata = metadata

super().__init__(
Expand Down Expand Up @@ -156,8 +161,7 @@ def hashable(value):
import ibis

config = deepcopy(self._connection_config)
backend_attr = config.pop("backend") if config else None
backend = getattr(ibis, backend_attr)
backend = getattr(ibis, config.pop("backend"))
cls._connections[key] = backend.connect(**config)

return cls._connections[key]
Expand All @@ -178,9 +182,7 @@ def _describe(self) -> dict[str, Any]:
"filepath": self._filepath,
"file_format": self._file_format,
"table_name": self._table_name,
"backend": self._connection_config.get("backend")
if self._connection_config
else None,
"backend": self._connection_config["backend"],
"load_args": self._load_args,
"save_args": self._save_args,
"version": self._version,
Expand Down
14 changes: 8 additions & 6 deletions kedro-datasets/kedro_datasets/ibis/table_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,10 @@ class TableDataset(AbstractDataset[ir.Table, ir.Table]):
"""

DEFAULT_CONNECTION_CONFIG: ClassVar[dict[str, Any]] = {
"backend": "duckdb",
"database": ":memory:",
}
DEFAULT_LOAD_ARGS: ClassVar[dict[str, Any]] = {}
DEFAULT_SAVE_ARGS: ClassVar[dict[str, Any]] = {
"materialized": "view",
Expand Down Expand Up @@ -99,6 +103,7 @@ def __init__( # noqa: PLR0913
Args:
table_name: The name of the table or view to read or create.
connection: Configuration for connecting to an Ibis backend.
If not provided, connect to DuckDB in in-memory mode.
load_args: Additional arguments passed to the Ibis backend's
`read_{file_format}` method.
save_args: Additional arguments passed to the Ibis backend's
Expand Down Expand Up @@ -126,7 +131,7 @@ def __init__( # noqa: PLR0913
self._filepath = filepath
self._file_format = file_format
self._table_name = table_name
self._connection_config = connection
self._connection_config = connection or self.DEFAULT_CONNECTION_CONFIG
self.metadata = metadata

# Set load and save arguments, overwriting defaults if provided.
Expand Down Expand Up @@ -158,8 +163,7 @@ def hashable(value):
import ibis

config = deepcopy(self._connection_config)
backend_attr = config.pop("backend") if config else None
backend = getattr(ibis, backend_attr)
backend = getattr(ibis, config.pop("backend"))
cls._connections[key] = backend.connect(**config)

return cls._connections[key]
Expand All @@ -186,9 +190,7 @@ def _describe(self) -> dict[str, Any]:
"filepath": self._filepath,
"file_format": self._file_format,
"table_name": self._table_name,
"backend": self._connection_config.get("backend")
if self._connection_config
else None,
"backend": self._connection_config["backend"],
"load_args": self._load_args,
"save_args": self._save_args,
"materialized": self._materialized,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@ def save(self, data: DataFrame) -> None:
(
output_constructor.option("checkpointLocation", checkpoint)
.option("path", save_path)
.outputMode(output_mode)
.outputMode(str(output_mode))
.options(**self._save_args or {})
.start()
)
Expand Down
23 changes: 20 additions & 3 deletions kedro-datasets/tests/ibis/test_file_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@

from kedro_datasets.ibis import FileDataset

_SENTINEL = object()


@pytest.fixture
def filepath_csv(tmp_path):
Expand All @@ -21,9 +23,13 @@ def database(tmp_path):
return (tmp_path / "file.db").as_posix()


@pytest.fixture(params=[None])
@pytest.fixture(params=[_SENTINEL])
def connection_config(request, database):
return request.param or {"backend": "duckdb", "database": database}
return (
{"backend": "duckdb", "database": database}
if request.param is _SENTINEL # `None` is a valid value to test
else request.param
)


@pytest.fixture
Expand Down Expand Up @@ -103,12 +109,23 @@ def test_save_extra_params(
("query", (("driver", "ODBC Driver 17 for SQL Server"),)),
),
),
# https://github.com/kedro-org/kedro-plugins/pull/893#discussion_r1804632435
(
None,
(
("backend", "duckdb"),
("database", ":memory:"),
),
),
],
indirect=["connection_config"],
)
def test_connection_config(self, mocker, file_dataset, connection_config, key):
"""Test hashing of more complicated connection configuration."""
mocker.patch(f"ibis.{connection_config['backend']}")
backend = (
connection_config["backend"] if connection_config is not None else "duckdb"
)
mocker.patch(f"ibis.{backend}")
file_dataset.load()
assert key in file_dataset._connections

Expand Down
23 changes: 20 additions & 3 deletions kedro-datasets/tests/ibis/test_table_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@

from kedro_datasets.ibis import TableDataset

_SENTINEL = object()


@pytest.fixture(scope="session")
def filepath_csv(tmp_path_factory):
Expand All @@ -19,9 +21,13 @@ def database(tmp_path):
return (tmp_path / "file.db").as_posix()


@pytest.fixture(params=[None])
@pytest.fixture(params=[_SENTINEL])
def connection_config(request, database):
return request.param or {"backend": "duckdb", "database": database}
return (
{"backend": "duckdb", "database": database}
if request.param is _SENTINEL # `None` is a valid value to test
else request.param
)


@pytest.fixture
Expand Down Expand Up @@ -122,11 +128,22 @@ def test_save_no_table_name(self, table_dataset_from_csv, dummy_table):
("query", (("driver", "ODBC Driver 17 for SQL Server"),)),
),
),
# https://github.com/kedro-org/kedro-plugins/pull/893#discussion_r1804632435
(
None,
(
("backend", "duckdb"),
("database", ":memory:"),
),
),
],
indirect=["connection_config"],
)
def test_connection_config(self, mocker, table_dataset, connection_config, key):
"""Test hashing of more complicated connection configuration."""
mocker.patch(f"ibis.{connection_config['backend']}")
backend = (
connection_config["backend"] if connection_config is not None else "duckdb"
)
mocker.patch(f"ibis.{backend}")
table_dataset.load()
assert key in table_dataset._connections
4 changes: 2 additions & 2 deletions kedro-docker/features/docker.feature
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ Feature: Docker commands in new projects

Scenario: Execute docker run target without building image
When I execute the kedro command "docker run"
Then I should get a successful exit code
Then I should get an error exit code
And Standard output should contain a message including "Error: Unable to find image `project-dummy` locally."

Scenario: Execute docker dive target
Expand All @@ -118,5 +118,5 @@ Feature: Docker commands in new projects

Scenario: Execute docker dive without building image
When I execute the kedro command "docker dive"
Then I should get a successful exit code
Then I should get an error exit code
And Standard output should contain a message including "Error: Unable to find image `project-dummy` locally."

0 comments on commit ca63d4c

Please sign in to comment.