Skip to content

Commit

Permalink
feat(datasets): Add format for ImageDataSet save (#77)
Browse files Browse the repository at this point in the history
* Add format for ImageDataSet save

When using the ImageDataSet with S3 storage you need to manually specify
the format in the save_args dict despite the suffix in the file. This is
since the save method on the PIL images are called with an open file.
The PIL documentation states the following about the format argument:

"If a file object was used instead of a filename, this parameter should
always be used."

Signed-off-by: Daniel Falk <daniel.falk.1@fixedit.ai>

* Add unit test + lint

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Update Release notes

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

---------

Signed-off-by: Daniel Falk <daniel.falk.1@fixedit.ai>
Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
Co-authored-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com>
Co-authored-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
  • Loading branch information
3 people authored Jul 14, 2023
1 parent 884c1fe commit ae9b6e5
Show file tree
Hide file tree
Showing 4 changed files with 31 additions and 4 deletions.
4 changes: 4 additions & 0 deletions kedro-datasets/RELEASE.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,14 @@
# Upcoming Release:

## Major features and improvements
* Added automatic inference of file format for `pillow.ImageDataSet` to be passed to `save()`

## Bug fixes and other changes

## Community contributions
Many thanks to the following Kedroids for contributing PRs to this release:

* [Daniel-Falk](https://github.com/daniel-falk)

# Release 1.4.2
## Bug fixes and other changes
Expand Down
15 changes: 12 additions & 3 deletions kedro-datasets/kedro_datasets/pillow/image_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -119,13 +119,22 @@ def _load(self) -> Image.Image:
return Image.open(fs_file).copy()

def _save(self, data: Image.Image) -> None:
save_path = get_filepath_str(self._get_save_path(), self._protocol)
save_path = self._get_save_path()

with self._fs.open(save_path, **self._fs_open_args_save) as fs_file:
data.save(fs_file, **self._save_args)
with self._fs.open(
get_filepath_str(save_path, self._protocol), **self._fs_open_args_save
) as fs_file:
data.save(fs_file, format=self._get_format(save_path), **self._save_args)

self._invalidate_cache()

@staticmethod
def _get_format(file_path: PurePosixPath):
ext = file_path.suffix.lower()
if ext not in Image.EXTENSION:
Image.init()
return Image.EXTENSION.get(ext)

def _exists(self) -> bool:
try:
load_path = get_filepath_str(self._get_load_path(), self._protocol)
Expand Down
1 change: 0 additions & 1 deletion kedro-datasets/kedro_datasets/video/video_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,6 @@ def __getitem__(self, index: Union[int, slice]):
class GeneratorVideo(AbstractVideo):
"""A video object with frames yielded by a generator"""

# pylint: disable=too-many-arguments
def __init__(
self,
frames: Generator[PIL.Image.Image, None, None],
Expand Down
15 changes: 15 additions & 0 deletions kedro-datasets/tests/pillow/test_image_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,21 @@ def test_catalog_release(self, mocker):
data_set.release()
fs_mock.invalidate_cache.assert_called_once_with(filepath)

@pytest.mark.parametrize(
"image_filepath , expected_extension",
[
("s3://bucket/file.png", "PNG"),
("file:///tmp/test.jpg", "JPEG"),
("/tmp/test.pdf", "PDF"),
("https://example.com/file.whatever", None),
],
)
def test_get_format(self, image_filepath, expected_extension):
"""Unit test for pillow.ImageDataSet._get_format() fn"""
data_set = ImageDataSet(image_filepath)
ext = data_set._get_format(Path(image_filepath))
assert expected_extension == ext


class TestImageDataSetVersioned:
def test_version_str_repr(self, load_version, save_version):
Expand Down

0 comments on commit ae9b6e5

Please sign in to comment.