Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests on more real datasets for some datasets #707

Open
wants to merge 10 commits into
base: develop
Choose a base branch
from

Conversation

yasakova-anastasia
Copy link

@yasakova-anastasia yasakova-anastasia commented Apr 1, 2022

Summary

  • Added dataset mangling function (changing id, image, labels, bounding boxes, masks)
  • Added more real datasets for Cityscapes, MNIST and WiderFace (there are 20 items in each dataset)
  • Added tests for importing a dataset in a specific format, exporting a dataset in a specific format, exporting a dataset in a common format (COCO, ImageNet, VOC) and filtering
  • Extended compare_datasets function:
    • the ability to not compare id and group of annotations
    • the ability to not compare categories completely

Questions:

  • Does the repository need a function to change the dataset?

How to test

Checklist

License

  • I submit my code changes under the same MIT License that covers the project.
    Feel free to contact the maintainers if that's a concern.
  • I have updated the license header for each file (see an example below)
# Copyright (C) 2021 Intel Corporation
#
# SPDX-License-Identifier: MIT

@yasakova-anastasia yasakova-anastasia marked this pull request as draft April 1, 2022 07:48
@yasakova-anastasia yasakova-anastasia marked this pull request as ready for review April 1, 2022 09:17
Comment on lines +18 to +20
MANGLING_DATASET_DIR = osp.join(
osp.dirname(__file__), "assets", "widerface_dataset", "mangling_dataset"
)
Copy link
Contributor

@zhiltsov-max zhiltsov-max Apr 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • What is the difference between mangled datasets and example datasets? Do we need them all?
  • I see that annotations in the mangled datasets are incorrect now, because the images are resized. It doesn't seem right. From the disk size view, single-color big images should take the same amount of memory as the small ones, so it shouldn't be a problem.
  • Please rename the directory from mangling_... to mangled_...

@@ -398,8 +403,50 @@ def test_inplace_save_writes_only_updated_data(self):
ignored_attrs=IGNORE_ALL,
)

@mark_requirement(Requirements.DATUM_GENERAL_REQ)
Copy link
Contributor

@zhiltsov-max zhiltsov-max Apr 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an extra category for these tests. Maybe, they should even be extracted into a separate directory.

Comment on lines +435 to +446
def test_can_filter_by_subsets(self):
source_dataset = Dataset.import_from(MANGLING_DATASET_DIR, "wider_face")

first_dataset = source_dataset.filter("/item/annotation[label='face']")
second_dataset = source_dataset.filter("/item/annotation[label!='face']")

merger = IntersectMerge()
merged_dataset = merger([first_dataset, second_dataset])

compare_datasets(
self, source_dataset, merged_dataset, require_media=True, ignored_attrs={"score"}
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't seem to be aligned with its name. Why the merging is being done? Why the subsets are obtained by label filtering?

@@ -398,8 +403,50 @@ def test_inplace_save_writes_only_updated_data(self):
ignored_attrs=IGNORE_ALL,
)

@mark_requirement(Requirements.DATUM_GENERAL_REQ)
def test_can_convert_to_widerface(self):
Copy link
Contributor

@zhiltsov-max zhiltsov-max Apr 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really a "conversion", so maybe it should be renamed. Also, it's not really clear why this test is needed, maybe some comments or the test category description should be added.

parsed_dataset,
require_media=True,
ignored_attrs={"difficult", "truncated", "occluded"},
externally_comparison=True,
Copy link
Contributor

@zhiltsov-max zhiltsov-max Apr 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
externally_comparison=True,
external_comparison=True,

Can we remove this parameter completely?

@@ -137,15 +140,49 @@ def _compare_annotations(expected, actual, ignored_attrs=None):
return r


def _compare_annotations_externally(expected, actual, ignored_attrs=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you reuse the regular comparison function?

def compare_datasets(
test,
expected: IDataset,
actual: IDataset,
ignored_attrs: Union[None, Literal["*"], Collection[str]] = None,
require_media: bool = False,
require_images: bool = False,
externally_comparison=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear what external means here.

Comment on lines +227 to +238
if externally_comparison:
ann_b = find(
ann_b_matches,
lambda x: _compare_annotations_externally(
x, ann_a, ignored_attrs=ignored_attrs
),
)
else:
ann_b = find(
ann_b_matches,
lambda x: _compare_annotations(x, ann_a, ignored_attrs=ignored_attrs),
)
Copy link
Contributor

@zhiltsov-max zhiltsov-max Apr 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmp_anns = ...
...
cmp_anns(a, b, ...)

I suggest to use the strategy pattern here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants