Tests on more real datasets for some datasets #707

yasakova-anastasia · 2022-04-01T07:31:39Z

Summary

Added dataset mangling function (changing id, image, labels, bounding boxes, masks)
Added more real datasets for Cityscapes, MNIST and WiderFace (there are 20 items in each dataset)
Added tests for importing a dataset in a specific format, exporting a dataset in a specific format, exporting a dataset in a common format (COCO, ImageNet, VOC) and filtering
Extended compare_datasets function:
- the ability to not compare id and group of annotations
- the ability to not compare categories completely

Questions:

Does the repository need a function to change the dataset?

How to test

Checklist

I submit my changes into the develop branch
I have added description of my changes into CHANGELOG
I have updated the documentation accordingly
I have added tests to cover my changes
I have linked related issues

License

I submit my code changes under the same MIT License that covers the project.
Feel free to contact the maintainers if that's a concern.
I have updated the license header for each file (see an example below)

# Copyright (C) 2021 Intel Corporation
#
# SPDX-License-Identifier: MIT

zhiltsov-max · 2022-04-04T10:52:52Z

tests/test_widerface_format.py

+MANGLING_DATASET_DIR = osp.join(
+    osp.dirname(__file__), "assets", "widerface_dataset", "mangling_dataset"
+)


What is the difference between mangled datasets and example datasets? Do we need them all?

I see that annotations in the mangled datasets are incorrect now, because the images are resized. It doesn't seem right. From the disk size view, single-color big images should take the same amount of memory as the small ones, so it shouldn't be a problem.

Please rename the directory from mangling_... to mangled_...

zhiltsov-max · 2022-04-04T10:56:42Z

tests/test_widerface_format.py

@@ -398,8 +403,50 @@ def test_inplace_save_writes_only_updated_data(self):
                ignored_attrs=IGNORE_ALL,
            )

+    @mark_requirement(Requirements.DATUM_GENERAL_REQ)


Please add an extra category for these tests. Maybe, they should even be extracted into a separate directory.

zhiltsov-max · 2022-04-04T11:01:37Z

tests/test_widerface_format.py

+    def test_can_filter_by_subsets(self):
+        source_dataset = Dataset.import_from(MANGLING_DATASET_DIR, "wider_face")
+
+        first_dataset = source_dataset.filter("/item/annotation[label='face']")
+        second_dataset = source_dataset.filter("/item/annotation[label!='face']")
+
+        merger = IntersectMerge()
+        merged_dataset = merger([first_dataset, second_dataset])
+
+        compare_datasets(
+            self, source_dataset, merged_dataset, require_media=True, ignored_attrs={"score"}
+        )


This test doesn't seem to be aligned with its name. Why the merging is being done? Why the subsets are obtained by label filtering?

zhiltsov-max · 2022-04-04T11:05:14Z

tests/test_widerface_format.py

@@ -398,8 +403,50 @@ def test_inplace_save_writes_only_updated_data(self):
                ignored_attrs=IGNORE_ALL,
            )

+    @mark_requirement(Requirements.DATUM_GENERAL_REQ)
+    def test_can_convert_to_widerface(self):


It's not really a "conversion", so maybe it should be renamed. Also, it's not really clear why this test is needed, maybe some comments or the test category description should be added.

zhiltsov-max · 2022-04-04T11:06:10Z

tests/test_widerface_format.py

+                parsed_dataset,
+                require_media=True,
+                ignored_attrs={"difficult", "truncated", "occluded"},
+                externally_comparison=True,


Suggested change

externally_comparison=True,

external_comparison=True,

Can we remove this parameter completely?

zhiltsov-max · 2022-04-04T11:17:20Z

datumaro/util/test_utils.py

@@ -137,15 +140,49 @@ def _compare_annotations(expected, actual, ignored_attrs=None):
    return r


+def _compare_annotations_externally(expected, actual, ignored_attrs=None):


Can you reuse the regular comparison function?

zhiltsov-max · 2022-04-04T11:19:56Z

datumaro/util/test_utils.py

 def compare_datasets(
    test,
    expected: IDataset,
    actual: IDataset,
    ignored_attrs: Union[None, Literal["*"], Collection[str]] = None,
    require_media: bool = False,
    require_images: bool = False,
+    externally_comparison=False,


It's not clear what external means here.

zhiltsov-max · 2022-04-04T11:21:03Z

datumaro/util/test_utils.py

+            if externally_comparison:
+                ann_b = find(
+                    ann_b_matches,
+                    lambda x: _compare_annotations_externally(
+                        x, ann_a, ignored_attrs=ignored_attrs
+                    ),
+                )
+            else:
+                ann_b = find(
+                    ann_b_matches,
+                    lambda x: _compare_annotations(x, ann_a, ignored_attrs=ignored_attrs),
+                )


cmp_anns = ... ... cmp_anns(a, b, ...)

I suggest to use the strategy pattern here.

yasakova-anastasia added 6 commits April 1, 2022 09:57

Add tests for MNIST

1af3a50

Add tests for WiderFace

0e4b923

Add tests for Cityscapes

9b477f0

Remove unused file

578bba9

Add dataset mangling function

d01c8be

Fixes

85abad6

yasakova-anastasia marked this pull request as draft April 1, 2022 07:48

yasakova-anastasia added 3 commits April 1, 2022 11:20

Fix dataset_mangling

5abcfff

Fixes

7e5a772

Add test

edcd287

yasakova-anastasia marked this pull request as ready for review April 1, 2022 09:17

Add file with labels for WiderFace dataset

11bd377

yasakova-anastasia requested a review from zhiltsov-max April 1, 2022 11:27

zhiltsov-max reviewed Apr 4, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tests on more real datasets for some datasets #707

Tests on more real datasets for some datasets #707

yasakova-anastasia commented Apr 1, 2022 •

edited

Loading

zhiltsov-max Apr 4, 2022 •

edited

Loading

zhiltsov-max Apr 4, 2022 •

edited

Loading

zhiltsov-max Apr 4, 2022

zhiltsov-max Apr 4, 2022 •

edited

Loading

zhiltsov-max Apr 4, 2022 •

edited

Loading

zhiltsov-max Apr 4, 2022

zhiltsov-max Apr 4, 2022

zhiltsov-max Apr 4, 2022 •

edited

Loading

		@@ -137,15 +140,49 @@ def _compare_annotations(expected, actual, ignored_attrs=None):
		return r


		def _compare_annotations_externally(expected, actual, ignored_attrs=None):

Tests on more real datasets for some datasets #707

Are you sure you want to change the base?

Tests on more real datasets for some datasets #707

Conversation

yasakova-anastasia commented Apr 1, 2022 • edited Loading

Summary

How to test

Checklist

License

zhiltsov-max Apr 4, 2022 • edited Loading

Choose a reason for hiding this comment

zhiltsov-max Apr 4, 2022 • edited Loading

Choose a reason for hiding this comment

zhiltsov-max Apr 4, 2022

Choose a reason for hiding this comment

zhiltsov-max Apr 4, 2022 • edited Loading

Choose a reason for hiding this comment

zhiltsov-max Apr 4, 2022 • edited Loading

Choose a reason for hiding this comment

zhiltsov-max Apr 4, 2022

Choose a reason for hiding this comment

zhiltsov-max Apr 4, 2022

Choose a reason for hiding this comment

zhiltsov-max Apr 4, 2022 • edited Loading

Choose a reason for hiding this comment

yasakova-anastasia commented Apr 1, 2022 •

edited

Loading

zhiltsov-max Apr 4, 2022 •

edited

Loading

zhiltsov-max Apr 4, 2022 •

edited

Loading

zhiltsov-max Apr 4, 2022 •

edited

Loading

zhiltsov-max Apr 4, 2022 •

edited

Loading

zhiltsov-max Apr 4, 2022 •

edited

Loading