Merge branch 'develop' into zm/video-extractor

openvinotoolkit · Jan 14, 2022 · b76916a · b76916a
2 parents 5c2b9d4 + 72f3a38
commit b76916a
Show file tree

Hide file tree

Showing 66 changed files with 1,134 additions and 339 deletions.
diff --git a/.github/workflows/health_check.yml b/.github/workflows/health_check.yml
@@ -8,7 +8,7 @@ jobs:
     strategy:
       fail-fast: false
       matrix:
-        python-version: ['3.6', '3.7', '3.8', '3.9']
+        python-version: ['3.7', '3.8', '3.9']
     runs-on: ubuntu-latest
     steps:
       - uses: actions/checkout@v2
@@ -18,13 +18,12 @@ jobs:
           python-version: ${{ matrix.python-version }}
       - name: Installing dependencies
         run: |
-          pip install tensorflow pytest pytest-cov
-          pip install -e .[default,tfds]
+          pip install -e '.[default,tf,tfds]' pytest pytest-cov
       - name: Code instrumentation
         run: |
           pytest -v --cov --cov-report xml:coverage.xml
           datum -h
       - name: Sending coverage results
-        if: matrix.python-version == '3.6'
+        if: matrix.python-version == '3.7'
         run: |
           bash <(curl -Ls https://coverage.codacy.com/get.sh) report -r coverage.xml -t ${{ secrets.CODACY_PROJECT_TOKEN }}
diff --git a/.github/workflows/pr_checks.yml b/.github/workflows/pr_checks.yml
@@ -18,7 +18,7 @@ jobs:
       fail-fast: false
       matrix:
         os: ['macos-10.15', 'ubuntu-20.04', 'windows-2016']
-        python-version: ['3.6', '3.7', '3.8', '3.9']
+        python-version: ['3.7', '3.8', '3.9']
     name: build and test (${{ matrix.os }}, Python ${{ matrix.python-version }})
     runs-on: ${{ matrix.os }}
     steps:
@@ -29,8 +29,7 @@ jobs:
           python-version: ${{ matrix.python-version }}
       - name: Installing dependencies
         run: |
-          pip install tensorflow pytest
-          pip install -e .[default,tfds]
+          pip install -e '.[default,tf,tfds]' pytest
       - name: Unit testing
         run: |
           pytest -v

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -21,23 +21,38 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
   (<https://github.com/openvinotoolkit/datumaro/pull/585>)
 
 ### Changed
-- `smooth_line` from `datumaro.util.annotation_util` - the function
-  is renamed to `approximate_line` and has updated interface
-  (<https://github.com/openvinotoolkit/datumaro/pull/592>)
+- The `pycocotools` dependency lower bound is raised to `2.0.4`.
+  (<https://github.com/openvinotoolkit/datumaro/pull/449>)
 - Allowed direct file paths in `datum import`. Such sources are imported like
   when the `rpath` parameter is specified, however, only the selected path
   is copied into the project
   (<https://github.com/openvinotoolkit/datumaro/pull/555>)
+- `smooth_line` from `datumaro.util.annotation_util` - the function
+  is renamed to `approximate_line` and has updated interface
+  (<https://github.com/openvinotoolkit/datumaro/pull/592>)
+- OpenVINO telemetry library 2022.1.0 from PyPI.
+  (<https://github.com/openvinotoolkit/datumaro/pull/625>)
 
 ### Deprecated
 - TBD
 
 ### Removed
-- TBD
+- Official support of Python 3.6 (due to it's EOL)
+  (<https://github.com/openvinotoolkit/datumaro/pull/617>)
 
 ### Fixed
 - Fails in multimerge when lines are not approximated and when there are no
   label categories (<https://github.com/openvinotoolkit/datumaro/pull/592>)
+- Cannot convert LabelMe dataset, that has no subsets
+  (<https://github.com/openvinotoolkit/datumaro/pull/600>)
+- Saving (overwriting) a dataset in a project when rpath is used
+  (<https://github.com/openvinotoolkit/datumaro/pull/613>)
+- Output image extension preserving in the `Resize` transform
+  (<https://github.com/openvinotoolkit/datumaro/issues/606>)
+- Memory overuse in the `Resize` transform
+  (<https://github.com/openvinotoolkit/datumaro/issues/607>)
+- Invalid image pixels produced by the `Resize` transform
+  (<https://github.com/openvinotoolkit/datumaro/issues/618>)
 
 ### Security
 - TBD

diff --git a/README.md b/README.md
@@ -26,28 +26,22 @@ CVAT annotations                             ---> Publication, statistics etc.
 
 [(Back to top)](#dataset-management-framework-datumaro)
 
-- Dataset reading, writing, conversion in any direction. [Supported formats](https://openvinotoolkit.github.io/datumaro/docs/user-manual/supported_formats):
-  - [COCO](http://cocodataset.org/#format-data) (`image_info`, `instances`, `person_keypoints`, `captions`, `labels`, `panoptic`, `stuff`)
-  - [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/htmldoc/index.html) (`classification`, `detection`, `segmentation`, `action_classification`, `person_layout`)
-  - [YOLO](https://github.com/AlexeyAB/darknet#how-to-train-pascal-voc-data) (`bboxes`)
-  - [TF Detection API](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/using_your_own_dataset.md) (`bboxes`, `masks`)
-  - [WIDER Face](http://shuoyang1213.me/WIDERFACE/) (`bboxes`)
-  - [VGGFace2](https://github.com/ox-vgg/vgg_face2) (`landmarks`, `bboxes`)
-  - [MOT sequences](https://arxiv.org/pdf/1906.04567.pdf)
-  - [MOTS PNG](https://www.vision.rwth-aachen.de/page/mots)
-  - [ImageNet](http://image-net.org/)
+- Dataset reading, writing, conversion in any direction.
   - [CIFAR-10/100](https://www.cs.toronto.edu/~kriz/cifar.html) (`classification`)
-  - [MNIST](http://yann.lecun.com/exdb/mnist/) (`classification`)
-  - [MNIST in CSV](https://pjreddie.com/projects/mnist-in-csv/) (`classification`)
-  - [CamVid](http://mi.eng.cam.ac.uk/research/projects/VideoRec/CamVid/)
   - [Cityscapes](https://www.cityscapes-dataset.com/)
-  - [Kitti](http://www.cvlibs.net/datasets/kitti/index.php) (`segmentation`, `detection`, `3D raw` / `velodyne points`)
-  - [Supervisely](https://docs.supervise.ly/data-organization/00_ann_format_navi) (`point cloud`)
+  - [COCO](http://cocodataset.org/#format-data) (`image_info`, `instances`, `person_keypoints`, `captions`, `labels`, `panoptic`, `stuff`)
   - [CVAT](https://openvinotoolkit.github.io/cvat/docs/manual/advanced/xml_format)
+  - [ImageNet](http://image-net.org/)
+  - [Kitti](http://www.cvlibs.net/datasets/kitti/index.php) (`segmentation`, `detection`, `3D raw` / `velodyne points`)
   - [LabelMe](http://labelme.csail.mit.edu/Release3.0)
-  - [ICDAR13/15](https://rrc.cvc.uab.es/?ch=2) (`word_recognition`, `text_localization`, `text_segmentation`)
-  - [Market-1501](https://www.aitribune.com/dataset/2018051063) (`person re-identification`)
   - [LFW](http://vis-www.cs.umass.edu/lfw/) (`classification`, `person re-identification`, `landmarks`)
+  - [MNIST](http://yann.lecun.com/exdb/mnist/) (`classification`)
+  - [Open Images](https://storage.googleapis.com/openimages/web/download.html)
+  - [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/htmldoc/index.html) (`classification`, `detection`, `segmentation`, `action_classification`, `person_layout`)
+  - [TF Detection API](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/using_your_own_dataset.md) (`bboxes`, `masks`)
+  - [YOLO](https://github.com/AlexeyAB/darknet#how-to-train-pascal-voc-data) (`bboxes`)
+
+  Other formats and documentation for them can be found [here](https://openvinotoolkit.github.io/datumaro/docs/user-manual/supported_formats).
 - Dataset building
   - Merging multiple datasets into one
   - Dataset filtering by a custom criteria:

diff --git a/datumaro/cli/__main__.py b/datumaro/cli/__main__.py
@@ -1,4 +1,4 @@
-# Copyright (C) 2019-2021 Intel Corporation
+# Copyright (C) 2019-2022 Intel Corporation
 #
 # SPDX-License-Identifier: MIT
 
@@ -56,18 +56,18 @@ def _make_subcommands_help(commands, help_line_start=0):
 
 def _get_known_contexts():
     return [
+        ('model', contexts.model, "Actions with models"),
         ('project', contexts.project, "Actions with projects"),
         ('source', contexts.source, "Actions with data sources"),
-        ('model', contexts.model, "Actions with models"),
         ('util', contexts.util, "Auxillary tools and utilities"),
     ]
 
 def _get_known_commands():
     return [
         ("Project modification:", None, ''),
+        ('add', commands.add, "Add dataset"),
         ('create', commands.create, "Create empty project"),
         ('import', commands.import_, "Import dataset"),
-        ('add', commands.add, "Add dataset"),
         ('remove', commands.remove, "Remove dataset"),
 
         ("", None, ''),
@@ -79,17 +79,17 @@ def _get_known_commands():
 
         ("", None, ''),
         ("Dataset operations:", None, ''),
+        ('convert', commands.convert, "Convert dataset between formats"),
+        ('diff', commands.diff, "Compare datasets"),
         ('download', commands.download, "Download a publicly available dataset"),
+        ('explain', commands.explain, "Run Explainable AI algorithm for model"),
         ('export', commands.export, "Export dataset in some format"),
         ('filter', commands.filter, "Filter dataset items"),
-        ('transform', commands.transform, "Modify dataset items"),
+        ('info', commands.info, "Print dataset info"),
         ('merge', commands.merge, "Merge datasets"),
         ('patch', commands.patch, "Update dataset from another one"),
-        ('convert', commands.convert, "Convert dataset between formats"),
-        ('diff', commands.diff, "Compare datasets"),
         ('stats', commands.stats, "Compute dataset statistics"),
-        ('info', commands.info, "Print dataset info"),
-        ('explain', commands.explain, "Run Explainable AI algorithm for model"),
+        ('transform', commands.transform, "Modify dataset items"),
         ('validate', commands.validate, "Validate dataset")
     ]
 

diff --git a/datumaro/components/annotation.py b/datumaro/components/annotation.py
@@ -1,7 +1,9 @@
-# Copyright (C) 2021 Intel Corporation
+# Copyright (C) 2021-2022 Intel Corporation
 #
 # SPDX-License-Identifier: MIT
 
+from __future__ import annotations
+
 from enum import Enum, auto
 from itertools import zip_longest
 from typing import (
@@ -101,7 +103,7 @@ def from_iterable(cls, iterable: Iterable[Union[
         Tuple[str],
         Tuple[str, str],
         Tuple[str, str, List[str]],
-    ]]) -> 'LabelCategories':
+    ]]) -> LabelCategories:
         """
         Creates a LabelCategories from iterable.
 
@@ -180,7 +182,7 @@ class MaskCategories(Categories):
 
     @classmethod
     def generate(cls, size: int = 255, include_background: bool = True) \
-            -> 'MaskCategories':
+            -> MaskCategories:
         """
         Generates MaskCategories with the specified size.
 
@@ -336,7 +338,7 @@ class CompiledMask:
     @staticmethod
     def from_instance_masks(instance_masks: Iterable[Mask],
             instance_ids: Optional[Iterable[int]] = None,
-            instance_labels: Optional[Iterable[int]] = None) -> 'CompiledMask':
+            instance_labels: Optional[Iterable[int]] = None) -> CompiledMask:
         """
         Joins instance masks into a single mask. Masks are sorted by
         z_order (ascending) prior to merging.
@@ -655,7 +657,7 @@ class Category:
     def from_iterable(cls, iterable: Union[
         Tuple[int, List[str]],
         Tuple[int, List[str], Set[Tuple[int, int]]],
-    ]) -> 'PointsCategories':
+    ]) -> PointsCategories:
         """
         Create PointsCategories from an iterable.
 

diff --git a/datumaro/components/dataset.py b/datumaro/components/dataset.py
@@ -1,7 +1,9 @@
-# Copyright (C) 2020-2021 Intel Corporation
+# Copyright (C) 2020-2022 Intel Corporation
 #
 # SPDX-License-Identifier: MIT
 
+from __future__ import annotations
+
 from contextlib import contextmanager
 from copy import copy
 from enum import Enum, auto
@@ -105,7 +107,7 @@ def __copy__(self):
 
 class DatasetItemStorageDatasetView(IDataset):
     class Subset(IDataset):
-        def __init__(self, parent: 'DatasetItemStorageDatasetView', name: str):
+        def __init__(self, parent: DatasetItemStorageDatasetView, name: str):
             super().__init__()
             self.parent = parent
             self.name = name
@@ -182,7 +184,7 @@ class DatasetPatch:
     class DatasetPatchWrapper(DatasetItemStorageDatasetView):
         # The purpose of this class is to indicate that the input dataset is
         # a patch and autofill patch info in Converter
-        def __init__(self, patch: 'DatasetPatch', parent: IDataset):
+        def __init__(self, patch: DatasetPatch, parent: IDataset):
             super().__init__(patch.data, parent.categories())
             self.patch = patch
 
@@ -212,7 +214,7 @@ def as_dataset(self, parent: IDataset) -> IDataset:
         return __class__.DatasetPatchWrapper(self, parent)
 
 class DatasetSubset(IDataset): # non-owning view
-    def __init__(self, parent: 'Dataset', name: str):
+    def __init__(self, parent: Dataset, name: str):
         super().__init__()
         self.parent = parent
         self.name = name
@@ -249,7 +251,7 @@ def subsets(self):
     def categories(self):
         return self.parent.categories()
 
-    def as_dataset(self) -> 'Dataset':
+    def as_dataset(self) -> Dataset:
         return Dataset.from_extractors(self, env=self.parent.env)
 
 
@@ -608,7 +610,7 @@ class Dataset(IDataset):
     @classmethod
     def from_iterable(cls, iterable: Iterable[DatasetItem],
             categories: Union[CategoriesInfo, List[str], None] = None,
-            env: Optional[Environment] = None) -> 'Dataset':
+            env: Optional[Environment] = None) -> Dataset:
         if isinstance(categories, list):
             categories = { AnnotationType.label:
                 LabelCategories.from_iterable(categories)
@@ -632,7 +634,7 @@ def categories(self):
 
     @staticmethod
     def from_extractors(*sources: IDataset,
-            env: Optional[Environment] = None) -> 'Dataset':
+            env: Optional[Environment] = None) -> Dataset:
         if len(sources) == 1:
             source = sources[0]
         else:
@@ -709,15 +711,15 @@ def remove(self, id: str, subset: Optional[str] = None) -> None:
         self._data.remove(id, subset)
 
     def filter(self, expr: str, filter_annotations: bool = False,
-            remove_empty: bool = False) -> 'Dataset':
+            remove_empty: bool = False) -> Dataset:
         if filter_annotations:
             return self.transform(XPathAnnotationsFilter, expr, remove_empty)
         else:
             return self.transform(XPathDatasetFilter, expr)
 
     def update(self,
             source: Union[DatasetPatch, IExtractor, Iterable[DatasetItem]]) \
-                -> 'Dataset':
+                -> Dataset:
         """
         Updates items of the current dataset from another dataset or an
         iterable (the source). Items from the source overwrite matching
@@ -734,7 +736,7 @@ def update(self,
         return self
 
     def transform(self, method: Union[str, Type[Transform]],
-            *args, **kwargs) -> 'Dataset':
+            *args, **kwargs) -> Dataset:
         """
         Applies some function to dataset items.
         """
@@ -754,7 +756,7 @@ def transform(self, method: Union[str, Type[Transform]],
 
         return self
 
-    def run_model(self, model, batch_size=1) -> 'Dataset':
+    def run_model(self, model, batch_size=1) -> Dataset:
         from datumaro.components.launcher import Launcher, ModelTransform
         if isinstance(model, Launcher):
             return self.transform(ModelTransform, launcher=model,
@@ -765,7 +767,7 @@ def run_model(self, model, batch_size=1) -> 'Dataset':
             raise TypeError("Unexpected 'model' argument type: %s" % \
                 type(model))
 
-    def select(self, pred: Callable[[DatasetItem], bool]) -> 'Dataset':
+    def select(self, pred: Callable[[DatasetItem], bool]) -> Dataset:
         class _DatasetFilter(ItemTransform):
             def transform_item(self, item):
                 if pred(item):
@@ -863,12 +865,12 @@ def save(self, save_dir: Optional[str] = None, **kwargs) -> None:
             format=self._format, **options)
 
     @classmethod
-    def load(cls, path: str, **kwargs) -> 'Dataset':
+    def load(cls, path: str, **kwargs) -> Dataset:
         return cls.import_from(path, format=DEFAULT_FORMAT, **kwargs)
 
     @classmethod
     def import_from(cls, path: str, format: Optional[str] = None,
-            env: Optional[Environment] = None, **kwargs) -> 'Dataset':
+            env: Optional[Environment] = None, **kwargs) -> Dataset:
         from datumaro.components.config_model import Source
 
         if env is None:

diff --git a/datumaro/components/extractor.py b/datumaro/components/extractor.py
@@ -1,7 +1,9 @@
-# Copyright (C) 2019-2021 Intel Corporation
+# Copyright (C) 2019-2022 Intel Corporation
 #
 # SPDX-License-Identifier: MIT
 
+from __future__ import annotations
+
 from glob import iglob
 from typing import Any, Callable, Dict, Iterator, List, Optional
 import os
@@ -101,10 +103,10 @@ def __len__(self) -> int:
     def __bool__(self): # avoid __len__ use for truth checking
         return True
 
-    def subsets(self) -> Dict[str, 'IExtractor']:
+    def subsets(self) -> Dict[str, IExtractor]:
         raise NotImplementedError()
 
-    def get_subset(self, name) -> 'IExtractor':
+    def get_subset(self, name) -> IExtractor:
         raise NotImplementedError()
 
     def categories(self) -> CategoriesInfo: