Skip to content

Commit

Permalink
Merge branch 'develop' into zm/video-extractor
Browse files Browse the repository at this point in the history
  • Loading branch information
Maxim Zhiltsov committed Jan 14, 2022
2 parents 5c2b9d4 + 72f3a38 commit b76916a
Show file tree
Hide file tree
Showing 66 changed files with 1,134 additions and 339 deletions.
7 changes: 3 additions & 4 deletions .github/workflows/health_check.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ jobs:
strategy:
fail-fast: false
matrix:
python-version: ['3.6', '3.7', '3.8', '3.9']
python-version: ['3.7', '3.8', '3.9']
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
Expand All @@ -18,13 +18,12 @@ jobs:
python-version: ${{ matrix.python-version }}
- name: Installing dependencies
run: |
pip install tensorflow pytest pytest-cov
pip install -e .[default,tfds]
pip install -e '.[default,tf,tfds]' pytest pytest-cov
- name: Code instrumentation
run: |
pytest -v --cov --cov-report xml:coverage.xml
datum -h
- name: Sending coverage results
if: matrix.python-version == '3.6'
if: matrix.python-version == '3.7'
run: |
bash <(curl -Ls https://coverage.codacy.com/get.sh) report -r coverage.xml -t ${{ secrets.CODACY_PROJECT_TOKEN }}
5 changes: 2 additions & 3 deletions .github/workflows/pr_checks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ jobs:
fail-fast: false
matrix:
os: ['macos-10.15', 'ubuntu-20.04', 'windows-2016']
python-version: ['3.6', '3.7', '3.8', '3.9']
python-version: ['3.7', '3.8', '3.9']
name: build and test (${{ matrix.os }}, Python ${{ matrix.python-version }})
runs-on: ${{ matrix.os }}
steps:
Expand All @@ -29,8 +29,7 @@ jobs:
python-version: ${{ matrix.python-version }}
- name: Installing dependencies
run: |
pip install tensorflow pytest
pip install -e .[default,tfds]
pip install -e '.[default,tf,tfds]' pytest
- name: Unit testing
run: |
pytest -v
Expand Down
23 changes: 19 additions & 4 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,23 +21,38 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
(<https://github.com/openvinotoolkit/datumaro/pull/585>)

### Changed
- `smooth_line` from `datumaro.util.annotation_util` - the function
is renamed to `approximate_line` and has updated interface
(<https://github.com/openvinotoolkit/datumaro/pull/592>)
- The `pycocotools` dependency lower bound is raised to `2.0.4`.
(<https://github.com/openvinotoolkit/datumaro/pull/449>)
- Allowed direct file paths in `datum import`. Such sources are imported like
when the `rpath` parameter is specified, however, only the selected path
is copied into the project
(<https://github.com/openvinotoolkit/datumaro/pull/555>)
- `smooth_line` from `datumaro.util.annotation_util` - the function
is renamed to `approximate_line` and has updated interface
(<https://github.com/openvinotoolkit/datumaro/pull/592>)
- OpenVINO telemetry library 2022.1.0 from PyPI.
(<https://github.com/openvinotoolkit/datumaro/pull/625>)

### Deprecated
- TBD

### Removed
- TBD
- Official support of Python 3.6 (due to it's EOL)
(<https://github.com/openvinotoolkit/datumaro/pull/617>)

### Fixed
- Fails in multimerge when lines are not approximated and when there are no
label categories (<https://github.com/openvinotoolkit/datumaro/pull/592>)
- Cannot convert LabelMe dataset, that has no subsets
(<https://github.com/openvinotoolkit/datumaro/pull/600>)
- Saving (overwriting) a dataset in a project when rpath is used
(<https://github.com/openvinotoolkit/datumaro/pull/613>)
- Output image extension preserving in the `Resize` transform
(<https://github.com/openvinotoolkit/datumaro/issues/606>)
- Memory overuse in the `Resize` transform
(<https://github.com/openvinotoolkit/datumaro/issues/607>)
- Invalid image pixels produced by the `Resize` transform
(<https://github.com/openvinotoolkit/datumaro/issues/618>)

### Security
- TBD
Expand Down
28 changes: 11 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,28 +26,22 @@ CVAT annotations ---> Publication, statistics etc.

[(Back to top)](#dataset-management-framework-datumaro)

- Dataset reading, writing, conversion in any direction. [Supported formats](https://openvinotoolkit.github.io/datumaro/docs/user-manual/supported_formats):
- [COCO](http://cocodataset.org/#format-data) (`image_info`, `instances`, `person_keypoints`, `captions`, `labels`, `panoptic`, `stuff`)
- [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/htmldoc/index.html) (`classification`, `detection`, `segmentation`, `action_classification`, `person_layout`)
- [YOLO](https://github.com/AlexeyAB/darknet#how-to-train-pascal-voc-data) (`bboxes`)
- [TF Detection API](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/using_your_own_dataset.md) (`bboxes`, `masks`)
- [WIDER Face](http://shuoyang1213.me/WIDERFACE/) (`bboxes`)
- [VGGFace2](https://github.com/ox-vgg/vgg_face2) (`landmarks`, `bboxes`)
- [MOT sequences](https://arxiv.org/pdf/1906.04567.pdf)
- [MOTS PNG](https://www.vision.rwth-aachen.de/page/mots)
- [ImageNet](http://image-net.org/)
- Dataset reading, writing, conversion in any direction.
- [CIFAR-10/100](https://www.cs.toronto.edu/~kriz/cifar.html) (`classification`)
- [MNIST](http://yann.lecun.com/exdb/mnist/) (`classification`)
- [MNIST in CSV](https://pjreddie.com/projects/mnist-in-csv/) (`classification`)
- [CamVid](http://mi.eng.cam.ac.uk/research/projects/VideoRec/CamVid/)
- [Cityscapes](https://www.cityscapes-dataset.com/)
- [Kitti](http://www.cvlibs.net/datasets/kitti/index.php) (`segmentation`, `detection`, `3D raw` / `velodyne points`)
- [Supervisely](https://docs.supervise.ly/data-organization/00_ann_format_navi) (`point cloud`)
- [COCO](http://cocodataset.org/#format-data) (`image_info`, `instances`, `person_keypoints`, `captions`, `labels`, `panoptic`, `stuff`)
- [CVAT](https://openvinotoolkit.github.io/cvat/docs/manual/advanced/xml_format)
- [ImageNet](http://image-net.org/)
- [Kitti](http://www.cvlibs.net/datasets/kitti/index.php) (`segmentation`, `detection`, `3D raw` / `velodyne points`)
- [LabelMe](http://labelme.csail.mit.edu/Release3.0)
- [ICDAR13/15](https://rrc.cvc.uab.es/?ch=2) (`word_recognition`, `text_localization`, `text_segmentation`)
- [Market-1501](https://www.aitribune.com/dataset/2018051063) (`person re-identification`)
- [LFW](http://vis-www.cs.umass.edu/lfw/) (`classification`, `person re-identification`, `landmarks`)
- [MNIST](http://yann.lecun.com/exdb/mnist/) (`classification`)
- [Open Images](https://storage.googleapis.com/openimages/web/download.html)
- [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/htmldoc/index.html) (`classification`, `detection`, `segmentation`, `action_classification`, `person_layout`)
- [TF Detection API](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/using_your_own_dataset.md) (`bboxes`, `masks`)
- [YOLO](https://github.com/AlexeyAB/darknet#how-to-train-pascal-voc-data) (`bboxes`)

Other formats and documentation for them can be found [here](https://openvinotoolkit.github.io/datumaro/docs/user-manual/supported_formats).
- Dataset building
- Merging multiple datasets into one
- Dataset filtering by a custom criteria:
Expand Down
16 changes: 8 additions & 8 deletions datumaro/cli/__main__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (C) 2019-2021 Intel Corporation
# Copyright (C) 2019-2022 Intel Corporation
#
# SPDX-License-Identifier: MIT

Expand Down Expand Up @@ -56,18 +56,18 @@ def _make_subcommands_help(commands, help_line_start=0):

def _get_known_contexts():
return [
('model', contexts.model, "Actions with models"),
('project', contexts.project, "Actions with projects"),
('source', contexts.source, "Actions with data sources"),
('model', contexts.model, "Actions with models"),
('util', contexts.util, "Auxillary tools and utilities"),
]

def _get_known_commands():
return [
("Project modification:", None, ''),
('add', commands.add, "Add dataset"),
('create', commands.create, "Create empty project"),
('import', commands.import_, "Import dataset"),
('add', commands.add, "Add dataset"),
('remove', commands.remove, "Remove dataset"),

("", None, ''),
Expand All @@ -79,17 +79,17 @@ def _get_known_commands():

("", None, ''),
("Dataset operations:", None, ''),
('convert', commands.convert, "Convert dataset between formats"),
('diff', commands.diff, "Compare datasets"),
('download', commands.download, "Download a publicly available dataset"),
('explain', commands.explain, "Run Explainable AI algorithm for model"),
('export', commands.export, "Export dataset in some format"),
('filter', commands.filter, "Filter dataset items"),
('transform', commands.transform, "Modify dataset items"),
('info', commands.info, "Print dataset info"),
('merge', commands.merge, "Merge datasets"),
('patch', commands.patch, "Update dataset from another one"),
('convert', commands.convert, "Convert dataset between formats"),
('diff', commands.diff, "Compare datasets"),
('stats', commands.stats, "Compute dataset statistics"),
('info', commands.info, "Print dataset info"),
('explain', commands.explain, "Run Explainable AI algorithm for model"),
('transform', commands.transform, "Modify dataset items"),
('validate', commands.validate, "Validate dataset")
]

Expand Down
12 changes: 7 additions & 5 deletions datumaro/components/annotation.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
# Copyright (C) 2021 Intel Corporation
# Copyright (C) 2021-2022 Intel Corporation
#
# SPDX-License-Identifier: MIT

from __future__ import annotations

from enum import Enum, auto
from itertools import zip_longest
from typing import (
Expand Down Expand Up @@ -101,7 +103,7 @@ def from_iterable(cls, iterable: Iterable[Union[
Tuple[str],
Tuple[str, str],
Tuple[str, str, List[str]],
]]) -> 'LabelCategories':
]]) -> LabelCategories:
"""
Creates a LabelCategories from iterable.
Expand Down Expand Up @@ -180,7 +182,7 @@ class MaskCategories(Categories):

@classmethod
def generate(cls, size: int = 255, include_background: bool = True) \
-> 'MaskCategories':
-> MaskCategories:
"""
Generates MaskCategories with the specified size.
Expand Down Expand Up @@ -336,7 +338,7 @@ class CompiledMask:
@staticmethod
def from_instance_masks(instance_masks: Iterable[Mask],
instance_ids: Optional[Iterable[int]] = None,
instance_labels: Optional[Iterable[int]] = None) -> 'CompiledMask':
instance_labels: Optional[Iterable[int]] = None) -> CompiledMask:
"""
Joins instance masks into a single mask. Masks are sorted by
z_order (ascending) prior to merging.
Expand Down Expand Up @@ -655,7 +657,7 @@ class Category:
def from_iterable(cls, iterable: Union[
Tuple[int, List[str]],
Tuple[int, List[str], Set[Tuple[int, int]]],
]) -> 'PointsCategories':
]) -> PointsCategories:
"""
Create PointsCategories from an iterable.
Expand Down
30 changes: 16 additions & 14 deletions datumaro/components/dataset.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
# Copyright (C) 2020-2021 Intel Corporation
# Copyright (C) 2020-2022 Intel Corporation
#
# SPDX-License-Identifier: MIT

from __future__ import annotations

from contextlib import contextmanager
from copy import copy
from enum import Enum, auto
Expand Down Expand Up @@ -105,7 +107,7 @@ def __copy__(self):

class DatasetItemStorageDatasetView(IDataset):
class Subset(IDataset):
def __init__(self, parent: 'DatasetItemStorageDatasetView', name: str):
def __init__(self, parent: DatasetItemStorageDatasetView, name: str):
super().__init__()
self.parent = parent
self.name = name
Expand Down Expand Up @@ -182,7 +184,7 @@ class DatasetPatch:
class DatasetPatchWrapper(DatasetItemStorageDatasetView):
# The purpose of this class is to indicate that the input dataset is
# a patch and autofill patch info in Converter
def __init__(self, patch: 'DatasetPatch', parent: IDataset):
def __init__(self, patch: DatasetPatch, parent: IDataset):
super().__init__(patch.data, parent.categories())
self.patch = patch

Expand Down Expand Up @@ -212,7 +214,7 @@ def as_dataset(self, parent: IDataset) -> IDataset:
return __class__.DatasetPatchWrapper(self, parent)

class DatasetSubset(IDataset): # non-owning view
def __init__(self, parent: 'Dataset', name: str):
def __init__(self, parent: Dataset, name: str):
super().__init__()
self.parent = parent
self.name = name
Expand Down Expand Up @@ -249,7 +251,7 @@ def subsets(self):
def categories(self):
return self.parent.categories()

def as_dataset(self) -> 'Dataset':
def as_dataset(self) -> Dataset:
return Dataset.from_extractors(self, env=self.parent.env)


Expand Down Expand Up @@ -608,7 +610,7 @@ class Dataset(IDataset):
@classmethod
def from_iterable(cls, iterable: Iterable[DatasetItem],
categories: Union[CategoriesInfo, List[str], None] = None,
env: Optional[Environment] = None) -> 'Dataset':
env: Optional[Environment] = None) -> Dataset:
if isinstance(categories, list):
categories = { AnnotationType.label:
LabelCategories.from_iterable(categories)
Expand All @@ -632,7 +634,7 @@ def categories(self):

@staticmethod
def from_extractors(*sources: IDataset,
env: Optional[Environment] = None) -> 'Dataset':
env: Optional[Environment] = None) -> Dataset:
if len(sources) == 1:
source = sources[0]
else:
Expand Down Expand Up @@ -709,15 +711,15 @@ def remove(self, id: str, subset: Optional[str] = None) -> None:
self._data.remove(id, subset)

def filter(self, expr: str, filter_annotations: bool = False,
remove_empty: bool = False) -> 'Dataset':
remove_empty: bool = False) -> Dataset:
if filter_annotations:
return self.transform(XPathAnnotationsFilter, expr, remove_empty)
else:
return self.transform(XPathDatasetFilter, expr)

def update(self,
source: Union[DatasetPatch, IExtractor, Iterable[DatasetItem]]) \
-> 'Dataset':
-> Dataset:
"""
Updates items of the current dataset from another dataset or an
iterable (the source). Items from the source overwrite matching
Expand All @@ -734,7 +736,7 @@ def update(self,
return self

def transform(self, method: Union[str, Type[Transform]],
*args, **kwargs) -> 'Dataset':
*args, **kwargs) -> Dataset:
"""
Applies some function to dataset items.
"""
Expand All @@ -754,7 +756,7 @@ def transform(self, method: Union[str, Type[Transform]],

return self

def run_model(self, model, batch_size=1) -> 'Dataset':
def run_model(self, model, batch_size=1) -> Dataset:
from datumaro.components.launcher import Launcher, ModelTransform
if isinstance(model, Launcher):
return self.transform(ModelTransform, launcher=model,
Expand All @@ -765,7 +767,7 @@ def run_model(self, model, batch_size=1) -> 'Dataset':
raise TypeError("Unexpected 'model' argument type: %s" % \
type(model))

def select(self, pred: Callable[[DatasetItem], bool]) -> 'Dataset':
def select(self, pred: Callable[[DatasetItem], bool]) -> Dataset:
class _DatasetFilter(ItemTransform):
def transform_item(self, item):
if pred(item):
Expand Down Expand Up @@ -863,12 +865,12 @@ def save(self, save_dir: Optional[str] = None, **kwargs) -> None:
format=self._format, **options)

@classmethod
def load(cls, path: str, **kwargs) -> 'Dataset':
def load(cls, path: str, **kwargs) -> Dataset:
return cls.import_from(path, format=DEFAULT_FORMAT, **kwargs)

@classmethod
def import_from(cls, path: str, format: Optional[str] = None,
env: Optional[Environment] = None, **kwargs) -> 'Dataset':
env: Optional[Environment] = None, **kwargs) -> Dataset:
from datumaro.components.config_model import Source

if env is None:
Expand Down
8 changes: 5 additions & 3 deletions datumaro/components/extractor.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
# Copyright (C) 2019-2021 Intel Corporation
# Copyright (C) 2019-2022 Intel Corporation
#
# SPDX-License-Identifier: MIT

from __future__ import annotations

from glob import iglob
from typing import Any, Callable, Dict, Iterator, List, Optional
import os
Expand Down Expand Up @@ -101,10 +103,10 @@ def __len__(self) -> int:
def __bool__(self): # avoid __len__ use for truth checking
return True

def subsets(self) -> Dict[str, 'IExtractor']:
def subsets(self) -> Dict[str, IExtractor]:
raise NotImplementedError()

def get_subset(self, name) -> 'IExtractor':
def get_subset(self, name) -> IExtractor:
raise NotImplementedError()

def categories(self) -> CategoriesInfo:
Expand Down
Loading

0 comments on commit b76916a

Please sign in to comment.