Skip to content

Commit

Permalink
Mergeback 1.9.0 to develop (openvinotoolkit#1604)
Browse files Browse the repository at this point in the history
<!-- Contributing guide:
https://github.com/openvinotoolkit/datumaro/blob/develop/CONTRIBUTING.md
-->

### Summary

<!--
Resolves openvinotoolkit#111 and openvinotoolkit#222.
Depends on openvinotoolkit#1000 (for series of dependent commits).

This PR introduces this capability to make the project better in this
and that.

- Added this feature
- Removed that feature
- Fixed the problem openvinotoolkit#1234
-->

### How to test
<!-- Describe the testing procedure for reviewers, if changes are
not fully covered by unit tests or manual testing can be complicated.
-->

### Checklist
<!-- Put an 'x' in all the boxes that apply -->
- [ ] I have added unit tests to cover my changes.​
- [ ] I have added integration tests to cover my changes.​
- [ ] I have added the description of my changes into
[CHANGELOG](https://github.com/openvinotoolkit/datumaro/blob/develop/CHANGELOG.md).​
- [ ] I have updated the
[documentation](https://github.com/openvinotoolkit/datumaro/tree/develop/docs)
accordingly

### License

- [ ] I submit _my code changes_ under the same [MIT
License](https://github.com/openvinotoolkit/datumaro/blob/develop/LICENSE)
that covers the project.
  Feel free to contact the maintainers if that's a concern.
- [ ] I have updated the license header for each file (see an example
below).

```python
# Copyright (C) 2024 Intel Corporation
#
# SPDX-License-Identifier: MIT
```

---------

Co-authored-by: Yunchu Lee <yunchu.lee@intel.com>
Co-authored-by: Wonju Lee <wonju.lee@intel.com>
  • Loading branch information
3 people committed Sep 11, 2024
1 parent 97b570f commit 4f3a6e0
Show file tree
Hide file tree
Showing 5 changed files with 291 additions and 11 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### New features
- Add a new CLI command: datum format
(<https://github.com/openvinotoolkit/datumaro/pull/1570>)
- Support language dataset for DmTorchDataset
(<https://github.com/openvinotoolkit/datumaro/pull/1592>)

### Enhancements
- Change _Shape to Shape and add comments for subclasses of Shape
Expand Down
3 changes: 3 additions & 0 deletions requirements-core.txt
Original file line number Diff line number Diff line change
Expand Up @@ -64,3 +64,6 @@ json-stream

# TabularValidator
nltk

# torch converter for language
portalocker
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ def parse_requirements(filename=CORE_REQUIREMENTS_FILE):
extras_require={
"tf": ["tensorflow"],
"tfds": ["tensorflow-datasets<4.9.3"],
"torch": ["torch", "torchvision"],
"torch": ["torch", "torchvision", "torchtext==0.16.0"],
"default": DEFAULT_REQUIREMENTS,
},
ext_modules=ext_modules,
Expand Down
51 changes: 49 additions & 2 deletions src/datumaro/plugins/framework_converter.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (C) 2023 Intel Corporation
# Copyright (C) 2023-2024 Intel Corporation
#
# SPDX-License-Identifier: MIT

Expand All @@ -17,6 +17,7 @@
"detection": AnnotationType.bbox,
"instance_segmentation": AnnotationType.polygon,
"semantic_segmentation": AnnotationType.mask,
"tabular": [AnnotationType.label, AnnotationType.caption],
}


Expand Down Expand Up @@ -88,7 +89,10 @@ def _gen_item(self, idx: int):
if ann.type == TASK_ANN_TYPE[self.task]
]
label = mask_tools.merge_masks((mask, label_id) for mask, label_id in masks)

elif self.task == "tabular":
label = [
ann.as_dict() for ann in item.annotations if ann.type in TASK_ANN_TYPE[self.task]
]
return image, label


Expand All @@ -103,15 +107,58 @@ def __init__(
task: str,
transform: Optional[Callable] = None,
target_transform: Optional[Callable] = None,
target: Optional[str] = None,
tokenizer: Optional[tuple[Callable, Callable]] = None,
vocab: Optional[tuple[Callable, Callable]] = None,
):
super().__init__(dataset=dataset, subset=subset, task=task)

self.transform = transform
self.target_transform = target_transform

if self.task == "tabular":
if not isinstance(target, dict):
raise ValueError(
"Target should be a dictionary with 'input' and 'output' keys."
)
self.input_target = target.get("input")
self.output_target = target.get("output")
if not self.input_target:
raise ValueError(
"Please provide target column for tabular task which is used for input"
)

if not (tokenizer and vocab):
raise ValueError("Both tokenizer and vocab must be provided for tabular task")
self.tokenizer = tokenizer
self.vocab = vocab

def __getitem__(self, idx):
image, label = self._gen_item(idx)

if self.task == "tabular":
text = image()[self.input_target]

if self.output_target:
src_tokenizer, tgt_tokenizer = self.tokenizer
src_vocab, tgt_vocab = self.vocab
src_tokens = src_tokenizer(text)
src_token_ids = src_vocab(src_tokens)

label_text = label[0]["caption"].split(f"{self.output_target}:")[-1]
tgt_tokens = tgt_tokenizer(label_text)
tgt_token_ids = tgt_vocab(tgt_tokens)

return torch.tensor(src_token_ids, dtype=torch.long), torch.tensor(
tgt_token_ids, dtype=torch.long
)
else:
tokens = self.tokenizer(text)
token_ids = self.vocab(tokens)
return torch.tensor(token_ids, dtype=torch.long), torch.tensor(
label[0]["label"], dtype=torch.long
)

if len(image.shape) == 2:
image = np.expand_dims(image, axis=-1)

Expand Down
Loading

0 comments on commit 4f3a6e0

Please sign in to comment.