diff --git a/README.md b/README.md index 0237062..9df5fa9 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,9 @@ - - +# Temporal Logic Video (TLV) Dataset [![Contributors][contributors-shield]][contributors-url] [![Forks][forks-shield]][forks-url] [![Stargazers][stars-shield]][stars-url] [![MIT License][license-shield]][license-url] -[![LinkedIn][linkedin-shield]][linkedin-url] -
@@ -22,192 +19,162 @@ Explore the docs »

- View Demo - · - Report Bug + NSVS-TL Project Webpage · - Request Feature + NSVS-TL Source Code

+## Overview - -
- Table of Contents -
    -
  1. About The Project
  2. -
  3. - Getting Started - -
  4. -
  5. Usage
  6. -
  7. Roadmap
  8. -
  9. Contributing
  10. -
  11. License
  12. -
  13. Contact
  14. -
  15. Acknowledgments
  16. -
-
- - -## About The Project - - - -Given the lack of SOTA video datasets for long-horizon, -temporally extended activity and object detection, we intro- -duce the Temporal Logic Video (TLV) datasets. The syn- -thetic TLV datasets are compiled by stitching together static -images from computer vision datasets like COCO and -ImageNet. This enables the artificial introduction of -a wide range of TL specifications. Additionally, we have -created two video datasets based on the open-source au- -tonomous vehicle (AV) driving datasets NuScenes and -Waymo. - -

(back to top)

- - -## Getting Started - -This is an example of how you may give instructions on setting up your project locally. -To get a local copy up and running follow these simple example steps. +The Temporal Logic Video (TLV) Dataset addresses the scarcity of state-of-the-art video datasets for long-horizon, temporally extended activity and object detection. It comprises two main components: -### Prerequisites +1. Synthetic datasets: Generated by concatenating static images from established computer vision datasets (COCO and ImageNet), allowing for the introduction of a wide range of Temporal Logic (TL) specifications. +2. Real-world datasets: Based on open-source autonomous vehicle (AV) driving datasets, specifically NuScenes and Waymo. -If you want to generate syntetic dataset from COCO and ImageNet, you should download the source data first. +## Table of Contents -1. [ImageNet](https://image-net.org/challenges/LSVRC/2017/index.php): The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2017. Recommended file structure as follows: -``` -|--ILSVRC -|----Annotations -|----Data -|----ImageSets -|----LOC_synset_mapping.txt -``` +- [Dataset Composition](#dataset-composition) +- [Dataset (Release)](#dataset) +- [Installation](#installation) +- [Usage](#usage) +- [Data Generation](#data-generation) +- [Contribution Guidelines](#contribution-guidelines) +- [License](#license) +- [Acknowledgments](#acknowledgments) -2. [COCO](https://cocodataset.org/#download): Download the source data as follow: -``` -|--COCO -|----2017 -|------annotations -|------train2017 -|------val2017 -``` +## Dataset Composition -### Installation -``` -python -m venv .venv -source .venv/bin/activate -python -m pip install --upgrade pip build -python -m pip install --editable ."[dev, test]" -``` - -

(back to top)

+### Synthetic Datasets +- Source: COCO and ImageNet +- Purpose: Introduce artificial Temporal Logic specifications +- Generation Method: Image stitching from static datasets - -## Usage -Please find argument details from run scripts. - -### Data Loader Common Argument -* `data_root_dir`: The root directory where the COCO dataset is stored. -* `mapping_to`: Map the original label to desired mapper, default is "coco". -* `save_dir`: Directory where the generated dataset will be saved. -### Synthetic Generator Common Argument -* `initial_number_of_frame`: Initial number of frames for each video. -* `max_number_frame`: Maximum number of frames for each video. -* `number_video_per_set_of_frame`: Number of videos to generate per set of frames. -* `increase_rate`: Rate at which the number of frames increases. -* `ltl_logic`: Temporal logic to apply. Options include various logical expressions like "F prop1", "G prop1", etc. -* `save_images`: Boolean to decide whether to save individual frame images (True or False). - -In each run script, make sure -1. **coco synthetic data generator**
-COCO synthetic data generator can generate & compositions since it has multiple labels. -``` -python3 run_scripts/run_synthetic_tlv_coco.py --data_root_dir "../COCO/2017" --save_dir "" -``` - -2. **Imagenet synthetic data generator**
-Imagenet synthetic data generator cannot generate & LTL logic formula. -``` -python3 run_synthetic_tlv_imagenet.py --data_root_dir "../ILSVRC" --save_dir """ -``` - -

(back to top)

+### Real-world Datasets +- Sources: NuScenes and Waymo +- Purpose: Provide real-world autonomous vehicle scenarios +- Annotation: Temporal Logic specifications added to existing data +## Dataset +
+ + Logo + +
+Though we provide a source code to generate datasets from different types of data source, we release a v1 dataset as a proof of concept. +### Dataset Structure - -## Roadmap +We provide a v1 dataset as a proof of concept. The data is offered as serialized objects, each containing a set of frames with annotations. -- [ ] Publication - - [ ] Repository - - [ ] Blog +#### File Naming Convention +`\:source:\-number_of_frames:\-\.pkl` -

(back to top)

+#### Object Attributes +Each serialized object contains the following attributes: +- `ground_truth`: Boolean indicating whether the dataset contains ground truth labels +- `ltl_formula`: Temporal logic formula applied to the dataset +- `proposition`: A set of proposition for ltl_formula +- `number_of_frame`: Total number of frames in the dataset +- `frames_of_interest`: Frames of interest which satisfy the ltl_formula +- `labels_of_frames`: Labels for each frame +- `images_of_frames`: Image data for each frame - +1. ImageNet (ILSVRC 2017): + ``` + ILSVRC/ + ├── Annotations/ + ├── Data/ + ├── ImageSets/ + └── LOC_synset_mapping.txt + ``` + +2. COCO (2017): + ``` + COCO/ + └── 2017/ + ├── annotations/ + ├── train2017/ + └── val2017/ + ``` +## Usage - -## License +Detailed usage instructions for data loading and processing. -Distributed under the MIT License. See `LICENSE` for more information. +### Data Loader Configuration -

(back to top)

+- `data_root_dir`: Root directory of the dataset +- `mapping_to`: Label mapping scheme (default: "coco") +- `save_dir`: Output directory for processed data - -## Contact +### Synthetic Data Generator Configuration -Minkyu Choi - [@your_twitter](https://twitter.com/MinkyuChoi7) - minkyu.choi@utexas.edu +- `initial_number_of_frame`: Starting frame count per video +- `max_number_frame`: Maximum frame count per video +- `number_video_per_set_of_frame`: Videos to generate per frame set +- `increase_rate`: Frame count increment rate +- `ltl_logic`: Temporal Logic specification (e.g., "F prop1", "G prop1") +- `save_images`: Boolean flag for saving individual frames -Project Link: TBD +## Data Generation -

(back to top)

+### COCO Synthetic Data Generation +```bash +python3 run_scripts/run_synthetic_tlv_coco.py --data_root_dir "../COCO/2017" --save_dir "" +``` +### ImageNet Synthetic Data Generation - -## Acknowledgments +```bash +python3 run_synthetic_tlv_imagenet.py --data_root_dir "../ILSVRC" --save_dir "" +``` -* University of Texas at Austin (UT Austin) -* UT Austin Swarm Lab +Note: ImageNet generator does not support '&' LTL logic formulae. -

(back to top)

+## License +This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details. + +## Citation +If you find this repo useful, please cite our paper: +```bibtex +@inproceedings{Choi_2024_ECCV, + author={Choi, Minkyu and Goel Harsh and Omama, Mohammad and Yang, Yunhao and Shah, Sahil and Chinchali and Sandeep}, + title={Towards Neuro-Symbolic Video Understanding}, + booktitle={Proceedings of the European Conference on Computer Vision (ECCV)}, + month={September}, + year={2024} +} +``` - - -[contributors-shield]: https://img.shields.io/github/contributors/othneildrew/Best-README-Template.svg?style=for-the-badge +[contributors-shield]: https://img.shields.io/github/contributors/UTAustin-SwarmLab/temporal-logic-video-dataset.svg?style=for-the-badge [contributors-url]: https://github.com/UTAustin-SwarmLab/temporal-logic-video-dataset/graphs/contributors -[forks-shield]: https://img.shields.io/github/forks/othneildrew/Best-README-Template.svg?style=for-the-badge +[forks-shield]: https://img.shields.io/github/forks/UTAustin-SwarmLab/temporal-logic-video-dataset.svg?style=for-the-badge [forks-url]: https://github.com/UTAustin-SwarmLab/temporal-logic-video-dataset/network/members -[stars-shield]: https://img.shields.io/github/stars/othneildrew/Best-README-Template.svg?style=for-the-badge +[stars-shield]: https://img.shields.io/github/stars/UTAustin-SwarmLab/temporal-logic-video-dataset.svg?style=for-the-badge [stars-url]: https://github.com/UTAustin-SwarmLab/temporal-logic-video-dataset/stargazers -[issues-shield]: https://img.shields.io/github/issues/othneildrew/Best-README-Template.svg?style=for-the-badge -[issues-url]: https://github.com/UTAustin-SwarmLab/temporal-logic-video-dataset/issues -[license-shield]: https://img.shields.io/github/license/othneildrew/Best-README-Template.svg?style=for-the-badge +[license-shield]: https://img.shields.io/github/license/UTAustin-SwarmLab/temporal-logic-video-dataset.svg?style=for-the-badge [license-url]: https://github.com/UTAustin-SwarmLab/temporal-logic-video-dataset/blob/master/LICENSE.txt -[linkedin-shield]: https://img.shields.io/badge/-LinkedIn-black.svg?style=for-the-badge&logo=linkedin&colorB=555 -[linkedin-url]: https://www.linkedin.com/in/mchoi07/ -[product-screenshot]: images/screenshot.png diff --git a/example.py b/example.py deleted file mode 100644 index 037703f..0000000 --- a/example.py +++ /dev/null @@ -1,6 +0,0 @@ -import os - -# Get the name of the current file/script -script_name = os.path.basename(__file__) - -print(script_name) diff --git a/images/teaser.png b/images/teaser.png index f5e9716..174fba9 100644 Binary files a/images/teaser.png and b/images/teaser.png differ diff --git a/tests/__init__.py b/tests/__init__.py deleted file mode 100644 index b89b4ee..0000000 --- a/tests/__init__.py +++ /dev/null @@ -1 +0,0 @@ -"""Package containing your_project name.""" diff --git a/tests/subpackage_name/__init__.py b/tests/subpackage_name/__init__.py deleted file mode 100644 index b89b4ee..0000000 --- a/tests/subpackage_name/__init__.py +++ /dev/null @@ -1 +0,0 @@ -"""Package containing your_project name.""" diff --git a/tests/subpackage_name/test_subpackage_module_name.py b/tests/subpackage_name/test_subpackage_module_name.py deleted file mode 100644 index 5ff819f..0000000 --- a/tests/subpackage_name/test_subpackage_module_name.py +++ /dev/null @@ -1 +0,0 @@ -"""test script of subpackage module.""" diff --git a/tests/test_module_name.py b/tests/test_module_name.py deleted file mode 100644 index 5ca1a9f..0000000 --- a/tests/test_module_name.py +++ /dev/null @@ -1 +0,0 @@ -"""test script of module.""" diff --git a/tlv_dataset/loader/coco.py b/tlv_dataset/loader/coco.py index 1f308a7..cca138e 100644 --- a/tlv_dataset/loader/coco.py +++ b/tlv_dataset/loader/coco.py @@ -57,10 +57,12 @@ def load_data(self): """ img_ids = self._coco.getImgIds() images = [ - self._image_dir / self._coco.loadImgs(id)[0]["file_name"] for id in img_ids + self._image_dir / self._coco.loadImgs(id)[0]["file_name"] + for id in img_ids ] annotations = [ - self._coco.loadAnns(self._coco.getAnnIds(imgIds=id)) for id in img_ids + self._coco.loadAnns(self._coco.getAnnIds(imgIds=id)) + for id in img_ids ] return images, annotations @@ -76,16 +78,19 @@ def process_data(self) -> TLVRawImage: for id in img_ids: images.append( cv2.imread( - str(self._image_dir / self._coco.loadImgs(id)[0]["file_name"]) + str( + self._image_dir + / self._coco.loadImgs(id)[0]["file_name"] + ) )[:, :, ::-1] ) # Read it as RGB annotation = self._coco.loadAnns(self._coco.getAnnIds(imgIds=id)) labels_per_image = [] for i in range(len(annotation)): labels_per_image.append( - self._coco.cats[annotation[i]["category_id"]]["name"].replace( - " ", "_" - ) + self._coco.cats[annotation[i]["category_id"]][ + "name" + ].replace(" ", "_") ) unique_labels = list(set(labels_per_image)) if len(unique_labels) == 0: @@ -133,10 +138,10 @@ def map_data(self, **kwargs) -> any: # # Example usage: # coco_loader = COCOImageLoader( -# coco_dir_path="/opt/Neuro-Symbolic-Video-Frame-Search/artifacts/data/benchmark_image_dataset/coco", -# annotation_file="annotations/instances_val2017.json", -# image_dir="val2017", +# coco_root_dir_path="/store/datasets/COCO/2017", +# coco_image_source="val", # ) +# breakpoint() # # Display a sample image # coco_loader.display_sample_image(0)