Skip to content

Commit

Permalink
Merge pull request #53 from orobix/dev
Browse files Browse the repository at this point in the history
Release v1.2.0
  • Loading branch information
rcmalli authored Sep 11, 2023
2 parents 5ecb71a + 17993c7 commit 9ba903d
Show file tree
Hide file tree
Showing 87 changed files with 2,259 additions and 797 deletions.
3 changes: 1 addition & 2 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,9 +37,8 @@ jobs:
- name: Install Package
run: |
python -m pip install -U pip
python -m pip install -e ".[test]" --no-cache-dir
python -m pip install -e ".[test,onnx]" --no-cache-dir
- name: Run Tests
run: |
python -m pytest -v --disable-pytest-warnings --strict-markers --color=yes -m "not slow"
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -68,3 +68,4 @@ docs/javascripts/images/*
test-output.xml
external/
site/
local/
41 changes: 41 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,47 @@
# Changelog
All notable changes to this project will be documented in this file.

### [1.2.0]

#### Added

- Add plot_raw_outputs feature to class VisualizerCallback in anomaly detection, to save the raw images of the segmentation and heatmap output.
- Add support for onnx exportation of trained models.
- Add support for onnx model import in all evaluation tasks.
- Add `export` configuration group to regulate exportation parameters.
- Add `inference` configuration group to regulate inference parameters.
- Add EfficientAD configuration for anomaly detection.
- Add `acknowledgements` section to `README.md` file.
- Add hashing parameters to datamodule configurations.

#### Updated

- Update anomalib library from version 0.4.0 to 0.7.0
- Update mkdocs library from version 1.4.3 to 1.5.2
- Update mkdocs-material library from version 9.1.18 to 9.2.8
- Update mkdocstrings library by fixing the version to 0.23.0
- Update mkdocs-material-extensions library by fixing the version to 1.1.1
- Update mkdocs-autorefs library by fixing the version to 0.5.0
- Update mkdocs-section-index library from version 0.3.5 to 0.3.6
- Update mkdocstrings-python library from version 1.2.0 to 1.6.2
- Update datamodule documentation for hashing.

#### Changed

- Move `export_types` parameter from `task` configuration group to `export` configuration group under `types` parameter.
- Refactor export model function to be more generic and be availble from the base task class.
- Remove `save_backbone` parameter for scikit-learn based tasks.

#### Fixed

- Fix failures when trying to override `hydra` configuration groups due to wrong override order.
- Fix certain anomalib models not loaded on the correct device.
- Fix quadra crash when launching an experiment inside a git repository not fully initialized (e.g. without a single commit).
- Fix documentation build failing due to wrong `mkdocstring` version.
- Fix SSL docstrings
- Fix reference page URL to segmentation page in module management tutorial.
- Fix `Makefile` command.

### [1.1.4]

#### Fixed
Expand Down
20 changes: 14 additions & 6 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,13 +1,16 @@
# Makefile
SHELL := /bin/bash
DEVICE ?= cpu

.PHONY: help
help:
@echo "Commands:"
@echo "clean : cleans all unnecessary files."
@echo "docs-serve : serves the documentation."
@echo "docs-build : builds the documentation."
@echo "style : runs pre-commit."
@echo "clean : cleans all unnecessary files."
@echo "docs-serve : serves the documentation."
@echo "docs-build : builds the documentation."
@echo "style : runs pre-commit."
@echo "unit-tests: : runs unit tests."
@echo "integration-tests: : runs integration tests."

# Cleaning
.PHONY: clean
Expand All @@ -27,8 +30,13 @@ style:
pre-commit run --all --verbose
.PHONY: docs-build
docs-build:
mkdocs build -d ./site
mkdocs build -d ./site

.PHONY: docs-serve
docs-serve:
mkdocs serve
mkdocs serve

.PHONY: unit-tests
unit-tests:
@python -m pytest -v --disable-pytest-warnings --strict-markers --color=yes --device $(DEVICE)

15 changes: 15 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -246,6 +246,21 @@ We rely on a combination of `Black`, `Pylint`, `Mypy`, `Ruff` and `Isort` to enf
3. To run the webserver for real-time rendering and editing run `mkdocs serve` and visit `http://localhost:8000/`.
4. If you want to export the static website to a specific folder `mkdocs build -d <Destination Folder>`


## Acknowledgements

This project is based on many open-source libraries and frameworks, we would like to thank all the contributors for their work. Here is a list of the main libraries and frameworks we use:

- [Pytorch](https://pytorch.org/) and [Pytorch Lightning](https://lightning.ai/) for training and deploying deep learning models. These two libraries are core part of training and testing tasks that allow us to run experiments on different devices in agile way.
- Pretrained models are usually loaded from [Pytorch Hub](https://pytorch.org/hub/) or [Pytorch-image-models](https://github.com/huggingface/pytorch-image-models) (or called as `timm`).
- Each specific task may rely on different libraries. For example, `segmentation` task uses [Segmentation_models.pytorch](https://github.com/qubvel/segmentation_models.pytorch) for loading backbones. The `anomaly detection` task uses a fork of [Anomalib](https://github.com/openvinotoolkit/anomalib) maintained by Orobix on [this repository](https://github.com/orobix/anomalib). We use light-weight ML models from [scikit-learn](https://scikit-learn.org/). We have also implementation of some SOTA models inside our library.
- Data processing and augmentation are done using [Albumentations](https://albumentations.ai/docs/) and [OpenCV](https://opencv.org/).
- [Hydra](https://hydra.cc/docs/intro/) for composing configurations and running experiments. Hydra is a powerful framework that allows us to compose configurations from command line interface and run multiple experiments with different settings and hyperparameters. We have followed suggestions from `Configuring Experiments` section of [Hydra documentation](https://hydra.cc/docs/patterns/configuring_experiments/) and [lightning-hydra-template](https://github.com/ashleve/lightning-hydra-template) repository.
- Documentation website is using [Material for MkDocs](https://squidfunk.github.io/mkdocs-material/) and [MkDocs](https://www.mkdocs.org/). For code documentation we are using [Mkdocstrings](https://mkdocstrings.github.io/). For releasing software versions we combine [Bumpver](https://github.com/mbarkhau/bumpver) and [Mike](https://github.com/jimporter/mike).
- Models can be exported in different ways (`torchscript` or `torch` file). We have also added [ONNX](https://onnx.ai/) support for some models.
- Testing framework is based on [Pytest](https://docs.pytest.org/en/) and related plug-ins.
- Code quality is ensured by [pre-commit](https://pre-commit.com/) hooks. We are using [Black](https://github.com/psf/black) for formatting, [Pylint](https://www.pylint.org/) for linting, [Mypy](https://mypy.readthedocs.io/en/stable/) for type checking, [Isort](https://pycqa.github.io/isort/) for sorting imports, and [Ruff](https://github.com/astral-sh/ruff) for checking futher code and documentation quality.

## FAQ

**How can I fix errors related to `GL` when I install full `opencv` package?**
Expand Down
7 changes: 3 additions & 4 deletions docs/tutorials/configurations.md
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,9 @@ defaults:
- override /scheduler: rop
- override /transforms: default_resize
export:
types: [torchscript]
datamodule:
num_workers: 8
batch_size: 32
Expand All @@ -178,10 +181,6 @@ task:
report: True
output:
example: True
export_config:
types: [torchscript]
input_shapes: # Redefine the input shape if not automatically inferred
core:
tag: "run"
Expand Down
15 changes: 15 additions & 0 deletions docs/tutorials/datamodules.md
Original file line number Diff line number Diff line change
Expand Up @@ -155,3 +155,18 @@ To extend the base datamodule is necessary to implement the `_prepare_data` func
- `split`: split of the image (train, test or val)

These are generally the required fields, different tasks may require additional fields. For example, in the case of segmentation tasks, the `masks` field is required.



## Data hashing

During `prepare_data` call of each datamodule we apply hashing algorithm for each sample of the dataset. This information helps developer to track not only the data path used for the experiment but also to track the data content. This is useful when the data is stored in a remote location and the developer wants to check if the data is the same as the one used for the experiment. [BaseDataModule][quadra.datamodules.base.BaseDataModule] class has following arguments to control the hashing process:

- `enable_hashing`: If `True` the data will be hashed.
- `hash_size`: Size of the hash. Must be one of [32, 64, 128]. Defaults to 64.
- `hash_type`: Type of hash to use, if content hash is used, the hash is computed on the file content, otherwise the hash is computed on the file size (`hash_type=size`) which is faster but less safe. Defaults to `content`.

After the training is completed. The hash value of each sample used from given dataset will be saved under `hash` column inside `<experiment_folder>/data/dataset.csv` file.

!!! info
If the user wants to disable hashing from command line, it is possible to pass `datamodule.enable_hashing=False` as override argument.
4 changes: 0 additions & 4 deletions docs/tutorials/devices_setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,13 +38,9 @@ _target_: quadra.tasks.SklearnClassification
device: "cuda:0"
output:
folder: "classification_experiment"
save_backbone: false
report: true
example: true
test_full_data: true
export_config:
types: [torchscript]
input_shapes: # Redefine the input shape if not automatically inferred
```

You can change the device to `cpu` or a different cuda device depending on your needs.
Expand Down
18 changes: 15 additions & 3 deletions docs/tutorials/examples/anomaly_detection.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,8 @@ experiments have shown that the previous models are better and faster to train.
- [Fastflow](https://github.com/openvinotoolkit/anomalib/tree/main/src/anomalib/models/fastflow): FastFlow is a two-dimensional normalizing flow-based probability distribution estimator. It can be used as a plug-in module with any deep feature extractor, such as ResNet and vision transformer, for unsupervised anomaly detection and localisation. In the training phase, FastFlow learns to transform the input visual feature into a tractable distribution, and in the inference phase, it assesses the likelihood of identifying anomalies.
- [DRAEM](https://github.com/openvinotoolkit/anomalib/tree/main/src/anomalib/models/draem): Is a reconstruction based algorithm that consists of a reconstructive subnetwork and a discriminative subnetwork. DRAEM is trained on simulated anomaly images, generated by augmenting normal input images from the training set with a random Perlin noise mask extracted from an unrelated source of image data. The reconstructive subnetwork is an autoencoder architecture that is trained to reconstruct the original input images from the augmented images. The reconstructive submodel is trained using a combination of L2 loss and Structural Similarity loss. The input of the discriminative subnetwork consists of the channel-wise concatenation of the (augmented) input image and the output of the reconstructive subnetwork. The output of the discriminative subnetwork is an anomaly map that contains the predicted anomaly scores for each pixel location. The discriminative subnetwork is trained using Focal Loss
- [CS-FLOW](https://github.com/openvinotoolkit/anomalib/tree/main/src/anomalib/models/csflow): The central idea of the paper is to handle fine-grained representations by incorporating global and local image context. This is done by taking multiple scales when extracting features and using a fully-convolutional normalizing flow to process the scales jointly.
- [EfficientAd](https://github.com/openvinotoolkit/anomalib/tree/main/src/anomalib/models/efficient_ad)
Fast anomaly segmentation algorithm that consists of a distilled pre-trained teacher model, a student model and an autoencoder. It detects local anomalies via the teacher-student discrepany and global anomalies via the student-autoencoder discrepancy.

For a detailed description of the models and their parameters please refer to the anomalib documentation.

Expand Down Expand Up @@ -122,6 +124,8 @@ callbacks:
output_path: anomaly_output
threshold_type: ${callbacks.min_max_normalization.threshold_type}
disable: true
plot_only_wrong: false
plot_raw_outputs: false
```

The min_max_normalization callback is used to normalize the anomaly maps to the range [0, 1] such that the threshold will become 0.5.
Expand All @@ -132,6 +136,11 @@ The post processing configuration allow to specify the method used to compute th

The visualizer callback is used to produce a visualization of the results on the test data, when the min_max_normalization callback is used the input_are_normalized flag must be set to true and the threshold_type should match the one used for normalization. By default it is disabled as it may take a while to compute, to enable just set `disable: false`.

In the context where many images are supplied to our model, we may be more interested in restricting the output images that are generated to only the cases where the result is not correct. By default it is disabled, to enable just set `plot_only_wrong: true`.

The display of the outputs of a model can be done in a preset format. However, this option may not be as desired, or may be affecting the resolution of the images. In order to give more flexibility to the generation of reports, the heatmap and segmentation ouput files can be generated independently and with the same resolution of the original image. By default it is disabled, to enable just set `plot_raw_outputs: true`.


### Anomalib configuration
Anomalib library doesn't use hydra but still uses yaml configurations that are found under `model/anomalib`.
This for example is the configuration used for PADIM.
Expand Down Expand Up @@ -177,14 +186,17 @@ As already mentioned anomaly detection requires just good images for training, t

### Experiment

Suppose that we want to run the experiment on the given dataset using the PADIM technique. We can define take the generic padim config for mnist as an example found under `experiment/generic/mnist/anomaly/padim.yaml`.
Suppose that we want to run the experiment on the given dataset using the PADIM technique. We can take the generic padim config for mnist as an example found under `experiment/generic/mnist/anomaly/padim.yaml`.

```yaml
# @package _global_
defaults:
- base/anomaly/padim
- override /datamodule: generic/mnist/anomaly/base
export:
types: [torchscript]
model:
model:
input_size: [224, 224]
Expand Down Expand Up @@ -219,7 +231,7 @@ trainer:
check_val_every_n_epoch: ${trainer.max_epochs}
```

We start from the base configuration for PADIM, then we override the datamodule to use the generic mnist datamodule. Using this configuration we specify that we want to use PADIM, extracting features using the resnet18 backbone with image size 224x224, the dataset is `mnist`, we specify that the task is taken from the anomalib configuration which specify it to be segmentation. One very important thing to watch out is the `check_val_every_n_epoch` parameter. This parameter should match the number of epochs for `PADIM` and `Patchcore`, the reason is that in the validation phase the model will be fitted and we want the fit to be done only once and on all the data, increasing the max_epoch is useful when we apply data augmentation, otherwise it doesn't make a lot of sense as we would fit the model on the same, replicated data.
We start from the base configuration for PADIM, then we override the datamodule to use the generic mnist datamodule. Using this configuration we specify that we want to use PADIM, extracting features using the resnet18 backbone with image size 224x224, the dataset is `mnist`, we specify that the task is taken from the anomalib configuration which specify it to be segmentation. One very important thing to watch out is the `check_val_every_n_epoch` parameter. This parameter should match the number of epochs for `PADIM` and `Patchcore`, the reason is that in the validation phase the model will be fitted and we want the fit to be done only once and on all the data, increasing the max_epoch is useful when we apply data augmentation, otherwise it doesn't make a lot of sense as we would fit the model on the same, replicated data. The model will be exported at the end of the training phase, as we have specified the `export.types` parameter to `torchscript` the model will be exported only in torchscript format.

### Run

Expand Down Expand Up @@ -282,7 +294,7 @@ task:

By default, the inference will recompute the threshold based on test data to maximize the F1-score, if you want to use the threshold from the training phase you can set the `use_training_threshold` parameter to true.

The model path is the path to an exported model, at the moment only `torchscript` models are supported (exported automatically after a training experiment). Right now only the `CFLOW` model is not supported for inference as it's not compatible with torchscript.
The model path is the path to an exported model, at the moment `torchscript` and `onnx` models are supported (exported automatically after a training experiment). Right now only the `CFLOW` model is not supported for inference as it's not compatible with botyh torchscript and onnx.

An inference configuration using the mnist dataset is found under `configs/experiment/generic/mnist/anomaly/inference.yaml`.

Expand Down
11 changes: 4 additions & 7 deletions docs/tutorials/examples/classification.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,9 +114,6 @@ task:
report: True
output:
example: True
export_config:
types: [pytorch, torchscript]
input_shapes: # Redefine the input shape if not automatically inferred
core:
Expand Down Expand Up @@ -172,6 +169,9 @@ defaults:
- override /backbone: vit16_tiny
- _self_
export:
types: [onnx, torchscript]
datamodule:
num_workers: 12
batch_size: 32
Expand All @@ -187,9 +187,6 @@ task:
report: True
output:
example: True # Generate an example of concordants and discordants predictions for each class
export_config:
types: [pytorch, torchscript]
input_shapes: # Redefine the input shape if not automatically inferred
model:
Expand Down Expand Up @@ -236,7 +233,7 @@ checkpoints config_tree.txt deployment_model test
config_resolved.yaml data main.log
```

Where `checkpoints` contains the pytorch lightning checkpoints of the model, `data` contains the joblib dump of the datamodule with its parameters and dataset split, `deployment_model` contains the model in exported format (default is torchscript), `test` contains the test artifacts.
Where `checkpoints` contains the pytorch lightning checkpoints of the model, `data` contains the joblib dump of the datamodule with its parameters and dataset split, `deployment_model` contains the model in exported format (in this case onnx and torchscript, but by default is only torchscript), `test` contains the test artifacts.

## Evaluation

Expand Down
3 changes: 0 additions & 3 deletions docs/tutorials/examples/multilabel_classification.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,9 +139,6 @@ task:
report: False
output:
example: False
export_config:
types: [torchscript]
input_shapes: # Redefine the input shape if not automatically inferred
logger:
Expand Down
9 changes: 4 additions & 5 deletions docs/tutorials/examples/segmentation.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,9 @@ defaults:
- base/segmentation/smp_multiclass # use smp file as default
- _self_ # use this file as final config
export:
types: [onnx, torchscript]
backbone:
model:
classes: 4 # The total number of classes (background + foreground)
Expand All @@ -171,10 +174,6 @@ task:
report: false # allows to generate reports
evaluate: # custom evaluation toggles
analysis: false # Perform in depth analysis
export_config:
types: [torchscript]
input_shapes: # Redefine the input shape if not automatically inferred
datamodule:
data_path: /path/to/the/dataset # change the path to the dataset
Expand All @@ -200,7 +199,7 @@ core:
When defining the `idx_to_class` dictionary, the keys should be the class index and the values should be the class name. The class index starts from 1.


In the final configuration experiment we have specified the path to the dataset, batch size, split files, GPU device, experiment name and toggled some evaluation options.
In the final configuration experiment we have specified the path to the dataset, batch size, split files, GPU device, experiment name and toggled some evaluation options, moreover we have specified that we want to export the model to `onnx` and `torchscript` formats.

By default data will be logged to `Mlflow`. If `Mlflow` is not available it's possible to configure a simple csv logger by adding an override to the file above:

Expand Down
Loading

0 comments on commit 9ba903d

Please sign in to comment.