GinJinn_development

GinJinn2 provides a collection of command-line tools for bounding-box object detection and instance segmentation based on Detectron2. Besides providing a convenient interface to the latter, GinJinn2 offers several utility functions to facilitate building custom pipelines.

Installation

Requirements

Installation via Conda

It is recommended to install GinJinn via Conda, an open-source package management system for Python and R, which also includes facilities for environment management system. See the official installation guide for further information.

To install Conda, run the following commands in your Linux terminal:

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

Usage

Make sure to activate your Conda environment via conda activate MY_ENV_NAME prior to running any ginjinn command.

ginjinn and all of its subcommands provide help pages, which can be displayed using the argument -h or --help, e.g.

ginjinn -h (get list of all essential GinJinn commands)
ginjinn utils -h (get list of GinJinn's additional utilities)
ginjinn utils crop -h (get usage information for the cropping utility)

The help pages briefly describe basic functionality and command-specific arguments.

Dataset formats

A (labeled) input dataset should consist of a single image directory containing JPG images at its top level and accompanying annotations. GinJinn2 supports two common annotation formats, COCO's data format (one JSON file per dataset), which is also used as output format, and XML files as used by PASCAL VOC (one file per image). The latter, however, is only supported for bounding-box object detection.

Although not mandatory, it is recommended to place image directory and annotations in a common directory to enable more compact command invocations. If datasets are structured as shown below, the user does not have specify the image directory explicitely. Note that the file names are arbitrarily chosen.

COCO

data
├── annotations.json
└── images
    ├── 1.jpg
    ├── 2.jpg
    └── ...

Pascal VOC

data
├── annotations
│   ├── 1.xml
│   ├── 2.xml
│   └── ...
└── images
    ├── 1.jpg
    ├── 2.jpg
    └── ...

In case of nested image directories, ginjinn utils flatten helps to convert datasets to an accepted format.

Train-val-test split

In addition to the dataset for training the model, it is advisable to provide a validation dataset, which can be used to optimize (hyper)parameters and to detect overfitting. A further test dataset, if available, allows to obtain an unbiased evaluation of the final, trained model.

ginjinn split can be used to partition a single dataset such that each image along with its annotated objects is assigned to only one of two or three sub-datasets ("train", "val", "test"). Aiming at a balanced split across different object categories, a simple heuristic is used to propose dataset partitions. The generated output has the following structure:

COCO

data
├── train
│   ├── annotations.json
│   └── images
├── val
│   ├── annotations.json
│   └── images
└── test
    ├── annotations.json
    └── images

Pascal VOC

data
├── train
│   ├── annotations
│   └── images
├── val
│   ├── annotations
│   └── images
└── test
    ├── annotations
    └── images

Basic workflow

ginjinn new

This command generates a new project directory, which is required for training, evaluation, and prediction. Initially, it only contains an empty output folder and the GinJinn2 configuration file (“ginjinn_config.yaml”), a simple, formatted text file for storing various settings, which can be edited by the user for customization. When executing certain GinJinn2 commands, further data may be written to the project directory. To avoid inconsistencies, it is strongly recommended to keep the configuration file fixed throughout subsequent steps of the analysis.

The -t/--template and -d/--data_dir options allow to automatically specify various settings such that a valid configuration can be created without manually editing the configuration file.

ginjinn train

This command trains the model and simultaneously evaluates it using the validation dataset, if available. Checkpoints are saved periodically at predefined intervalls. While the training is running, its progress can be most easily monitored via the "outputs/metrics.pdf" file in your project directory. If the training has finished or in case of interruption, it can be resumed with -r/--resume. The number of iterations as specified in the configuration file can be overwritten with -n/--n_iter.

ginjinn evaluate

This command calculates COCO evaluation metrics for a trained model using the test dataset.

ginjinn predict

This command uses a trained model to predict objects for new, unlabeled images. It provides several optional types of output: a COCO annotation file, object visualization on the original images, and cropped images (segmentation masks) for each predicted object.

Concrete workflows including more complex examples are described in XYZ.

License

GinJinn2 is released under the Apache 2.0 license.

Citation

XYZ

Name		Name	Last commit message	Last commit date
Latest commit History 307 Commits
.github/workflows		.github/workflows
conda.recipe		conda.recipe
ginjinn		ginjinn
.gitignore		.gitignore
.pylintrc		.pylintrc
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GinJinn_development

Installation

Requirements

Installation via Conda

Usage

Dataset formats

Train-val-test split

Basic workflow

License

Citation

About

Releases

Packages

Contributors 2

Languages

License

AGOberprieler/GinJinn_development

Folders and files

Latest commit

History

Repository files navigation

GinJinn_development

Installation

Requirements

Installation via Conda

Usage

Dataset formats

Train-val-test split

Basic workflow

License

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages