Handwritten digits detection using a YOLOv8 detection model and ONNX pre/post processing. An example of how model works in real world scenario can be viewed at https://thawro.github.io/web-object-detector/.
The dataset consists of images created with the use of a HWD+ dataset.
The HWD+ dataset consists of gray images of single handwritten digits in high resolution (500x500 pixels).
The yolo_HWD+
dataset is composed of images which are produced with the use of HWD+
dataset. Each yolo_HWD+
image has many single digits on one image and each digit is properly annotated (class x_center y_center width height). The processing of HWD+
to obtain yolo_HWD+
:
- Cut the digit from each image (
HWD+
images have a lot of white background around) - Create background image of size
imgsz
and apply transform to it (pre_transform
attribute) - e.g. RGB shift/shuffle - Take
nrows * ncols
digit images and form a nrows x ncols grid. - For each digit:
- Apply transform (
obj_transform
attribute) - e.g. invert color, RGB shift/shuffle - Randomly place the digit in ij cell and save its label and location as annotation.
- Apply transform (
- Apply transform to the fully formed grid (
post_transform
attribute) - e.g. rotation
Example below:
- PyTorch - neural networks architectures and datasets classes
- ONNX - All processing steps used in pipeline
- ONNX Runtime - Pipeline inference
- OpenCV - Image processing for the server-side model inference (optional)
- React - Web application used to test object detection models in real world examples
Each pipeline step is done with ONNX models. The complete pipeline during inference is the following:
- Image preprocessing - resize and pad to match model input size (preprocessing)
- Object detection - Detect objects with YOLOv8 model (yolo)
- Non Maximum Supression - Apply NMS to YOLO output (nms)
- Postprocessing - Apply postprocessing to filtered boxes (postprocessing)