From bb646b428f43571bd02d02c14c551621bc10a66f Mon Sep 17 00:00:00 2001 From: Katsuya Hyodo Date: Sat, 19 Oct 2024 19:35:56 +0900 Subject: [PATCH] Update README.md --- 460_RT-DETRv2-Wholebody25/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/460_RT-DETRv2-Wholebody25/README.md b/460_RT-DETRv2-Wholebody25/README.md index c0e079a5c7..85fb0b6872 100644 --- a/460_RT-DETRv2-Wholebody25/README.md +++ b/460_RT-DETRv2-Wholebody25/README.md @@ -2,7 +2,7 @@ [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.10229410.svg)](https://doi.org/10.5281/zenodo.10229410) -This model far surpasses the performance of existing CNNs in both inference speed and accuracy. I'm not particularly interested in comparing performance between architectures, so I don't cherry-pick any of the verification results. +This model far surpasses the performance of existing CNNs in both inference speed and accuracy. I'm not particularly interested in comparing performance between architectures, so I don't cherry-pick any of the verification results. What is important is a balance between accuracy, speed, the number of output classes, and versatility of output values. Lightweight human detection models generated on high-quality human data sets. It can detect objects with high accuracy and speed in a total of 25 classes: `Body`, `Adult`, `Child`, `Male`, `Female`, `Body_with_Wheelchair`, `Body_with_Crutches`, `Head`, `Front`, `Right_Front`, `Right_Side`, `Right_Back`, `Back`, `Left_Back`, `Left_Side`, `Left_Front`, `Face`, `Eye`, `Nose`, `Mouth`, `Ear`, `Hand`, `Hand_Left`, `Hand_Right`, `Foot`. Even the classification problem is being attempted to be solved by object detection. There is no need to perform any complex affine transformations or other processing for pre-processing and post-processing of input images. In addition, the resistance to Motion Blur, Gaussian noise, contrast noise, backlighting, and halation is quite strong because it was trained only on images with added photometric noise for all images in the MS-COCO subset of the image set. In addition, about half of the image set was annotated by me with the aspect ratio of the original image substantially destroyed. I manually annotated all images in the dataset by myself. The model is intended to use real-world video for inference and has enhanced resistance to all kinds of noise. Probably stronger than any known model. However, the quality of the known data set and my data set are so different that an accurate comparison of accuracy is not possible.