Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
PINTO0309 authored Dec 16, 2023
1 parent 06b9729 commit ce7b260
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion 426_YOLOX-Body-Head-Hand/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Lightweight human detection model generated using a high-quality human dataset.

https://github.com/PINTO0309/PINTO_model_zoo/assets/33194443/ab4c4b1b-6e51-416a-948f-809b3d06eafd

The advantage of being able to detect hands with high accuracy is that it makes it possible to detect key points on the fingers as correctly as possible. The video below is processed by converting the MediaPipe tflite file to ONNX, so the performance of keypoint detection is not very high. It is assumed that information can be acquired quite robustly when combined with a highly accurate keypoint detection model focused on the hand region. It would be realistic to use the distance in the Z direction, which represents depth, in combination with physical information such as ToF, rather than relying on model estimation. To obtain as accurate a three-dimensional value as possible, including depth, sparse positional information on a two-dimensional plane, such as skeletal detection, is likely to break down the algorithm. This has the advantage that unstable depths can be easily corrected by a simple algorithm by capturing each part of the body in planes, as a countermeasure to the phenomenon that when information acquired from a depth camera (ToF or stereo camera parallax measurement) is used at any one point, the values are affected by noise and become unstable due to environmental noise.
The advantage of being able to detect hands with high accuracy is that it makes it possible to detect key points on the fingers as correctly as possible. The video below is processed by converting the MediaPipe tflite file to ONNX, so the performance of keypoint detection is not very high. It is assumed that information can be acquired quite robustly when combined with a highly accurate keypoint detection model focused on the hand region. It would be realistic to use the distance in the Z direction, which represents depth, in combination with physical information such as ToF, rather than relying on model estimation. To obtain as accurate a three-dimensional value as possible, including depth, sparse positional information on a two-dimensional plane, such as skeletal detection, is likely to break down the algorithm. This has the advantage that unstable depths can be easily corrected by a simple algorithm by capturing each part of the body in planes, as a countermeasure to the phenomenon that when information acquired from a depth camera (ToF or stereo camera parallax measurement) is used at any one point, the values are affected by noise and become unstable due to environmental noise. The method of detecting 133 skeletal keypoints at once gives the impression that the process is very heavy because it requires batch or loop processing to calculate heat maps for multiple human bounding boxes detected by the object detection model. I also feel that the computational cost is high because complex affine transformations and other coordinate transformation processes must be performed on large areas of the entire body. However, this is not my negative view of a model that detects 133 keypoints, only that it is computationally expensive to run on an unpowered edge device.

https://github.com/PINTO0309/hand_landmark

Expand Down

0 comments on commit ce7b260

Please sign in to comment.