I find that EgoHOS produce better hand-object segmentation than 100DoH + Semantic-SAM pipeline. I have also provided a wrapper of EgoHOS in ego_hos_wrapper.py
.
First install EgoHOS from the official repository, than run the code below for segmentation:
from ego_hos_wrapper import EgoHOSWrapper
base_fp = "image.jpg"
ego_hos_wrapper = EgoHOSWrapper(cache_path="/home/ycb/dflowmap_project/dflowmap/dfm/hoi/cache", # an absolute file-path for caching
repo_path='../repo')
seg_hands, seg_obj2, seg_cb = ego_hos_wrapper.segment(image_fp, vis=True) # "cb" is contact boundary
This repository is a pipeline for hand object interaction (HOI) analysis, which includes:
(1) hand object state and position detection.
(2) hand and object segmentation.
(3) dynamic area segmentation.
This repository is based on 100DoH for state and position detection, Semantic-SAM for segmentation and General-Flow codebase for post-processing. The whole pipeline is as follows: (1) 100DoH is used to detect the contact/manipulation state and bbox position. (2) The center point of each active bbox is used as prompt of Semantic-SAM for segmentation. The mask with max area size is chosen as a conservative mask estimation. (3) All masks in the previous step are merged into the dynamic-area mask.
I also fix the bugs in the compilation of 100DoH caused by the upgrade of pytorch and CUDA, and the bugs in Semantic SAM caused by the loss of package for its detectron2 dependencies.
Hope this repository can help you : )
my environment: python3.11, CUDA11.7, cudatoolkit11.7, torch=='2.0.1+cu117'.
First use the 100DoH repository I provided in ego_hand_detector/
to install the 100DoH hand-object detector. You can refer to ego_hand_detector/README.md
.
Note that if you used the original 100DoH repository, you may meet problems since it only support for low-version CUDA.
First get the Semantic-SAM repository from here.
git clone https://github.com/UX-Decoder/Semantic-SAM.git
Then install the repository according to the Semantic-SAM/README.md
.
If you meet the problem MultiScaleDeformableAttention Module is not Found
in the later process, you may solved it as below:
cd ops
./make.sh
Run the following code as demo:
from ego_hoi_detection import EgoHOIDetection
base_fp = "image.jpg"
ego_hoi_detector = EgoHOIDetection(repo_path='.')
det_result = ego_hoi_detector.detect(base_fp, vis=True)
The struction of det_result
is as follows:
{
'left': {...}
'right':
{
'offset': array([ 0.02586488, -0.02177986, -0.09759938]),
'bbox_obj': array([295.7385 , 84.01411, 340.0573 , 124.14677]),
'bbox_hand': array([303.76373, 91.89891, 360.7369 , 138.00621]),
'confidence_hand': 0.99910814,
'state': 'P',
'state_explaination': 'Portable Object'}
}
}
The visualization result will be saved in vis_det.png
. Check out 100DoH and ego_hoi_detection.py
for more details.
Run the following code as demo:
from ego_hoi_segmentation import EgoHOISegmentation
base_fp = "image.jpg"
ego_hoi_segmentation = EgoHOISegmentation(repo_path='../repo')
seg_result, det_result = ego_hoi_segmentation.segment(image_fp, hand_threshold=0.2, vis=True)
The struction of seg_result
is a dict (e.g. dict_keys(['left_hand', 'left_obj', 'dynamic_area'])) with (H×W) numpy value of each key. Note that we pick up the mask with max area size of Semantic-SAM for conservative estimation. Check out the original repository and ego_hoi_segmentation.py
for more details.
@article{yuan2024general,
title={General Flow as Foundation Affordance for Scalable Robot Learning},
author={Yuan, Chengbo and Wen, Chuan and Zhang, Tong and Gao, Yang},
journal={arXiv preprint arXiv:2401.11439},
year={2024}
}
This repository is based on the code from 100DoH, Semantic-SAM, EgoHOS, General-Flow and Deformable-DETR. Thanks a lot : )