This is the repository for the CAST-STEM 2024 Summer Camp project. The project aims to estimate hand and object poses from recordings captured by the Multi-Camera System. The project website can be found here.
- CAST-STEM 2024 Summer Camp Project
Click on the image to watch the project instruction video.
This project is licensed under the GPL-3.0 License - see the LICENSE file for details.
-
Git
- For Linux:
sudo apt-get install git
-
For Windows:
- Option One: Github Desktop.
- Option Two: Git for Windows.
-
For MacOS:
- Option One: Github Desktop.
- Option Two: Homebrew.
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
-
Conda Environment Manager
Please refer to the official instruction Installing Miniconda to install the miniconda.
-
Code Editor (Visual Studio Code for example)
- You could install the Visual Studio Code (VSCode) from the official website.
- Once you have installed the VSCode, you could install below extensions:
- Create the Conda Environment
conda create --name summer_camp python=3.11
conda activate summer_camp
- Clone the Repository
git clone --recursive https://github.com/gobanana520/CAST-STEM-2024.git
cd CAST-STEM-2024
- Install Dependencies
- For Linux & Windows
python -m pip install --no-cache-dir -r requirements.txt
- For MacOS
python -m pip install --no-cache-dir -r requirements_macos.txt
- ROS Environment Setup [Optional]
If you plan to run the ROS locally, refer to the ROS Environment Setup document for detailed steps. You can then run roscore
to start the ROS master and debug your code under the ROS environment.
- Slides
- ✅ Pythion_Basics.ipynb Introduce basics in Python, such as list, tuple, set, dictionary, class, function, loop, etc.
- ✅ Numpy_Basics.ipynb Introduce basics in Numpy, such as array, matrix, operation, etc.
- ✅ Pytorch_Basics.ipynb Introduce basics in Pytorch, such as tensor, operation, etc.
- ✅ ComputerVisionBasics.pdf
- Practice 1: CV_Transformation.ipynb How to apply the transformation on 3D points.
- Practice 2: CV_Deprojection.ipynb How to depreject the 2D depth image to 3D points.
- Practice 3: CV_Triangulation.ipynb How to calculate the 3D points from 2D landmarks.
- Practice 4: SequenceLoader.ipynb Write a class to load the data from demo sequence.
- ✅ Introduction_to_ROS.pdf Introduce the basic concepts and useful commands in ROS.
- ✅ Slide: Introduction_to_MANO.pdf
Introduce the basic concepts of parametric hand model MANO, and the Pytorch implementation of MANO (Manopth).
- Practice 5: MANO_Hand.ipynb How to initialize the MANO layer and run the forward process.
- ✅ Introduction_to_Optimization.pdf
Introduce the basic concepts of optimization and the optimization algorithms.
- Practice 6: MANO_Pose_Optimization.ipynb How to use the Adam algorithm to optimize the MANO hand pose parameters to fit the target 3D joints.
- 📚 Readings
- 👉 Highlights
- Wiki: RANSAC Algorithm
- Practice 7: Notebook: RANSAC_Algorithm.ipynb A simple implementation of RANSAC algorithm.
- ROS message Synchronization & Extraction
- Export image from rosbag
A demo code of how to sync messages with
message_filters.ApproximateTimeSynchronizer()
from the rosbag file.
- Export image from rosbag
A demo code of how to sync messages with
- Link: MediaPipe Handmarks Detection Understand how to use MediaPipe to detect Handmarks from RGB images.
- Wiki: RANSAC Algorithm
- Methods will be used in the project
- Hand Pose Estimation: HaMeR
- Object Pose Estimation: FoundationPose
- Image Segmentation: Segment Anything
- Video Object Segmentation: XMem
- Related Papers (Optional)
- 👉 Highlights
-
Tasks
-
✅ Camera Intrinsics Extraction
- The camera intrinsics are saved under the
./data/calibration/intrinsics/<camera_serial>_640x480.json
file.
- The camera intrinsics are saved under the
-
✅ Camera Extrinsics Calibration
- We use a large calibration board to calibrate the camera extrinsics in pairs. Below is the usage demo of the tool Vicalib:
- The camera extrinsics are saved under the
./data/calibration/extrinsics/extrinsics_<data>/extrinsics.json
file.
-
✅ Hand Shape Calibration.
- The MANO hand shapes are saved under the
./data/calibration/mano/<person_id>/mano.json
file.
- The MANO hand shapes are saved under the
-
✅ Get familiar with data collection with the Multi-Camera System.
- Launch all the Realsense Cameras with ROS.
- Use RVIZ to visualize the camera images.
- Monitor the camera status.
- Command to record the rosbag from specific topics.
-
-
Objects Used in the Dataset
- The dataset contains the following objects:
- The object models are saved under the
./data/models
folder. You could use Meshlab to view the object models.
-
Tasks
- ✅ Collect the data with the Multi-Camera System.
- Each person will pick one object.
- Use single / two hands to manipulate the object.
- Recording is saved to the rosbag file.
- ✅ Extract the images from the rosbag recordings.
- ✅ Collect the data with the Multi-Camera System.
-
Homeworks
- HW1: Rosbag Extraction
- Write the class
RosbagExtractor
- to extract the images from the rosbag recordings for all the camera image topics.
- the extracted images should be saved in the
./data/recordings/<person_id>_<rosbag_name>
folder following below structure<person_id>_<rosbag_name> # the recording folder name ├── 037522251142 # the camera serial number │ ├── color_000000.jpg # the color image color_xxxxxx.jpg │ └── depth_000000.png # the depth image depth_xxxxxx.png │ └── ... ├── 043422252387 │ ├── color_000000.jpg │ ├── depth_000000.png │ ├── ... ├── ... ├── 117222250549 │ ├── color_000000.jpg │ ├── depth_000000.png │ ├── ...
- References:
- Write the class
- HW2: Metadata Generation
- For each extracted recording, the metadata should be generated under the sequence folder with filename
meta.json
. - The
object_id
(G01_1,...,G31_4) could be found in the Week 3 section. - Below is an example of the
meta.json
file:{ // the camera serial numbers "serials": [ "037522251142", "043422252387", "046122250168", "105322251225", "105322251564", "108222250342", "115422250549", "117222250549" ], // the image width "width": 640, // the image height "height": 480, // the extrinsics folder name "extrinsics": "extrinsics_20240611", // the person name "mano_calib": "john", // the object id "object_ids": "G31_4", // the hand sides in the recording // (if both hands are used, the order should be right first and then left) "mano_sides": ["right", "left"], // the number of frames in the recording "num_frames": 1024 }
- For each extracted recording, the metadata should be generated under the sequence folder with filename
- HW1: Rosbag Extraction
-
Slides
-
Tasks
- ✅ Handmarks Detection by MediaPipe
- ✅ Label the initial Object Mask mannually.
- ✅ Use XMem to generate the remaining masks for all the recordings.
- ✅ Generate 3D hand joints by Triangulation and RANSAC.
- ✅ Setup the HaMeR python environment.
- ✅ Setup the FoundationPose python environment.
-
Homeworks
- HW1: Handmarks Detection
- Write the class
MPHandDetector
to detect the 2D handmarks from the extracted images using the MediaPipe. - The detected handmarks should be saved in the
./data/recordings/<sequence_name>/processed/hand_detection
folder following below structure:<sequence_name>/processed/hand_detection ├── mp_handmarks_results.npz # the detected handmarks results └── vis # the folder to save the visualization results ├── mp_handmarks │ ├── vis_000000.png # the visualization image of the handmarks │ ├── vis_000001.png │ ├── ... └── mp_handmarks.mp4 # the visualization video of the handmarks
- The detected handmarks should be saved as the numpy array with the shape of
(num_hands, num_joints, 2)
. - The detected handmarks should be saved in the image coordinate system and unnormalized.
- The detected handmarks should be saved in the order of the right hand first, and then the left hand.
- References:
- Write the class
- HW2: Label the initial Object Mask mannually.
- The
mask_id
(1, 2,...,10) of each object could be found in the Week 3 section. - Dwonload the pretrained models [4.3GB] for Segment Anything Model (SAM).
- For linux like OS: run
bash ./config/sam/download_sam_model.sh
in the terminal. - Or you could download the models from the Box and put them under
./config/sam
folder.
- For linux like OS: run
- Run the mask label toolkit to label the object mask in each camera view.
python ./tools/04_run_mask_label_toolkit.py
- Click
...
to select the image. Ctrl + Left Click
to add positive point (green color).Ctrl + Right Click
to add negative point (red color).R
to reset the points.- Click
-
and+
to set the mask id, and clickAdd Mask
to add the mask. - Click
Save Mask
to save the mask. - The mask and visualization images will be saved in the
./data/recordings/<sequence_name>/processed/segmentation/init_segmentation/<camera_serial>
folder.
- Click
- HW3: Generate one 3D hand joint by triangulation and RANSAC.
- Create the list of candidate 3D points by triangulation handmarks of each camera pair.
- Use RANSAC to find the best 3D hand joint.
- References:
- The
- HW1: Handmarks Detection
-
Tasks
- ✅ Use the HaMeR to estimate the 2D handmarks in each camera view.
- Generate the input bounding box for the HaMeR.
- Run HaMeR model to estimate the 2D handmarks.
- ✅ Use the FoundationPose to estimate the object pose in each camera view.
- Setup the FoundationPose python environment.
- Write the
DataReader
to load the input data for the FoundationPose for our sequences. - Run the FoundationPose model to estimate the object pose.
- ✅ Optimize the final MANO hand pose.
- Generate 3D hand joints from handmarks of HaMeR.
- Optimize the MANO hand pose to fit the 3D hand joints.
- ✅ Optimize the final Object Pose.
- Generate the best 3D object pose from the FoundationPose results.
- Optimize the object pose to fit the 3D inlier FD poses.
- ✅ Generate the final 3D hand and object poses.
- Generate the final hand and object poses from the optimized MANO hand pose and object pose.
- The final MANO hand poses is save to
poses_m.npy
file under each sequence folder. - The final 6D object poses is save to
poses_o.npy
file under each sequence folder.
- The final MANO hand poses is save to
- Generate the final hand and object poses from the optimized MANO hand pose and object pose.
- ✅ Visualization of the final poses
- The rendered images are saved in the
./data/recordings/<sequence_name>/processed/sequence_rendering
folder. And - Tthe rendered video is saved to
vis_<sequence_name>.mp4
file under each sequence folder.
- The rendered images are saved in the
- ✅ Use the HaMeR to estimate the 2D handmarks in each camera view.
-
Homeworks
- HW1: Run FoundationPose Model on our sequences.
- Write the code to run FoundationPose on our dataset.
- References:
- HW2: Run HaMeR Model on our sequences.
- Write the code to run HaMeR on our dataset.
- References:
- Notebook: Run_HaMeR_for_Summer_Camp.ipynb
- HW1: Run FoundationPose Model on our sequences.
Videos demonstrating the final processed results of the project can be found below: