Human Action Recognition from Depth Maps

This repository contains the implementation of a system for Human Action Recognition (HAR) using depth map data. The system is designed to assist individuals with dementia in bathroom settings by recognizing human actions in a privacy-preserving manner. The project integrates cutting-edge deep learning techniques, utilizing the SPiKE model for 3D Human Pose Estimation (HPE) and a Relational Graph Convolutional Network (RGCN) for action classification.

Overview of the System

The system pipeline consists of the following stages:

Depth Map Acquisition: Collects 3D depth data in bathroom environments using depth sensors.
Point Cloud Generation: Converts depth maps into 3D point clouds, which represent spatial distributions.
3D Skeleton Estimation (SPiKE Model): Extracts the skeletal structure of humans from the point clouds.
Spatio-Temporal Graph Construction: Builds graphs where nodes represent body joints, and edges encode spatial and temporal relationships.
Action Classification (RGCN Model): Predicts human actions using the relational information in the spatio-temporal graph.

This system aims to provide real-time assistance while preserving user privacy by avoiding the capture of detailed visual information.

System Architecture

Deep Learning Models

SPiKE Model for 3D Human Pose Estimation

The SPiKE model is a neural network designed to predict 3D human poses from point clouds. It performs the following:

Local Feature Extraction: Analyzes spatial features in local volumes of the point clouds.
Temporal Encoding: Utilizes a transformer network to model motion dynamics.
Pose Regression: Outputs 3D coordinates for 15 human body joints, ensuring robust and accurate skeletal representations.

The model was fine-tuned on a custom dataset, BAD, annotated with 2D skeletons, to improve its performance in real-world settings.

Visualizing a Skeleton

SPiKE Model Workflow

Relational Graph Convolutional Network (RGCN)

The RGCN extends traditional graph neural networks to handle spatial and temporal relationships between skeleton joints. Features include:

Graph Construction: Represents human poses as graphs, with joints as nodes and spatial-temporal connections as edges.
Relational Convolutions: Uses distinct convolutional operations for different types of edges, such as spatial or temporal.
Action Classification: Processes graph data to identify one of eight actions, including walking, sitting, and washing hands.

What is a Spatio-Temporal Graph?

RGCN Architecture

Dataset

The system is trained and tested on the BAD dataset, a custom dataset of depth maps recorded in bathroom settings. This dataset includes:

Depth Maps: 3D representations of the environment.
Annotations: 2D skeletons manually labeled for training.
Actions: Eight human actions like sitting, standing, and washing hands.

Dataset Example

Results

SPiKE Model Results

Quantitative: High mean Average Precision (mAP) and Percentage of Correct Keypoints (PCK) across key joints.
Qualitative: Strong alignment of predicted skeletons with ground truth.

RGCN Model Results

Quantitative: Consistently decreasing training/testing losses and increasing accuracies.
Qualitative: Accurate classification of human actions in testing scenarios.

Key Features

Privacy-Preserving: Works with depth sensors to ensure individuals' dignity and anonymity.
Real-Time Processing: Designed for real-time human action recognition in practical scenarios.
Custom Dataset Support: Fine-tuned and tested on the BAD dataset for accurate performance in bathroom environments.
Extendable Framework: Can incorporate additional actions or adapt to other domains by modifying the graph construction and model training pipelines.

Getting Started

Setup Environment: Install dependencies using:
```
pip install -r requirements.txt
```
Prepare Dataset: Organize depth maps and skeleton annotations in the required format.
Train Models: Use the provided training scripts to fine-tune SPiKE and train the RGCN model.
Inference: Run the system on live or pre-recorded depth maps to classify human actions.

Future Improvements

Complete Dataset Annotation: The entire dataset should be comprehensively annotated to provide a richer and more diverse set of training examples, improving the model’s ability to generalize across various actions and scenarios.
Incorporation of Edge Features in Spatio-Temporal Graphs: The spatio-temporal graph can be enriched by adding edge features, such as the lengths of the edges (i.e., distances between joints). This additional information could help improve action classification accuracy.
Potential Integration of Symbolic Reasoning: Incorporate symbolic reasoning to create a Neurosymbolic AI system, enhancing the model’s ability to understand contextual information and make more informed decisions.
Answer Set Programming (ASP) for Safety Rules: Use ASP to model safety rules and reason about action sequences, improving real-time decision-making, safety, and patient autonomy.

Name		Name	Last commit message	Last commit date
Latest commit History 152 Commits
SPiKE		SPiKE
ST-RGCN		ST-RGCN
docs/images		docs/images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Human Action Recognition from Depth Maps

Overview of the System

System Architecture

Deep Learning Models

SPiKE Model for 3D Human Pose Estimation

Visualizing a Skeleton

SPiKE Model Workflow

Relational Graph Convolutional Network (RGCN)

What is a Spatio-Temporal Graph?

RGCN Architecture

Dataset

Dataset Example

Results

SPiKE Model Results

RGCN Model Results

Key Features

Getting Started

Future Improvements

About

Releases

Packages

Contributors 2

Languages

License

gvnberaldi/Pose2Action

Folders and files

Latest commit

History

Repository files navigation

Human Action Recognition from Depth Maps

Overview of the System

System Architecture

Deep Learning Models

SPiKE Model for 3D Human Pose Estimation

Visualizing a Skeleton

SPiKE Model Workflow

Relational Graph Convolutional Network (RGCN)

What is a Spatio-Temporal Graph?

RGCN Architecture

Dataset

Dataset Example

Results

SPiKE Model Results

RGCN Model Results

Key Features

Getting Started

Future Improvements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages