Skip to content

Latest commit

 

History

History
45 lines (37 loc) · 2.71 KB

README.md

File metadata and controls

45 lines (37 loc) · 2.71 KB

Human Activity Detection with TensorFlow and Python

A simple baseline object detection model (Faster-RCNN with ResNet101 backbone) that can detect basic human activities like walking, running, sitting etc from image and video. The model is pre-trained on the Google AVA Actions dataset which contains the bounding box annotations for 60 basic human actions like sit, stand, walk, run etc. The entire list can be found on the label file. Checkout the blog post to learn more.

Installation

Install the dependencies using the commands below.

git clone https://github.com/visiongeeklabs/human-activity-detection.git
cd human-activity-detection
wget https://github.com/visiongeeklabs/human-activity-detection/releases/download/v0.1.0/frozen_inference_graph.pb
pip install -r requirements.txt

Running inference on image

Run inference on image using the command below

python detect_activity_image.py /path/to/input/image

# For example
python detect_activity_image.py sample_inputs/input_image1.webp

Running inference on video

Run inference on video using the command below

python detect_activity_video.py /path/to/input/video

# For example
python detect_activity_video.py sample_inputs/input_video.mp4

Limitations

There are some known limitations to this model that need to be kept in mind while using it.

  • It is an object detection model working on a single frame at a time. It doesn’t really have the memory of previous frames. For complex actions, it is important for the model to know what was happening in previous frames.
  • Faster-RCNN with ResNet101 backbone is a heavy model. It is recommended to run on a reasonably powerful GPU for faster processing. For example, on average it takes around 110 ms for a single frame of size 1280x720 on Nvidia T4 GPU (15 GB RAM) and takes around 3.5 seconds for the same frame on Intel Core i5 CPU (1.8 GHz, 8 GB RAM).
  • Sometimes the same person might be doing multiple activities like watching a person while standing. The model produces separate bounding boxes for each activity for the same person which might make the output image clumsy (that is why we have omitted few classes from processing).

Support on Patreon

If you are getting value out of this work, please consider supporting on Patreon and unlock exclusive perks such as

  • Downloadable PDFs
  • Ready to run Google Colab notebooks (with all the dependencies pre-configured))
  • Early access to blog posts and video tutorials
  • Hands-on live coding sessions and Q&A
  • Access to exclusive Discord Server