This project focuses on the development of a robust Convolutional Neural Network (CNN) for the precise detection of human-object interactions in images. Leveraging computer vision and deep learning techniques, the model is trained on a curated dataset, extracted from videos capturing scenarios with and without interactions. The CNN's architecture, featuring convolutional, batch normalization, and dense layers, emphasizes both accuracy and interpretability. Extensive data augmentation ensures the model's generalization across diverse real-world scenarios.
For detailed implementation and results, please refer to the HOID.ipynb Jupyter notebook.
Demonstration of the HOID model in action.