The problem statement focuses on classifying the action from a trimmer video clip. The Kinectics-700 dataset consists of 650,000 clips with 600-700 actions consisting of atleast 600 for each class. The dataset consists of YouTube urls with time stamps of duration of 10 seconds, with variable resolution, and different frame rate.
The repository in organized with various notebooks for specific tasks performed. The data is cleaned and images are extracted from the trimmed video clips. The following data is provided as an input to the model for prediction of action.