Skip to content

Latest commit

 

History

History
118 lines (89 loc) · 5.6 KB

README.md

File metadata and controls

118 lines (89 loc) · 5.6 KB

torch-lrcn

torch-lrcn provides a framework in Torch7 for action recognition using Long-term Recurrent Convolutional Networks. The LRCN model was proposed by Jeff Donahue et. al in this paper. Find more information about their Caffe code and experiments here.

Note that currently this library does not support fine-grained action detection (i.e. a specific label for each frame). The detection accuracy it computes is simply the frame accuracy using only a single label for each video.

Installation

System setup

You need ffmpeg accessible via command line. Find installation guides here.

Lua setup

All code is written in Lua using Torch; you can find installation instructions here. You'll need the following Lua packages:

After installing Torch, you can install / update these packages by running the following:

# Install using Luarocks
luarocks install torch
luarocks install nn
luarocks install optim
luarocks install image
luarocks install ffmpeg

We also need @jcjohnson's LSTM module, which is already included in this repository.

CUDA support

Because training takes awhile, you will want to use CUDA to get results in a reasonable amount of time. To enable GPU acceleration with CUDA, you'll first need to install CUDA 6.5 or higher. Find CUDA installations here.

Then you need to install following Lua packages for CUDA:

You can install / update the Lua packages by running:

luarocks install cutorch
luarocks install cunn

Usage

Training and testing a model requires some text files. The scripts assume that there are valid text files detailed below, and that all videos have the same native resolution.

Step 1: Ready the data

The training step requires a text file for each of the training, validation, and testing splits. The structure of these text files is identical.

Example line: <path to video> <label>

Example file:

/path/to/video1.avi 1
/path/to/video2.avi 4
...
/path/to/video10.avi 3

Step 2: Train the model

With the text files ready, we can begin training using train.lua. This will take quite some time because it is training a CNN and LSTM step.

You can run the training script, at minimum, like this:

th train.lua -trainList train.txt -valList val.txt -testList test.txt -numClasses 101 -videoHeight 240 -videoWidth 320

By default, this will dump 8 random frames at 5 FPS in native resolution representing semi-equally sized chunks for each video, train for 30 epochs, and save checkpoints to the trained models with names like checkpoints/checkpoint_3.t7. This also runs with CUDA by default. Run on CPU with -cuda 0. The default values are tuned to fit on an NVIDIA GPU with 4GB VRAM.

Some important parameters for training to tune are:

  • -scaledHeight: optional downscaling
  • -scaledWidth: optional downscaling
  • -desiredFPS: FPS rate to convert videos to
  • -seqLength: number of frames for each video
  • -batchSize: number of videos per batch
  • -numEpochs: number of epochs to train for
  • -learningRate: learning rate
  • -lrDecayFactor: multiplier for the learning rate decay
  • -lrDecayEvery: decay the learning rate after every n epochs

An example of a more specific run:

th train.lua -trainList train.txt -valList val.txt -testList test.txt -numClasses 101 -videoHeight 240 -videoWidth 320 -scaledHeight 224 -scaledWidth 224 -seqLength 16 -batchSize 4 -numEpochs 15

Step 3: Test the model

After training a model, you can compute the action recognition and detection accuracies using a model you trained. Do this by running test.lua as such:

th test.lua -checkpoint checkpoints/checkpoint_final.t7

By default, this will load the trained checkpoint checkpoints/checkpoint_final.t7 from the training step and then compute the action detection and recognition accuracies for the test split. This also runs with CUDA by default. Run on CPU with -cuda 0.

The list of parameters is:

  • -checkpoint: path to a checkpoint file (default: '')
  • -split: name of split to test on (default: 'test')
  • -cuda: run with CUDA (default: 1)

Acknowledgments

  • J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, 2015.
  • Justin Johnson for his torch-rnn library, which this library was heavily modeled after.
  • Serena Yeung for the project idea, direction, and advice.
  • Stanford University CS 231N course staff for granting funds for AWS EC2 testing.

TODOs

  • Separate data preprocessing into its own step.
  • Parallelize data loading.
  • Write more documentation in a doc folder about training flags.
  • Implement fine grained action detection..
  • Add unit tests.