Skip to content

Vehicle Detection and Tracking using traditional computer vision and machine learning techniques such as Histogram of Oriented Gradients (HOG) and Support Vector Machines (SVM).

Notifications You must be signed in to change notification settings

sagunms/CarND-Vehicle-Detection

 
 

Repository files navigation

Vehicle Detection and Tracking

Udacity - Self-Driving Car NanoDegree

Overview

This project detects and tracks vehicles using traditional computer vision and machine learning techniques. These include performing a Histogram of Oriented Gradients (HOG) feature extraction on a labeled training set of vehicle/non-vehicle images and training a Linear Support Vector Machines (SVM) classifier model. The algorithm extracts features from the input video stream by applying a colour transform, performing HOG, Colour Histogram and Spatial binning. Then a sliding-window technique is used to scan the road for vehicles in the images by using the trained classifier to indicate that certain patches correspond to a vehicle or not. False positive are filtered out and vehicle tracking is stabalised by thresholding on heat maps over a number of frames of overlapping bounding boxes.

The following animation demonstrate how the final model, combined with Lane Detection, performs on the given video stream.

alt text

Project goals

  • Perform a Histogram of Oriented Gradients (HOG) feature extraction on a labeled training set of images and train a classifier Linear SVM classifier
  • Optionally, you can also apply a color transform and append binned color features, as well as histograms of color, to your HOG feature vector.
  • Note: for those first two steps don't forget to normalize your features and randomize a selection for training and testing.
  • Implement a sliding-window technique and use your trained classifier to search for vehicles in images.
  • Run your pipeline on a video stream (start with the test_video.mp4 and later implement on full project_video.mp4) and create a heat map of recurring detections frame by frame to reject outliers and follow detected vehicles.
  • Estimate a bounding box for vehicles detected.

Run Instructions

The project is written in python and utilises numpy, OpenCV, scikit learn and MoviePy.

Here are the steps required to generate the model from scratch and run the project for vehicle tracking.

Clone my project

git clone https://github.com/sagunms/CarND-Vehicle-Detection.git
cd CarND-Vehicle-Detection

Activate conda environment

Follow instructions from CarND-Term1-Starter-Kit page to setup the conda environment from scratch.

source activate carnd-term1

Download training data of vehicles and non-vehicles

mkdir data
cd data
wget https://s3.amazonaws.com/udacity-sdc/Vehicle_Tracking/vehicles.zip
wget https://s3.amazonaws.com/udacity-sdc/Vehicle_Tracking/non-vehicles.zip
unzip vehicles.zip
unzip non-vehicles.zip
cd ..

Configure parameters

vim vehicle_lib/config.py

Train model from downloaded training data

python model.py -m model.mdl

Run vehicle detection project (output video)

python main.py -m model.mdl -i project_video.mp4 -o annotated_project_video.mp4

Project structure

Source Code

The code is divided up into several files which are imported by model.py and main.py.

  • main.py - Takes input video file, input trained model and outputs annotated video with bounding boxes highlighting the detected vehicles. Moreover, it also highlights the lane the vehicle is in due to the integration of lane detection library.
  • model.py - Reads vehicle and non-vehicle labelled images, extracts HOG features for both classes, splits features into training and validation datasets. Then, trains a pipeline consisting of StandardScaler and a Linear SVM classifier and saves the trained model as output.
  • vehicle_lib/config.py - Consists of configuration parameters such as colour space, HOG parameters, spatial size, histogram bins, sliding window parameters, and region of interest, etc.
  • vehicle_lib/vehicle_detect.py - Main class of the project which encapsulates sliding windows, feature generation, svm, remove duplicates and false positives, etc.
  • vehicle_lib/feature_extract.py - Consists of functions related to feature extraction such as HOG features, spatial binning, colour histogram, etc.
  • vehicle_lib/window.py - Consists of functions related to sliding window traversal and predicting which patch of the video frame contains a vehicle using the trained SVM model.
  • vehicle_lib/heatmap.py - Class for stablising detected heatmaps. Maintains history of heat maps over multiple frames and takes aggregate of all frames.
  • vehicle_lib/utils.py - Consists of utility functions to display images, features, traverse through subdirectories to load training images.
  • vehicle_lib/debug.py - Some plotting functions the helped during debugging.
  • lane_lib/* - Lane Detection library from my Advanced Lane Finding project.

Miscellaneous Files

  • VehicleDetection.ipynb - Jupyter notebook for generating various stages of the project to assist this writeup. Images produced from this notebook can also be found at output_images/*.png.
  • model.mdl - Trained model saved as the outcome of training the Linear SVM classifier from model.py. This file was then used in main.py to produce the annotated videos for demonstrating the working of my vehicle detection project.
  • calib.p - Pickle file containing instrinc camera calibration matrix and distortion coefficient saved as the outcome of CameraCalibrate class used during the initialisation of the lane detection pipeline.
  • annotated_project_video.mp4 - The output of the vehicle detection project when processing against project_video.mp4 video stream.
  • annotated_project_video_test.mp4 - The output of the vehicle detection project when processing against test_video.mp4 video stream.
  • annotated_project_video_combined.mp4 - The output of the vehicle detection project when combined with lane finding project, and processing against project_video.mp4 video stream.

Algorithm

Histogram of Oriented Gradients (HOG) and other features

First step is to extract the features used to train the classifier and then to classify the video frames.

The code for this step is contained in extract_features function in feature_extract.py. This is invoked by prepare_train_features function in model.py, which is ultimately invoked when runing the __main__ to train the model.

I started by reading in all the vehicle and non-vehicle labelled images and calling extract_features function. Here is an example of some of the vehicle and non-vehicle classes:

alt text

alt text

After trying out different color spaces and different parameters skimage.hog() parameters (orientations, pixels_per_cell, and cells_per_block). I grabbed random images from each of the two classes and displayed them to get a feel for what the skimage.hog() output looks like.

First I tried different parameters however, the one provided in the course material was better and therefore settled with that. I used YCrCb color space and HOG parameters of orientations=9, pix_per_cell=(8, 8) and cells_per_block=(2, 2):

alt text alt text

Classifier

After extracting features, we need to train a classifier to be able to differentiate between a portion of the frame as being a vehicle or non-vehicle.

I used sklearn.pipeline.Pipeline() to encapsulate both StandardScalar and linear Support Vector Machine (SVM) into one, train it and save it as a model file using sklearn.externals.joblib. This help separate training and prediction into model.py and main.py files respectively.

The main.py loads the saved model (model.mdl) and passes the loaded classifier pipeline to VehicleDetect class.

The for training the classifier is contained in extract_features function in feature_extract.py. This is invoked by prepare_train_features function in model.py,which prepares vehicle and non-vehicle features from the provided training images and split into training and testing dataset at the ratio of 75% and 25% respectively. Initially, I experimented with various values of C parameter. However, I later found out that the default linear SVM parameters initialised achieved validation accuracy of 0.9903 which was sufficient for detecting vehicles from the video stream.

The classifier pipleine parameters are as follows:

Pipeline(steps=[('scaling', StandardScaler(copy=True, with_mean=0, with_std=1)), ('classifier', SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='linear',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False))])

Sliding Window Search

To locate the cars in each frame, a sliding window approach was used over a region of interest. Initially, I started with different window sizes and overlaps hoping to get a finer resolution and higher detection accuracy. I tried different window sizes, region of interest and overlaps. Through hit and trail, I settled for a more simplistic single scale window instead of varying the sizes. The parameters I used are as follows which can be found in vehicle_lib/config.py.

  • xy_window = (96, 96)
  • xy_overlap = (.75, .75)
  • y_start_stop = [400, 600]
  • x_start_stop = [None, None]

The region of interest for sliding window search includes only the portion of the road, spanning from left to right.

alt text

For this project, I searched on YCrCb 3-channel HOG features plus spatially binned color and histograms of color in the feature vector, which provided a nice result. This was mostly based on recommended values available in computer vision literature. The parameters can be found in vehicle_lib/config.py.

  1. Spatial Binning
    • spatial_size = (32, 32)
    • Function: bin_spatial() in vehicle_lib/feature_extract.py.
  2. Color Histograms
    • hist_bins = 32
    • Function: color_hist() in vehicle_lib/feature_extract.py.
  3. Histogram of Oriented Gradients (HOG)
    • pix_per_cell = 8
    • cell_per_block = 2
    • orient = 9
    • color_space = YCrCb
    • Function: get_hog_features() in vehicle_lib/feature_extract.py.

Filtering False positives and Vehicle Tracking

I recorded the positions of positive detections in each frame of the video. Numerous patches in the images are predicted as being a vehicle and therefore contains noisy false positives.

alt text

The above figure illustrates the need to filter out overlapping bounding boxes by filtering. For this, from the positive detections I created a heatmap and then thresholded that map to identify vehicle positions. I then used scipy.ndimage.measurements.label() to identify individual blobs in the heatmap. I then assumed each blob corresponded to a vehicle. I constructed bounding boxes to cover the area of each blob detected.

Here's an example result showing the heatmap from a video frame.

alt text

The bounding boxes then overlaid on the area of the blobs detected.

alt text

This worked great for images but when testing with video frames, the bounding boxes fluctuated at different patches in the image. In order to achieve stable tracking of vehicles that were already detected in temporal dimension, I created a StableHeatMaps class in vehicle_lib/heatmap.py which maintains a historical sum of heat pixels (of same size as the input frame) over 20 frames. The class includes private methods _add_heat() which adds heat for all pixels that fall within the patch of positive detection by the classifier, and _apply_threshold() to remove false-positives by thresholding.

The method generate() generates an aggregate sum of heatmap over history of 20 frames which thereby helps to stabalise the predicted bounding boxes. I am able to eliminate all false positives as shown in the project video, showing the method works fine.

Video Implementation

The working implementation can be summarised with the following animation.

alt text

My pipeline was able to perform reasonably well on the entire project video. The working implementation after combining my previous Advanced Lane Detection project can be summarised with the following animation.

alt text

Here's a link to test video result, final project video result, and result of combined vehicle and lane detection.

Discussion, Limitations and Improvements

This project was really exciting to work on but it's a shame I had very little time to work on it. The implementation is far from perfect, but vehicle detection works quite well for the given project video. However, several things could be improved.

  1. One of the main drawbacks is that my detection pipeline is very slow (~4.5s per frame) and therefore cannot be used for real-time applications. Recent deep learning and CNN techniques like YOLO seem better suited in terms of detection accuracy and real-time performance. Therefore it would be worth evaluating these modern methods as an alternative to traditional computer vision and machine learning techniques such as used in this project.
  2. My algorithm pipeline would probably fail in real-world scenarios. For example, it would fail for objects different from vehicles/non-vehicles it was trained with such as motorbikes, cyclists and pedestrians. Perhaps if the training images from the same camera is used, the classifier accuracy would be better.
  3. Similar to advanced line detection limitations, there can be some false positives produced in cases such as shadow regions.
  4. This was tested for only one video and therefore, it there can be some false positives produced in cases such as shadow regions.
  5. My implementation sums heat map over several historical frames and then thresholding eliminate false-positives and stabilise tracking of vehicles quite well. However, I would be inclined to explore a more robust approaches such as Kalman Filters for vehicle tracking.
  6. A simple method of improving processing speed would be to drop frames or scan frames at high frequency over high confidence heatmaps and lower at other reason. Kalman filter again, would be better in tracking with lower computation.
  7. Had there been sufficient time, I would integrate Advanced Lane Lines detection into this project. It would also be interesting to integrate an additional pipeline for using Traffic line classification to detect road signs.

About

Vehicle Detection and Tracking using traditional computer vision and machine learning techniques such as Histogram of Oriented Gradients (HOG) and Support Vector Machines (SVM).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.5%
  • Python 0.5%