This project detects and tracks vehicles using traditional computer vision and machine learning techniques. These include performing a Histogram of Oriented Gradients (HOG) feature extraction on a labeled training set of vehicle/non-vehicle images and training a Linear Support Vector Machines (SVM) classifier model. The algorithm extracts features from the input video stream by applying a colour transform, performing HOG, Colour Histogram and Spatial binning. Then a sliding-window technique is used to scan the road for vehicles in the images by using the trained classifier to indicate that certain patches correspond to a vehicle or not. False positive are filtered out and vehicle tracking is stabalised by thresholding on heat maps over a number of frames of overlapping bounding boxes.
The following animation demonstrate how the final model, combined with Lane Detection, performs on the given video stream.
- Perform a Histogram of Oriented Gradients (HOG) feature extraction on a labeled training set of images and train a classifier Linear SVM classifier
- Optionally, you can also apply a color transform and append binned color features, as well as histograms of color, to your HOG feature vector.
- Note: for those first two steps don't forget to normalize your features and randomize a selection for training and testing.
- Implement a sliding-window technique and use your trained classifier to search for vehicles in images.
- Run your pipeline on a video stream (start with the test_video.mp4 and later implement on full project_video.mp4) and create a heat map of recurring detections frame by frame to reject outliers and follow detected vehicles.
- Estimate a bounding box for vehicles detected.
The project is written in python and utilises numpy, OpenCV, scikit learn and MoviePy.
Here are the steps required to generate the model from scratch and run the project for vehicle tracking.
git clone https://github.com/sagunms/CarND-Vehicle-Detection.git
cd CarND-Vehicle-Detection
Follow instructions from CarND-Term1-Starter-Kit page to setup the conda environment from scratch.
source activate carnd-term1
mkdir data
cd data
wget https://s3.amazonaws.com/udacity-sdc/Vehicle_Tracking/vehicles.zip
wget https://s3.amazonaws.com/udacity-sdc/Vehicle_Tracking/non-vehicles.zip
unzip vehicles.zip
unzip non-vehicles.zip
cd ..
vim vehicle_lib/config.py
python model.py -m model.mdl
python main.py -m model.mdl -i project_video.mp4 -o annotated_project_video.mp4
The code is divided up into several files which are imported by model.py
and main.py
.
main.py
- Takes input video file, input trained model and outputs annotated video with bounding boxes highlighting the detected vehicles. Moreover, it also highlights the lane the vehicle is in due to the integration of lane detection library.model.py
- Reads vehicle and non-vehicle labelled images, extracts HOG features for both classes, splits features into training and validation datasets. Then, trains a pipeline consisting of StandardScaler and a Linear SVM classifier and saves the trained model as output.vehicle_lib/config.py
- Consists of configuration parameters such as colour space, HOG parameters, spatial size, histogram bins, sliding window parameters, and region of interest, etc.vehicle_lib/vehicle_detect.py
- Main class of the project which encapsulates sliding windows, feature generation, svm, remove duplicates and false positives, etc.vehicle_lib/feature_extract.py
- Consists of functions related to feature extraction such as HOG features, spatial binning, colour histogram, etc.vehicle_lib/window.py
- Consists of functions related to sliding window traversal and predicting which patch of the video frame contains a vehicle using the trained SVM model.vehicle_lib/heatmap.py
- Class for stablising detected heatmaps. Maintains history of heat maps over multiple frames and takes aggregate of all frames.vehicle_lib/utils.py
- Consists of utility functions to display images, features, traverse through subdirectories to load training images.vehicle_lib/debug.py
- Some plotting functions the helped during debugging.lane_lib/*
- Lane Detection library from my Advanced Lane Finding project.
VehicleDetection.ipynb
- Jupyter notebook for generating various stages of the project to assist this writeup. Images produced from this notebook can also be found atoutput_images/*.png
.model.mdl
- Trained model saved as the outcome of training the Linear SVM classifier from model.py. This file was then used in main.py to produce the annotated videos for demonstrating the working of my vehicle detection project.calib.p
- Pickle file containing instrinc camera calibration matrix and distortion coefficient saved as the outcome ofCameraCalibrate
class used during the initialisation of the lane detection pipeline.annotated_project_video.mp4
- The output of the vehicle detection project when processing against project_video.mp4 video stream.annotated_project_video_test.mp4
- The output of the vehicle detection project when processing against test_video.mp4 video stream.annotated_project_video_combined.mp4
- The output of the vehicle detection project when combined with lane finding project, and processing againstproject_video.mp4
video stream.
First step is to extract the features used to train the classifier and then to classify the video frames.
The code for this step is contained in extract_features
function in feature_extract.py
. This is invoked by prepare_train_features
function in model.py
, which is ultimately invoked when runing the __main__
to train the model.
I started by reading in all the vehicle
and non-vehicle
labelled images and calling extract_features
function. Here is an example of some of the vehicle
and non-vehicle
classes:
After trying out different color spaces and different parameters skimage.hog()
parameters (orientations
, pixels_per_cell
, and cells_per_block
). I grabbed random images from each of the two classes and displayed them to get a feel for what the skimage.hog()
output looks like.
First I tried different parameters however, the one provided in the course material was better and therefore settled with that. I used YCrCb
color space and HOG parameters of orientations=9
, pix_per_cell=(8, 8)
and cells_per_block=(2, 2)
:
After extracting features, we need to train a classifier to be able to differentiate between a portion of the frame as being a vehicle or non-vehicle.
I used sklearn.pipeline.Pipeline()
to encapsulate both StandardScalar and linear Support Vector Machine (SVM) into one, train it and save it as a model file using sklearn.externals.joblib
. This help separate training and prediction into model.py
and main.py
files respectively.
The main.py
loads the saved model (model.mdl
) and passes the loaded classifier pipeline to VehicleDetect
class.
The for training the classifier is contained in extract_features
function in feature_extract.py
. This is invoked by prepare_train_features
function in model.py
,which prepares vehicle and non-vehicle features from the provided training images and split into training and testing dataset at the ratio of 75% and 25% respectively. Initially, I experimented with various values of C parameter. However, I later found out that the default linear SVM parameters initialised achieved validation accuracy of 0.9903 which was sufficient for detecting vehicles from the video stream.
The classifier pipleine parameters are as follows:
Pipeline(steps=[('scaling', StandardScaler(copy=True, with_mean=0, with_std=1)), ('classifier', SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape=None, degree=3, gamma='auto', kernel='linear',
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False))])
To locate the cars in each frame, a sliding window approach was used over a region of interest. Initially, I started with different window sizes and overlaps hoping to get a finer resolution and higher detection accuracy. I tried different window sizes, region of interest and overlaps. Through hit and trail, I settled for a more simplistic single scale window instead of varying the sizes. The parameters I used are as follows which can be found in vehicle_lib/config.py
.
xy_window = (96, 96)
xy_overlap = (.75, .75)
y_start_stop = [400, 600]
x_start_stop = [None, None]
The region of interest for sliding window search includes only the portion of the road, spanning from left to right.
For this project, I searched on YCrCb 3-channel HOG features plus spatially binned color and histograms of color in the feature vector, which provided a nice result. This was mostly based on recommended values available in computer vision literature. The parameters can be found in vehicle_lib/config.py
.
- Spatial Binning
spatial_size = (32, 32)
- Function:
bin_spatial()
invehicle_lib/feature_extract.py
.
- Color Histograms
hist_bins = 32
- Function:
color_hist()
invehicle_lib/feature_extract.py
.
- Histogram of Oriented Gradients (HOG)
pix_per_cell = 8
cell_per_block = 2
orient = 9
color_space = YCrCb
- Function:
get_hog_features()
invehicle_lib/feature_extract.py
.
I recorded the positions of positive detections in each frame of the video. Numerous patches in the images are predicted as being a vehicle and therefore contains noisy false positives.
The above figure illustrates the need to filter out overlapping bounding boxes by filtering. For this, from the positive detections I created a heatmap and then thresholded that map to identify vehicle positions. I then used scipy.ndimage.measurements.label()
to identify individual blobs in the heatmap. I then assumed each blob corresponded to a vehicle. I constructed bounding boxes to cover the area of each blob detected.
Here's an example result showing the heatmap from a video frame.
The bounding boxes then overlaid on the area of the blobs detected.
This worked great for images but when testing with video frames, the bounding boxes fluctuated at different patches in the image. In order to achieve stable tracking of vehicles that were already detected in temporal dimension, I created a StableHeatMaps
class in vehicle_lib/heatmap.py
which maintains a historical sum of heat pixels (of same size as the input frame) over 20 frames. The class includes private methods _add_heat()
which adds heat for all pixels that fall within the patch of positive detection by the classifier, and _apply_threshold()
to remove false-positives by thresholding.
The method generate()
generates an aggregate sum of heatmap over history of 20 frames which thereby helps to stabalise the predicted bounding boxes. I am able to eliminate all false positives as shown in the project video, showing the method works fine.
The working implementation can be summarised with the following animation.
My pipeline was able to perform reasonably well on the entire project video. The working implementation after combining my previous Advanced Lane Detection project can be summarised with the following animation.
Here's a link to test video result, final project video result, and result of combined vehicle and lane detection.
This project was really exciting to work on but it's a shame I had very little time to work on it. The implementation is far from perfect, but vehicle detection works quite well for the given project video. However, several things could be improved.
- One of the main drawbacks is that my detection pipeline is very slow (~4.5s per frame) and therefore cannot be used for real-time applications. Recent deep learning and CNN techniques like YOLO seem better suited in terms of detection accuracy and real-time performance. Therefore it would be worth evaluating these modern methods as an alternative to traditional computer vision and machine learning techniques such as used in this project.
- My algorithm pipeline would probably fail in real-world scenarios. For example, it would fail for objects different from vehicles/non-vehicles it was trained with such as motorbikes, cyclists and pedestrians. Perhaps if the training images from the same camera is used, the classifier accuracy would be better.
- Similar to advanced line detection limitations, there can be some false positives produced in cases such as shadow regions.
- This was tested for only one video and therefore, it there can be some false positives produced in cases such as shadow regions.
- My implementation sums heat map over several historical frames and then thresholding eliminate false-positives and stabilise tracking of vehicles quite well. However, I would be inclined to explore a more robust approaches such as Kalman Filters for vehicle tracking.
- A simple method of improving processing speed would be to drop frames or scan frames at high frequency over high confidence heatmaps and lower at other reason. Kalman filter again, would be better in tracking with lower computation.
- Had there been sufficient time, I would integrate Advanced Lane Lines detection into this project. It would also be interesting to integrate an additional pipeline for using Traffic line classification to detect road signs.