Skip to content

teeekay/CarND-Semantic-Segmentation

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Semantic Segmentation

Udacity - Self-Driving Car NanoDegree

Operation

main.py supports 3 modes of operation: 1) run training, 2) process images based on an inference model, 3) process a video based on an inference model

To run training for 15 epochs with a batch size of 1 and a learning rate of 0.00015 and save the results in model1.meta

python main.py -md=0 -ep=30 -bs=1 -lr=0.000015 -mod='model1'

To generate the inference samples from the Kitti Road dataset using model1.meta

python main.py -md=1 -mod='model1'

to run inference on a video using model1.meta

python main.py -md=2 -mod='model1'

Development

In this application, labelled training images taken from the Cityscapes dataset were used to train a fully convolutional model using the FCN-8 architecture. FCN-8 uses the VGG16 encoder which has been trained on Imagenet for classification. A fully convolutional decoder is added which combines pool layers 3 and 4 and fully connected layer7 from the encoder to enable pixel level classification of images.

In each epoch 500 images are randomly selected from the dataset of 2,876 and a different set of 500 images are used to calculate Intersection over Union. adjust_cityscapes.py was used to downscale and crop the source images to the model size of 576 pixels wide by 160 high. The labelled images were also processed to match those provided with the KITTI dataset in this assignment, where all pixels that did not match the purple color (RGB=128,64,128) assigned to roads by Cityscapes were set to red (RGB=255,0,0).

Example Image from Cityscapes Aachen Dataset
alt-text
Aachen Cityscapes image 19
alt-text
Road pixels labelled in purple - everything else Red

After training, the inference model was used on the KITTI images which were not used during training. A gradational mask of green pixels was overlaid onto the original images based on the softmax probability that the pixel belonged to road (decreasing transparency as probability increases above 0.25, 0.5, and 0.75). In most cases the gradation was quite abrupt, with sharp definition of areas the model assigned as road.

Images from KITTI dataset with road predictions labelled

A video was also processed with acceptable results.

Training

The Adam optimizer was used in combination with a learning rate of 0.000015. The model was run for 15 epochs of 500 steps each, with each step using a batch size of 1. I was limited to using this batch size as my GPU would not support larger batch sizes.

Intersection over Union (IOU) was computed after each step along with cross entropy loss and regularization losses.

A plot of IOU increasing and loss decreasing as the model was trained is shown below.

Model Structure as displayed in Tensorboard

tensorflow model in tensorboard

From Udacity

Introduction

In this project, you'll label the pixels of a road in images using a Fully Convolutional Network (FCN).

Setup

Frameworks and Packages

Make sure you have the following is installed:

Dataset

Download the Kitti Road dataset from here. Extract the dataset in the data folder. This will create the folder data_road with all the training a test images.

Start

Implement

Implement the code in the main.py module indicated by the "TODO" comments. The comments indicated with "OPTIONAL" tag are not required to complete.

Run

Run the following command to run the project:

python main.py

Note If running this in Jupyter Notebook system messages, such as those regarding test status, may appear in the terminal rather than the notebook.

Submission

  1. Ensure you've passed all the unit tests.
  2. Ensure you pass all points on the rubric.
  3. Submit the following in a zip file.
  • helper.py
  • main.py
  • project_tests.py
  • Newest inference images from runs folder (all images from the most recent run)

Tips

  • The link for the frozen VGG16 model is hardcoded into helper.py. The model can be found here
  • The model is not vanilla VGG16, but a fully convolutional version, which already contains the 1x1 convolutions to replace the fully connected layers. Please see this forum post for more information. A summary of additional points, follow.
  • The original FCN-8s was trained in stages. The authors later uploaded a version that was trained all at once to their GitHub repo. The version in the GitHub repo has one important difference: The outputs of pooling layers 3 and 4 are scaled before they are fed into the 1x1 convolutions. As a result, some students have found that the model learns much better with the scaling layers included. The model may not converge substantially faster, but may reach a higher IoU and accuracy.
  • When adding l2-regularization, setting a regularizer in the arguments of the tf.layers is not enough. Regularization loss terms must be manually added to your loss function. otherwise regularization is not implemented.

Using GitHub and Creating Effective READMEs

If you are unfamiliar with GitHub , Udacity has a brief GitHub tutorial to get you started. Udacity also provides a more detailed free course on git and GitHub.

To learn about REAMDE files and Markdown, Udacity provides a free course on READMEs, as well.

GitHub also provides a tutorial about creating Markdown files.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%