This repository contains code for training and evaluating a UNet model for lane detection using the BDD100K dataset. The project leverages PyTorch for model implementation and training, and includes scripts for preprocessing data, running inference, and evaluating model performance.
- Introduction
- Dataset
- Model Architecture
- Installation
- Usage
- Results
- Streamlit App
- Contributing
- License
Lane detection is a crucial component of autonomous driving systems. This project implements a UNet model to accurately segment lane markings from images. The UNet architecture is well-suited for this task due to its encoder-decoder structure that captures contextual information at multiple scales.
Our lane detection model is trained on the BDD100K dataset, which is ideal for this task due to:
- Diversity: It covers a wide range of driving scenarios, weather conditions, and times of day.
- Rich Annotations: It includes detailed annotations for lane markings, drivable areas, and objects.
- Real-world Data: Captured from real-world driving, ensuring the model generalizes well to actual driving conditions.
- High Quality: Provides high-resolution images necessary for accurate detection.
- Community Support: Widely used in the research community, providing benchmarks and continuous improvements.
By leveraging BDD100K, our model can perform lane detection effectively under various conditions, ensuring robust performance in all weather and lighting scenarios.
- Download the dataset from BDD100K website
The UNet model is implemented with the following architecture:
- Encoder: A series of convolutional layers followed by batch normalization and ReLU activation.
- Bottleneck: A set of convolutional layers that capture the deepest features.
- Decoder: A series of transposed convolutional layers that upsample the features back to the original image size.
class UNet(nn.Module):
def __init__(self, in_channels, out_channels):
super(UNet, self).__init__()
def CBR(in_channels, out_channels):
return nn.Sequential(
nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1),
nn.BatchNorm2d(out_channels),
nn.ReLU(inplace=True),
nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1),
nn.BatchNorm2d(out_channels),
nn.ReLU(inplace=True)
)
self.enc1 = CBR(in_channels, 64)
self.enc2 = CBR(64, 128)
self.enc3 = CBR(128, 256)
self.enc4 = CBR(256, 512)
# Define other layers...
def forward(self, x):
# Implement forward pass...
pass
- Clone the repository:
git clone https://github.com/AnshChoudhary/Lane-Detection-UNet.git
cd Lane-Detection-UNet
- Create a virtual environment and activate it:
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
- Install the required dependencies:
pip install -r requirements.txt
The model is trained on NVIDIA A6000 GPU with 48GB VRAM. The training takes approximately 10-12 hours on these specs. To train the model, run:
CUDA_VISIBLE_DEVICES=<YOUR_GPU_ID> nohup python train.py
Once the model is trained, you can evaluate the model's performance on the validation set (10,000 images) in termms of metrics like the Jaccard Score (IoU), Accuracy, and F1-Score. You can make necessary changes to eval_lane.py and then run the following command in order to evaluate the model:
CUDA_VISIBLE_DEVICES=<YOUR_GPU_ID> nohup python eval-lane.py
To run inference on a single image and save the predicted mask in the pred folder, use:
CUDA_VISIBLE_DEVICES=<YOUR_GPU_ID> nohup python inference.py
To run inference on a video and overlay the lane detection mask, use:
CUDA_VISIBLE_DEVICES=<YOUR_GPU_ID> nohup python video_infer2.py
To run inference on a video that would output an overlayed lane detection mask + YOLO detections, use:
CUDA_VISIBLE_DEVICES=<YOUR_GPU_ID> nohup python yolo_integrated.py
The model was evaluated on the following metrics over the validation set:
- Validation Jaccard Score (IoU): 0.9934
- Validation Accuracy: 0.9934
- Validation F1 Score: 0.9967
Here's a look at the model's predicted mask being compared to the ground truth mask on a sample image:
Here's a look at a sample output video that overlays the lane detection mask from the trained model and performs YOLO object detections on cars, pedestrians, traffic lights, etc.:
After the masks generated by the model on a video input, The moving average filter is used to smooth out the detected lane mask over successive frames. This helps to reduce flicker and provide a more stable and coherent lane detection result over time.
def moving_average_2d(data, window_size):
ret = np.cumsum(data, axis=0, dtype=float)
ret[window_size:] = ret[window_size:] - ret[:-window_size]
return ret[window_size - 1:] / window_size
This function calculates the moving average along the first axis of the 2D data array, which could represent the mask or some other processed frame data, smoothing the transitions and making the lane detection more robust. You can also adjust the blending alpha parameter for blending the original and smoothed masks and the moving average window size to define the size of the window of frames used for calculating the average.
A static moving average filter did not perform well on videos that had curved paths and it was averaging the lane lines to a different position. In order to tackle this problem, a dynamic window size adjustment was implemented. Now the window size would be inversely proportional to the number of pixels being detected in a frame. This would solve the averaging problem drastically as now only the frame with lesser detected pixels are being averaged out on bigger window sizes.
def dynamic_window_size_adjustment(mask, base_window_size, min_window_size, max_window_size):
detected_pixels = np.count_nonzero(mask)
total_pixels = mask.size
proportion = detected_pixels / total_pixels
# Larger window size when fewer lane pixels are detected
window_size = int(max_window_size * (1 - proportion) + min_window_size * proportion)
return max(min_window_size, min(max_window_size, window_size))
This project can be run on a streamlit web app in order to generate output videos that overlay the lane detection mask from the trained model and perform YOLO object detections. The user will be able to upload a video in avi, mov, mp4 formats and will have control over various parameters such as YOLO confidence threshold, detection transparency, interpolation factor, etc.
- Moving Average Filtering has also been added to the streamlit app and users can adjust the blending alpha parameter and the base, min, and max moving average window sizes within the streamlit web app controls.
Here's a look at the web app UI:
To run the streamlit app, run the following in terminal:
streamlit run streamlit-dynamic.py
Contributions are welcome! Please fork the repository and submit a pull request with your changes. For major changes, please open an issue first to discuss what you would like to change.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/YourFeature
) - Commit your Changes (
git commit -m 'Add some YourFeature'
) - Push to the Branch (
git push origin feature/YourFeature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.