Skip to content

HE-Drive: Human-Like End-to-End Driving with Vision Language Models

Notifications You must be signed in to change notification settings

jmwang0117/HE-Drive

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🤖 HE-Drive

Human-Like End-to-End Driving with Vision Language Models


We will open source the complete code after the paper is accepted !

arxiv Project Page

📢 News

  • [2024/10.08]: 🔥 We release the HE-Drive paper on arXiv !

📜 Introduction

HE-Drive is a groundbreaking end-to-end autonomous driving system that prioritizes human-like driving characteristics, ensuring both temporal consistency and comfort in generated trajectories. By leveraging sparse perception for key 3D spatial representations, a DDPM-based motion planner for generating multi-modal trajectories, and a VLM-guided trajectory scorer for selecting the most comfortable option, HE-Drive sets a new standard in autonomous driving performance and efficiency. This innovative approach not only significantly reduces collision rates and improves computational speed compared to existing solutions but also delivers the most comfortable driving experience based on real-world data.



🚀 Citing

@article{wang2024he,
  title={HE-Drive: Human-Like End-to-End Driving with Vision Language Models},
  author={Wang, Junming and Zhang, Xingyu and Xing, Zebin and Gu, Songen and Guo, Xiaoyang and Hu, Yang and Song, Ziying and Zhang, Qian and Long, Xiaoxiao and Yin, Wei},
  journal={arXiv preprint arXiv:2410.05051},
  year={2024}
} 

Please kindly star ⭐️ this project if it helps you. We take great efforts to develop and maintain it 😁.

🛠️ Installation

Note

Installation steps follow SparseDrive

Set up a new virtual environment

conda create -n hedrive python=3.8 -y
conda activate hedrive

Install dependency packpages

hedrive_path="path/to/hedrive"
cd ${hedrive_path}
pip3 install --upgrade pip
pip3 install torch==1.13.0+cu116 torchvision==0.14.0+cu116 torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cu116
pip3 install -r requirement.txt

Compile the deformable_aggregation CUDA op

cd projects/mmdet3d_plugin/ops
python3 setup.py develop
cd ../../../

Prepare the data

Download the NuScenes dataset and CAN bus expansion, put CAN bus expansion in /path/to/nuscenes, create symbolic links.

cd ${hedrive_path}
mkdir data
ln -s path/to/nuscenes ./data/nuscenes

Pack the meta-information and labels of the dataset, and generate the required pkl files to data/infos. Note that we also generate map_annos in data_converter, with a roi_size of (30, 60) as default, if you want a different range, you can modify roi_size in tools/data_converter/nuscenes_converter.py.

sh scripts/create_data.sh

Prepare the 3D representation

Note

Generate 3D representation using SparseDrive second stage checkpoint!

Commence training

# train
sh scripts/train.sh

Install Ollama and Llama 3.2-Vision 11B

Note

Download Ollama 0.4, then run:

ollama run llama3.2-vision-11b

Important

Llama 3.2 Vision 11B requires least 8GB of VRAM.

Please prepare at least 10 sets of VQA templates to complete the dialogue, focusing the llama knowledge domain on driving style assessment.

Commence testing

# test
sh scripts/test.sh

💽 Dataset

  • nuScenes
  • Real-World Data
  • OpenScene/NAVSIM

🏆 Acknowledgement

Many thanks to these excellent open source projects:

About

HE-Drive: Human-Like End-to-End Driving with Vision Language Models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages