🤖 HE-Drive

Human-Like End-to-End Driving with Vision Language Models

We will open source the complete code after the paper is accepted ！

📢 News

[2024/10.08]: 🔥 We release the HE-Drive paper on arXiv !

📜 Introduction

HE-Drive is a groundbreaking end-to-end autonomous driving system that prioritizes human-like driving characteristics, ensuring both temporal consistency and comfort in generated trajectories. By leveraging sparse perception for key 3D spatial representations, a DDPM-based motion planner for generating multi-modal trajectories, and a VLM-guided trajectory scorer for selecting the most comfortable option, HE-Drive sets a new standard in autonomous driving performance and efficiency. This innovative approach not only significantly reduces collision rates and improves computational speed compared to existing solutions but also delivers the most comfortable driving experience based on real-world data.

🚀 Citing

@article{wang2024he,
  title={HE-Drive: Human-Like End-to-End Driving with Vision Language Models},
  author={Wang, Junming and Zhang, Xingyu and Xing, Zebin and Gu, Songen and Guo, Xiaoyang and Hu, Yang and Song, Ziying and Zhang, Qian and Long, Xiaoxiao and Yin, Wei},
  journal={arXiv preprint arXiv:2410.05051},
  year={2024}
}

Please kindly star ⭐️ this project if it helps you. We take great efforts to develop and maintain it 😁.

🛠️ Installation

Note

Installation steps follow SparseDrive

Set up a new virtual environment

conda create -n hedrive python=3.8 -y
conda activate hedrive

Install dependency packpages

hedrive_path="path/to/hedrive"
cd ${hedrive_path}
pip3 install --upgrade pip
pip3 install torch==1.13.0+cu116 torchvision==0.14.0+cu116 torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cu116
pip3 install -r requirement.txt

Compile the deformable_aggregation CUDA op

cd projects/mmdet3d_plugin/ops
python3 setup.py develop
cd ../../../

Prepare the data

Download the NuScenes dataset and CAN bus expansion, put CAN bus expansion in /path/to/nuscenes, create symbolic links.

cd ${hedrive_path}
mkdir data
ln -s path/to/nuscenes ./data/nuscenes

Pack the meta-information and labels of the dataset, and generate the required pkl files to data/infos. Note that we also generate map_annos in data_converter, with a roi_size of (30, 60) as default, if you want a different range, you can modify roi_size in tools/data_converter/nuscenes_converter.py.

sh scripts/create_data.sh

Prepare the 3D representation

Note

Generate 3D representation using SparseDrive second stage checkpoint!

Commence training

# train
sh scripts/train.sh

Install Ollama and Llama 3.2-Vision 11B

Note

Download Ollama 0.4, then run:

ollama run llama3.2-vision-11b

Important

Llama 3.2 Vision 11B requires least 8GB of VRAM.

Please prepare at least 10 sets of VQA templates to complete the dialogue, focusing the llama knowledge domain on driving style assessment.

Commence testing

# test
sh scripts/test.sh

💽 Dataset

nuScenes
Real-World Data
OpenScene/NAVSIM

🏆 Acknowledgement

Many thanks to these excellent open source projects:

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
diffusion_head		diffusion_head
docs		docs
misc		misc
projects		projects
resources		resources
scripts		scripts
tools		tools
README.md		README.md
requirement.txt		requirement.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 HE-Drive

Human-Like End-to-End Driving with Vision Language Models

📢 News

📜 Introduction

🚀 Citing

🛠️ Installation

Set up a new virtual environment

Install dependency packpages

Compile the deformable_aggregation CUDA op

Prepare the data

Prepare the 3D representation

Commence training

Install Ollama and Llama 3.2-Vision 11B

Commence testing

💽 Dataset

🏆 Acknowledgement

About

Releases

Packages

Languages

jmwang0117/HE-Drive

Folders and files

Latest commit

History

Repository files navigation

🤖 HE-Drive

Human-Like End-to-End Driving with Vision Language Models

📢 News

📜 Introduction

🚀 Citing

🛠️ Installation

Set up a new virtual environment

Install dependency packpages

Compile the deformable_aggregation CUDA op

Prepare the data

Prepare the 3D representation

Commence training

Install Ollama and Llama 3.2-Vision 11B

Commence testing

💽 Dataset

🏆 Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages