This is the official repository for the technical report:
On the Road with GPT-4V(ision): Explorations of Utilizing Visual-Language Model as Autonomous Driving Agent.
In our report, we explore the revolutionary GPT-4V, a visionary in the field of autonomous driving. Here, you'll find a treasure trove of original test images and in-depth results demonstrating the model's capabilities in understanding complex driving scenes and making decisions like a seasoned driver.
Dive into our insightful findings by exploring the categorized directories:
Scenario Understanding
: Tests on GPT-4V's perception of its environment and fellow road-goers.Reasoning
: A peek into the model's advanced reasoning capabilities.Serving as a Driving Agent
: Scenes where GPT-4V showcases its multi-task driving skills.
Each case within these categories is accompanied by a .json
file detailing the prompts and responses from GPT-4V, alongside a .png
image that the model assessed.
Here's a glimpse into some of the fascinating results from our report:
-
Weather Understanding: This image showcases GPT-4V's capability to understand different weather conditions, a critical factor in autonomous driving.
-
Corner Cases: An illustration of how GPT-4V handles complex and unusual traffic scenarios, which are often challenging for autonomous systems.
-
Serving as a Driving Agent: A demonstration of GPT-4V showcasing its capabilities as a driving agent, making real-world decisions in various driving scenarios.
Dive into our insightful findings by exploring the categorized directories:
Scenario Understanding
: Tests on GPT-4V's perception of its environment and fellow road-goers.Reasoning
: A peek into the model's advanced reasoning capabilities.Act as A Driver
: Scenes where GPT-4V showcases its driving acumen.
Each case within these categories is accompanied by a .json
file detailing the prompts and responses from GPT-4V, alongside a .png
image that the model assessed.
Your thoughts and contributions are a green signal for us! 🚦
If you have suggestions or additional insights, feel free to open an issue or submit a pull request.
If our repository accelerates your research, please use the following citation:
@article{wen2023road,
title={On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving},
author={Licheng Wen and Xuemeng Yang and Daocheng Fu and Xiaofeng Wang and Pinlong Cai and Xin Li and Tao Ma and Yingxuan Li and Linran Xu and Dengke Shang and Zheng Zhu and Shaoyan Sun and Yeqi Bai and Xinyu Cai and Min Dou and Shuanglu Hu and Botian Shi},
journal={arXiv preprint arXiv:2311.05332},
year={2023}
}
Our team is actively involved in various innovative projects in the realm of autonomous driving. Here are some other exciting repositories that you might find interesting:
The content of this repository is under the hood of an MIT License.