A premier forum uniting academic, industry, and standards communities to shape the next generation of cooperative, foundation-model-driven autonomous driving and intelligent transportation systems.
The 8th edition of the DriveX Workshop focuses on how foundation models and cooperative systems can redefine perception, prediction, planning, and decision-making for autonomous driving and intelligent transportation infrastructure.
Traditional single-vehicle pipelines have achieved impressive progress in 3D detection and tracking, yet they remain constrained by limited viewpoints, occlusions, and domain shifts. Cooperative driving systems, powered by V2X communication and roadside/edge intelligence, extend sensing range, enrich scene context, and enable shared representations across vehicles and infrastructure.
In parallel, foundation models, including vision, vision-language, and multi-modal large models, unlock powerful generalization capabilities: open-vocabulary understanding, scalable pretraining, zero-shot adaptation, and interpretable reasoning about complex road scenes. Emerging end-to-end and agentic systems such as large driving models promise unified perception-to-control frameworks but raise new questions in trustworthiness, reliability, calibration, and evaluation at urban scale.
DriveX@NeurIPS 2026 convenes researchers and practitioners from computer vision, robotics, communications, transportation, AI safety, and policy to:
| Time | Session |
|---|---|
| 08:00 – 08:05 | Opening Remarks – Welcome & Workshop Overview |
| 08:05 – 08:20 |
Opening Keynote
Keynote
Dr. Walter Zimmer University of California Los Angeles (UCLA) & Technical University of Munich (TUM), USA
AbstractAutonomous driving in urban environments is fundamentally limited by the range, occlusions, and failure modes of vehicle-only perception. This opening keynote shows research advances in cooperative roadside–vehicle perception by fusing multi-modal data from onboard sensors and intelligent roadside infrastructure via V2X communication to extend situational awareness beyond line of sight. The proposed methods improve real-time 3D object detection and tracking in dense traffic and are supported by new large-scale, multi-modal datasets for benchmarking cooperative perception in real-world urban settings. By integrating recent advances in foundation models, including vision-language models, the work further enables semantic understanding of complex traffic scenes, laying the groundwork for AI-driven urban digital twins and safer, more efficient intelligent transportation systems. Speaker BioDr. rer. nat. Walter Zimmer is a post-doctoral researcher at the University of California Los Angeles (UCLA) and guest researcher at the Technical University of Munich (TUM). He received his Ph.D. from the Technical University of Munich (TUM) in 2025. His research focuses on cooperative autonomous driving, 3D perception and 3D foundation models. He has authored over 40 publications at top venues such as CVPR, ICCV, ECCV, ICML, NeurIPS, and T-PAMI. Dr. Zimmer previously worked as an Autonomous Systems Engineer at the STTech startup and research assistant at Siemens AG. His work has earned multiple awards, including the IEEE ITSS Best Student Paper Award 2023 and IEEE ITSS Best Dissertation Award 2025. |
| 08:20 – 08:40 |
Keynote 1
Keynote
Speaker 1 Affiliation AbstractAbstract will be announced closer to the workshop date. Speaker BioBio will be announced closer to the workshop date. |
| 08:40 – 09:00 |
Keynote 2
Keynote
Speaker 2 Affiliation AbstractAbstract will be announced closer to the workshop date. Speaker BioBio will be announced closer to the workshop date. |
| 09:00 – 09:20 |
Keynote 3
Keynote
Speaker 3 Affiliation AbstractAbstract will be announced closer to the workshop date. Speaker BioBio will be announced closer to the workshop date. |
| 09:20 – 09:40 |
Keynote 4
Keynote
Speaker 4 Affiliation AbstractAbstract will be announced closer to the workshop date. Speaker BioBio will be announced closer to the workshop date. |
| 09:40 – 10:00 |
Keynote 5
Keynote
Speaker 5 Affiliation AbstractAbstract will be announced closer to the workshop date. Speaker BioBio will be announced closer to the workshop date. |
| 10:00 – 10:50 | Poster Session I & Coffee Break Posters Coffee Break |
| 10:00 – 10:30 | Live Demo Demo |
| 10:50 – 11:10 |
Keynote 6
Keynote
Speaker 6 Affiliation AbstractAbstract will be announced closer to the workshop date. Speaker BioBio will be announced closer to the workshop date. |
| 11:10 – 11:30 |
Keynote 7
Keynote
Speaker 7 Affiliation AbstractAbstract will be announced closer to the workshop date. Speaker BioBio will be announced closer to the workshop date. |
| 11:30 – 12:00 | Panel Discussion I: Industry Track Panel |
| 12:00 – 13:00 | Lunch Break & Networking |
| 13:00 – 13:20 |
Keynote 8
Keynote
Speaker 8 Affiliation AbstractAbstract will be announced closer to the workshop date. Speaker BioBio will be announced closer to the workshop date. |
| 13:20 – 13:40 |
Keynote 9
Keynote
Speaker 9 Affiliation AbstractAbstract will be announced closer to the workshop date. Speaker BioBio will be announced closer to the workshop date. |
| 13:40 – 14:00 |
Keynote 10
Keynote
Speaker 10 Affiliation AbstractAbstract will be announced closer to the workshop date. Speaker BioBio will be announced closer to the workshop date. |
| 14:00 – 14:20 |
Keynote 11
Keynote
Speaker 11 Affiliation AbstractAbstract will be announced closer to the workshop date. Speaker BioBio will be announced closer to the workshop date. |
| 14:20 – 14:40 |
Keynote 12
Keynote
Speaker 12 Affiliation AbstractAbstract will be announced closer to the workshop date. Speaker BioBio will be announced closer to the workshop date. |
| 14:40 – 15:00 |
Keynote 13
Keynote
Speaker 13 Affiliation AbstractAbstract will be announced closer to the workshop date. Speaker BioBio will be announced closer to the workshop date. |
| 15:00 – 16:00 | Poster Session II & Coffee Break Posters Coffee Break |
| 16:00 – 16:30 | Panel Discussion II: Academic Track Panel |
| 16:30 – 16:40 |
Oral Paper Presentation 1
Oral
Presenter 1 Affiliation AbstractAbstract will be announced closer to the workshop date. Speaker BioSpeaker bio will be announced closer to the workshop date. |
| 16:40 – 16:50 |
Oral Paper Presentation 2
Oral
Presenter 2 Affiliation AbstractAbstract will be announced closer to the workshop date. Speaker BioSpeaker bio will be announced closer to the workshop date. |
| 16:50 – 17:00 |
Oral Paper Presentation 3
Oral
Presenter 3 Affiliation AbstractAbstract will be announced closer to the workshop date. Speaker BioSpeaker bio will be announced closer to the workshop date. |
| 17:00 – 17:10 |
Oral Paper Presentation 4
Oral
Presenter 4 Affiliation AbstractAbstract will be announced closer to the workshop date. Speaker BioSpeaker bio will be announced closer to the workshop date. |
| 17:10 – 17:20 |
Oral Paper Presentation 5
Oral
Presenter 5 Affiliation AbstractAbstract will be announced closer to the workshop date. Speaker BioSpeaker bio will be announced closer to the workshop date. |
| 17:20 – 17:30 | Paper Awards Ceremony – Best Paper, Runner-Up, Best Application Paper, Best Poster & Best Keynote |
| 17:30 – 17:40 | Challenge Winner Presentation |
| 17:40 – 17:50 | Challenge Awards Ceremony |
| 17:50 – 18:00 | Closing Remarks & Group Photo |
| 19:00 – 21:00 | Workshop Reception & Networking |
Final schedule, room allocation, and speaker order will be announced closer to the workshop date.
DriveX 2026 invites high-quality contributions on foundation models, cooperative perception, large driving models, and related topics outlined above.
We welcome novel full papers (max. 9 pages, excluding references). NeurIPS workshop papers are non-archival and are not included in proceedings.
Submissions must follow the official NeurIPS 2026 style: LaTeX or Typst.
Challenge winners will receive award certificates and are invited to present their results at the workshop.
🔥Tracks are opening soon!
Reasoning-Centric Autonomous Driving
Structured reasoning for long-tail autonomous driving using
nuReasoning. Focus on spatial reasoning, decision reasoning, and
counterfactual reasoning over real-world driving scenes with
synchronized multi-camera images, LiDAR, HD maps, and object
annotations, enabling evaluation of reasoning-aware perception,
planning, and Vision-Language-Action models.
@article{nureasoning2026,
title={nuReasoning: A Reasoning-Centric Dataset and Benchmark for Long-Tail Autonomous Driving},
author={},
journal={arXiv preprint arXiv:},
year={2026}
}
V2I-Based Cooperative Perception
Infrastructure–vehicle fusion using
TUMTraf-V2X. Focus on cooperative 3D detection and tracking with
infrastructure-mounted LiDAR, radar, and cameras, emphasizing
occlusion handling, long-range awareness, and reliability under
real-world conditions.
@INPROCEEDINGS{10658179,
author={Zimmer, Walter and Wardana, Gerhard Arya and Sritharan, Suren and Zhou, Xingcheng and Song, Rui and Knoll, Alois C.},
booktitle={2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
title={TUMTraf V2X Cooperative Perception Dataset},
year={2024},
volume={},
number={},
pages={22668-22677},
keywords={Point cloud compression;Solid modeling;Three-dimensional displays;Computational modeling;Object detection;Road safety;Sensors;Autonomous Driving;Deep Learning;Cooperative Perception;3D Object Detection;V2X;Dataset;Camera-LiDAR Fusion;Roadside Infrastructure;ITS;Tracking;Fusion;Annotation Tool;Labeling Tool;Point Cloud;Image;V2I;LiDAR;Camera;GPS;IMU;Transformer;HD Map;Collaborative Perception;Vehicle-Infrastructure Fusion;TUMTraf;OpenLABEL},
doi={10.1109/CVPR52733.2024.02139}
}
Natural Language Instruction for Human Interaction and
Vision-Language Navigation
Built upon
doScenes. Participants design models for natural language instructions
and visual language navigation to facilitate research on
human-vehicle instruction interactions.
@inproceedings{roy2025doscenes,
title={doScenes: An Autonomous Driving Dataset with Natural Language Instruction for Human Interaction and Vision-Language Navigation},
author={Parthib Roy and Srinivasa Perisetla and Shashank Shriram and Harsha Krishnaswamy and Aryan Keskar and Ross Greer},
booktitle={2025 IEEE 28th International Conference on Intelligent Transportation Systems (ITSC)},
pages={1651--1658},
year={2025},
organization={IEEE},
url={https://arxiv.org/abs/2412.05893},
}
Multi-Agent Reasoning
Using
MDrive
teams explore a cooperative driving benchmark for end-to-end
closed-loop multi-agent systems.
@misc{MDrive,
author = {Marco Coscoy and Zhiyu Huang and Johnson Liu and Jiaqi Ma and Angela Magtoto and Rui Song and Henry Wei and Zhihao Zhao and Bolei Zhou and Zewei Zhou and Walter Zimmer},
title = {MDrive: A Cooperative Driving Benchmark for End-to-End Closed-loop Multi-Agent System},
year = {2026},
howpublished = {\url{https://github.com/ucla-mobility/MDrive}}
}
Competition Timeline
Top-performing teams will be invited to present at the workshop and will receive money prizes ($14,000 prize pool) and award certificates. Detailed rules, baselines, and submission instructions are available on the official challenge page.
UC Merced
UCLA
UCLA
UC Riverside
Nourah Bint Abdulrahman University
Uni Tübingen
Cruise
UC Merced
UCLA
Uni. of Illinois at Urbana-Champaign
UC Merced
Purdue Uni.
Cornell Uni
UC Merced
UC Merced
Cornell Uni.
UCLA
Fraunhofer IVI & TUM
UC Riverside
Stanford
Waymo
UCLA
UC Berkeley
UCLA
UCLA
UCLA
Motional is a leading autonomous driving technology company, established as a venture between Hyundai Motor Group and Aptiv, that develops Level 4 driverless vehicles designed for safe and accessible urban transportation. The company currently operates commercial robotaxi and delivery services using its all-electric Hyundai IONIQ 5 platform through high-profile partnerships with networks such as Uber and Lyft.
NVIDIA delivers the full-stack infrastructure for physical AI, combining accelerated computing, simulation, world models, and robotics frameworks such as Isaac and Omniverse to train, evaluate, and deploy autonomous systems at scale across robotics, manufacturing, and autonomous driving.
Nomadic's Visual AI automatically discovers rare edge cases, multi-agent interactions, and safety-critical scenarios from fleet-scale video and V2X sensor streams. The platform provides the high-signal training data and evaluation benchmarks that foundation models need to bridge simulation and real-world deployment, trusted by the world's leading autonomous vehicle and robotics companies.
Qualcomm is a leading technology company whose Snapdragon Ride platform combines high-performance automotive chips, AI-driven perception, and advanced driver-assistance software to enable scalable ADAS and automated driving capabilities for automakers worldwide. Its Snapdragon Ride Pilot system, co-developed with partners like BMW and integrated with real-world data, supports hands-free highway and urban driving scenarios while helping vehicle makers accelerate safety-focused automated driving solutions.
X Performance Robotics develops hardware-agnostic, AI-powered robotic cores that empower next-generation robots and humanoids to operate autonomously and resiliently in the most demanding, unpredictable environments for government, defense, industrial, and hazardous applications.
HiWiTronics is a cutting-edge technology start-up specializing in advanced electronic components, drones, demining solutions, ICT and autonomous technologies. Our mission is to develop innovative, high-performance solutions that enhance industrial applications and defense operations in an evolving global landscape.
DriveX 2026 welcomes sponsorship from industry, startups, and institutions interested in foundation models, cooperative perception, simulation, and large-scale autonomous driving systems.
For sponsorship opportunities, please contact: wz@ucla.edu.