A premier forum uniting academic, industry, and standards communities to shape the next generation of cooperative, foundation-model-driven autonomous driving and intelligent transportation systems.
The 4th edition of the DriveX Workshop focuses on how foundation models and cooperative systems can redefine perception, prediction, planning, and decision-making for autonomous driving and intelligent transportation infrastructure.
Traditional single-vehicle pipelines have achieved impressive progress in 3D detection and tracking, yet they remain constrained by limited viewpoints, occlusions, and domain shifts. Cooperative driving systems, powered by V2X communication and roadside/edge intelligence, extend sensing range, enrich scene context, and enable shared representations across vehicles and infrastructure.
In parallel, foundation models, including vision, vision-language, and multi-modal large models, unlock powerful generalization capabilities: open-vocabulary understanding, scalable pretraining, zero-shot adaptation, and interpretable reasoning about complex road scenes. Emerging end-to-end and agentic systems such as large driving models promise unified perception-to-control frameworks but raise new questions in trustworthiness, reliability, calibration, and evaluation at urban scale.
DriveX 2026 convenes researchers and practitioners from computer vision, robotics, communications, transportation, AI safety, and policy to:
| Time | Session |
|---|---|
| 08:00 – 08:10 | Opening Remarks – Welcome & Workshop Overview | 08:10 – 08:30 |
Opening Keynote
Keynote
Dr. Walter Zimmer University of California Los Angeles (UCLA) & Technical University of Munich (TUM), USA
AbstractAutonomous driving in urban environments is fundamentally limited by the range, occlusions, and failure modes of vehicle-only perception. This opening keynote shows research advances in cooperative roadside–vehicle perception by fusing multi-modal data from onboard sensors and intelligent roadside infrastructure via V2X communication to extend situational awareness beyond line of sight. The proposed methods improve real-time 3D object detection and tracking in dense traffic and are supported by new large-scale, multi-modal datasets for benchmarking cooperative perception in real-world urban settings. By integrating recent advances in foundation models, including vision-language models, the work further enables semantic understanding of complex traffic scenes, laying the groundwork for AI-driven urban digital twins and safer, more efficient intelligent transportation systems. Speaker BioDr. rer. nat. Walter Zimmer is a post-doctoral researcher at the University of California Los Angeles (UCLA) and guest researcher at the Technical University of Munich (TUM). He received his Ph.D. from the Technical University of Munich (TUM) in 2025. His research focuses on cooperative autonomous driving, 3D perception and 3D foundation models. He has authored over 40 publications at top venues such as CVPR, ICCV, ECCV, ICML, NeurIPS, and T-PAMI. Dr. Zimmer previously worked as an Autonomous Systems Engineer at the STTech startup and research assistant at Siemens AG. His work has earned multiple awards, including the IEEE ITSS Best Student Paper Award 2023 and IEEE ITSS Best Dissertation Award 2025. |
| 08:30 – 08:50 |
Keynote 1
Keynote
Balajee Kannan Motional, USA
AbstractAbstract will be announced closer to the workshop date. Speaker BioSpeaker bio will be announced closer to the workshop date. |
| 08:50 – 09:10 |
Keynote 2
Keynote
Prof. Dr. Jiaqi Ma University of California Los Angeles (UCLA), USA
AbstractAbstract will be announced closer to the workshop date. Speaker BioSpeaker bio will be announced closer to the workshop date. |
| 09:10 – 09:30 |
Keynote 3
Keynote
Dr. Mingxing Tan Waymo, USA
AbstractAbstract will be announced closer to the workshop date. Speaker BioSpeaker bio will be announced closer to the workshop date. |
| 09:30 – 09:50 |
Sponsorship Advertisement
Keynote
Mustafa Bal NomadicML, USA AbstractAbstract will be announced closer to the workshop date. Speaker BioSpeaker bio will be announced closer to the workshop date. |
| 09:30 – 10:00 | Poster Session I & Coffee Break Posters Coffee Break |
| 10:00 – 10:20 |
Keynote 4
Keynote
Dr. Jamie Shotton Wayve, Vancouver
AbstractAbstract will be announced closer to the workshop date. Speaker BioSpeaker bio will be announced closer to the workshop date. |
| 10:20 – 10:40 |
Keynote 5
Keynote
Prof. Dr. Marco Pavone Stanford University & NVIDIA, USA
AbstractAbstract will be announced closer to the workshop date. Speaker BioSpeaker bio will be announced closer to the workshop date. |
| 10:40 – 11:00 |
Keynote 6
Keynote
Dr. Phil Duan Tesla, USA AbstractAbstract will be announced closer to the workshop date. Speaker BioSpeaker bio will be announced closer to the workshop date. |
| 11:00 – 11:30 | Panel Discussion I: Industry Track Panel |
| 11:30 – 12:00 | Poster Session II & Coffee Break Posters Coffee Break |
| 12:00 – 13:00 | Lunch Break & Networking |
| 13:00 – 13:20 |
Keynote 7
Keynote
Prof. Dr. Bolei Zhou UCLA & Coco Robotics, USA
AbstractAbstract will be announced closer to the workshop date. Speaker BioSpeaker bio will be announced closer to the workshop date. |
| 13:20 – 13:40 |
Keynote 8
Keynote
Prof. Dr. Manmohan Chandraker UCSD & NEC Labs, USA
AbstractAbstract will be announced closer to the workshop date. Speaker BioSpeaker bio will be announced closer to the workshop date. |
| 13:40 – 14:00 |
Keynote 9
Keynote
Prof. Dr. Sharon Li University of Wisconsin-Madison, USA
AbstractAbstract will be announced closer to the workshop date. Speaker BioSpeaker bio will be announced closer to the workshop date. |
| 14:00 – 14:30 | Poster Session III & Coffee Break Posters Coffee Break |
| 14:30 – 14:50 |
Keynote 10
Keynote
Prof. Dr. Holger Caesar Delft University of Technology, Netherlands
AbstractAbstract will be announced closer to the workshop date. Speaker BioSpeaker bio will be announced closer to the workshop date. |
| 14:50 – 15:10 |
Keynote 11
Keynote
Prof. Dr. Daniel Cremers Technical University of Munich, Germany AbstractAbstract will be announced closer to the workshop date. Speaker BioSpeaker bio will be announced closer to the workshop date. |
| 15:10 – 15:30 |
Keynote 12
Keynote
Prof. Dr. Angela Dai Technical University of Munich, Germany AbstractAbstract will be announced closer to the workshop date. Speaker BioSpeaker bio will be announced closer to the workshop date. |
| 15:30 – 16:00 | Panel Discussion II: Academic Track Panel |
| 16:00 – 16:30 | Poster Session IV & Coffee Break Posters |
| 16:30 – 16:40 | Oral Presentation 1 Oral Calibration-Free View-Agnostic Monocular 3D Object Detection for Urban Scenes |
| 16:40 – 16:50 | Oral Presentation 2 Oral 4-D Radar Meets LiDAR and Camera: Cooperative Perception under Adverse Weather |
| 16:50 – 17:00 | Oral Presentation 3 Oral Rethinking Intermediate Module Utilization in V2X End-to-End Autonomous Driving |
| 17:00 – 17:10 | Oral Presentation 4 Oral DinoRADE: Full Spectral Radar-Camera Fusion with Vision Foundation Model Features for Multi-class Object Detection in Adverse Weather |
| 17:10 – 17:20 | Oral Presentation 5 Oral GaussianDet3D: Bridging Gaussian Splatting and Sparse LiDAR Detection for Multi-View 3D Object Detection |
| 17:20 – 17:30 | Paper Awards Ceremony – Best Paper, Runner-Up, Best Application Paper, Best Poster & Best Keynote |
| 17:30 – 17:40 | Challenge Winner Presentation |
| 17:40 – 17:50 | Challenge Awards Ceremony |
| 17:50 – 18:00 | Closing Remarks & Group Photo |
| 19:00 – 21:00 | Workshop Reception & Networking |
Final schedule, room allocation, and speaker order will be announced closer to the workshop date.
DriveX 2026 invites high-quality contributions on foundation models, cooperative perception, large driving models, and related topics outlined above.
We welcome:
Submissions must follow the official CVPR 2026 style: LaTeX or Typst.
Second and third place of each challenge will receive an award certificate.
🔥All tracks are open now
V2I-Based Cooperative Perception
Infrastructure–vehicle fusion using
TUMTraf-V2X.
Focus on cooperative 3D detection and tracking with infrastructure-mounted LiDAR, radar, and cameras,
emphasizing occlusion handling, long-range awareness, and reliability under real-world conditions.
@INPROCEEDINGS{10658179,
author={Zimmer, Walter and Wardana, Gerhard Arya and Sritharan, Suren and Zhou, Xingcheng and Song, Rui and Knoll, Alois C.},
booktitle={2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
title={TUMTraf V2X Cooperative Perception Dataset},
year={2024},
volume={},
number={},
pages={22668-22677},
keywords={Point cloud compression;Solid modeling;Three-dimensional displays;Computational modeling;Object detection;Road safety;Sensors;Autonomous Driving;Deep Learning;Cooperative Perception;3D Object Detection;V2X;Dataset;Camera-LiDAR Fusion;Roadside Infrastructure;ITS;Tracking;Fusion;Annotation Tool;Labeling Tool;Point Cloud;Image;V2I;LiDAR;Camera;GPS;IMU;Transformer;HD Map;Collaborative Perception;Vehicle-Infrastructure Fusion;TUMTraf;OpenLABEL},
doi={10.1109/CVPR52733.2024.02139}
}
Natural Language Instruction for Human Interaction and Vision-Language Navigation
Built upon
doScenes.
Participants design models for natural language instructions and visual language navigation
to facilitate research on human-vehicle instruction interactions.
@inproceedings{roy2025doscenes,
title={doScenes: An Autonomous Driving Dataset with Natural Language Instruction for Human Interaction and Vision-Language Navigation},
author={Parthib Roy and Srinivasa Perisetla and Shashank Shriram and Harsha Krishnaswamy and Aryan Keskar and Ross Greer},
booktitle={2025 IEEE 28th International Conference on Intelligent Transportation Systems (ITSC)},
pages={1651--1658},
year={2025},
organization={IEEE},
url={https://arxiv.org/abs/2412.05893},
}
Multi-Agent Reasoning
Using
MDrive
teams explore a cooperative driving benchmark for end-to-end closed-loop multi-agent systems.
@misc{MDrive,
author = {Marco Coscoy and Zhiyu Huang and Johnson Liu and Jiaqi Ma and Angela Magtoto and Rui Song and Henry Wei and Zhihao Zhao and Bolei Zhou and Zewei Zhou and Walter Zimmer},
title = {MDrive: A Cooperative Driving Benchmark for End-to-End Closed-loop Multi-Agent System},
year = {2026},
howpublished = {\url{https://github.com/ucla-mobility/MDrive}}
}
Competition Timeline
Top-performing teams will be invited to present at the workshop and will receive money prizes ($300) and award certificates. Detailed rules, baselines, and submission instructions are available on the official challenge page.
UC Merced
UCLA
UCLA
UC Riverside
Nourah Bint Abdulrahman University
Uni Tübingen
Cruise
UC Merced
UCLA
Uni. of Illinois at Urbana-Champaign
UC Merced
Purdue Uni.
Cornell Uni
UC Merced
UC Merced
Cornell Uni.
UCLA
Fraunhofer IVI & TUM
UC Riverside
Stanford
Waymo
UCLA
UC Berkeley
UCLA
UCLA
UCLA
Motional is a leading autonomous driving technology company, established as a venture between Hyundai Motor Group and Aptiv, that develops Level 4 driverless vehicles designed for safe and accessible urban transportation. The company currently operates commercial robotaxi and delivery services using its all-electric Hyundai IONIQ 5 platform through high-profile partnerships with networks such as Uber and Lyft.
Nomadic's Visual AI automatically discovers rare edge cases, multi-agent interactions, and safety-critical scenarios from fleet-scale video and V2X sensor streams. The platform provides the high-signal training data and evaluation benchmarks that foundation models need to bridge simulation and real-world deployment, trusted by the world's leading autonomous vehicle and robotics companies.
Qualcomm is a leading technology company whose Snapdragon Ride platform combines high-performance automotive chips, AI-driven perception, and advanced driver-assistance software to enable scalable ADAS and automated driving capabilities for automakers worldwide. Its Snapdragon Ride Pilot system, co-developed with partners like BMW and integrated with real-world data, supports hands-free highway and urban driving scenarios while helping vehicle makers accelerate safety-focused automated driving solutions.
DriveX 2026 welcomes sponsorship from industry, startups, and institutions interested in foundation models, cooperative perception, simulation, and large-scale autonomous driving systems.
For sponsorship opportunities, please contact: wz@ucla.edu.