DriveX 🚗 (2nd Edition)

Workshop on Foundation Models for

V2X-Based Cooperative Autonomous Driving

In conjunction with ICCV 2025, October 19 in Honolulu, Hawai'i, USA

Introduction

The 2nd Edition of our DriveX Workshop explores the integration of foundation models and V2X-based cooperative systems to improve perception, planning, and decision-making in autonomous vehicles. While traditional single-vehicle systems have advanced tasks like 3D object detection, emerging challenges like holistic scene understanding and 3D occupancy prediction require more comprehensive solutions. Collaborative driving systems, utilizing V2X communication and roadside infrastructure, extend sensory range, provide hazard warnings, and improve decision-making through shared data. Simultaneously, foundation models like Vision-Language Models (VLMs) offer generalization abilities, enabling zero-shot learning, open-vocabulary recognition, and scene explanation for novel scenarios. Recent advancements in end-to-end systems and foundation models like DriveLLM further enhance autonomous systems. The workshop aims to bring together experts to explore these technologies, address challenges, and advance road safety.

Topics

  • Foundation Models for Cooperative Autonomous Driving and Intelligent Transportation Systems
  • Vision-Language Models (VLMs) for Traffic Scene Understanding
  • Large Language Model (LLM)-assisted Cooperative Systems
  • Cooperative Perception & V2X communication for Autonomous Vehicles
  • Dataset Curation and Data Labeling for Autonomous Driving
  • Datasets and Benchmarks for Foundation Models and Cooperative Perception
  • 3D Occupancy Prediction, 3D Object Detection, 3D Semantic Segmentation, and 3D Scene Understanding
  • End-to-end Perception and Real-time Decision-Making Systems
  • Vehicle-to-Infrastructure (V2I) Interaction

Schedule

08:30 - 08:50 Opening Remarks (Welcome & Introduction)
08:50 - 09:20 Keynote 1: Dr. Mingxing Tan (Waymo)
09:20 - 09:50 Keynote 2: Prof. Dr. Jiaqi Ma (University of Los Angeles, UCLA)
09:50 - 10:20 Keynote 3: Prof. Dr. Sharon Li (University of Wisconsin-Madison)
10:20 - 10:30 Coffee Break
10:30 - 11:00 Keynote 4: Prof. Dr. Manmohan Chandraker (University of California San Diego, UCSD)
11:00 - 11:30 Keynote 5: Prof. Dr. Trevor Darrell (University of California Berkeley, UCB))
11:30 - 12:00 Panel Discussion I
12:00 - 13:00 Lunch Break
13:00 - 13:30 Keynote 6: Prof. Dr. Philip Krähenbühl (University of Texas at Austin)
13:30 - 14:00 Keynote 7: Prof. Dr. Daniel Cremers (Technical University of Munich, TUM)
14:00 - 14:30 Keynote 8: Prof. Dr. Marc Pollefeys (ETH Zurich)
14:30 - 15:00 Keynote 9: Prof. Dr. Angela Dai (Technical University of Munich, TUM)
15:00 - 15:10 Coffee Break
15:10 - 15:40 Keynote 10: Prof. Dr. Alina Roitberg (University of Stuttgart)
15:40 - 16:10 Panel Discussion II
16:10 - 16:20 Oral Paper Presentation 1
16:20 - 16:30 Oral Paper Presentation 2
16:30 - 16:40 Oral Paper Presentation 3
16:40 - 16:50 Oral Paper Presentation 4
16:50 - 17:00 Oral Paper Presentation 5
17:00 - 17:10 Best Paper Presentation & Best Paper Award
17:10 - 17:20 Competition Presentation & Competition Winner Award
17:20 - 17:30 Group Picture (with all organizers and speakers)
17:30 - 18:00 Poster Session
18:00 - 20:00 Social Mixer, Networking, Dinner (Location will be announced during the workshop)

Confirmed Keynote Speakers













Paper Track

We accept novel full 8-page papers for publication in the proceedings, and either shorter 4-page extended abstracts or 8-page papers of previously published work that will not be included in the proceedings. Full papers should use the official LaTeX or Typst ICCV 2025 template. For extended abstracts we recommend to use the same template.

Paper Awards

Challenge

We host multiple challenges based on our TUM Traffic Datasets, e.g. TUMTraf V2X Cooperative Perception Dataset (CVPR'24) which includes high-quality, real-world V2X perception data for the cooperative 3D object detection and tracking task in autonomous driving. The datasets are available here. We provide a dataset development kit to work with the dataset.

Competition Timeline:
The best-performing teams will be invited to present their solutions during the workshop, and the winners will receive prizes and recognition for their contributions to the field.

Challenge Awards

Accepted Papers

Learning 3D Perception from Others' Predictions

Jinsu Yoo, Zhenyang Feng, Tai-Yu Pan, Yihong Sun, Cheng Perng Phoo, Xiangyu Chen, Mark Campbell, Kilian Q Weinberger, Bharath Hariharan, Wei-Lun Chao

PDF

Drive-R1: Bridging Reasoning and Planning in VLMs for Autonomous Driving with Reinforcement Learning

Yue Li, Meng Tian, Dechang Zhu, Jiangtong Zhu, Zhenyu Lin, Zhiwei Xiong, Xinhai Zhao

PDF

Contextual-Personalized Adaptive Cruise Control via Fine-Tuned Large Language Models

Ziye Qin, Xue Yao, Chuheng Wei, Ang Ji, Guoyuan Wu, Zhanbo Sun

PDF

RG-Attn: Radian Glue Attention for Multi-modal Multi-agent Cooperative Perception

Lantao Li, Kang Yang, Wenqi Zhang, Xiaoxue Wang, Chen Sun

PDF

D3FNet: A Differential Attention Fusion Network for Fine-Grained Road Structure Extraction in Remote Perception Systems

Chang Liu, Yang Xu, Tamas Sziranyi

PDF

Understanding What Vision-Language Models See in Traffic: PixelSHAP for Object-Level Attribution in Autonomous Driving

Roni Goldshmidt

PDF

SlimComm: Doppler-Guided Sparse Queries for Bandwidth-Efficient Cooperative 3-D Perception

Melih Yazgan, Qiyuan Wu, Iramm Hamdard, Shiqi Li, J. Marius Zoellner

PDF

Robust Scenario Mining Assisted by Multimodal Semantics

Yifei Chen, Ross Greer

PDF

MAP: End-to-End Autonomous Driving with Map-Assisted Planning

Huilin Yin, Yiming Kan, Daniel Watzenig

PDF

The Role of Radar in End-to-End Autonomous Driving

Philipp Wolters, Johannes Gilg, Torben Teepe, Gerhard Rigoll

PDF

Improving Event-Phase Captions in Multi-View Urban Traffic Videos via Prompt-Aware LoRA Tuning of Vision Language Models

Ricardo Ornelas, Alton Chao, Shyam Gupta, Edmund Chao, Ross Greer

PDF

Research Challenges and Progress in the End-to-End V2X Cooperative Autonomous Driving Competition

Ruiyang Hao, Haibao Yu, Jiaru Zhong, Chuanye Wang, Jiahao Wang, Yiming Kan, Wenxian Yang, Siqi Fan, Huilin Yin, Jianing Qiu, Yao Mu, Jiankai Sun, Li Chen, Walter Zimmer, Dandan Zhang, Shanghang Zhang, Mac Schwager, Ping Luo, Zaiqing Nie

PDF

MIC-BEV: Infrastructure-Based Multi-Camera Bird's-Eye-View Perception Transformer for 3D Object Detection

Yun Zhang, Zhaoliang Zheng, Johnson Liu, Zhiyu Huang, Zewei Zhou, Zonglin Meng, Tianhui Cai, Jiaqi Ma

PDF

V2XPnP: Vehicle-to-Everything Spatio-Temporal Fusion for Multi-Agent Perception and Prediction

Zewei Zhou, Hao Xiang, Zhaoliang Zheng, Seth Z. Zhao, Mingyue Lei, Yun Zhang, Tianhui Cai, Xinyi Liu, Johnson Liu, Maheswari Bajji, Xin Xia, Zhiyu Huang, Bolei Zhou, Jiaqi Ma

PDF

Scene-Aware Location Modeling for Data Augmentation in Automotive Object Detection

Jens Petersen

PDF

Multi-modal Large Language Model for Training-free Vision-based Driver State

Chuanfei Hu, Xinde Li, Hang Shao

PDF

HetroD: A High-Fidelity Drone Dataset and Benchmark for Heterogeneous Traffic in Autonomous Driving

Yu-Hsiang Chen, Wei-Jer Chang, Wei Zhan, Masayoshi Tomizuka, Yi-Ting Chen

PDF

V2X-based Logical Scenario Understanding with Vision-Language Models

Cheng-Liang Chi, Zi-Hui Li, Yu-Hsiang Chen, Yi-Ting Chen

PDF

Cross-camera Monocular 3D Detection for Autonomous Racing

Elena Govi, Davide Malvezzi, Davide Sapienza, Marko Bertogna

PDF

Organizers

Invited Program Committee

Sponsors

We are currently seeking for further sponsorship opportunities and would be delighted to discuss potential collaborations. Interested parties are kindly requested to contact us via email at walter.zimmer@cs.tum.edu for further details.