DriveX 🚗 (2nd Edition)

Workshop on Foundation Models for

V2X-Based Cooperative Autonomous Driving

In conjunction with ICCV 2025, October 19 in Honolulu, Hawai'i, USA

Sunday 08:00 - 18:00 - Room: Ballroom C (Level 4)

Introduction

The 2nd Edition of our DriveX Workshop explores the integration of foundation models and V2X-based cooperative systems to improve perception, planning, and decision-making in autonomous vehicles. While traditional single-vehicle systems have advanced tasks like 3D object detection, emerging challenges like holistic scene understanding and 3D occupancy prediction require more comprehensive solutions. Collaborative driving systems, utilizing V2X communication and roadside infrastructure, extend sensory range, provide hazard warnings, and improve decision-making through shared data. Simultaneously, foundation models like Vision-Language Models (VLMs) offer generalization abilities, enabling zero-shot learning, open-vocabulary recognition, and scene explanation for novel scenarios. Recent advancements in end-to-end systems and foundation models like DriveLLM further enhance autonomous systems. The workshop aims to bring together experts to explore these technologies, address challenges, and advance road safety.

Topics

  • Foundation Models for Cooperative Autonomous Driving and Intelligent Transportation Systems
  • Vision-Language Models (VLMs) for Traffic Scene Understanding
  • Large Language Model (LLM)-assisted Cooperative Systems
  • Cooperative Perception & V2X communication for Autonomous Vehicles
  • Dataset Curation and Data Labeling for Autonomous Driving
  • Datasets and Benchmarks for Foundation Models and Cooperative Perception
  • 3D Occupancy Prediction, 3D Object Detection, 3D Semantic Segmentation, and 3D Scene Understanding
  • End-to-end Perception and Real-time Decision-Making Systems
  • Vehicle-to-Infrastructure (V2I) Interaction

Schedule

08:00 - 08:10 Opening Remarks (Welcome & Introduction): Dr. Walter Zimmer
08:10 - 08:30 Opening Keynote: Dr. Walter Zimmer
Research Highlights: Roadside 3D Perception for Autonomous Driving
08:30 - 08:50 Keynote 1: Prof. Dr. Angela Schoellig (TUM)
08:50 - 09:10 Keynote 2: Dr. Mingxing Tan (Waymo)
Waymo Research: VLMs for E2E Autonomous Driving
09:20 - 09:40 Keynote 3: Prof. Dr. Sharon Li (University of Wisconsin-Madison)
09:40 - 10:00 Keynote 4: Prof. Dr. Manmohan Chandraker (University of California San Diego, UCSD)
Foundational Models for V2X Intersection Safety
10:00 - 11:00 Poster Session I (ExHall II) & Coffee Break
11:00 - 11:20 Keynote 5: Prof. Dr. Jiaqi Ma (University of Los Angeles, UCLA)
A Multi-Agent Future of Mobility with Intelligent Vehicles and Infrastructure
11:20 - 12:00 Panel Discussion I
12:00 - 13:00 Lunch Break (Exhibit Hall II / Overflow Seating Rooftop)
13:00 - 13:20 Keynote 6: Prof. Dr. Philip Krähenbühl (University of Texas at Austin)
Robust Autonomy Emerges from Self-Play
13:20 - 13:40 Keynote 7: Prof. Dr. Daniel Cremers (Technical University of Munich, TUM)
Dynamic 3D Scene Understanding for Autonomous Vehicles
13:40 - 14:00 Keynote 8: Prof. Dr. Marc Pollefeys (ETH Zurich)
14:00 - 14:20 Keynote 9: Prof. Dr. Jiachen Li (UC Riverside)
14:20 - 14:30 Oral Paper Presentation 1: Jinsu Yoo
Learning 3D Perception from Others' Predictions
14:30 - 14:40 Oral Paper Presentation 2: Lantao Li
RG-Attn: Radian Glue Attention for Multi-modal Multi-agent Cooperative Perception
14:40 - 14:50 Oral Paper Presentation 3: Ruiyang Hao
Research Challenges and Progress in the End-to-End V2X Cooperative Autonomous Driving Competition
14:50 - 15:00 Oral Paper Presentation 4: Yun Zhang
MIC-BEV: Infrastructure-Based Multi-Camera Bird's-Eye-View Perception Transformer for 3D Object Detection
15:00 - 16:00 Poster Session II (ExHall II) & Coffee Break
16:00 - 16:20 Keynote 10: Prof. Dr. Angela Dai (Technical University of Munich, TUM)
From Quantity to Quality for 3D Perception
16:20 - 16:40 Keynote 11: Dr. Boris Ivanovic (NVIDIA) & Dr. Yan Wang (NVIDIA)
16:40 - 17:20 Panel Discussion II
17:20 - 17:30 Oral Paper Presentation 5: Zewei Zhou
V2XPnP: Vehicle-to-Everything Spatio-Temporal Fusion for Multi-Agent Perception and Prediction
17:30 - 17:40 Best Paper Awards Ceremony
17:40 - 17:50 Competition & Challenge Awards Ceremony
17:50 - 18:00 Closing & Group Picture with all organizers and speakers
19:00 - 21:00 Workshop Reception: Social Mixer, Networking, Dinner (Location will be announced during the workshop)

Confirmed Keynote Speakers














Paper Track

We accept novel full 8-page papers for publication in the proceedings, and either shorter 4-page extended abstracts or 8-page papers of previously published work that will not be included in the proceedings. Full papers should use the official LaTeX or Typst ICCV 2025 template. For extended abstracts we recommend to use the same template.

Paper Awards

Challenge

We host multiple challenges based on our TUM Traffic Datasets, e.g. TUMTraf V2X Cooperative Perception Dataset (CVPR'24) which includes high-quality, real-world V2X perception data for the cooperative 3D object detection and tracking task in autonomous driving. The datasets are available here. We provide a dataset development kit to work with the dataset.

Competition Timeline:
The best-performing teams will be invited to present their solutions during the workshop, and the winners will receive prizes and recognition for their contributions to the field.

Challenge Awards

Organizers

Invited Program Committee

Sponsors

We are currently seeking for further sponsorship opportunities and would be delighted to discuss potential collaborations. Interested parties are kindly requested to contact us via email at walter.zimmer@cs.tum.edu for further details.