DriveX - Foundation Models for V2X-based Cooperative Autonomous Driving 🚗

Introduction

The 2nd Edition of our DriveX Workshop explores the integration of foundation models and V2X-based cooperative systems to improve perception, planning, and decision-making in autonomous vehicles. While traditional single-vehicle systems have advanced tasks like 3D object detection, emerging challenges like holistic scene understanding and 3D occupancy prediction require more comprehensive solutions. Collaborative driving systems, utilizing V2X communication and roadside infrastructure, extend sensory range, provide hazard warnings, and improve decision-making through shared data. Simultaneously, foundation models like Vision-Language Models (VLMs) offer generalization abilities, enabling zero-shot learning, open-vocabulary recognition, and scene explanation for novel scenarios. Recent advancements in end-to-end systems and foundation models like DriveLLM further enhance autonomous systems. The workshop aims to bring together experts to explore these technologies, address challenges, and advance road safety.

Topics

Foundation Models for Cooperative Autonomous Driving and Intelligent Transportation Systems
Vision-Language Models (VLMs) for Traffic Scene Understanding
Large Language Model (LLM)-assisted Cooperative Systems
Cooperative Perception & V2X communication for Autonomous Vehicles
Dataset Curation and Data Labeling for Autonomous Driving
Datasets and Benchmarks for Foundation Models and Cooperative Perception
3D Occupancy Prediction, 3D Object Detection, 3D Semantic Segmentation, and 3D Scene Understanding
End-to-end Perception and Real-time Decision-Making Systems
Vehicle-to-Infrastructure (V2I) Interaction

Schedule

08:00 - 08:10	Opening Remarks (Welcome & Introduction): Dr. Walter Zimmer
08:10 - 08:30	Opening Keynote: Dr. Walter Zimmer Research Highlights: Roadside 3D Perception for Autonomous Driving
08:30 - 08:50	Keynote 1: Prof. Dr. Angela Schoellig (TUM)
08:50 - 09:10	Keynote 2: Dr. Mingxing Tan (Waymo) Waymo Research: VLMs for E2E Autonomous Driving
09:20 - 09:40	Keynote 3: Prof. Dr. Sharon Li (University of Wisconsin-Madison)
09:40 - 10:00	Keynote 4: Prof. Dr. Jiachen Li (UC Riverside)
10:00 - 11:00	Poster Session I (ExHall II) & Coffee Break
11:00 - 11:20	Keynote 5: Prof. Dr. Jiaqi Ma (University of Los Angeles, UCLA) A Multi-Agent Future of Mobility with Intelligent Vehicles and Infrastructure
11:20 - 12:00	Panel Discussion I
12:00 - 13:00	Lunch Break (Exhibit Hall II / Overflow Seating Rooftop)
13:00 - 13:20	Keynote 6: Prof. Dr. Philip Krähenbühl (University of Texas at Austin) Robust Autonomy Emerges from Self-Play
13:20 - 13:40	Keynote 7: Prof. Dr. Daniel Cremers (Technical University of Munich, TUM) Dynamic 3D Scene Understanding for Autonomous Vehicles
13:40 - 14:00	Keynote 8: Prof. Dr. Marc Pollefeys (ETH Zurich)
14:00 - 14:20	Keynote 9: Prof. Dr. Manmohan Chandraker (University of California San Diego, UCSD) Foundational Models for V2X Intersection Safety
14:20 - 14:30	Oral Paper Presentation 1: Jinsu Yoo Learning 3D Perception from Others' Predictions
14:30 - 14:40	Oral Paper Presentation 2: Lantao Li RG-Attn: Radian Glue Attention for Multi-modal Multi-agent Cooperative Perception
14:40 - 14:50	Oral Paper Presentation 3: Ruiyang Hao Research Challenges and Progress in the End-to-End V2X Cooperative Autonomous Driving Competition
14:50 - 15:00	Oral Paper Presentation 4: Yun Zhang MIC-BEV: Infrastructure-Based Multi-Camera Bird's-Eye-View Perception Transformer for 3D Object Detection
15:00 - 16:00	Poster Session II (ExHall II) & Coffee Break
16:00 - 16:20	Keynote 10: Prof. Dr. Lei Li (University of Virginia) Geometric Modeling of 3D Scenes and Objects
16:20 - 16:40	Keynote 11: Dr. Boris Ivanovic (NVIDIA) & Dr. Yan Wang (NVIDIA)
16:40 - 17:20	Panel Discussion II
17:20 - 17:30	Oral Paper Presentation 5: Zewei Zhou V2XPnP: Vehicle-to-Everything Spatio-Temporal Fusion for Multi-Agent Perception and Prediction
17:30 - 17:40	Awards Ceremony (Best Paper Award, Best Keynote Presentation Award, Best Poster Award)
17:40 - 17:50	Closing & Final Remarks & Announcements
17:50 - 18:00	Group Picture with all organizers and speakers
19:00 - 21:00	Workshop Reception: Social Mixer, Networking, Dinner (Location will be announced during the workshop)

Paper Track

We accept novel full 8-page papers for publication in the proceedings, and either shorter 4-page extended abstracts or 8-page papers of previously published work that will not be included in the proceedings. Full papers should use the official LaTeX or Typst ICCV 2025 template. For extended abstracts we recommend to use the same template.

Submission Portal: OpenReview
Paper Submission Opens: July 3, 2025
Paper Submission Deadline: July 13, 2025 (23:59 PST) (for publication in the proceedings)
Paper Submission Deadline: July 22, 2025 (23:59 PST)

walter.zimmer@cs.tum.edu

Notification to Authors: July 31, 2025
Camera-ready submission: August 15, 2025

Paper Awards

🏆 Best Paper Award
🏆 Best Paper Runner-Up Award
🏆 Best Application Paper Award
🏆 Best Poster Award
🏆 Best Keynote Presentation Award

Learning 3D Perception from Others' Predictions

Jinsu Yoo, Zhenyang Feng, Tai-Yu Pan, Yihong Sun, Cheng Perng Phoo, Xiangyu Chen, Mark Campbell, Kilian Q Weinberger, Bharath Hariharan, Wei-Lun Chao

DriveX 🚗 (2nd Edition)

In conjunction with ICCV 2025, October 19 in Honolulu, Hawai'i, USA

Schedule Speakers Paper Track Accepted Papers Challenge Organizers Programm Committee

Introduction

Topics

Schedule

Confirmed Keynote Speakers

Paper Track

Paper Awards

Accepted Papers

Learning 3D Perception from Others' Predictions

Drive-R1: Bridging Reasoning and Planning in VLMs for Autonomous Driving with Reinforcement Learning

Contextual-Personalized Adaptive Cruise Control via Fine-Tuned Large Language Models

RG-Attn: Radian Glue Attention for Multi-modal Multi-agent Cooperative Perception

D3FNet: A Differential Attention Fusion Network for Fine-Grained Road Structure Extraction in Remote Perception Systems

Understanding What Vision-Language Models See in Traffic: PixelSHAP for Object-Level Attribution in Autonomous Driving

SlimComm: Doppler-Guided Sparse Queries for Bandwidth-Efficient Cooperative 3-D Perception

Robust Scenario Mining Assisted by Multimodal Semantics

MAP: End-to-End Autonomous Driving with Map-Assisted Planning

The Role of Radar in End-to-End Autonomous Driving

Improving Event-Phase Captions in Multi-View Urban Traffic Videos via Prompt-Aware LoRA Tuning of Vision Language Models

Research Challenges and Progress in the End-to-End V2X Cooperative Autonomous Driving Competition

MIC-BEV: Infrastructure-Based Multi-Camera Bird's-Eye-View Perception Transformer for 3D Object Detection

V2XPnP: Vehicle-to-Everything Spatio-Temporal Fusion for Multi-Agent Perception and Prediction

Scene-Aware Location Modeling for Data Augmentation in Automotive Object Detection

Multi-modal Large Language Model for Training-free Vision-based Driver State

HetroD: A High-Fidelity Drone Dataset and Benchmark for Heterogeneous Traffic in Autonomous Driving

V2X-based Logical Scenario Understanding with Vision-Language Models

Cross-camera Monocular 3D Detection for Autonomous Racing

Challenge

Challenge Awards

Organizers

Invited Program Committee

Sponsors