DriveX 🚗

Workshop on Foundation Models for

V2X-Based Cooperative Autonomous Driving

In conjunction with CVPR 2025, June 12 in Nashville, USA

https://us06web.zoom.us/j/89345633672?pwd=oncpaCSbRoiDZpYuAdfoeUylt1RWWs.1

Thursday 08:00 - 18:00 - Room: TBA

Introduction

The DriveX Workshop explores the integration of foundation models and V2X-based cooperative systems to improve perception, planning, and decision-making in autonomous vehicles. While traditional single-vehicle systems have advanced tasks like 3D object detection, emerging challenges like holistic scene understanding and 3D occupancy prediction require more comprehensive solutions. Collaborative driving systems, utilizing V2X communication and roadside infrastructure, extend sensory range, provide hazard warnings, and improve decision-making through shared data. Simultaneously, foundation models like Vision-Language Models (VLMs) offer generalization abilities, enabling zero-shot learning, open-vocabulary recognition, and scene explanation for novel scenarios. Recent advancements in end-to-end systems and foundation models like DriveLLM further enhance autonomous systems. The workshop aims to bring together experts to explore these technologies, address challenges, and advance road safety.

Topics

  • Foundation Models for Cooperative Autonomous Driving and Intelligent Transportation Systems
  • Vision-Language Models (VLMs) for Traffic Scene Understanding
  • Large Language Model (LLM)-assisted Cooperative Systems
  • Communication-Efficient Cooperative Perception for Autonomous Vehicles
  • Efficient and Intelligent Vehicle-to-everything (V2X) Communication
  • Dataset Curation and Data Labeling for Autonomous Driving
  • Datasets and Benchmarks for Foundation Models and Cooperative Perception
  • 3D Object Detection and Semantic Segmentation of Vulnerable Road Users (VRUs)
  • 3D Occupancy Prediction and Scene Understanding
  • End-to-end Perception and Real-time Decision-Making Systems
  • Vehicle-to-Infrastructure (V2I) Interaction
  • Safety and Standards in Autonomous Driving

Schedule

08:00 - 08:10 Opening Remarks & Welcome
08:10 - 08:30 Intro: Walter Zimmer (Roadside 3D Perception for Autonomous Driving), TUM, Germany
08:30 - 08:50 Keynote 1: (remote) Maria Lyssenko (Technical Uni. of Munich, TUM & BOSCH, Germany)
08:50 - 09:10 Keynote 2: Prof. Dr. Jiaqi Ma (Uni. of California, Los Angeles, UCLA, USA)
09:10 - 09:30 Keynote 3: Dr. Mingxing Tan (Waymo, USA)
09:30 - 09:50 Keynote 4: Prof. Dr. Philipp Krähenbühl (University of Texas at Austin)
09:50 - 10:00 Coffee Break
10:00 - 10:20 Keynote 5: (remote) Prof. Dr. Cyrill Stachniss (University of Bonn, Germany)
10:20 - 10:40 Keynote 6: Prof. Dr. Trevor Darrell (University of California Berkeley, USA)
10:40 - 11:00 Keynote 7: Prof. Dr. Marco Pavone (NVIDIA & Stanford University, USA)
11:00 - 11:20 Keynote 8: Prof. Dr. Laura Leal-Taixé (NVIDIA & Technical University of Munich, Italy/Germany)
11:20 - 11:40 Keynote 9: Prof. Dr. Manmohan Chandraker (University of California San Diego, UCSD, USA)
11:40 - 12:00 Panel Discussion I
12:00 - 13:00 Lunch Break
13:00 - 13:20 Keynote 10: Dr. Katie Luo (Stanford University, USA)
13:20 - 13:40 Keynote 11: Prof. Dr. Cheng Feng (New York University, USA)
13:40 - 14:00 Keynote 12: Prof. Dr. Wolfram Burgard (Uni. of Technology Nuremberg, UTN, Germany)
14:00 - 14:20 Keynote 13: Dr. Gianluca Corrado (Wayve, UK)
14:20 - 14:40 Keynote 14: Prof. Dr. Daniel Cremers (Technical Uni. of Munich, TUM, Germany)
14:40 - 15:00 Keynote 15: Prof. Dr. Angela Dai (Technical Uni. of Munich, TUM, Germany)
15:00 - 15:40 Panel Discussion II
15:40 - 15:50 Coffee Break
15:50 - 16:00 Oral Paper Presentation 1
16:00 - 16:10 Oral Paper Presentation 2
16:10 - 16:20 Oral Paper Presentation 3
16:20 - 16:30 Oral Paper Presentation 4
16:30 - 16:40 Oral Paper Presentation 5
16:40 - 16:50 Best Paper Awards (Announcement & Handover of Prizes) + Group Photo
16:50 - 17:10 Keynote 16: (remote) Prof. Dr. Manabu Tsukada (University of Tokyo, Japan)
17:10 - 17:20 Competition Winner Presentation
17:20 - 17:30 Competition Winner Awards (Announcement & Handover of Prizes) + Group Photo
17:30 - 17:30 Announcement of Evening Social Event (Location & Time), Group Picture of all Organizers and Speakers
17:30 - 18:00 Poster Session & Networking
19:00 - 21:00 Social Mixer, Networking, Dinner with Workshop Organizers, Speakers and Invited Guests (Location will be announced during the workshop)

Keynote Speakers

















Paper Track

We accept novel papers (8 pages excl. references) for publication in the proceedings, and shorter extended abstracts (4 pages excl. references). Full papers should use the official LaTeX or Typst CVPR 2025 template. For extended abstracts we recommend to use the same template. Submission Portal: OpenReview
Submission Opens February 26, 2025
Paper Abstract Submission Deadline March 25, 2025 (23:59 PST) March 26, 2025 (23:59 GMT)
Paper Submission Deadline March 25, 2025 (23:59 PST) March 26, 2025 (23:59 GMT)
Notification to Authors March 31, 2025
Camera-ready Submission April 7, 2025 (23:59 GMT)

Paper Awards

Accepted Papers

CooPre: Cooperative Pretraining for V2X Cooperative Perception

Seth Z. Zhao, Hao Xiang, Chenfeng Xu, Xin Xia, Bolei Zhou, Jiaqi Ma

PDF

V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multi-Modal Large Language Models

Hsu-kuang Chiu, Ryo Hachiuma, Chien-Yi Wang, Stephen F Smith, Yu-Chiang Frank Wang, Min-Hung Chen

PDF

V3LMA: Visual 3D-enhanced Language Model for Autonomous Driving

Esteban Rivera, Jannik Lübberstedt, Nico Uhlemann, Markus Lienkamp

PDF

Can Vision-Language Models Understand and Interpret Dynamic Gestures from Pedestrians? Pilot Datasets and Exploration Towards Instructive Nonverbal Commands for Cooperative Autonomous Vehicles

Tonko Emil Westerhof Bossen, Andreas Møgelmose, Ross Greer

PDF

Exploring Modality Guidance to Enhance VFM-based Feature Fusion for UDA in 3D Semantic Segmentation

Johannes Spoecklberger, Pedro Hermosilla, Wei Lin, Sivan Doveh, Horst Possegger, Muhammad Jehanzeb Mirza

PDF

VPOcc: Exploiting Vanishing Point for 3D Semantic Occupancy Prediction

Junsu Kim, Junhee Lee, Ukcheol Shin, Jean Oh, Kyungdon Joo

PDF

Investigating Vision-Language Model for Point Cloud-based Vehicle Classification

Yiqiao Li, Jie Wei, Camille Kamga

PDF

Challenge

We host a 10-week-long challenge based on the TUMTraf V2X Cooperative Perception Dataset (CVPR'24) which includes high-quality, real-world V2X perception data for the cooperative 3D object detection and tracking task in autonomous driving. The dataset is available here. We provide a dataset development kit to work with the dataset. Our TUMTraf V2X dataset contains:

Competition Timeline:
Dataset Final Release: December 31, 2024
Competition Announcement January 8, 2025
Competition Starts (Leaderboard opens on EvalAI) March 17, 2025
Competition Ends (Leaderboard closes on EvalAI) May 26, 2025 (23:59 GMT)
Abstract Submission Deadline (via email) May 26, 2025 (23:59 GMT)
Challenge Report (4-8 pages) Submission Deadline (via email) June 1, 2025 (23:59 GMT)
Notification to Participants (1st, 2nd, 3rd Prize Winner Announcement) June 3, 2025
Video Presentation (10 min) Submission (via email) June 10, 2025 (23:59 GMT)

Evaluation Metrics:
Challenge Submission:
CoopDet3D generates detection (inference) files in OpenLABEL (.json) format (see example). In case you are developing your own model, please output the detection results in the OpenLABEL format following this schema. After you have generated the detection files for the test set (100 frames), please combine them into a single OpenLABEL (.json) file using this script. Test the evaluation locally on your machine using our evaluation script in the dev-kit, before submitting to the leaderboard. Finally, submit the accumulated OpenLABEL prediction file (submission.json) to the EvalAI Leaderboard platform. The best-performing teams will be invited to present their solutions during the workshop, and the winners will receive prizes and recognition for their contributions to the field.

Challenge Awards

Organizers

Invited Program Committee

Sponsors

We sincerely thank Qualcomm Inc., Virtual Vehicle Research GmbH, Abaka AI and Autodriving-Heart for their generous sponsorship of our workshop. Their support enables us to recognize outstanding research through prestigious awards.
We are currently seeking for further sponsorship opportunities and would be delighted to discuss potential collaborations. Interested parties are kindly requested to contact us via email at walter.zimmer@cs.tum.edu for further details.