DriveX 🚗

Workshop on Foundation Models for

V2X-Based Cooperative Autonomous Driving

In conjunction with CVPR 2025, June 12 in Nashville, USA

https://us06web.zoom.us/j/89345633672?pwd=oncpaCSbRoiDZpYuAdfoeUylt1RWWs.1

Thursday 08:00 - 18:00 - Room: TBA

Schedule Speakers Paper Track Accepted Papers Challenge Organizers Programm Committee

Introduction

The DriveX Workshop explores the integration of foundation models and V2X-based cooperative systems to improve perception, planning, and decision-making in autonomous vehicles. While traditional single-vehicle systems have advanced tasks like 3D object detection, emerging challenges like holistic scene understanding and 3D occupancy prediction require more comprehensive solutions. Collaborative driving systems, utilizing V2X communication and roadside infrastructure, extend sensory range, provide hazard warnings, and improve decision-making through shared data. Simultaneously, foundation models like Vision-Language Models (VLMs) offer generalization abilities, enabling zero-shot learning, open-vocabulary recognition, and scene explanation for novel scenarios. Recent advancements in end-to-end systems and foundation models like DriveLLM further enhance autonomous systems. The workshop aims to bring together experts to explore these technologies, address challenges, and advance road safety.

Topics

Foundation Models for Cooperative Autonomous Driving and Intelligent Transportation Systems
Vision-Language Models (VLMs) for Traffic Scene Understanding
Large Language Model (LLM)-assisted Cooperative Systems
Communication-Efficient Cooperative Perception for Autonomous Vehicles
Efficient and Intelligent Vehicle-to-everything (V2X) Communication
Dataset Curation and Data Labeling for Autonomous Driving
Datasets and Benchmarks for Foundation Models and Cooperative Perception
3D Object Detection and Semantic Segmentation of Vulnerable Road Users (VRUs)
3D Occupancy Prediction and Scene Understanding
End-to-end Perception and Real-time Decision-Making Systems
Vehicle-to-Infrastructure (V2I) Interaction
Safety and Standards in Autonomous Driving

Schedule

08:00 - 08:10	Opening Remarks & Welcome
08:10 - 08:30	Intro: Walter Zimmer (Roadside 3D Perception for Autonomous Driving), TUM, Germany
08:30 - 08:50	Keynote 1: (remote) Maria Lyssenko (Technical Uni. of Munich, TUM & BOSCH, Germany)
08:50 - 09:10	Keynote 2: Prof. Dr. Jiaqi Ma (Uni. of California, Los Angeles, UCLA, USA)
09:10 - 09:30	Keynote 3: Dr. Mingxing Tan (Waymo, USA)
09:30 - 09:50	Keynote 4: Prof. Dr. Philipp Krähenbühl (University of Texas at Austin)
09:50 - 10:00	Coffee Break
10:00 - 10:20	Keynote 5: (remote) Prof. Dr. Cyrill Stachniss (University of Bonn, Germany)
10:20 - 10:40	Keynote 6: Prof. Dr. Trevor Darrell (University of California Berkeley, USA)
10:40 - 11:00	Keynote 7: Prof. Dr. Marco Pavone (NVIDIA & Stanford University, USA)
11:00 - 11:20	Keynote 8: Prof. Dr. Laura Leal-Taixé (NVIDIA & Technical University of Munich, Italy/Germany)
11:20 - 11:40	Keynote 9: Prof. Dr. Manmohan Chandraker (University of California San Diego, UCSD, USA)
11:40 - 12:00	Panel Discussion I
12:00 - 13:00	Lunch Break
13:00 - 13:20	Keynote 10: Dr. Katie Luo (Stanford University, USA)
13:20 - 13:40	Keynote 11: Prof. Dr. Cheng Feng (New York University, USA)
13:40 - 14:00	Keynote 12: Prof. Dr. Wolfram Burgard (Uni. of Technology Nuremberg, UTN, Germany)
14:00 - 14:20	Keynote 13: Dr. Gianluca Corrado (Wayve, UK)
14:20 - 14:40	Keynote 14: Prof. Dr. Daniel Cremers (Technical Uni. of Munich, TUM, Germany)
14:40 - 15:00	Keynote 15: Prof. Dr. Angela Dai (Technical Uni. of Munich, TUM, Germany)
15:00 - 15:40	Panel Discussion II
15:40 - 15:50	Coffee Break
15:50 - 16:00	Oral Paper Presentation 1
16:00 - 16:10	Oral Paper Presentation 2
16:10 - 16:20	Oral Paper Presentation 3
16:20 - 16:30	Oral Paper Presentation 4
16:30 - 16:40	Oral Paper Presentation 5
16:40 - 16:50	Best Paper Awards (Announcement & Handover of Prizes) + Group Photo
16:50 - 17:10	Keynote 16: (remote) Prof. Dr. Manabu Tsukada (University of Tokyo, Japan)
17:10 - 17:20	Competition Winner Presentation
17:20 - 17:30	Competition Winner Awards (Announcement & Handover of Prizes) + Group Photo
17:30 - 17:30	Announcement of Evening Social Event (Location & Time), Group Picture of all Organizers and Speakers
17:30 - 18:00	Poster Session & Networking
19:00 - 21:00	Social Mixer, Networking, Dinner with Workshop Organizers, Speakers and Invited Guests (Location will be announced during the workshop)

Keynote Speakers

Prof. Dr. Trevor Darell

UC Berkeley, USA

Prof. Dr. Wolfram Burgard

University of Technology Nuremberg, Germany

Prof. Dr. Daniel Cremers

Technical University of Munich, Germany

Dr. Mingxing Tan

Waymo, USA

Prof. Dr. Philipp Krähenbühl

University of Texas at Austin, USA

Prof. Dr. Cyrill Stachniss

University of Bonn, Germany

Prof. Dr. Marco Pavone

Stanford University & NVIDIA, USA

Prof. Dr. Laura Leal-Taixé

Technical University of Munich, Germany & NVIDIA, Italy

Prof. Dr. Angela Dai

Technical University of Munich, Germany

Prof. Dr. Manmohan Chandraker

University of California San Diego, UCSD, USA

Prof. Dr. Chen Feng

New York University (NYU), USA

Prof. Dr. Jiaqi Ma

University of California Los Angeles (UCLA), USA

Dr. Gianluca Corrado

Wayve, UK

Prof. Dr. Manabu Tsukada

University of Tokyo, Japan

Maria Lyssenko

Technical University of Munich & BOSCH, Germany

Dr. Katie Luo

Stanford University, USA

Paper Track

We accept novel papers (8 pages excl. references) for publication in the proceedings, and shorter extended abstracts (4 pages excl. references). Full papers should use the official LaTeX or Typst CVPR 2025 template. For extended abstracts we recommend to use the same template. Submission Portal: OpenReview

~~Submission Opens~~	~~February 26, 2025~~
Paper Abstract Submission Deadline	~~March 25, 2025 (23:59 PST)~~ March 26, 2025 (23:59 GMT)
Paper Submission Deadline	~~March 25, 2025 (23:59 PST)~~ March 26, 2025 (23:59 GMT)
Notification to Authors	March 31, 2025
Camera-ready Submission	April 7, 2025 (23:59 GMT)

Paper Awards

🏆 Best Paper Award: prize money ($500 in cash) 💰 and a certificate of recognition 📜
🏅 Best Student Paper Award: prize money ($300 in cash) 💰 and a certificate of recognition 📜
🎖️ Best Application Paper Award: prize money ($200 in cash) 💰 and a certificate of recognition 📜

Accepted Papers

CooPre: Cooperative Pretraining for V2X Cooperative Perception

Seth Z. Zhao, Hao Xiang, Chenfeng Xu, Xin Xia, Bolei Zhou, Jiaqi Ma

PDF

V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multi-Modal Large Language Models

Hsu-kuang Chiu, Ryo Hachiuma, Chien-Yi Wang, Stephen F Smith, Yu-Chiang Frank Wang, Min-Hung Chen

PDF

V3LMA: Visual 3D-enhanced Language Model for Autonomous Driving

Esteban Rivera, Jannik Lübberstedt, Nico Uhlemann, Markus Lienkamp

PDF

Can Vision-Language Models Understand and Interpret Dynamic Gestures from Pedestrians? Pilot Datasets and Exploration Towards Instructive Nonverbal Commands for Cooperative Autonomous Vehicles

Tonko Emil Westerhof Bossen, Andreas Møgelmose, Ross Greer

PDF

Exploring Modality Guidance to Enhance VFM-based Feature Fusion for UDA in 3D Semantic Segmentation

Johannes Spoecklberger, Pedro Hermosilla, Wei Lin, Sivan Doveh, Horst Possegger, Muhammad Jehanzeb Mirza

PDF

VPOcc: Exploiting Vanishing Point for 3D Semantic Occupancy Prediction

Junsu Kim, Junhee Lee, Ukcheol Shin, Jean Oh, Kyungdon Joo

PDF

Investigating Vision-Language Model for Point Cloud-based Vehicle Classification

Yiqiao Li, Jie Wei, Camille Kamga

PDF

Challenge

We host a 10-week-long challenge based on the TUMTraf V2X Cooperative Perception Dataset (CVPR'24) which includes high-quality, real-world V2X perception data for the cooperative 3D object detection and tracking task in autonomous driving. The dataset is available here. We provide a dataset development kit to work with the dataset. Our TUMTraf V2X dataset contains:

Data collected by 9 sensors simultaneously from onboard and roadside sensors.
2,000 labeled point clouds and 5,000 labeled images.
30k 3D bounding boxes with track IDs.
Challenging scenarios: near-miss and traffic violation events, overtaking and U-turn maneuvers.
HD maps of the driving domains.
Labels in OpenLABEL standard.
A dev kit to load, preprocess, visualize, convert labels, and to evaluate perception methods.

Competition Timeline:

~~Dataset Final Release:~~	~~December 31, 2024~~
~~Competition Announcement~~	~~January 8, 2025~~
~~Competition Starts (Leaderboard opens on EvalAI)~~	~~March 17, 2025~~
Competition Ends (Leaderboard closes on EvalAI)	May 26, 2025 (23:59 GMT)
Abstract Submission Deadline (via email)	May 26, 2025 (23:59 GMT)
Challenge Report (4-8 pages) Submission Deadline (via email)	June 1, 2025 (23:59 GMT)
Notification to Participants (1st, 2nd, 3rd Prize Winner Announcement)	June 3, 2025
Video Presentation (10 min) Submission (via email)	June 10, 2025 (23:59 GMT)

Evaluation Metrics:

Precision: Ratio of correctly detected objects to the total number of detected objects, measuring accuracy of positive predictions.
Recall: Ratio of correct detections to the total number of actual objects, assessing the model's ability to find all relevant instances.
3D IoU: Ratio of intersection volume to the union volume between predicted and ground truth 3D box, measuring spatial overlap.
Position RMSE: Square root of mean squared differences between predicted and GT positions, evaluating localization accuracy.
Rotation RMSE: Square root of mean squared differences between predicted and GT rotations, quantifying angular estimation errors.
3D mAP: Average precision across intersection-over-union (IoU) thresholds in 3D detection, reflecting overall detection performance.

Challenge Submission:
CoopDet3D generates detection (inference) files in OpenLABEL (.json) format (see example). In case you are developing your own model, please output the detection results in the OpenLABEL format following this schema. After you have generated the detection files for the test set (100 frames), please combine them into a single OpenLABEL (.json) file using this script. Test the evaluation locally on your machine using our evaluation script in the dev-kit, before submitting to the leaderboard. Finally, submit the accumulated OpenLABEL prediction file (submission.json) to the EvalAI Leaderboard platform. The best-performing teams will be invited to present their solutions during the workshop, and the winners will receive prizes and recognition for their contributions to the field.

Challenge Awards

🏆 Challenge Winner Award (1st Place): Gold medal & prize money ($500 in cash) 💰
🥈 Challenge Runner-up Award (2nd Place): Silver medal & prize money ($300 in cash) 💰
🥉 Challenge Third Place Award (3rd Place): Bronze medal & prize money ($200 in cash) 💰