DriveX - Foundation Models for V2X-based Cooperative Autonomous Driving 🚗

Introduction

The 3rd edition of the full-day workshop explores advances in Foundation Models, Cooperative Autonomous Driving and 3D Perception. Topics include V2X communication, cooperative perception, 3D object detection, semantic segmentation, sensor fusion, and Vision-Language Models (VLMs). V2X communication enables vehicles and roadside infrastructure to share real-time information, improving situational awareness, extending sensory range, and providing early warnings for hazards. Cooperative perception addresses limitations of single-vehicle perception by fusing sensor data from multiple sources, which is crucial for understanding complex traffic scenarios. Foundation models, such as VLMs, have demonstrated strong generalization abilities by leveraging large-scale, cross-domain data. These models enable zero-shot learning, open-vocabulary object recognition, and scene interpretation, allowing autonomous vehicles to better handle unseen objects and novel traffic situations. Furthermore, we also explore Large Language Models (LLMs) to enhance perception accuracy, dataset curation, and novelty detection. By uniting experts across perception, V2X, and foundation model domains, this workshop aims to foster innovation in cooperative autonomous driving and intelligent transportation systems. Previous workshops editions are available here: https://drivex-workshop.github.io

Topics

3D Environment Perception

3D Scene Understanding
3D Occupancy Prediction
3D Instance Segmentation
3D Detection and Tracking

Cooperative Perception

V2X Communication
Vehicle-Infrastructure Fusion
Roadside & ITS Sensors (RSUs)
Multi-modal Sensor Data Fusion

Foundation Models (FMs)

LLM-assisted Perception & Prediction
Vision-Language Models (VLMs)
FMs for Dataset Curation & Labeling
FMs for Accident & Novelty Detection

Schedule

08:00 - 08:10	Introduction
08:10 - 08:30	Research Highlights — Dr. Walter Zimmer (TUM)
08:30 - 08:50	Keynote Presentation 1 — Prof. Dr. Jiaqi Ma (UCLA)
08:50 - 09:10	Keynote Presentation 2 — Dr. Xiao Wang (Anhui University)
09:10 - 09:30	Keynote Presentation 3 — Prof. Dr. Bassam Alrifaee (UniBW Munich) Infrastructure-Assisted Perception and Foundation Models in Cooperative Autonomous Driving.
09:30 - 09:50	Keynote Presentation 4 — Prof. Dr. Manabu Tsukada (Tokyo Uni.)
10:00 - 10:30	Morning Refreshment & Coffee Break
10:30 - 10:50	Keynote Presentation 5 (remote) — Prof. Dr. Aleksandr Petiushko (Sofia University) Safe Planning in Autonomous Driving
10:50 - 11:10	Keynote Presentation 6 — Dr. Elnaz Irannezhad (UNSW Sydney) Intelligent Ports of the Future: Generative AI-Driven Decision-Making in Logistics
11:10 - 11:30	Keynote Presentation 7 — Prof. Jonathan Sprinkle (Vanderbilt Uni.) Vehicle to Infrastructure Communication and Lagrangian Variable Speed Controllers
11:30 - 11:50	Keynote Presentation 8 — Prof. Dr. Ziran Wang (Purdue Uni.) ViLaD: Large Vision-Language Diffusion Models for End-to-End Autonomous Driving
11:50 - 12:30	Panel Discussion I
12:30 - 13:30	Lunch
13:30 - 13:50	Keynote Presentation 9 — Prof. Dr. Cathy Wu (MIT) Empowering Smart Mobility with Computational Social Intelligence
13:50 - 14:10	Keynote Presentation 10 — Dr. Mao Shan (Uni. of Sydney) Foundational Models for Animal Detection on Roads
14:10 - 14:30	Keynote Presentation 11 — Linda Lim (UC Berkeley) From Corridors to Networks: Data-Driven Traffic Modeling for Connected and Automated Vehicles
14:30 - 15:00	Panel Discussion II
15:00 - 15:10	Paper Pitch 1 — Zhenxing Ming (University of Sydney) OccCylindrical: Multi-Modal Fusion with Cylindrical Representation for 3D Semantic Occupancy Prediction
15:10 - 15:20	Paper Pitch 2 — Roy Parthib (UC Merced) DoScenes: An Autonomous Driving Dataset with Natural Language Instruction
15:20 - 15:30	Paper Pitch 3 — Michael Kösel (Uni Ulm) ALOOD: LiDAR-Based Out-Of-Distribution Object Detection
15:30 - 16:00	Afternoon Refreshments & Coffee Break
16:00 - 16:10	Paper Pitch 4 — Mateus Karvat (Queen's University) Adver-City: Multi-Modal Dataset under Adverse Weather
16:10 - 16:20	Paper Pitch 5 — William Barbour (Vanderbilt Uni.) Persistent Monitoring and Analysis at a Corridor Scale
16:20 - 16:30	Paper Pitch 6 — Felix Neumann (Siemens) LiDAR Ground Segmentation with Gaussian Mixture Models
16:30 - 16:40	Paper Pitch 7 — Zhichao Liu (Southeast University) AlignOcc: LiDAR-Camera Fusion for 3D Occupancy Prediction
16:40 - 16:50	Paper Pitch 8 — Jörg Gamerdinger (Uni Tuebingen) SnowyLane: Lane Detection on Snow-Covered Rural Roads
16:50 - 17:00	Paper Pitch 9 — Tzu-Yun Tseng (University of Sydney) Panoptic-CUDAL: Rural Australia Point Cloud Dataset
17:00 - 17:10	Award Ceremony — Dr. Walter Zimmer (TUM)
17:10 - 17:20	Closing & Final Remarks — Dr. Walter Zimmer (TUM)
17:20 - 17:30	Group Picture — All organizers & speakers