💥 Ctrl-Crash: Controllable Diffusion for Realistic Car Crashes

Anthony Gosselin, Ge Ya Luo, Luis Lara, Florian Golemo,
Derek Nowrouzezahrai, Liam Paull, Alexia Jolicoeur-Martineau, and Christopher Pal
Mila - Quebec AI Institute

Generated Crash Videos 🚗💥

Abstract

Video diffusion techniques have advanced significantly in recent years; however, they struggle to generate realistic imagery of car crashes due to the scarcity of accident events in most driving datasets. Improving traffic safety requires realistic and controllable accident simulations. To tackle the problem, we propose Ctrl-Crash, a controllable car crash video generation model that conditions on signals such as bounding boxes, crash types, and an initial image frame. Our approach enables counterfactual scenario generation where minor variations in input can lead to dramatically different crash outcomes. To support fine-grained control at inference time, we leverage classifier-free guidance with independently tunable scales for each conditioning signal. Ctrl-Crash achieves state-of-the-art performance across quantitative video quality metrics (e.g., FVD and JEDi) and qualitative measurements based on a human-evaluation of physical realism and video quality compared to prior diffusion-based methods.

Ctrl_Crash_teaser
Ground-truth
Generated

Counterfactual Crash Generation

Generation of Scenarios with Varying Crash Types

These examples illustrate scenarios conditioned for multiple distinct crash types (which describe which actors are involved in the crash):

Ground-truth

No Crash
Ego-Only crash
Ego-and-Vehicle crash
Vehicle-Only crash
Vehicle-and-Vehicle crash

Crash Reconstruction

Crash predicted by Ctrl-Crash using only the initial ground-truth frame and all bounding-box frames as input:

Ground-truth Clip
Bounding-box Frames
Predicted Crash Clip

Crash Prediction

Crash predicted by Ctrl-Crash using the initial frame and the first 9 bounding-box frames as input (white frames indicate that the bounding-boxes were masked):

Ground-truth Clip
Bounding-box Frames
Predicted Crash Clip

Crash Generation from Non-Crash Data

Generating crashes from the non-accident BDD100K dataset by conditioning on the initial frame and the first 9 bounding-box frames:

Ground-truth
Generated Crash

Baseline Comparisons

Other methods struggle to generate realistic crashes

BibTeX


        @misc{gosselin2025ctrlcrashcontrollablediffusionrealistic,
          title={Ctrl-Crash: Controllable Diffusion for Realistic Car Crashes}, 
          author={Anthony Gosselin and Ge Ya Luo and Luis Lara and Florian Golemo and Derek Nowrouzezahrai and Liam Paull and Alexia Jolicoeur-Martineau and Christopher Pal},
          year={2025},
          eprint={2506.00227},
          archivePrefix={arXiv},
          primaryClass={cs.CV},
          url={https://arxiv.org/abs/2506.00227}, 
        }
  

References

  1. [1] Cosmos - Niket Agarwal, Arslan Ali, Maciej Bala, Yogesh Balaji, Erik Barker, Tiffany Cai, Prithvijit Chattopadhyay, Yongxin Chen, Yin Cui, Yifan Ding, et al. Cosmos world foundation model platform for physical ai, 2025
  2. [2] Sora - Tim Brooks, Bill Peebles, Connor Holmes, Will DePue, Yufei Guo, and OpenAI. Video generation models as world simulators, 2024
  3. [3] AVD2 - Cheng Li, Keyuan Zhou, Tong Liu, Yu Wang, Mingqiao Zhuang, Huan-ang Gao, Bu Jin, and Hao Zhao. Avd2: Accident video diffusion for accident video description, 2025
  4. [4] DrivingGen - Zipeng Guo, Yuchen Zhou, and Chao Gou. Drivinggen: Efficient safety-critical driving video generation with latent diffusion models. In 2024 IEEE International Conference on Multimedia and Expo (ICME), pages 1–6, 2024
  5. [5] Ctrl-V - Ge Ya Luo, ZhiHao Luo, Anthony Gosselin, Alexia Jolicoeur-Martineau, and Christopher Pal. Ctrl-v: Higher fidelity autonomous vehicle video generation with bounding-box controlled object motion. Transactions on Machine Learning Research, 2025