Ctrl-Crash: Controllable Diffusion for Realistic Car Crashes

Video diffusion techniques have advanced significantly in recent years; however, they struggle to generate realistic imagery of car crashes due to the scarcity of accident events in most driving datasets. Improving traffic safety requires realistic and controllable accident simulations. To tackle the problem, we propose Ctrl-Crash, a controllable car crash video generation model that conditions on signals such as bounding boxes, crash types, and an initial image frame. Our approach enables counterfactual scenario generation where minor variations in input can lead to dramatically different crash outcomes. To support fine-grained control at inference time, we leverage classifier-free guidance with independently tunable scales for each conditioning signal. Ctrl-Crash achieves state-of-the-art performance across quantitative video quality metrics (e.g., FVD and JEDi) and qualitative measurements based on a human-evaluation of physical realism and video quality compared to prior diffusion-based methods.

BibTeX


        @misc{gosselin2025ctrlcrashcontrollablediffusionrealistic,
          title={Ctrl-Crash: Controllable Diffusion for Realistic Car Crashes}, 
          author={Anthony Gosselin and Ge Ya Luo and Luis Lara and Florian Golemo and Derek Nowrouzezahrai and Liam Paull and Alexia Jolicoeur-Martineau and Christopher Pal},
          year={2025},
          eprint={2506.00227},
          archivePrefix={arXiv},
          primaryClass={cs.CV},
          url={https://arxiv.org/abs/2506.00227}, 
        }

References

[1] Cosmos - Niket Agarwal, Arslan Ali, Maciej Bala, Yogesh Balaji, Erik Barker, Tiffany Cai, Prithvijit Chattopadhyay, Yongxin Chen, Yin Cui, Yifan Ding, et al. Cosmos world foundation model platform for physical ai, 2025
[2] Sora - Tim Brooks, Bill Peebles, Connor Holmes, Will DePue, Yufei Guo, and OpenAI. Video generation models as world simulators, 2024
[3] AVD2 - Cheng Li, Keyuan Zhou, Tong Liu, Yu Wang, Mingqiao Zhuang, Huan-ang Gao, Bu Jin, and Hao Zhao. Avd2: Accident video diffusion for accident video description, 2025
[4] DrivingGen - Zipeng Guo, Yuchen Zhou, and Chao Gou. Drivinggen: Efficient safety-critical driving video generation with latent diffusion models. In 2024 IEEE International Conference on Multimedia and Expo (ICME), pages 1–6, 2024
[5] Ctrl-V - Ge Ya Luo, ZhiHao Luo, Anthony Gosselin, Alexia Jolicoeur-Martineau, and Christopher Pal. Ctrl-v: Higher fidelity autonomous vehicle video generation with bounding-box controlled object motion. Transactions on Machine Learning Research, 2025

💥 Ctrl-Crash: Controllable Diffusion for Realistic Car Crashes

Generated Crash Videos 🚗💥

Abstract

Counterfactual Crash Generation

Generation of Scenarios with Varying Crash Types

Crash Reconstruction

Crash Prediction

Crash Generation from Non-Crash Data

Baseline Comparisons

Cosmos (Cosmos-Predict1-7B-Video2World) [1]

Sora (Text2Video) [2]

AVD2 [3]

DrivingGen [4]

Ctrl-V [5]

Ctrl-Crash (Ours)

BibTeX

References