Selective Temporal Fusion using Recurrent Attention for End-to-End Autonomous Driving

Andreas Bentzen Winje; Florian Wintel; Gabriel Hanssen Kiss; Frank Lindseth

doi:10.5324/zpjv0r03

Authors

Andreas Bentzen Winje https://orcid.org/0009-0001-8409-9364
Florian Wintel https://orcid.org/0009-0000-2592-6166
Gabriel Hanssen Kiss https://orcid.org/0000-0001-5024-1548
Frank Lindseth https://orcid.org/0000-0002-4979-9218

DOI:

https://doi.org/10.5324/zpjv0r03

Keywords:

End-to-End Autonomous Driving, Recurrent Neural Networks, Imitation Learning, Attention, Temporal Processing, CARLA

Abstract

In end-to-end autonomous driving (E2E-AD), understanding the complex and dynamic environment of the driving scene is crucial. Temporal information supports this by extending perception beyond what is observable in a single frame. While some E2E-AD architectures, like TransFuser++, operate without temporal modeling, various methods for temporal fusion have been explored, from frame-stacking to memory-based methods and most recently, attention-based recurrent methods. However, existing recurrent attention methods lack a mechanism for forgetting information, distributing attention across all past features even when they are no longer relevant. In this paper, we present a recurrent attention-based temporal fusion module (TFM) with selective forgetting, designed as a drop-in extension for E2E-AD architectures. The TFM fuses current and past information using cross-attention, enabling temporal modeling with minimal impact on inference time, and allows for interpretable retention through attention weight visualization. We integrate a selection mechanism using a void token to allow selective forgetting of irrelevant past information. Applied to the TransFuser++ architecture, our method achieves a driving score of 83.69% on the closed-loop Bench2Drive benchmark and provides qualitative insights into how models retain past information. These results demonstrate its potential as a temporal extension to otherwise temporally unaware architectures.

Downloads

Download data is not yet available.

Selective Temporal Fusion using Recurrent Attention for End-to-End Autonomous Driving

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

Issue

Section

License

How to Cite