Selective Temporal Fusion using Recurrent Attention for End-to-End Autonomous Driving

Authors

DOI:

https://doi.org/10.5324/zpjv0r03

Keywords:

End-to-End Autonomous Driving, Recurrent Neural Networks, Imitation Learning, Attention, Temporal Processing, CARLA

Abstract

In end-to-end autonomous driving (E2E-AD), understanding the complex and dynamic environment of the driving scene is crucial. Temporal information supports this by extending perception beyond what is observable in a single frame. While some E2E-AD architectures, like TransFuser++, operate without temporal modeling, various methods for temporal fusion have been explored, from frame-stacking to memory-based methods and most recently, attention-based recurrent methods. However, existing recurrent attention methods lack a mechanism for forgetting information, distributing attention across all past features even when they are no longer relevant. In this paper, we present a recurrent attention-based temporal fusion module (TFM) with selective forgetting, designed as a drop-in extension for E2E-AD architectures. The TFM fuses current and past information using cross-attention, enabling temporal modeling with minimal impact on inference time, and allows for interpretable retention through attention weight visualization. We integrate a selection mechanism using a void token to allow selective forgetting of irrelevant past information. Applied to the TransFuser++ architecture, our method achieves a driving score of 83.69% on the closed-loop Bench2Drive benchmark and provides qualitative insights into how models retain past information. These results demonstrate its potential as a temporal extension to otherwise temporally unaware architectures.

Downloads

Download data is not yet available.

Downloads

Published

2025-11-24

How to Cite

[1]
“Selective Temporal Fusion using Recurrent Attention for End-to-End Autonomous Driving”, NIKT, vol. 37, no. 1, Nov. 2025, doi: 10.5324/zpjv0r03.