Abstract: BEV-based 3D perception with multi-frame images input is crucial for autonomous driving. However, current methods for temporal BEV perception fail to fully utilize long sequence features ...