Abstract: Despite the rapid integration of video perception capabilities into Large Multimodal Models (LMMs), what drives their video perception remains poorly understood. Consequently, many design ...