Abstract: Despite the rapid integration of video perception capabilities into Large Multimodal Models (LMMs), what drives their video perception remains poorly understood. Consequently, many design ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results