LOCUS: Local Visual Cue Search for Enhancing Fine-Grained Perception in Multimodal Large Language Models


This is a companion discussion topic for the original entry at https://arxiv.org/abs/2606.16586