The field of Vision-Language Models (VLMs) has witnessed a rapid surge, with diverse approaches emerging, including MultiModal Large Language Models (MLLMs) and Large Multimodal Models (LMMs). Notable examples of these models include Flamingo, LLaVA,...