Many documents contain a mixture of content types including images an texts. Yet information captured in images is lost in most RAG applications. With the emergence of multimodal LLMs like (GPT4-V, LLaVA, or FUYU-8b) it is worth considering how to ut...