Unlocking Multimodal RAG: A Guide to Using Cohere Embed with Azure AI Search
In today's digital landscape, organizations are dealing with an ever-growing collection of both textual and visual data. The ability to effectively search across these different modalities has become crucial for building modern enterprise applications. With the recent release of Cohere's Embed v3 model on Azure AI Studio, developers can now implement powerful multimodal search capabilities within their Azure AI Search solutions.
Key Features of Cohere Embed v3 on Azure AI Search
Unified Vector Space
Text and image embeddings share the same semantic space
Enables seamless cross-modal search capabilities
No need for separate indexes or complex routing logic
Enterprise-Grade Performance
Support for 100+ languages
Optimized for real-world business data
Exceptional accuracy on retrieval tasks
Integration Benefits
Native integration with Azure AI Studio
Simplified deployment and scaling
Built-in support for Azure AI Search vector search capabilities
What's New with Cohere Embed v3?
Cohere's latest Embed v3 model brings groundbreaking multimodal capabilities to Azure AI Studio. This state-of-the-art model can generate embeddings for both text and images, placing them in a unified vector space. This means you can:
Search images using text queries
Find relevant text using image queries
Perform cross-modal searches
Build sophisticated retrieval-augmented generation (RAG) systems
The model supports 100+ languages and maintains exceptional performance across various retrieval tasks, making it ideal for enterprise applications.
Setting Up Your Environment
Let's walk through implementing a multimodal search solution using Cohere Embed v3 and Azure AI Search. First, you'll need to install the required packages:
!pip install cohere azure-search-documents
!pip install azure-search-documents==11.6.0b6
!pip install cohere python-dotenv azure-identity tqdm requests
Configuration and Authentication
Set up your environment variables and initialize the necessary clients:
import cohere
import json
import os
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
from azure.search.documents.indexes import SearchIndexClient
# Load environment variables
AZURE_AI_STUDIO_COHERE_EMBED_KEY = os.getenv("AZURE_AI_STUDIO_COHERE_EMBED_KEY")
AZURE_AI_STUDIO_COHERE_EMBED_ENDPOINT = os.getenv("AZURE_AI_STUDIO_COHERE_EMBED_ENDPOINT")
AZURE_SEARCH_SERVICE_ENDPOINT = os.getenv("AZURE_SEARCH_SERVICE_ENDPOINT")
AZURE_SEARCH_ADMIN_KEY = os.getenv("AZURE_SEARCH_ADMIN_KEY")
INDEX_NAME = "multimodal-cohere-index"
# Initialize clients
azure_search_credential = AzureKeyCredential(AZURE_SEARCH_ADMIN_KEY)
index_client = SearchIndexClient(endpoint=AZURE_SEARCH_SERVICE_ENDPOINT, credential=azure_search_credential)
search_client = SearchClient(endpoint=AZURE_SEARCH_SERVICE_ENDPOINT, index_name=INDEX_NAME, credential=azure_search_credential)
cohere_client = cohere.Client(api_key=AZURE_AI_STUDIO_COHERE_EMBED_KEY, base_url=AZURE_AI_STUDIO_COHERE_EMBED_ENDPOINT)
Creating a Multimodal Search Index
The heart of our solution lies in creating an Azure AI Search index that can handle both text and image embeddings:
fields = [
SimpleField(name="id", type=SearchFieldDataType.String, key=True),
SimpleField(name="imageUrl", type=SearchFieldDataType.String, retrievable=True),
SimpleField(name="caption", type=SearchFieldDataType.String, searchable=True, retrievable=True),
SearchField(
name="imageVector",
type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
searchable=True,
vector_search_dimensions=1024,
vector_search_profile_name="vector_profile"
),
SearchField(
name="captionVector",
type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
searchable=True,
vector_search_dimensions=1024,
vector_search_profile_name="vector_profile"
)
]
Implementing Different Search Modes
1. Text-to-Text Search
Search for similar text descriptions based on a text query:
def text_to_text_search(query_text):
text_embedding = generate_text_embedding(query_text)
text_vector_query = VectorizedQuery(
vector=text_embedding,
k_nearest_neighbors=1,
fields="captionVector"
)
results = search_client.search(
search_text=None,
vector_queries=[text_vector_query],
top=1
)
return results
2. Text-to-Image Search
Find images that match a text description:
def text_to_image_search(query_text):
text_embedding = generate_text_embedding(query_text)
text_to_image_query = VectorizedQuery(
vector=text_embedding,
k_nearest_neighbors=1,
fields="imageVector"
)
results = search_client.search(
search_text=None,
vector_queries=[text_to_image_query],
top=1
)
return results
3. Image-to-Text Search
Search for relevant text descriptions using an image:
def image_to_text_search(image_url):
image_base64 = encode_image_to_base64(image_url)
image_embedding = generate_image_embedding(image_base64)
image_to_text_query = VectorizedQuery(
vector=image_embedding,
k_nearest_neighbors=1,
fields="captionVector"
)
results = search_client.search(
search_text=None,
vector_queries=[image_to_text_query],
top=1
)
return results
4. Cross-Field Vector Search with Text
Search across both text and image fields simultaneously given a text input query.
def text_embedding_cross_field_search(query_text):
text_embedding = generate_text_embedding(query_text)
cross_field_query = VectorizedQuery(
vector=text_embedding,
k_nearest_neighbors=1,
fields="imageVector, captionVector"
)
results = search_client.search(
search_text=None,
vector_queries=[cross_field_query],
top=3
)
return results
5. Cross-Field Vector Search with Images
And given an image input query.
def image_embedding_cross_field_search(image_url):
image_base64 = encode_image_to_base64(image_url)
image_embedding = generate_image_embedding(image_base64)
cross_field_query = VectorizedQuery(
vector=image_embedding,
k_nearest_neighbors=1,
fields="imageVector, captionVector"
)
return search_client.search(
search_text=None,
vector_queries=[cross_field_query],
top=1
)
6. Multi-Vector Search
Combine multiple vector queries for more precise results:
def text_and_image_query_multi_vector(query_text, image_url):
text_embedding = generate_text_embedding(query_text)
image_base64 = encode_image_to_base64(image_url)
image_embedding = generate_image_embedding(image_base64)
text_vector_query = VectorizedQuery(
vector=text_embedding,
k_nearest_neighbors=1,
fields="captionVector"
)
image_vector_query = VectorizedQuery(
vector=image_embedding,
k_nearest_neighbors=1,
fields="imageVector"
)
results = search_client.search(
search_text=None,
vector_queries=[text_vector_query, image_vector_query],
top=2
)
return results
Building RAG Applications
The combination of Cohere Embed v3 and Azure AI Search enables powerful RAG applications. Here's how to implement a simple RAG system (in this example, I’ll arbitrarily select the “text_embedding_cross_field_search” retrieval configuration.
def ask(query_text):
search_results = text_embedding_cross_field_search(query_text)
documents = [{"text": result["caption"]} for result in search_results]
chat_response = co_chat.chat(
message=query_text,
documents=documents,
max_tokens=100
)
return chat_response
Real-World Applications
The multimodal search capabilities enabled by this integration open up numerous possibilities:
E-commerce Product Discovery: Enable customers to find products using both text descriptions and visual similarities.
Content Management: Efficiently organize and retrieve mixed media content including documents, images, and presentations.
Knowledge Management: Build sophisticated enterprise search systems that understand both textual and visual content.
Design Asset Management: Help creative teams quickly find relevant design assets using natural language descriptions.
Conclusion
The integration of Cohere Embed v3 with Azure AI Search represents a significant advancement in multimodal search capabilities. This powerful combination provides enterprises with the tools they need to build next-generation search experiences. Whether you're building an e-commerce platform, a content management system, or a knowledge base, this technology stack enables you to create more intelligent and user-friendly applications.
Get started today by deploying Cohere Embed v3 through Azure AI Studio and revolutionize how your applications handle multimodal search.
References
Subscribe to my newsletter
Read articles from Farzad Sunavala directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Farzad Sunavala
Farzad Sunavala
I am a Principal Product Manager at Microsoft, leading RAG and Vector Database capabilities in Azure AI Search. My passion lies in Information Retrieval, Generative AI, and everything in between—from RAG and Embedding Models to LLMs and SLMs. Follow my journey for a deep dive into the coolest AI/ML innovations, where I demystify complex concepts and share the latest breakthroughs. Whether you're here to geek out on technology or find practical AI solutions, you've found your tribe.