One of the first problems you have to solve when building your first chatbot, is to create embeddings for your knowledge, store them in a database and find the relevant knowledge for the question.

That works well for a while, until you get users that pose multiple questions in one single message. Hey! Do you happen to know what the weather is like in SF? Btw, when did WW2 end and why do I start crying when I peel an onion?

Three completely unrelated questions. It’s unlikely that you would find good matches of knowledge if you simply turn this entire message into an embeddings and do a vector search. Unless you extract all the different queries separately and get knowledge for each query.

Here’s one code example of doing just that. It uses Typescript, Vercel AI SDK and Claude 3.7 Sonnet as LLM.

const getPrompt = (
  targetLanguage: string,
  chatHistory: CoreMessage[]
) => `Generate concise search queries for an embeddings database using cosine similarity based on this chat history:

<chat_history>
${stringifyChatHistory(chatHistory)}
</chat_history>

Guidelines:
- Focus on most recent intent/topics
- Ignore previous irrelevant topics
- For "what is X" questions, NEVER include potential answers

Multiple queries (max 3) ONLY for COMPLETELY UNRELATED topics:
- One query for single topics with multiple aspects
- Combine related aspects (e.g., "studying tips", "study techniques") into ONE query

For each query:
- Keep very brief queries intact
- Summarize longer queries to core intent
- Focus on key terms representing the topic
- Write in ${targetLanguage}
- NEVER include potential answers

Examples:
"User: What's the weather? Assistant: It's sunny. User: Great. How about movies?" → "popular movies current releases"
"User: Tell me about quantum computing and banana bread?" → Two queries: "quantum computing basics" and "banana bread recipe ingredients" 
"User: What is the capital of Sweden?" → "huvudstad Sverige" (NOT "Stockholm huvudstad Sverige")
"User: How do I start a business with registration, marketing, and funding?" → "how to start a business registration marketing funding steps"`;

const QuerySchema = z.object({
  queries: z
    .array(z.string())
    .min(1)
    .max(3)
    .describe(
      "The search queries to be used doing the embeddings search. Return between 1-3 queries for each different topic. Only one query per topic."
    ),
});

const result = await generateObject({
    model,
    prompt: getPrompt(targetLanguage, [ role: "user", content: "Hey! Do you happen to know what the weather is like in SF? Btw, when did WW2 end and why do I start crying when I peel an onion?"]),
    schema: QuerySchema,
  });

// Example result
//[
//    'current weather in San Francisco',
//    'when did World War 2 end',
//    'why onions make people cry'
//]

// Get search results for each of the queries
const knowledge = await Promise.all(result.object.queries.map(findKnowledge))

Some key remarks here:

We use a targetLanguage for the queries. The language would be the same as we write the content in. There are embeddings models that handle multiple languages, but I feel I get a better result when I keep the original content and the search queries in the same language.
If you’re not careful the LLM will likely produce multiple queries on the same topic, which is counter productive to what we want to achieve.

The Takeaway

By implementing intelligent query extraction, your chatbot transforms from a one-dimensional responder into a conversation partner that understands context and handles complexity. This approach not only improves the accuracy of responses but also creates a more natural user experience - even when your users jump between asking about the weather in San Francisco, World War II facts, and the science of onion-induced tears in a single breath.

Having a large knowledge base is of no use, if you don’t find the relevant parts for each query.

Query Splitting Magic: Tackling Multi-Topic Chatbot Conversations

The Takeaway

Subscribe to my newsletter

Andreas Du Rietz

Andreas Du Rietz