Relevancy and Accuracy in Spring AI: Evaluating LLM Responses the Right Way

Ilkay PolatIlkay Polat
3 min read

In the world of generative AI, one of the biggest challenges is making sure your Large Language Model (LLM) is not just answering questions—but doing so in a way that’s both relevant and factually accurate. This gets especially tricky when you're building applications on top of AI models and want to maintain user trust.

But here's the catch: LLMs are non-deterministic. Ask the same question twice, and you might get different responses. So how can we verify that our AI-generated responses are acceptable?

That’s where Spring AI’s Evaluators come in. 💡

✅ 1. Relevancy: Is the Answer Even on Topic?

The first line of defense is relevancy—did the model even understand the question?

Spring AI’s RelevancyEvaluator allows us to automatically assess whether the LLM's response addresses the intent of the original prompt.

📌 Example:

private fun evaluateRelevancy(question: Question, answer: String?) {
    val evaluationRequest = EvaluationRequest(question.question, answer ?: "")
    val evaluationResponse = relevancyEvaluator.evaluate(evaluationRequest)

    if (!evaluationResponse.isPass) {
        throw AnswerNotRelevantException("Answer is not relevant to the question")
    }
}

In this snippet, we're using RelevancyEvaluator to check if the AI's answer aligns with the user's original question. If not, an exception is thrown.

🧾 2. Accuracy: Is the Answer Factually Correct?

Relevance alone isn't enough—an answer can sound right but be totally wrong.

Spring AI’s FactCheckingEvaluator uses another LLM-based evaluation to ensure factual accuracy.

📌 Example:

private fun evaluateAccuracy(question: Question, answer: String?) {
    val evaluationRequest = EvaluationRequest(question.question, answer ?: "")
    val evaluationResponse = accuracyEvaluator.evaluate(evaluationRequest)

    if (!evaluationResponse.isPass) {
        throw AnswerNotAccurateException("Answer is not accurate for the given question")
    }
}

With FactCheckingEvaluator, we’re not just relying on the AI’s confidence—we’re adding a second layer of AI-based judgment to validate the answer.

⚙️ 3. Runtime Evaluation with Retry and Recovery

By applying evaluators at runtime, we can catch and retry poor-quality responses before they reach end users.

📌 Core service using both evaluators + retry:

@Retryable(retryFor = [AnswerNotRelevantException::class, AnswerNotAccurateException::class])
fun ask(question: Question): Answer {
    val answer = chatClient.prompt().user(question.question).call().content()

    evaluateRelevancy(question, answer)
    evaluateAccuracy(question, answer)

    return Answer(answer ?: "No answer")
}

📌 Graceful fallback when all retries fail:

@Recover
fun recover(exception: AnswerNotRelevantException) = Answer("Unable to answer the question in terms of relevancy")

@Recover
fun recover(exception: AnswerNotAccurateException) = Answer("Unable to answer the question in terms of accuracy")

With this setup, your application becomes resilient. It automatically retries until it receives a relevant and accurate response—or returns a controlled fallback if it can’t.

🧠 Wrapping Up

Spring AI’s RelevancyEvaluator and FactCheckingEvaluator are game-changers for anyone building with LLMs. They help ensure that your generative AI is trustworthy, consistent, and production-ready.

Whether you’re writing tests or evaluating responses on the fly, these tools bring much-needed determinism to a non-deterministic world.

Exploring More

For a practical implementation of a Spring AI-based project, you can check out this GitHub repository: Board Game Buddy - GitHub


Are you already using evaluator-based testing in your AI applications? I’d love to hear how you're managing quality in the age of LLMs!

#SpringAI #GenerativeAI #Kotlin #LLM #OpenAI #SoftwareEngineering #AIQuality #SpringBoot #DeveloperTools

0
Subscribe to my newsletter

Read articles from Ilkay Polat directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Ilkay Polat
Ilkay Polat