Relevancy and Accuracy in Spring AI: Evaluating LLM Responses the Right Way


In the world of generative AI, one of the biggest challenges is making sure your Large Language Model (LLM) is not just answering questions—but doing so in a way that’s both relevant and factually accurate. This gets especially tricky when you're building applications on top of AI models and want to maintain user trust.
But here's the catch: LLMs are non-deterministic. Ask the same question twice, and you might get different responses. So how can we verify that our AI-generated responses are acceptable?
That’s where Spring AI’s Evaluators come in. 💡
✅ 1. Relevancy: Is the Answer Even on Topic?
The first line of defense is relevancy—did the model even understand the question?
Spring AI’s RelevancyEvaluator
allows us to automatically assess whether the LLM's response addresses the intent of the original prompt.
📌 Example:
private fun evaluateRelevancy(question: Question, answer: String?) {
val evaluationRequest = EvaluationRequest(question.question, answer ?: "")
val evaluationResponse = relevancyEvaluator.evaluate(evaluationRequest)
if (!evaluationResponse.isPass) {
throw AnswerNotRelevantException("Answer is not relevant to the question")
}
}
In this snippet, we're using RelevancyEvaluator
to check if the AI's answer aligns with the user's original question. If not, an exception is thrown.
🧾 2. Accuracy: Is the Answer Factually Correct?
Relevance alone isn't enough—an answer can sound right but be totally wrong.
Spring AI’s FactCheckingEvaluator
uses another LLM-based evaluation to ensure factual accuracy.
📌 Example:
private fun evaluateAccuracy(question: Question, answer: String?) {
val evaluationRequest = EvaluationRequest(question.question, answer ?: "")
val evaluationResponse = accuracyEvaluator.evaluate(evaluationRequest)
if (!evaluationResponse.isPass) {
throw AnswerNotAccurateException("Answer is not accurate for the given question")
}
}
With FactCheckingEvaluator
, we’re not just relying on the AI’s confidence—we’re adding a second layer of AI-based judgment to validate the answer.
⚙️ 3. Runtime Evaluation with Retry and Recovery
By applying evaluators at runtime, we can catch and retry poor-quality responses before they reach end users.
📌 Core service using both evaluators + retry:
@Retryable(retryFor = [AnswerNotRelevantException::class, AnswerNotAccurateException::class])
fun ask(question: Question): Answer {
val answer = chatClient.prompt().user(question.question).call().content()
evaluateRelevancy(question, answer)
evaluateAccuracy(question, answer)
return Answer(answer ?: "No answer")
}
📌 Graceful fallback when all retries fail:
@Recover
fun recover(exception: AnswerNotRelevantException) = Answer("Unable to answer the question in terms of relevancy")
@Recover
fun recover(exception: AnswerNotAccurateException) = Answer("Unable to answer the question in terms of accuracy")
With this setup, your application becomes resilient. It automatically retries until it receives a relevant and accurate response—or returns a controlled fallback if it can’t.
🧠 Wrapping Up
Spring AI’s RelevancyEvaluator
and FactCheckingEvaluator
are game-changers for anyone building with LLMs. They help ensure that your generative AI is trustworthy, consistent, and production-ready.
Whether you’re writing tests or evaluating responses on the fly, these tools bring much-needed determinism to a non-deterministic world.
Exploring More
For a practical implementation of a Spring AI-based project, you can check out this GitHub repository: Board Game Buddy - GitHub
Are you already using evaluator-based testing in your AI applications? I’d love to hear how you're managing quality in the age of LLMs!
#SpringAI #GenerativeAI #Kotlin #LLM #OpenAI #SoftwareEngineering #AIQuality #SpringBoot #DeveloperTools
Subscribe to my newsletter
Read articles from Ilkay Polat directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
