Date: 2025-03-28

This article demonstrates how to build a Spring Boot application to test Large Language Model (LLM) responses using Spring AI's RelevanceEvaluator and FactCheckingEvaluator. It leverages Ollama (for running LLMs locally via Docker) and TestContainers (for managing Docker containers within tests). The application sends prompts to Ollama, receives responses, and then evaluates their relevance and factual accuracy, returning results in a JSON format. Unit tests ensure the functionality. This setup provides a controlled, reproducible environment for evaluating LLM reliability.

Testing LLM Responses Using Spring AI Evaluators

Subscribe to my newsletter

Yatin B.

Yatin B.