๐ Building a Retrieval-Augmented Generation (RAG) System with Java 21, Quarkus and Vaadin

Connect a modern LLM backend to a clean Vaadin UI using Java 21 microservices
In this article, weโll build a modern LLM-powered Retrieval-Augmented Generation (RAG) system using Java 21, Quarkus for the backend, and Vaadin for a clean interactive UI. This end-to-end system demonstrates embedding, prompt engineering, and LLM integration in a microservices architecture.
๐ก What You'll Build
Two Java 21 microservices:
๐
rag-service
(Quarkus)Accepts a user query
Generates embeddings (stubbed)
Retrieves similar documents (mocked vector store)
Calls LLM (stubbed or real OpenAI/GPT)
Returns the generated response
๐ฌ
rag-ui
(Vaadin)A clean web UI where users ask questions
Sends questions to the backend
Displays the AI-generated answers
๐ง Project Structure
rag-poc/
โโโ rag-service/ # Quarkus backend
โโโ rag-ui/ # Vaadin frontend
๐ง rag-service
โ RAG API with Quarkus
pom.xml
<dependencies>
<dependency>
<groupId>io.quarkus</groupId>
<artifactId>quarkus-resteasy-reactive</artifactId>
</dependency>
<dependency>
<groupId>io.quarkus</groupId>
<artifactId>quarkus-jackson</artifactId>
</dependency>
</dependencies>
RagService.java
@Path("/rag")
@ApplicationScoped
public class RagService {
@Inject EmbeddingService embeddingService;
@Inject DocumentStore documentStore;
@Inject LlmClient llmClient;
@POST
@Produces(MediaType.TEXT_PLAIN)
@Consumes(MediaType.TEXT_PLAIN)
public Uni<String> ask(String userQuestion) {
return embeddingService.embed(userQuestion)
.onItem().transformToUni(vector -> documentStore.searchSimilarDocuments(vector))
.onItem().transformToUni(docs -> {
String prompt = buildPrompt(userQuestion, docs);
return llmClient.ask(prompt);
});
}
private String buildPrompt(String question, List<String> docs) {
String context = String.join("\n", docs);
return """
You are a helpful assistant. Use the context below to answer the question.
Context:
%s
Question: %s
Answer:""".formatted(context, question);
}
}
Stubbed Services
EmbeddingService.java
@ApplicationScoped
public class EmbeddingService {
public Uni<float[]> embed(String text) {
return Uni.createFrom().item(new float[] {0.1f, 0.2f, 0.3f});
}
}
DocumentStore.java
@ApplicationScoped
public class DocumentStore {
public Uni<List<String>> searchSimilarDocuments(float[] queryVector) {
return Uni.createFrom().item(List.of(
"Doc 1: LLMs are models that generate text.",
"Doc 2: Embeddings convert text into vectors."
));
}
}
LlmClient.java
@ApplicationScoped
public class LlmClient {
public Uni<String> ask(String prompt) {
return Uni.createFrom().item("This is the answer based on the context.");
}
}
๐ฌ rag-ui
โ Vaadin Frontend
pom.xml
<dependencies>
<dependency>
<groupId>com.vaadin</groupId>
<artifactId>vaadin-spring-boot-starter</artifactId>
<version>24.4.0</version>
</dependency>
</dependencies>
<properties>
<java.version>21</java.version>
</properties>
RagUiApplication.java
@SpringBootApplication
public class RagUiApplication {
public static void main(String[] args) {
SpringApplication.run(RagUiApplication.class, args);
}
}
MainView.java
@Route("")
@PageTitle("RAG Assistant")
public class MainView extends VerticalLayout {
private final TextField questionField = new TextField("Ask a question:");
private final Button submitButton = new Button("Submit");
private final TextArea resultArea = new TextArea("Answer");
public MainView() {
resultArea.setWidthFull();
resultArea.setHeight("200px");
resultArea.setReadOnly(true);
submitButton.addClickListener(e -> askQuestion());
add(questionField, submitButton, resultArea);
}
private void askQuestion() {
String question = questionField.getValue();
if (question == null || question.isBlank()) {
Notification.show("Please enter a question");
return;
}
try {
String answer = sendQuestionToRagService(question);
resultArea.setValue(answer);
} catch (IOException | InterruptedException ex) {
resultArea.setValue("Error: " + ex.getMessage());
}
}
private String sendQuestionToRagService(String question) throws IOException, InterruptedException {
HttpClient client = HttpClient.newHttpClient();
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create("http://localhost:8080/rag"))
.header("Content-Type", "text/plain")
.POST(HttpRequest.BodyPublishers.ofString(question))
.build();
HttpResponse<String> response = client.send(request, HttpResponse.BodyHandlers.ofString());
return response.body();
}
}
โถ How to Run It
- Start Backend (rag-service)
cd rag-service
./mvnw quarkus:dev
- Start Frontend (rag-ui)
cd rag-ui
./mvnw spring-boot:run
- Open your browser at: http://localhost:8081
๐ Next Steps
Integrate real OpenAI embeddings and completions
Replace stubbed vector search with Qdrant or Weaviate
Add document upload + ingestion pipeline
Stream answers from the backend using Server-Sent Events (SSE)
๐ง Why This Matters
This architecture demonstrates how modern Java can play a significant role in building practical AI systems. By combining Quarkus for performant APIs and Vaadin for elegant UIs, you can create powerful, interactive RAG systems that leverage the best of LLMs โ all in Java.
๐ Repo Template?
Interested in a GitHub repo template or Docker Compose setup? Let me know in the comments!
Subscribe to my newsletter
Read articles from Vitali R directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by