My Personal Exam: How I Built an MVP LLM Agent on Google ADK

For several years now, I have been working as the CTO of ASRP, a company that develops educational projects and explores new approaches in EdTech. One of our focus areas has been testing the capabilities of Google ADK in educational scenarios and building a prototype of an intelligent knowledge assessment system. As an engineer and researcher, I am especially inspired by the idea of how artificial intelligence and LLM models can influence and transform education. In this project, my role was to act as the chief architect and engineer — from designing the concept to implementing the system’s key modules.

Features of Google ADK

To create the agent, I chose Google ADK (Agent Development Kit) — a framework specifically designed for building systems with LLM agents. The Google ADK documentation emphasizes that the LlmAgent serves as the “thinking part” of the application: it leverages the power of LLMs for reasoning, natural language understanding, decision-making, and response generation [1].

In addition to LlmAgent, Google ADK provides tools for strictly defined execution sequences. For example, the SequentialAgent is a workflow agent that executes nested sub-agents in a fixed order [2]. Essentially, it ensures that tasks are carried out step by step, exactly as specified. In my experiments, I first relied on LlmAgent with extended instructions and reasoning inside the model, but later shifted to explicit orchestration through code: connecting sub-agents and tools (AgentTool) to keep the process under external control.

Architectural Decisions

In the first version, I tried to fit all the business logic into a single LlmAgent: I carefully described the agent’s role, goals, and expectations, and relied on the LLM itself to perform all computations and evaluate the student’s answers. This approach made it possible to quickly produce a working prototype, but its limitations soon became clear: the model became “overloaded” with lengthy instructions and context, token consumption per request grew, and debugging such a “monolithic” solution proved difficult [3].

Gradually, I shifted to a more modular architecture: the system was divided into multiple agents and stages. For example, one LlmAgent was responsible for generating questions, another for evaluating and analyzing answers, and a third for producing comments and hints. To connect these stages, I used SequentialAgent and ParallelAgent: the former ensured that steps were executed in the correct order, while the latter allowed independent tasks to run simultaneously. This multi-agent approach helped reduce the load on each individual agent and made the processes more reliable. Task separation also enabled the creation of more stable processing chains for handling answers and simplified the testing of each module.

Evolution of agent architecture: from monolith to modularity

Evolution of agent architecture: from monolith to modularity

As the dialogue grew and the session accumulated, the amount of text being passed increased rapidly, leading to higher computational costs. To address this, I implemented token-level optimization. Instead of sending the entire chat history at every step, I began actively using memory features. Rather than storing all dialogue in context, I saved key information in ctx.session.state and in ADK’s memory service (for example, InMemoryMemoryService). This made it possible to exclude repetitive parts from requests and reduce token usage. Moreover, I split complex tasks into smaller steps and used shorter prompts to avoid overloading the model.

Technical Challenges and Engineering Hacks

The model sometimes produced unexpected or incomplete responses, so I had to add extra validation. For example, I defined strict format checks in the instructions: after receiving an evaluator agent’s output, the system verified that it matched the expected structure (e.g., valid JSON or a clear schema). If the format was broken, the script “re-asked” the model with clarifying prompts.

I also implemented a feedback mechanism: a dedicated reviewer agent checked the quality of responses before presenting them to the user. This allowed errors to be caught early and significantly improved system reliability.

Another engineering trick was making use of ADK’s built-in tools. I actively used AgentTool: essentially turning one agent into a tool for another. This gave us flexibility — the main LlmAgent could dynamically invoke sub-agents depending on the request. This approach allowed us to combine the strengths of different models inside one unified logic.

To optimize generation, I used streaming where appropriate and fine-tuned model parameters (temperature, max_output_tokens, etc.). In critical steps, I set temperature=0 to enforce deterministic outputs. I also stored key results (output_key) within the session so they could be passed from one agent to another without redundant text duplication.

All of these techniques helped save tokens and made debugging easier. It required extensive testing and fine-tuning, but ultimately led to a much more stable system.

The Evolution of the CTO Role

Balance Between Code and Strategy

Balance Between Code and Strategy

During development, my role expanded significantly. At first, I was mostly focused on engineering tasks: designing the architecture, writing code, and experimenting with models. But as the project quickly grew, I had to shift toward management. I found myself not only designing systems but also building a development team, delegating tasks, and overseeing execution. I increasingly took part in project planning, aligning timelines with partners, managing the budget, and training colleagues to work with new tools.

Each new stage of the project became an exam for me personally. When integrating new APIs or scaling the system on Kubernetes, my responsibilities went far beyond “just code” — I had to mentor the team, distribute roles, handle documentation, and coordinate with the business side. I realized that being a CTO meant not only being a “technical architect,” but also a “team leader” and a “bridge” between developers and business. At times, I had to act as a mentor and coach, and at other times as an inspector, ensuring quality. These new “exams” in project and people management became an essential part of my role.

Philosophical Reflections

Exam as a Metaphor: A Dialogue Between Human and LLM Agent

Exam as a Metaphor: A Dialogue Between Human and LLM Agent

Creating such an “exam” made me reflect on the broader meaning of work. Every project in IT feels like an exam: a test of logic, creativity, and the ability to keep learning. Working with LLM agents becomes more like a partnership: they don’t just execute commands but enter into dialogue, suggest ideas, or ask clarifying questions. More and more, I see them not just as tools but as colleagues — it’s important to give them clear tasks, yet also leave room for the model’s own “thought process.”

I look to the future of EdTech with optimism. These systems will undoubtedly transform education: personal tutors, instant feedback, adaptive courses. Learning becomes personalized — and that changes the game. This is why at ASRP we bring the lessons learned from prototypes into our own products, including the Arcanum12th platform. There, LLM agents already support students with assignments, tests, and interactive dialogues. The experience with ADK became a foundation: modularity, memory, and dialogue scenarios have all been integrated into Arcanum12th.

Still, it’s important to remember that the human remains at the center of the process. An exam is always a test of knowledge and skills, and agents are only assistants. They expand possibilities, but responsibility for the result still rests with us.

10 Lessons of a CTO from an AI Exam

Analyzing my experience, I distilled the following lessons as a CTO from my own “AI exam.”.

10 Lessons of a CTO from an AI Exam

10 Lessons of a CTO from an AI Exam

Delegate tasks through agents.

Don’t try to pack all logic into one “super-agent” — such systems often collapse under their own weight due to overloaded instructions. A team of specialized agents provides better accuracy and scalability. Break the project into stages and wrap separate functionalities into individual agents.
Design architecture with tokens in mind.

Limit the size of requests and responses (configure max_output_tokens), and use session state and memory services to avoid sending redundant context to the model. This reduces computational costs and speeds up the system.
Leverage ADK’s strengths.

Beyond LlmAgent, learn to use SequentialAgent and other workflow agents — they add reliability and predictability to processes. Transform agents into AgentTools for flexible interaction, so the main agent can call the right “specialist” at the right time.
Parallelize independent tasks.

If a chain has steps that don’t depend on each other, run them in parallel. ADK offers ParallelAgent for this. For example, searching for information and checking answers can be done simultaneously, saving time without compromising quality.
Track session and state.

Store essential data (temporary results, counters, output keys) in ctx.session.state. This makes implementation more transparent and helps agents switch context. Clearly define output_key to pass data between agents without unnecessary duplication.
Add feedback and validation.

A good agent double-checks its work: I created a separate reviewer agent that ensures quality before delivering the final result. This allows errors to be caught early and enables iterative improvement (e.g., returning “failure” and requesting refinement).
Log and analyze.

Implement callbacks and logging (e.g., AgentOps, Phoenix, or custom solutions). Understanding how agents interact and how many resources are consumed is crucial for optimization and system stability.
Embrace experimentation.

Working with AI is a continuous experiment. Models can surprise you, so patience and adaptability are essential. Even failed attempts teach us something new.
Collaboration with AI is the new standard.

Learn to treat LLMs as “colleagues”: give clear tasks, verify results, and combine ideas. This shifts the culture of development from purely human-driven to a hybrid “human + AI” approach.
The future belongs to learning.

From a philosophical perspective: invest in knowledge and knowledge-sharing. The exam as a project is not just a lesson for students, but also for developers. Continuous learning, readiness to adapt, and the ability to teach others are what will keep us in step with technology.

I invite everyone interested in creating LLM agents to join my author’s course. In it, I share practical development experience, explain the capabilities of Google ADK, and demonstrate how to apply the techniques described here in practice. Together, we will learn how to build reliable systems!

Sources

Agent Development Kit - LLM agents - link.
Agent Development Kit - Sequential agents - link.
Agent Patterns with ADK (1 Agent, 5 Ways!) - link.

My Personal Exam: How I Built an MVP LLM Agent on Google ADK

Features of Google ADK

Architectural Decisions

Technical Challenges and Engineering Hacks

The Evolution of the CTO Role

Philosophical Reflections

10 Lessons of a CTO from an AI Exam

Sources

Subscribe to my newsletter

Mykhailo Kapustin

Mykhailo Kapustin