How to Integrate Google ADK with a Custom Interface Guide

Hello everyone!
My name is Mykhailo Kapustin. I’m a developer with 10 years of experience, Сo-founder and CTO of the research project Advanced Scientific Research Projects, and a Senior AI/ML Engineer in the education sector at Woolf, a Silicon Valley-based company.

Recently, I was tasked with building a custom interactive agent with a custom UI that could be integrated into our project’s existing infrastructure. I started researching, had discussions with other engineers — including Grigory Nosyrev (Senior Software Engineer at Workato) — and realized something important: there is very little documentation or public guidance on how to integrate Google’s Agent Development Kit (ADK) into real-world interfaces.

What is Google ADK: A Brief Overview

The Google Agent Development Kit (ADK) is an open-source toolkit introduced by Google on April 9, 2025. It’s designed for building LLM agents capable of holding conversations, invoking external functions, managing internal state, connecting to documents, and performing complex reasoning processes. ADK enables developers to create agents locally or in the cloud with minimal code and a fairly transparent architecture.

Despite being relatively new and having some technical limitations, ADK is rapidly evolving. The Google team is actively working on improvements, and every update reflects a clear intention to make agent-based architecture more predictable, transparent, and manageable.

Creating a Simple Agent with Google ADK Is Easy

Google ADK is a surprisingly well-designed technology. The interfaces are simple, the documentation is concise, and most basic use cases require very little code. Even if you don’t have much programming experience, you can still build your first agent.

Here’s an example of a minimal agent using the official adk-python package:

from google.adk.agents import Agent 
from google.adk.tools import google_search

root_agent = Agent( 
   name="search_assistant", 
   model="gemini-2.0-flash", # Or your preferred Gemini model 
   instruction="You are a helpful assistant. Answer user questions using Google Search when needed.",    
   description="An assistant that can search the web.",
   tools=[google_search] 
)

You define the instructions, connect a tool (like google_search), and that’s it — the agent is ready to chat with you and search for information via Google Search.

If you’d like to start with something more advanced, I recommend checking out the official adk-samples repository, where you’ll find working templates, including agents with function calls, reasoning chains, internal state, and built-in tests.

📎 To launch the agent, simply run the command: adk web

This will start a ready-to-use web interface where you can immediately interact with the agent — just like in a full-featured chat.

Demonstration of the agent running in the Agent Development Kit web interface. The session initialization and the agent’s first message are shown.

Everything runs locally — no additional code or infrastructure setup required.

Google ADK Agent Architecture: Sessions, Reasoning, and State

When you send the first message via the “Type a message…” input in the adk web interface, Google ADK automatically creates a unique session for the ongoing dialogue. From that point on, everything that happens between you and the LLM is stored in the context of this session.

Inside each session, the agent maintains its own internal state, which includes:

message history,
reasoning steps and intermediate thoughts,
a list of invoked functions and events,
artifacts (if generated by the agent).

If you're using adk web, you don't need to worry about session creation or manually managing state — everything happens automatically. The web interface sends requests in the correct format and maintains session structure for you.

📌 This makes adk web an excellent starting point for early experiments — you can focus on your agent's logic without needing to handle transport or data serialization.

But what if you need a custom interface for interacting with the LLM — with your own design, logic, authentication, and business integration?

Connecting a Custom UI to Google ADK via FastAPI

After my initial experiments with adk web, a natural question came up:
if the web interface can already communicate with the agent, then it must be using an API under the hood — so can I use that API for my own custom interface?

The answer is yes. Google ADK runs a full FastAPI server, and adk web interacts with it directly. This means you can build your own UI — with custom authentication, logic, and user experience — using the same set of API methods.

📦 To launch the ADK API, just run the following command:

adk api_server

After that, a local FastAPI interface is launched on port 8000:

INFO:     Started server process [18018]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)

And you can access the Google ADK API documentation at: http://127.0.0.1:8000/docs

📎 Here, you’ll find the complete FastAPI documentation with all available endpoints, including:

POST /run — for single-turn agent queries;
GET /run_sse — for streaming responses (SSE);
POST /sessions — for managing user sessions;
POST /evals — for launching built-in evaluation routines;
GET /artifacts — for accessing artifacts created by the agent.

This means you can fully replicate the functionality of adk web — but inside your own application.

Everything works well in local experiments. But once you move toward production, a new challenge appears: You need to deploy the model somewhere accessible externally — reliably, securely, and without depending on your local machine.

How to Deploy Your Agent: Vertex AI, Cloud Run, or Docker

Google offers three official deployment options — each requiring you to package your agent as a container:

Vertex AI Agent Engine — a fully managed solution from Google Cloud;
Cloud Run — a lightweight way to run containers in the cloud;
Custom Infrastructure — self-managed deployment using Docker, GKE, or on-premise solutions.

Of the three options, I chose Vertex AI Agent Engine — a fully managed platform built specifically for deploying LLM agents in production. It allows you to focus on logic without worrying about infrastructure, networking, load balancing, or authentication.

The deployment process is extremely straightforward. Once you've created your agent:

app = reasoning_engines.AdkApp(
    agent=root_agent,
    enable_tracing=True,
)

You can deploy it with a single command:

remote_app = agent_engines.create(
    agent_engine=app,
    requirements=["google-cloud-aiplatform[adk,agent_engines]"],
    extra_packages=["./path_to_folder_with_agent"],
)

And that’s it. Agent Engine automatically creates the container, launches the server, and provides an endpoint you can connect to from your frontend or access via REST API.

Compared to Cloud Run or especially Custom Infrastructure, Agent Engine minimizes engineering overhead, speeds up development, and allows your team to focus on testing and iteration. And if needed — you can always extend it with your own layers of authentication, logging, or analytics.

📄 Deployment documentation: https://google.github.io/adk-docs/deploy.

After Deployment: Getting to Know the Agent Engine API

Once you deploy your agent using Vertex AI Agent Engine, a key transition happens behind the scenes: you no longer interact directly with the Google ADK API. Instead, you work with a separate Agent Engine API that Google has built on top of ADK.

While both APIs are conceptually similar, there are differences in format, routes, and capabilities that are important to understand.

You'll have access to essential methods for managing sessions and interacting with your model:

Session Management Methods:

create_session — creates a new session
get_session / async_get_session — retrieves information about a session
list_sessions — lists all user sessions
delete_session — deletes a session

Agent Interaction Methods:

stream_query — the primary method for interactive communication with the agent (via SSE)
async_stream_query — similar method with asynchronous support
streaming_agent_run_with_events — an advanced endpoint that returns step-by-step events (e.g., reasoning, function calls, and artifacts)

📌 It’s important to understand: all dialog logic now relies on this API. This means you can build your own UI, backend, or microservice using these exact endpoints.

🔗 Official Agent Engine API documentation:
https://cloud.google.com/vertex-ai/generative-ai/docs/agent-engine/use/overview

Integrating with Production: How to Connect Your Model to a Product

Now comes the key question — how do you integrate this into your architecture?

I explored several architectural approaches:

Frontend → LLM Directly
Simple and fast, but insecure: you'd have to give the agent access to your database and internal APIs — which means dealing with authentication, data protection, and model control.
Proxy Backend Between Frontend and LLM
Provides more control: you can log messages, filter responses, and implement authorization. However, you'll need to proxy streaming (via WebSocket or SSE) and synchronize session state between the model and the backend.
Shared Database Between LLM and Proxy
Simplifies state access, but introduces risks around race conditions, data consistency, and long-term support.

All of these approaches are valid, especially if you have the experience and resources to support a thoughtful architecture. In my project, I chose a proxy backend that:

authenticates users,
filters/enriches responses,
logs data, and
requests session state from the model to understand what's going on inside.

Example workflow:

The frontend sends a message to the backend.
The backend forwards it to the LLM (query or stream_query).
The LLM processes the request and updates the session state.
The LLM returns its response.
The backend fetches the session state using session_id.
The final result is sent to the frontend.

The only caveat is streaming: you’ll need to proxy stream_query through WebSocket (or SSE) so that the response arrives in real time.

Now we have everything needed to build a production-ready integration: the agent is deployed in Vertex AI Agent Engine, we can send queries and receive responses, and we’re able to inspect the session state to understand what’s happening inside. One last — but critically important — step remains: learning how to update the state during the conversation with the model, so we can preserve key data, track progress, and build memory-aware dialogue.

Managing Session State: How to Update State During a Conversation

To update the state during a conversation, ADK provides a callback mechanism — functions that are triggered automatically after a response is generated or a tool is executed.

Callback documentation:
📚 https://google.github.io/adk-docs/callbacks/

Here’s a basic example: the memorize_list function adds a value to a list under a specific key in the state.

def memorize_list(key: str, value: str, tool_context: ToolContext) -> dict:
    mem_dict = tool_context.state
    if key not in mem_dict:
        mem_dict[key] = []
    if value not in mem_dict[key]:
        mem_dict[key].append(value)
    return {"status": f'Stored "{key}": "{value}"'}

Now let’s connect this function to the agent using after_model_callback, so that data is saved after each LLM response:

def save_to_state(callback_context: CallbackContext, llm_response: LlmResponse) -> Optional[LlmResponse]:
    memorize_list("progress", llm_response.content.parts[0].text, callback_context)

This callback will be automatically triggered after each response is generated, allowing the agent to update its internal state without any involvement from the frontend or backend. Thanks to this approach, the agent becomes "memory-aware": it knows which step the user is on, what has already been completed, and what to do next.

This is a powerful tool — you gain controllable memory that can be used both to adapt the agent's behavior and for later analysis or training.

Now you have everything you need:
🔹 An agent deployed in the cloud,
🔹 An API for communication,
🔹 A proxy handling requests,
🔹 And a session state that can evolve during the conversation.

The only question left is: what exactly do you want your agent to “remember” — and why?

And that’s no longer a question of code, but of business logic and product goals. At this point, the LLM is no longer just a model — it becomes part of a complete user experience.

Direct Integration with a Deployed Agent

Once your agent is deployed via Vertex AI Agent Engine, it becomes accessible through a REST endpoint:

https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/reasoningEngines/RESOURCE_ID:MODE

The API provides two interaction modes: query and streamQuery.
We’ll use the query mode to initialize a session with a predefined state:

Example request:

{
  "classMethod": "create_session",
  "input": {
    "user_id": "user_id",
    "state": {
       "user": {
            "gender": "man",
            "city": "Berlin"
        }
      /* state structure here */
    }
  }
}

Example response:

{
  "output": {
    "state": {
       "user": {
            "gender": "man",
            "city": "Berlin"
        }
    },
    "lastUpdateTime": 1753711142.2750449,
    "userId": "user_id",
    "appName": "app_name",
    "id": "2971967285494808576",
    "events": []
  }
}

Here, the id field represents the session ID.
Once the session is created, we can send a message using the streamQuery mode.

Example request:

{
  "user_id": "user_id",
  "session_id": "sessionId",
  "message": "What's the weather like in Berlin today?"
}

Example response:

{
  "content": {
    "parts": [
      {
        "text": "{LLM response to your message}"
      }
    ],
    "role": "model"
  },
  "usage_metadata": {
    "candidates_token_count": 74,
    "prompt_token_count": 2990,
    "total_token_count": 3064
  },
  "invocation_id": "e-d4bf65a1-b728-4c4d-9ead-bd1358e29275",
  "author": "your_llm_name",
  "actions": {
    "state_delta": {},
    "artifact_delta": {},
    "requested_auth_configs": {}
  },
  "id": "78b41809-90ad-427b-a8df-bc9ce8167c6a",
  "timestamp": 1753711240.486078
}

After this interaction, you can query the session state again using query mode to inspect what’s happening inside the LLM and how it’s evolving over time.

Example request:

{
    "classMethod": "get_session",
    "input": {
      "session_id": "{sessionId}",
      "user_id": "{userId}",
    }
}

Example response:

{
  "events": [
    {
      "content": [
        {
            "parts": [
              {
                "text": "{LLM response to your message}"
              }
            ],
            "role": "model"
        }    
      ],
      "groundingMetadata": null,
      "errorCode": null,
      "invocationId": 'e-eb9b81ac-d901-4117-a79e-4766e7eb955a',
      "timestamp": 1753708785.467739,
      "errorMessage": null,
      "longRunningToolIds": null,
      "branch": null,
      "author": "user",
      "interrupted": null,
      "turnComplete": null,
      "usageMetadata": null,
      "customMetadata": null,
      "actions": [Object],
      "partial": null,
      "id": "2337804162965700608"
    }
  ],
  "appName": "app_name",
  "userId": 'bec12377-b51e-4172-b7f3-d287012490f6',
  "id": '8160114056225619968',
  "state": {
     "user": {
            "gender": "man",
            "city": "Berlin"
      },
      "current_question": "What's the weather like in Berlin today?"
  },
  "lastUpdateTime": 1753711273.97866
}

Now, we have everything we need:
we can send messages to the agent, update the session state with each interaction, and retrieve the latest state at any moment. This opens the door to building memory-aware agents that track progress and context dynamically.

📄 Official documentation:
https://cloud.google.com/vertex-ai/generative-ai/docs/agent-engine/use/overview

Conclusion

Integrating Google ADK and Vertex AI Agent Engine is more than just a way to connect an LLM to your product. It’s an opportunity to build a flexible and scalable architecture where the model becomes an active participant in the dialogue — not just a text-wrapping API.

With the help of state, callbacks, and a thoughtful backend structure, you can:

manage the agent’s memory and behavior,
monitor and analyze sessions,
tailor the interaction to your product’s needs.

Most importantly — you are not limited.
You can experiment freely, connect your analytics, switch models, adapt responses, and evolve your agent into a fully integrated part of your business logic.

The LLM interface isn’t the final destination.
It’s the beginning of a new branch in your product’s architecture.

How to Integrate Google ADK with a Custom Interface: A Step-by-Step Guide with Examples

What is Google ADK: A Brief Overview

Creating a Simple Agent with Google ADK Is Easy

Google ADK Agent Architecture: Sessions, Reasoning, and State

Connecting a Custom UI to Google ADK via FastAPI

How to Deploy Your Agent: Vertex AI, Cloud Run, or Docker

After Deployment: Getting to Know the Agent Engine API

Integrating with Production: How to Connect Your Model to a Product

Direct Integration with a Deployed Agent

Example request:

Example response:

Example response:

Conclusion

Subscribe to my newsletter

Mykhailo Kapustin

Mykhailo Kapustin