Unlocking Agent Potential-A2A Protocol


You all would have heard the new buzzword A2A. In this writeup, I want to share a quick research recap on this subject.
Here is the storyline -
What is A2A and why do we need it?
How is this going to change the agentic world?
How does A2A support the vision of a team of agents?
Communication is fundamental
Agent Discovery
Security
Technical Implementation
How does Agent Card help in agent discovery?
Sequence Flow: Multi-Vendor Interoperability
Sequence Flow: Discovery and Task stages between Client and Server
What is next?
What is A2A? Why do we need it?
We are all noticing how important agents have become recently, and I personally expect their continued growth. In present world, these agents tend to function independently, silo fashion, resulting in limited capabilities, redundancies and their inability to handle complex tasks. Clearly there is a ask on how can we enable secure, standardized collaboration among them which can be a real game-changer for us all.
That is exactly what Agent 2 Agent Protocol is going to do for us! 😊 Google announced its Agent to Agent protocol (A2A) on April 9th. It is interesting time because you think about MCP, and it's been great for connecting “agents to tools”. But it always struck me that there wasn't a clear, standard way for the agents themselves to just... talk to each other. A2A seems to be directly addressing that, which is pretty significant
How will this transform the world of agents?
Let us deep dive to understand key features
Easy to scale to new capabilities: Easily integrate new or existing specialized agents into the network. Secure communication protocols ensure immediate and trustworthy contributions to expanded capabilities.
Security and Inter-operability: Standardized and secure protocols enable trusted communication between agents from diverse teams or organizations, fostering secure innovation.
Ready for complex problems: Secure agent communication and collaboration via A2A allows trusted combination of strengths and knowledge to tackle tasks beyond individual agent capabilities with confidence
To sum it up you can think multi agent system now operating as a “team” of specialists working together on a multifaceted project.
How does A2A support this vision of team of agents?
To understand let us take a quick look on the key design principles. It is a long list, but it’s definitely worth a read because it gives us a good understanding on how this is capable of innovation and its fitment into various use cases.
To make it simple to understand, I am going to call out 3 main categories
Communication
Agent Discovery
Security
Communication is fundamental
Standardize communication for multi-vendor environments:
This protocol provides a universal "language" based on “existing” web standards like HTTP, Server-Sent Events (SSE), and JSON-RPC, enabling agents built on different frameworks and by different vendors to interact.
Interoperability: My personal favorite. In this way it allows agents developed with diverse technologies and hosted on various platforms (including different cloud environments) to communicate effectively without needing to share code, memory, or internal tools.
Modality Agnostic
- We want it all! right? 😊 So, A2A supports various communication formats beyond text, including audio and video streaming.
Task Management:
When agents talk to each other, the unit of work they understand is “task”.
This protocol defines a lifecycle for “tasks” with different states (submitted, working, input-required, completed, etc.) and mechanisms for tracking their progress.
Streaming:
- It supports Server-Sent Events (SSE) to provide real-time updates on the task status and intermediate results (artifacts).
Push Notifications (Optional):
- Servers can proactively send task updates to a client-provided webhook URL, allowing for more immediate notifications.
Content Negotiation:
- Agents can negotiate the format of the content they exchange based on what each agent can handle or display (e.g., choosing between raw data or an image for a chart).
Agent Discovery
Agent Cards and Capability discovery
Agents can showcase their capabilities, skills, endpoints, and authentication requirements through a standardized JSON metadata file called an "Agent Card," typically hosted at a well-known URL (/.well-known/agent.json). This allows client agents to discover suitable remote agents for specific tasks.
Client agents can use the information in Agent Cards to identify and select the most appropriate agent to handle a given task.
Security
Secure Communication
A2A emphasizes secure communication through encryption and authentication methods to protect data exchanged between agents. Production systems are expected to use HTTPS with modern TLS ciphers.
The protocol includes mechanisms for agents to authenticate themselves, often described through digital signatures and potentially dynamic credential validation.
Monitoring
- A2A facilitates real-time feedback, notifications, and state updates for users involved in agent interactions.
Technical Implementation
How does Agent Card help in agent discovery?
Imagine a shop. The shop (server agent) has a sign (Agent Card) listing what it sells (capabilities), how to order (endpoints, input requirements), and how they accept payment (security schemes). A customer (client agent) reads the sign to decide if this shop can fulfill their needs and how to interact with it. The shop is responsible for putting up and maintaining its own sign
{
"agent_name": "FlightBookingAgent",
"version": "1.2",
"description": "Facilitates flight bookings by searching for available flights and making reservations.",
"vendor": "Global Travel Solutions Inc.",
"contact": {
"email": "api-support@globaltravel.com",
"website": "https://globaltravel.com/api"
},
"capabilities": [
{
"name": "findFlights",
"description": "Searches for available flights based on specified criteria.",
"input_schema": {
"type": "object",
"properties": {
"departure_airport": {
"type": "string",
"description": "IATA code of the departure airport (e.g., BLR)."
},
"arrival_airport": {
"type": "string",
"description": "IATA code of the arrival airport (e.g., DEL)."
},
"departure_date": {
"type": "string",
"format": "date",
"description": "Departure date (YYYY-MM-DD)."
},
"return_date": {
"type": "string",
"format": "date",
"description": "Return date (YYYY-MM-DD, optional for one-way trips)."
}
},
"required": [
"departure_airport",
"arrival_airport",
"departure_date"
]
},
"output_schema": {
"type": "array",
"items": {
"type": "object",
"properties": {
"flight_number": {
"type": "string"
},
"airline_code": {
"type": "string"
},
"departure_airport": {
"type": "string"
},
"arrival_airport": {
"type": "string"
},
"departure_time": {
"type": "string",
"format": "date-time"
},
"arrival_time": {
"type": "string",
"format": "date-time"
},
"price": {
"type": "number",
"format": "float",
"description": "Total price for all passengers in the requested currency."
},
"currency": {
"type": "string",
"description": "Currency code (e.g., USD, INR)."
}
},
"required": [
"flight_number",
"airline_code",
"departure_airport",
"arrival_airport",
"departure_time",
"arrival_time",
"price",
"currency"
]
}
},
"endpoints": [
{
"uri": "/flights/search",
"method": "GET",
"security": [
"bearerAuth"
]
}
]
}
],
"security_schemes": {
"bearerAuth": {
"type": "http",
"scheme": "bearer",
"bearerFormat": "JWT"
}
}
}
It is like any other schema, very easy to read - It talks about the capability description, endpoint details, http method, request parameters, response parameters and of course the security schema.
Sequence Flow: Multi-Vendor Interoperability
This emphasizes the interoperability amongst "multi-vendor" aspect by explicitly labeling the vendors of the client agent, server agents, and underlying tools/services.
It shows how A2A acts as the unifying protocol allowing a client from one vendor to interact with servers and services from other vendors.
Sequence Flow: Discovery and Task stages between Client and Server
Explanation of sequence diagram
This sequence diagram illustrates the communication flow between a Client and a Server in a typical interaction pattern, likely for initiating and managing tasks. Let's break down each phase:
1. Discovery Phase:
Client,Server: Discovery Phase: This indicates the start of the discovery phase, where the Client tries to find information about the Server's capabilities.
Client->>Server: GET /well-known/agent-card: The Client sends an HTTP GET request to a well-known URL (/well-known/agent-card).
Server-->>Client: Agent Card: The Server responds with the "Agent Card," which likely contains information about the Server's supported functionalities, endpoints, and possibly authentication details.
2. Initiation Phase:
Client,Server: Initiation Phase: This phase involves the Client starting a new task on the Server.
Client->>Server: POST /tasks/send or /tasks/sendSubscribe: The Client sends an HTTP POST request to either /tasks/send or /tasks/sendSubscribe. This suggests two possible ways to initiate a task:
/tasks/send: Likely for a one-off task where the result is expected in a single response.
/tasks/sendSubscribe: Possibly for tasks that involve a stream of updates or results.
Client: Initial message & Task ID: Along with the POST request, the Client sends an initial message containing the details of the task it wants to execute. The Server likely assigns a unique Task ID to this request for tracking.
3. Processing Phase:
Client,Server: Processing Phase: This is where the Server processes the task initiated by the Client. The diagram shows two alternative scenarios:
alt Streaming: This branch represents a scenario where the Server provides updates in real-time.
Server-->>Client: SSE Events (status, artifacts): The Server sends Server-Sent Events (SSE). These events likely contain updates on the task's status and any intermediate results or "artifacts" produced during processing.
else Non-Streaming: This branch represents a scenario where the Server processes the task and sends a single final response.
Server-->>Client: Final Task Object: The Server sends a complete object containing the final result of the task.
4. Interaction Phase (Optional):
Client,Server: Interaction Phase (Optional): This phase is optional and occurs if the Server requires further input from the Client during the task processing.
alt Input Required: This alternative indicates that the Server has signalled a need for more information.
Client->>Server: POST /tasks/send or /tasks/sendSubscribe: The Client sends another POST request (potentially to the same or a different endpoint) with the required subsequent messages or input.
Client: Subsequent messages: This note clarifies that these POST requests contain the additional information needed by the Server.
5. Completion Phase:
Client,Server: Completion Phase: This marks the end of the task lifecycle from the Client's perspective.
Server-->>Client: Terminal State (completed/failed/canceled): The Server sends a final message indicating the terminal state of the task. This could be that the task was completed successfully, failed due to an error, or was explicitly cancelled.
What is next?
There is sure lots of potential when agents start to work together, and I am sure this protocol will ease the implementation and will bring waves of innovation.
But as this evolves, I still have some question running in my thoughts. Will be happy to get your perspective or feedback.
Q1: Are we going to see new players following similar or disruptive thought process?
Q2: Can MCP or A2A evolve and give us a protocol that can talk to both tools as well as agents
Q3: Agents are trained with lots of data and are expensive. How is the money or billing going to play in this world of collaboration.
Subscribe to my newsletter
Read articles from Honey Baweja directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
