Agent communication protocols

Your agents are talking past each other

In the previous lesson, you learned that agents sharing state need clear rules for how information flows between them. But shared state is only half the problem. The other half — and often the harder half — is communication: how does the output of one agent become the input of another without losing meaning, dropping context, or introducing ambiguity?

This is the question that has consumed researchers in linguistics, artificial intelligence, organizational design, and cognitive science for decades. The answer is always the same: you need a protocol. Not a vague understanding. Not a hope that the receiving agent will figure it out. A protocol — a structured, explicit agreement about what information gets transmitted, in what format, with what metadata, and under what constraints.

Without a protocol, every handoff between agents is a game of telephone. With one, it becomes an engineered interface. The difference between these two determines whether your multi-agent system coordinates or merely coexists.

Speech act theory: how communication became computable

The intellectual foundation of agent communication protocols comes from an unexpected source — the philosophy of language.

In 1962, J.L. Austin published How to Do Things with Words, based on his 1955 William James Lectures at Harvard. Austin's central insight was that language is not merely descriptive. When you say "I promise to deliver the report by Friday," you are not describing a promise — you are performing one. Austin called these performative utterances, and he distinguished three layers in every communicative act: the locutionary act (the literal content of what is said), the illocutionary act (the intended function — requesting, promising, declaring, asserting), and the perlocutionary act (the actual effect on the listener).

Austin's student John Searle systematized these ideas in Speech Acts (1969), developing a taxonomy of illocutionary types: assertives (stating facts), directives (requesting action), commissives (making commitments), expressives (conveying attitudes), and declarations (changing states of affairs by the act of saying them). Searle's taxonomy was not just philosophical classification. It was the blueprint for making communication computable.

Why does this matter for your cognitive infrastructure? Because every time one of your internal agents hands off work to another, it is performing a speech act. Your planning agent does not just output text — it issues a directive. Your review agent does not just produce notes — it makes an assertion about quality. Your commitment-tracking agent does not just log tasks — it registers commissives. When you fail to recognize the type of communication happening between agents, you treat all outputs as raw information. But information without illocutionary force — without a clear specification of what the receiver is supposed to do with it — is noise.

FIPA and the engineering of agent languages

The leap from philosophy to engineering happened in the 1990s. The Foundation for Intelligent Physical Agents (FIPA), an international standards body, recognized that if software agents were going to coordinate effectively, they needed a shared language — not just shared data.

FIPA ratified its Agent Communication Language (FIPA-ACL) in 2000, building directly on Searle's speech act theory. The protocol defines approximately twenty communicative acts — inform, request, agree, refuse, propose, confirm, cancel — each with precise preconditions and postconditions grounded in the agents' mental states: their beliefs, desires, and intentions. When Agent A sends a request message to Agent B, the protocol specifies that A believes B can perform the action, A intends for B to perform it, and A does not believe B would perform it without being asked. The semantics are not informal. They are formal logical conditions.

This matters because it solves the fundamental problem of inter-agent ambiguity. Without formal communicative acts, Agent A sends a message that says "the deadline is Friday" and Agent B has no way to know whether A is informing (stating a fact), requesting (asking B to meet the deadline), or declaring (setting the deadline by authority). The message content is identical. The communicative act — the protocol-level metadata — determines what the receiver should do with it.

FIPA-ACL also specifies interaction protocols: structured sequences of communicative acts that define entire conversations. A contract-net protocol, for example, defines a standard flow where one agent issues a call for proposals, other agents submit bids, the first agent evaluates and awards, and the winning agent confirms. The protocol is not just about individual messages. It is about the grammar of entire interactions.

Shared intentionality: the cognitive science of coordination

The FIPA engineers were building formal systems. But cognitive scientists were discovering the same principles in how humans naturally coordinate.

Michael Tomasello's research on shared intentionality — published across multiple works including A Natural History of Human Thinking (2014) — revealed that human cooperation depends on a cognitive capacity that most other species lack: the ability to form shared goals, shared plans, and shared knowledge about the roles each participant plays. Communication is not just information transfer. It is a joint action that presupposes a shared understanding of what each party is trying to accomplish together.

Research published in Communications Biology (2023) demonstrated this at the neural level. When pairs of participants established novel communication systems in laboratory settings, their brain activity synchronized — specifically in the right superior temporal gyrus — and this synchronization correlated with both shared intentionality and communicative accuracy. The better the participants understood each other's communicative goals, the more their neural activity aligned, and the more accurately they communicated.

The implication for your multi-agent cognitive system is direct. When your agents share intentionality — when the sending agent and the receiving agent both understand the purpose of the communication, not just its content — the protocol works. When they do not, no amount of structured formatting will save the handoff. A research agent that understands it is feeding a writing agent will produce different output than a research agent that is just "doing research." The protocol must encode not just what to transmit, but why.

Protocols in the age of AI agents

The principles Austin and Searle identified in human language, and FIPA formalized for software agents, have re-emerged with renewed urgency in the era of large language model-based multi-agent systems.

Modern AI agent frameworks — CrewAI, LangGraph, MetaGPT, AutoGen — all face the same fundamental problem: how does the output of one LLM-powered agent become usable input for another? The answer has converged on structured message passing. Agents do not exchange raw text. They exchange structured objects — typically JSON — that specify the content of the message, the type of communicative act, the context from prior interactions, and metadata about confidence, sources, and constraints.

In 2025, the industry began standardizing these patterns. Google released the Agent-to-Agent (A2A) protocol. Anthropic published the Model Context Protocol (MCP). The IEEE developed the Agent Communication Protocol (ACP). Each addresses the same core problem from a different angle, but they share a common insight: agents that communicate through structured, typed, semantically explicit messages coordinate orders of magnitude better than agents that pass unstructured text.

The architectural patterns mirror what cognitive science already knew. Sequential pipelines (agent A hands off to agent B hands off to agent C) work when the protocol between each pair is well-defined. Network architectures (every agent can communicate with every other agent) require more sophisticated protocols because the combinatorial explosion of possible interactions demands stricter message typing. Role-based teams (agents assigned specific professional roles, as in MetaGPT's software development simulation) work because the roles themselves constrain the protocol — a product manager communicates differently to an engineer than to a designer, and the protocol encodes those constraints.

The lesson for your own cognitive infrastructure is the same one the AI industry is learning at massive scale: capable agents with poor communication protocols produce worse outcomes than mediocre agents with excellent protocols. The protocol is not overhead. It is the system.

The anatomy of a cognitive communication protocol

You do not need FIPA-ACL or JSON schemas to build communication protocols between your own internal agents. But you do need the same structural elements that every effective protocol contains.

Message type. Every communication between agents must declare what it is: a request, an assertion, a proposal, a status update, a handoff. Your research agent handing material to your writing agent is performing a different act than your review agent sending feedback to your revision agent. The receiving agent needs to know which act is being performed to know what to do next.

Structured payload. The content of the message must follow a predictable format. If your planning agent always outputs three fields — task description, success criteria, and time constraint — then your execution agent can reliably consume that output without guessing. If the format varies every time, the receiving agent spends its energy parsing rather than acting.

Context window. Every message must carry enough context for the receiving agent to act without needing to reconstruct the entire history. This does not mean transmitting everything. It means transmitting the minimal sufficient context — what the previous agent did, what it decided, what constraints apply, and what the receiving agent specifically needs to know. Anthropic's MCP protocol formalized this as "context" — the structured background that accompanies every agent interaction.

Completion criteria. The sending agent must specify what "done" looks like for the receiving agent. Without this, the receiving agent either over-delivers (wasting resources) or under-delivers (failing the handoff). A research agent that says "here are notes on the topic" leaves the writing agent guessing about depth, scope, and angle. A research agent that says "here are five sourced claims supporting the argument that X, organized by strength of evidence, sufficient for a 1,500-word article" gives the writing agent a contract to fulfill.

Acknowledgment mechanism. The receiving agent must confirm that it received the message, understood the type, and can act on the payload. In human systems, this is the head nod, the "got it," the email reply confirming receipt. In your cognitive system, it is the moment where you verify that your next agent actually has what it needs before you let the previous agent stand down.

Where protocols break and what to do about it

Protocols fail in three characteristic ways, and knowing these failure modes is as important as knowing the protocol itself.

Type mismatch. The sending agent performs one communicative act, and the receiving agent interprets it as another. Your planning agent makes a suggestion ("we could try X"), and your execution agent treats it as a directive ("we are doing X"). In FIPA-ACL terms, a propose was received as a request. The fix is explicit type labeling — when you communicate between agents, state the type: "This is a suggestion, not a commitment."

Payload degradation. The structured information degrades into unstructured noise as it passes between agents. Your research agent produces detailed, organized notes. By the time those notes reach your writing agent — after sitting in a tab for three hours while you did other things — you have forgotten the organization, lost the hierarchy, and are left with a pile of text. The fix is persistent structure: write the protocol output down in a format that survives the passage of time and context switches.

Context starvation. The receiving agent gets the payload but not the context. Your execution agent knows what to do but not why, so it optimizes for the wrong thing. Your review agent has the deliverable but not the success criteria, so it evaluates against default standards instead of the specific ones the planner intended. The fix is mandatory context fields — every handoff must include not just the artifact but the purpose, constraints, and evaluation criteria.

From shared state to structured conversation

In the previous lesson (L-0506), you learned to define how information flows between agents that share state. This lesson takes that further: information flow is necessary but not sufficient. The information must be structured as communicative acts — typed, formatted, contextualized, and acknowledged — or it degrades into noise.

The core principle is simple and non-negotiable: the output of one agent must be engineered to be the input of the next. Not adapted. Not translated. Not interpreted. Engineered. This means the sending agent must know who the receiving agent is, what that agent needs, and in what format it needs it. It means the receiving agent must be able to consume the message without reconstruction, guessing, or improvisation.

In the next lesson (L-0508), you will meet the orchestrator agent — a meta-agent whose entire purpose is to coordinate other agents by deciding which should run when. The orchestrator depends on protocols. It cannot route work between agents if those agents do not speak a shared language. The communication protocols you define now become the infrastructure the orchestrator manages. Build them well, and orchestration becomes possible. Build them poorly — or not at all — and no orchestrator can save you.

Sources:

Austin, J.L. (1962). How to Do Things with Words. Oxford University Press.
Searle, J.R. (1969). Speech Acts: An Essay in the Philosophy of Language. Cambridge University Press.
FIPA (2000). FIPA Agent Communication Language Specifications. Foundation for Intelligent Physical Agents.
Tomasello, M. (2014). A Natural History of Human Thinking. Harvard University Press.
Lu, Y., et al. (2023). "Shared intentionality modulates interpersonal neural synchronization at the establishment of communication system." Communications Biology, 6, 830.
Desai, A., et al. (2025). "A Survey of Agent Interoperability Protocols: MCP, ACP, A2A, and ANP." arXiv:2505.02279.
Li, G., et al. (2023). "MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework." arXiv:2308.00352.