How AI Agents Actually Work: The Message Loop Explained
Scenario
Context: You're building an AI agent and need to understand what happens behind the scenes
Goal: Understand the message structure, tool calling mechanics, and agent loop that power AI coding assistants
Anti-pattern: Treating agents as black boxes without understanding their mechanics
Conversation
What Is an Agent?
An AI agent is an LLM in a loop that can take actions. The core pattern:
while not done:
response = llm.generate(messages)
if response.has_tool_calls:
results = execute_tools(response.tool_calls)
messages.append(response)
messages.append(results)
else:
done = True
return response.text
That's it. Everything else—Claude Code, Cursor, Copilot agents—is variations on this loop.
The Message Array: An Agent's Memory
Every agent conversation is a growing array of messages. Each message has a role:
| Role | Purpose |
|---|---|
system |
Instructions, personality, constraints (set once) |
user |
Human input |
assistant |
LLM responses (may include tool calls) |
tool |
Results from tool execution |
The LLM sees the entire array on every turn. This is how it "remembers" context.
Example: A Simple File-Reading Agent
Let's trace through exactly what happens when you ask an agent to read a file.
Initial state:
{
"messages": [
{
"role": "system",
"content": "You are a coding assistant. Use tools to help the user."
},
{
"role": "user",
"content": "What's in src/config.ts?"
}
],
"tools": [
{
"name": "read_file",
"description": "Read contents of a file",
"parameters": {
"type": "object",
"properties": {
"path": { "type": "string", "description": "File path to read" }
},
"required": ["path"]
}
}
]
}
Step 1: LLM Generates Response with Tool Call
The LLM doesn't output text—it outputs a structured tool call:
{
"role": "assistant",
"content": null,
"tool_calls": [
{
"id": "call_abc123",
"type": "function",
"function": {
"name": "read_file",
"arguments": "{\"path\": \"src/config.ts\"}"
}
}
]
}
Step 2: Your Code Executes the Tool
The agent framework (your code) intercepts this, runs the actual tool:
// Agent loop pseudocode
const response = await llm.chat(messages, tools);
if (response.tool_calls) {
for (const call of response.tool_calls) {
// Actually execute the tool
const result = await executeTool(call.function.name,
JSON.parse(call.function.arguments));
// Append the tool result to messages
messages.push({
role: "tool",
tool_call_id: call.id,
content: result // The actual file contents
});
}
// Continue the loop - call LLM again with tool results
messages.push(response); // Include the assistant's tool_calls
return agentLoop(messages, tools); // Recurse
}
Step 3: Messages Array After Tool Execution
Now the messages array has grown:
[
{ "role": "system", "content": "You are a coding assistant..." },
{ "role": "user", "content": "What's in src/config.ts?" },
{
"role": "assistant",
"content": null,
"tool_calls": [{ "id": "call_abc123", "function": { "name": "read_file", ... }}]
},
{
"role": "tool",
"tool_call_id": "call_abc123",
"content": "export const config = {\n port: 3000,\n debug: true\n};"
}
]
Step 4: LLM Generates Final Response
With the file contents in context, the LLM can now answer:
{
"role": "assistant",
"content": "The `src/config.ts` file exports a config object with two properties:\n\n- `port`: 3000 (the server port)\n- `debug`: true (debug mode enabled)\n\nThis is a simple configuration file for your application settings."
}
Key Insight: The LLM Decides Everything
The LLM makes all decisions:
- Whether to call a tool (or just respond)
- Which tool to call
- What arguments to pass
- When to stop (respond without tool calls)
Your code just executes what it asks for and feeds results back.
Multi-Tool Chains: Reading Then Writing
Agents often chain multiple tools. Here's a "read file, modify it, write it back" flow:
User: "Add a timeout field to config.ts, default 30000"
Turn 1: LLM calls read_file("src/config.ts")
→ Tool returns file contents
Turn 2: LLM calls write_file("src/config.ts", "...updated contents...")
→ Tool returns "File written successfully"
Turn 3: LLM responds "Done! Added timeout: 30000 to the config."
Each turn adds to the messages array. By Turn 3, the LLM has seen:
- The original request
- Its decision to read
- The file contents
- Its decision to write
- Confirmation of the write
Parallel Tool Calls
Modern LLMs can request multiple tools simultaneously:
{
"tool_calls": [
{ "id": "call_1", "function": { "name": "read_file", "arguments": "{\"path\": \"src/a.ts\"}" }},
{ "id": "call_2", "function": { "name": "read_file", "arguments": "{\"path\": \"src/b.ts\"}" }},
{ "id": "call_3", "function": { "name": "read_file", "arguments": "{\"path\": \"src/c.ts\"}" }}
]
}
Your agent should execute these in parallel for efficiency, then return all results:
[
{ "role": "tool", "tool_call_id": "call_1", "content": "...contents of a.ts..." },
{ "role": "tool", "tool_call_id": "call_2", "content": "...contents of b.ts..." },
{ "role": "tool", "tool_call_id": "call_3", "content": "...contents of c.ts..." }
]
Context Accumulation: The Blessing and Curse
Every turn adds to the messages array:
- User messages
- Assistant responses
- Tool calls
- Tool results (which can be large!)
Problem: Context windows have limits (128K-200K tokens typically).
Solutions agents use:
- Truncate old messages - Drop earlier turns
- Summarize tool results - "File has 500 lines, showing first 100"
- Reference instead of inline - "See file contents in previous turn"
- Clear context -
/clearcommands start fresh
Tool Schemas: Teaching the LLM What's Available
Tools are defined with JSON Schema. The schema teaches the LLM:
- What the tool does (description)
- What parameters it needs
- Which parameters are required
{
"name": "search_codebase",
"description": "Search for text patterns across all files. Use for finding function definitions, usages, or specific strings.",
"parameters": {
"type": "object",
"properties": {
"pattern": {
"type": "string",
"description": "Regex pattern to search for"
},
"file_glob": {
"type": "string",
"description": "Optional glob to filter files, e.g. '*.ts'"
},
"max_results": {
"type": "integer",
"description": "Maximum results to return",
"default": 20
}
},
"required": ["pattern"]
}
}
The Complete Agent Loop
async function agentLoop(
messages: Message[],
tools: Tool[],
maxIterations = 20
): Promise<string> {
for (let i = 0; i < maxIterations; i++) {
const response = await llm.chat({ messages, tools });
// No tool calls = final response
if (!response.tool_calls?.length) {
return response.content;
}
// Execute all tool calls (in parallel)
messages.push(response); // Include assistant's tool_calls
const toolResults = await Promise.all(
response.tool_calls.map(async (call) => ({
role: "tool" as const,
tool_call_id: call.id,
content: await executeTool(call.function.name,
JSON.parse(call.function.arguments))
}))
);
messages.push(...toolResults);
}
throw new Error("Agent exceeded max iterations");
}
Why This Matters for AI Engineers
Understanding the message loop lets you:
- Debug agent behavior - Inspect the messages array to see what the LLM saw
- Optimize context usage - Know what's eating your token budget
- Design better tools - Write descriptions that guide the LLM correctly
- Build guardrails - Intercept tool calls before execution
- Implement caching - Recognize repeated tool calls
The agent isn't magic—it's a loop with a very smart decision-maker inside.
Summary
| Concept | What It Is |
|---|---|
| Messages array | The agent's memory—grows each turn |
| Tool call | LLM's structured request to execute something |
| Tool result | Your code's response, fed back to LLM |
| Agent loop | Generate → execute tools → append results → repeat |
| Termination | LLM responds without tool calls |
| tool_call_id | Links results to requests (enables parallel calls) |
Every AI coding assistant—from simple chatbots to Claude Code—is built on this foundation.
Key Takeaways
- An agent is an LLM in a loop that can call tools and see their results
- The messages array is the agent's memory—it grows with each turn
- The LLM decides when to call tools vs when to respond
- tool_call_id links parallel tool results back to their requests
- Context accumulation is both powerful (memory) and limiting (token budget)
Related Content
Sources
OpenAI Function Calling Documentation
OpenAI function calling enables LLMs to generate structured JSON that maps to external function calls. The model decides when to invoke functions based on conversation context. Critical protocol for building AI agents that interact with external systems, APIs, and tools.
- Function calling allows models to generate structured JSON for function arguments
- Models decide when to call functions based on user input
- Supports parallel function calls in a single response