🛠️ Building AI Agents
Building a reliable agentic system requires choosing the right architectural abstraction. This guide covers how to design, implement, and secure production-ready agents and workflows.
1. ⚖️ Workflows vs. Agents
Not every system requires a fully autonomous agent. When building, choose between structured Workflows (predictable code paths) and autonomous Agents (dynamic model paths).
| Dimension | Workflows | Agents |
|---|---|---|
| Orchestration | Hardcoded logic, state machines, and sequential conditions. | Dynamic planning, tool selection, and model self-reflection. |
| Predictability | High. Consistent, repeatable paths that are easy to test. | Lower. Paths vary based on model reasoning. |
| Cost & Latency | Low. Minimal LLM calls per execution cycle. | Higher. Multiple reasoning turns and tool invocations. |
| Best For | Structured data ingestion, report generation, predictable tasks. | Open-ended research, dynamic debugging, complex scheduling. |
🎯 Decision Criteria: When to Use Agents
Recommended Use Cases
- Exploratory Problem-Solving: Tasks requiring dynamic adjustments where the exact execution steps cannot be hardcoded in advance (e.g., searching web docs, debugging code).
- High-Context Scenarios: Tasks where the optimal next step is highly variable and depends on the specific outcome of the previous tool execution.
- Semantic Dispatching: Operations that require natural language understanding to select routes, tools, or policies dynamically.
When to Avoid (Choose Workflows Instead)
- High Predictability Demands: Critical systems where the path must be strictly audit-trailed, identical every run, and easily testable (e.g., transaction postings).
- Strict Latency SLAs: Real-time web requests where multi-turn LLM agent reasoning would introduce unacceptable delays.
- Cost-Sensitive Pipelines: High-throughput batch operations where running recursive reasoning loops would cause API costs to balloon.
- Simple Rules-Based Logic: Straightforward tasks that can be fully resolved using standard regex, if-else logic, or simple API integrations.
2. 🔗 Core Workflow Patterns
Workflows use static code structures to coordinate LLM calls. The four foundational patterns are:
A. Prompt Chaining
Sequentially piping the output of one LLM call directly as the input to the next.
def run_prompt_chain(user_query: str) -> str:
# Step 1: Draft response
draft = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": f"Draft an answer: {user_query}"}]
).choices[0].message.content
# Step 2: Format draft
formatted = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": f"Format this draft in Markdown:\n{draft}"}]
).choices[0].message.content
return formattedB. Routing
Classifying an input and redirecting execution to a specialized path or prompt.
def run_router(user_query: str) -> str:
# Route input query
route = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "system", "content": "Classify query as: 'code' or 'billing'."},
{"role": "user", "content": user_query}]
).choices[0].message.content.strip().lower()
# Redirect to corresponding handler
if "code" in route:
return handle_code_query(user_query)
return handle_billing_query(user_query)C. Parallelization
Running independent LLM calls concurrently and aggregating the results.
import asyncio
async def fetch_insights(topic: str, perspective: str) -> str:
response = await async_client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": f"Analyze {topic} from {perspective} view."}]
)
return response.choices[0].message.content
async def run_parallel_analysis(topic: str):
# Run concurrent analysts
results = await asyncio.gather(
fetch_insights(topic, "Security"),
fetch_insights(topic, "Performance")
)
return f"Security: {results[0]}\nPerformance: {results[1]}"D. Evaluator-Optimizer
A generator creates a draft, and an evaluator reviews it, looping back until criteria are met.
def run_eval_loop(target_task: str, max_attempts: int = 3) -> str:
draft = "Initial draft"
for attempt in range(max_attempts):
# Evaluate draft
review = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": f"Rate this draft: {draft}. Output APPROVED or feedback."}]
).choices[0].message.content
if "APPROVED" in review:
return draft
# Optimize draft based on feedback
draft = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": f"Improve draft: {draft} using feedback: {review}"}]
).choices[0].message.content
return draft3. 🧠 Agent Architecture
An autonomous agent acts by continuously analyzing tool outputs and self-correcting.
The loop continues until the LLM decides the user’s objective is fully resolved.
4. 🛠️ Tool Design & Schema Registration
When registering tools with LLM APIs, descriptions are parameters. Follow these principles:
- Tightly Scoped Types: Use strict primitives (strings, integers) and define JSON Schemas.
- Detailed Descriptions: Explicitly declare when the tool should and should not be invoked.
- Keep Tool Count Small: LLMs struggle to select from dozens of tools. Keep the active tool registry under 10-15 tools to prevent selection errors and token bloat. If more are needed, partition them behind specialized router agents.
- Provide Few-Shot Tool Examples: If a model struggles to format tool parameters correctly, include example tool invocations and output responses directly in the system instructions.
tools_schema = [
{
"type": "function",
"function": {
"name": "query_database",
"description": "Query database for customer orders. Use ONLY when querying order status, tracking links, or balances. Do NOT use for editing or creating items.",
"parameters": {
"type": "object",
"properties": {
"order_id": {"type": "string", "description": "Alphanumeric order ID (e.g. 'ORD-1234')"}
},
"required": ["order_id"]
}
}
}
]For guidelines on packaging capabilities, refer to the Agent Skills & Capabilities page.
5. 💻 The ReAct Loop (Implementation)
Below is an active Python implementation of the ReAct runtime. It catches execution exceptions and feeds the trace back to the LLM to trigger self-correction:
import json
from openai import OpenAI
client = OpenAI()
tools_map = {"query_database": query_database}
def run_react_agent(query: str, max_turns: int = 5):
messages = [{"role": "user", "content": query}]
for turn in range(max_turns):
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools_schema
)
message = response.choices[0].message
messages.append(message)
if not message.tool_calls:
return message.content # Final answer
# Execute requested tool calls
for tool_call in message.tool_calls:
func_name = tool_call.function.name
func_args = json.loads(tool_call.function.arguments)
try:
# Execute tool
tool_output = tools_map[func_name](**func_args)
except Exception as e:
# Capture exception and feed back to the LLM for self-correction
tool_output = f"Error: {str(e)}. Please correct your parameters and retry."
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"name": func_name,
"content": tool_output
})
return "Error: Maximum agent loops exceeded."6. 🧩 Memory Integration
To keep agent execution within context limits, write memory truncation logic directly into the orchestration loop.
def truncate_active_context(messages: list, max_tokens: int) -> list:
"""Trim oldest messages if they exceed context bounds, keeping system instructions."""
system_prompt = messages[0] if messages and messages[0]["role"] == "system" else None
history = messages[1:] if system_prompt else messages
while history and count_tokens(history) > max_tokens:
history.pop(0) # Trim oldest entry
return [system_prompt] + history if system_prompt else historyFor advanced long-term retrieval and episodic strategies, see Agent Memory Systems.
7. 🤝 Human-in-the-Loop (HITL) Gateways
Destructive actions (e.g., executing transactions, mutating database records) require human authorization.
Implement this in code by separating tool execution into Proposed and Confirmed phases.
8. 🛡️ Production Guardrails
Protect your systems from runaway costs and logic loops using these constraints:
| Guardrail | Threat | Mitigation |
|---|---|---|
| Max Iterations | Infinite execution loops | Set a hard limit on turns (e.g., max_turns = 5). |
| Schema Validation | Injection of invalid arguments | Enforce schema parsing using Pydantic and pass parsing failures back to the LLM. |
| API Rate Limits | Sudden token spend spikes | Throttle model requests per session or context window. |
9. 🚀 Production Considerations
- Code Sandboxing: Execute model-generated code in isolated sandboxes using services like E2B or secure Docker containers.
- Observability: Log step-by-step tool traces, latencies, and token costs per session. See Evaluation Tools for details.
- Caching: Cache repeated tool execution results or prompt headers to lower costs and reduce latency.
10. 💼 Real-World Agent Applications
When combining these architectural patterns, keep domain constraints in mind:
- Customer Support (Router + memory + HITL): Use a router to classify incoming user queries. Direct simple FAQs to retrieval workflows (chains). Route billing or high-risk account modifications to a stateful ReAct agent equipped with transactional database tools, and enforce a Human-in-the-Loop approval queue before applying permanent changes.
- Code Development (Reflection Loop): Configure a generator model to draft software and an evaluator model (or test suite run inside a sandboxed E2B container) to test the output. Feed all compilation errors and test tracebacks back into the generator’s context for recursive self-correction.
🔗 Related Sections
- Anatomy of AI Agents — Conceptual components breakdown.
- Agent Memory Systems — Core sliding window and RAG memory logic.
- Agent Skills & Capabilities — Designing reliable schemas.
- Multi-Agent Systems — Supervisor, Pipeline, and Debate architectures.
- Prompt Engineering — Advanced prompting tactics.