AI Engineering🤖 AI Agents🛠️ Building AI Agents
🛡️
Running production systems? Exemplar brings SRE, uptime monitoring, and incident management together so your team resolves outages faster and proves reliability to the business. Visit exemplar.dev →

🛠️ Building AI Agents

Building a reliable agentic system requires choosing the right architectural abstraction. This guide covers how to design, implement, and secure production-ready agents and workflows.


1. ⚖️ Workflows vs. Agents

Not every system requires a fully autonomous agent. When building, choose between structured Workflows (predictable code paths) and autonomous Agents (dynamic model paths).

DimensionWorkflowsAgents
OrchestrationHardcoded logic, state machines, and sequential conditions.Dynamic planning, tool selection, and model self-reflection.
PredictabilityHigh. Consistent, repeatable paths that are easy to test.Lower. Paths vary based on model reasoning.
Cost & LatencyLow. Minimal LLM calls per execution cycle.Higher. Multiple reasoning turns and tool invocations.
Best ForStructured data ingestion, report generation, predictable tasks.Open-ended research, dynamic debugging, complex scheduling.

🎯 Decision Criteria: When to Use Agents

  • Exploratory Problem-Solving: Tasks requiring dynamic adjustments where the exact execution steps cannot be hardcoded in advance (e.g., searching web docs, debugging code).
  • High-Context Scenarios: Tasks where the optimal next step is highly variable and depends on the specific outcome of the previous tool execution.
  • Semantic Dispatching: Operations that require natural language understanding to select routes, tools, or policies dynamically.

When to Avoid (Choose Workflows Instead)

  • High Predictability Demands: Critical systems where the path must be strictly audit-trailed, identical every run, and easily testable (e.g., transaction postings).
  • Strict Latency SLAs: Real-time web requests where multi-turn LLM agent reasoning would introduce unacceptable delays.
  • Cost-Sensitive Pipelines: High-throughput batch operations where running recursive reasoning loops would cause API costs to balloon.
  • Simple Rules-Based Logic: Straightforward tasks that can be fully resolved using standard regex, if-else logic, or simple API integrations.

2. 🔗 Core Workflow Patterns

Workflows use static code structures to coordinate LLM calls. The four foundational patterns are:

A. Prompt Chaining

Sequentially piping the output of one LLM call directly as the input to the next.

def run_prompt_chain(user_query: str) -> str:
    # Step 1: Draft response
    draft = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Draft an answer: {user_query}"}]
    ).choices[0].message.content
 
    # Step 2: Format draft
    formatted = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Format this draft in Markdown:\n{draft}"}]
    ).choices[0].message.content
    return formatted

B. Routing

Classifying an input and redirecting execution to a specialized path or prompt.

def run_router(user_query: str) -> str:
    # Route input query
    route = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "system", "content": "Classify query as: 'code' or 'billing'."},
                  {"role": "user", "content": user_query}]
    ).choices[0].message.content.strip().lower()
 
    # Redirect to corresponding handler
    if "code" in route:
        return handle_code_query(user_query)
    return handle_billing_query(user_query)

C. Parallelization

Running independent LLM calls concurrently and aggregating the results.

import asyncio
 
async def fetch_insights(topic: str, perspective: str) -> str:
    response = await async_client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Analyze {topic} from {perspective} view."}]
    )
    return response.choices[0].message.content
 
async def run_parallel_analysis(topic: str):
    # Run concurrent analysts
    results = await asyncio.gather(
        fetch_insights(topic, "Security"),
        fetch_insights(topic, "Performance")
    )
    return f"Security: {results[0]}\nPerformance: {results[1]}"

D. Evaluator-Optimizer

A generator creates a draft, and an evaluator reviews it, looping back until criteria are met.

def run_eval_loop(target_task: str, max_attempts: int = 3) -> str:
    draft = "Initial draft"
    for attempt in range(max_attempts):
        # Evaluate draft
        review = client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": f"Rate this draft: {draft}. Output APPROVED or feedback."}]
        ).choices[0].message.content
 
        if "APPROVED" in review:
            return draft
 
        # Optimize draft based on feedback
        draft = client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": f"Improve draft: {draft} using feedback: {review}"}]
        ).choices[0].message.content
    return draft

3. 🧠 Agent Architecture

An autonomous agent acts by continuously analyzing tool outputs and self-correcting.

The loop continues until the LLM decides the user’s objective is fully resolved.


4. 🛠️ Tool Design & Schema Registration

When registering tools with LLM APIs, descriptions are parameters. Follow these principles:

  • Tightly Scoped Types: Use strict primitives (strings, integers) and define JSON Schemas.
  • Detailed Descriptions: Explicitly declare when the tool should and should not be invoked.
  • Keep Tool Count Small: LLMs struggle to select from dozens of tools. Keep the active tool registry under 10-15 tools to prevent selection errors and token bloat. If more are needed, partition them behind specialized router agents.
  • Provide Few-Shot Tool Examples: If a model struggles to format tool parameters correctly, include example tool invocations and output responses directly in the system instructions.
tools_schema = [
    {
        "type": "function",
        "function": {
            "name": "query_database",
            "description": "Query database for customer orders. Use ONLY when querying order status, tracking links, or balances. Do NOT use for editing or creating items.",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {"type": "string", "description": "Alphanumeric order ID (e.g. 'ORD-1234')"}
                },
                "required": ["order_id"]
            }
        }
    }
]

For guidelines on packaging capabilities, refer to the Agent Skills & Capabilities page.


5. 💻 The ReAct Loop (Implementation)

Below is an active Python implementation of the ReAct runtime. It catches execution exceptions and feeds the trace back to the LLM to trigger self-correction:

import json
from openai import OpenAI
 
client = OpenAI()
tools_map = {"query_database": query_database}
 
def run_react_agent(query: str, max_turns: int = 5):
    messages = [{"role": "user", "content": query}]
 
    for turn in range(max_turns):
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=tools_schema
        )
        message = response.choices[0].message
        messages.append(message)
 
        if not message.tool_calls:
            return message.content  # Final answer
 
        # Execute requested tool calls
        for tool_call in message.tool_calls:
            func_name = tool_call.function.name
            func_args = json.loads(tool_call.function.arguments)
            
            try:
                # Execute tool
                tool_output = tools_map[func_name](**func_args)
            except Exception as e:
                # Capture exception and feed back to the LLM for self-correction
                tool_output = f"Error: {str(e)}. Please correct your parameters and retry."
            
            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "name": func_name,
                "content": tool_output
            })
    return "Error: Maximum agent loops exceeded."

6. 🧩 Memory Integration

To keep agent execution within context limits, write memory truncation logic directly into the orchestration loop.

def truncate_active_context(messages: list, max_tokens: int) -> list:
    """Trim oldest messages if they exceed context bounds, keeping system instructions."""
    system_prompt = messages[0] if messages and messages[0]["role"] == "system" else None
    history = messages[1:] if system_prompt else messages
 
    while history and count_tokens(history) > max_tokens:
        history.pop(0)  # Trim oldest entry
        
    return [system_prompt] + history if system_prompt else history

For advanced long-term retrieval and episodic strategies, see Agent Memory Systems.


7. 🤝 Human-in-the-Loop (HITL) Gateways

Destructive actions (e.g., executing transactions, mutating database records) require human authorization.

Implement this in code by separating tool execution into Proposed and Confirmed phases.


8. 🛡️ Production Guardrails

Protect your systems from runaway costs and logic loops using these constraints:

GuardrailThreatMitigation
Max IterationsInfinite execution loopsSet a hard limit on turns (e.g., max_turns = 5).
Schema ValidationInjection of invalid argumentsEnforce schema parsing using Pydantic and pass parsing failures back to the LLM.
API Rate LimitsSudden token spend spikesThrottle model requests per session or context window.

9. 🚀 Production Considerations

  • Code Sandboxing: Execute model-generated code in isolated sandboxes using services like E2B or secure Docker containers.
  • Observability: Log step-by-step tool traces, latencies, and token costs per session. See Evaluation Tools for details.
  • Caching: Cache repeated tool execution results or prompt headers to lower costs and reduce latency.

10. 💼 Real-World Agent Applications

When combining these architectural patterns, keep domain constraints in mind:

  • Customer Support (Router + memory + HITL): Use a router to classify incoming user queries. Direct simple FAQs to retrieval workflows (chains). Route billing or high-risk account modifications to a stateful ReAct agent equipped with transactional database tools, and enforce a Human-in-the-Loop approval queue before applying permanent changes.
  • Code Development (Reflection Loop): Configure a generator model to draft software and an evaluator model (or test suite run inside a sandboxed E2B container) to test the output. Feed all compilation errors and test tracebacks back into the generator’s context for recursive self-correction.


🚀 10K+ page views in last 7 days
Developer Handbook 2026 © Exemplar.