AI Engineering🤖 AI Agents🧩 Agent Architectures
🛡️
Running production systems? Exemplar brings SRE, uptime monitoring, and incident management together so your team resolves outages faster and proves reliability to the business. Visit exemplar.dev →

🧩 AI Agent Architectures: Workflows vs. Agents

In production generative AI systems, “agent” is not a single architectural pattern. Instead, system designs exist along a spectrum of autonomy, ranging from fully deterministic workflows (predefined execution graphs) to open-ended autonomous agents (LLM-driven tool calling loops).

Selecting the correct architecture is a balancing act between autonomy (the system’s ability to handle novel inputs) and predictability (the system’s guarantee of consistent, low-latency, and cost-effective execution).


📊 1. The Autonomy Spectrum

Modern LLM applications are split into two primary paradigms: Workflows and Agents.

  • Workflows orchestrate LLMs using deterministic programmatic steps. The application code dictates the control flow, using LLMs for transformations, extraction, or routing decisions.
  • Agents give control of the execution loop to the LLM. The LLM determines which tools to call, in what order, and when to terminate the loop and return an answer.

Architectural Trade-offs

Design PatternAutonomy LevelPrimary Decision MakerLatency and CostBest Production Use Case
Prompt ChainingNone (Static)Application CodeLow and PredictableLinear pipelines (e.g., summarize then translate)
Router AgentsLow (Conditional)LLM (Outputs route key)LowSupport triage, intent classification
Orchestrator-WorkersMedium (Dynamic Split)LLM (Task Planner)MediumDocument analysis, multi-source research tasks
Autonomous ReActHigh (Open Loop)LLM (Tool Call Selector)High and VariableComplex database querying, exploratory searches
Reflection LoopHigh (Iterative Self-Fix)LLM (Evaluator Model)High (Multi-turn LLM)Code generation, strict schema validation

🔀 2. Deterministic Workflow Topologies

Workflows maximize reliability and are preferred when the steps required to complete a task are known beforehand.

A. Router Agents

A Router Agent uses an LLM to classify user intent and select a specialized downstream execution path. This pattern replaces complex, fragile regex rules with a semantic decision-maker.

To prevent parsing failures, production routers must use Structured Outputs (e.g., JSON Schema enforcement) rather than relying on parsing raw text strings.

from typing import Literal
from pydantic import BaseModel, Field
 
class RouteSelection(BaseModel):
    """Schema for routing customer support tickets to specialized agents."""
    route: Literal["billing", "technical_support", "general_inquiry"] = Field(
        description="Select the specialized route based on the ticket content"
    )
    confidence: float = Field(
        description="Confidence score for the route selection, between 0.0 and 1.0"
    )
    reasoning: str = Field(
        description="Detailed justification for why this route was selected"
    )
 
def handle_route(ticket_text: str) -> str:
    # Example demonstrating structured routing logic
    # In practice: response = client.beta.chat.completions.parse(..., response_format=RouteSelection)
    # router_decision = response.choices[0].message.parsed
    print(f"Routing ticket: '{ticket_text}'")
    
    # Mocking parser output for demonstration
    decision = RouteSelection(
        route="technical_support",
        confidence=0.95,
        reasoning="The ticket mentions database crash and SQL connection errors."
    )
    return decision.route

B. Orchestrator-Workers

The Orchestrator-Workers pattern uses a central planner LLM to break a complex goal down into independent sub-tasks, delegates those tasks to concurrent workers (programmatic prompts or sub-agents), and compiles the results.

This pattern is highly effective for gathering information across distinct databases, generating structured reports, and running parallel checks (e.g., auditing compliance clauses across multiple contracts concurrently).


🔄 3. Autonomous Reasoning Loops

When the execution path cannot be planned in advance, control must be handed to the LLM to decide actions dynamically.

A. ReAct / Tool-Calling Agents

The ReAct (Reason + Act) loop is the foundational pattern for autonomous agents. At each step, the agent:

  1. Thinks: Analyzes the current history to decide how to proceed.
  2. Acts: Invokes a tool (e.g., SQL query, Web Search API) with generated parameters.
  3. Observes: Receives the tool’s output (the observation) and appends it to its context window.

This cycle continues iteratively until the agent decides it has gathered enough information to output the final answer.

[!WARNING] Because autonomous loops are open-ended, they can fall into infinite loop cycles (e.g., calling the same failing tool repeatedly). Production ReAct engines must enforce hard limits on maximum loop iterations and total token usage to prevent runaway API costs.

B. Reflection & Self-Correction

A Reflection Agent uses two distinct LLM configurations—a Generator and an Evaluator—to build a self-correcting loop. The Generator creates a draft, the Evaluator checks it for syntax, style, or security violations, and the loop continues until the Evaluator approves or a maximum iteration limit is hit.

Below is a self-contained implementation of a code generator-evaluator reflection loop:

from typing import Dict, Any
 
class GeneratorAgent:
    def generate_draft(self, prompt: str, feedback: str = "") -> str:
        # Generate initial draft or update based on feedback
        if feedback:
            return f"def calculate_sum(a, b):\n    # Fixed issue: {feedback}\n    return a + b"
        return "def calculate_sum(a, b):\n    return a - b"  # Intentional bug in first draft
 
class EvaluatorAgent:
    def evaluate(self, code: str) -> Dict[str, Any]:
        # Validate code and provide logical verification feedback
        if "a - b" in code:
            return {"status": "failed", "feedback": "Subtracted instead of adding."}
        if "a + b" in code:
            return {"status": "passed", "feedback": ""}
        return {"status": "failed", "feedback": "Function calculate_sum not found."}
 
def run_reflection_loop(task: str, max_iterations: int = 3) -> str:
    generator = GeneratorAgent()
    evaluator = EvaluatorAgent()
    
    draft = generator.generate_draft(task)
    feedback = ""
    
    for iteration in range(max_iterations):
        result = evaluator.evaluate(draft)
        if result["status"] == "passed":
            print(f"Iteration {iteration + 1}: Evaluation passed!")
            return draft
        
        feedback = result["feedback"]
        print(f"Iteration {iteration + 1}: Evaluation failed. Feedback: {feedback}")
        draft = generator.generate_draft(task, feedback=feedback)
        
    raise RuntimeError("Failed to generate correct output within maximum reflection iterations.")
 
# Example execution:
# final_output = run_reflection_loop("Write a sum function")

⏱️ 4. Stateful vs. Stateless Execution

When designing the platform hosting architecture for your agents, match the infrastructure runtime to the lifecycle of the agent:

Stateless Wrappers (Serverless/FaaS)

  • Architecture: Ephemeral stateless handlers (e.g., AWS Lambda, Google Cloud Run).
  • Fit: Prompt chains, simple routing agents, and quick single-turn classification.
  • Benefits: Zero-idle hosting costs, infinite scale, and low operational overhead.
  • Limitations: Execution timeout limits (usually 15 minutes) and lack of persistent memory.

Stateful Orchestration Engines

  • Architecture: Durable execution backends (e.g., Temporal, LangGraph checkpointers, database-backed Celery queues).
  • Fit: Multi-step autonomous loops, Human-in-the-loop (HITL) authorization gates, and long-running background tasks.
  • Benefits: Ability to pause execution (e.g., awaiting human approval), state graph recovery during crash/retry cycles, and session thread persistence.
  • Limitations: Elevated hosting complexity, database write overhead, and state serialization bottlenecks.


🚀 10K+ page views in last 7 days
Developer Handbook 2026 © Exemplar.