Workflow vs Agent vs Multi-Agent: The Architecture Decision Nobody Explains

Everyone wants to build agents. Almost nobody should start there.

In my previous post on agent definitions, I mapped the five competing definitions of “AI agent” floating around the industry. But here’s what I didn’t address: once you pick a definition, how do you decide what architecture to actually build?

Towards AI’s Agent Architecture Cheat Sheet provides the clearest framework I’ve found.

The core insight is brutal in its simplicity: stay as far left as possible on the complexity spectrum.

Workflow → Single Agent + Tools → Multi-Agent

Each step right increases token costs (~4x to ~15x), latency, and debugging complexity. The question isn’t “can we build an agent?” It’s “should we?”

The Autonomy Test

Before writing any code, answer one question:

Who controls the steps and their order?

Answer	Architecture
You define steps + order in code	Workflow
Model decides what to do next	Agent

That’s it. If you’re writing if/else logic to determine the next action, you’re building a workflow. If the LLM chooses which tool to call and what to do after, you’re building an agent.

The common mistake: confusing tool use with agency. One model calling 10 APIs is one agent with 10 tools, not a multi-agent system. Tools are capabilities. An agent is the decision-maker that chooses tools and determines next steps.

%%{init: {"layout": "dagre"}}%%
flowchart LR
    subgraph Workflow["Workflow (Predetermined)"]
        W1[Step 1] --> W2[Step 2] --> W3[Step 3]
    end

    subgraph Agent["Agent (Dynamic)"]
        A1[LLM] --> |decides| T1[Tool A]
        A1 --> |decides| T2[Tool B]
        A1 --> |decides| T3[Tool C]
        T1 --> A1
        T2 --> A1
        T3 --> A1
    end

Choose a Workflow When…

Start here. Default to workflows. Only move right when forced by constraints.

Workflows win when:

Steps are known and stable (same order most runs)
Predictability matters: easy unit tests per step, clear traces, deterministic gates
Cost/latency matters: fewer thinking loops, fewer tool decisions

Example: A support ticket pipeline where every request goes through classify → route → draft → policy check → send, in the same order every time.

%%{init: {"layout": "dagre"}}%%
flowchart LR
    Input[Ticket] --> Classify --> Route --> Draft --> PolicyCheck[Policy Check] --> Send

No LLM is deciding “should I classify first or draft first?” The order is fixed. The LLM might power individual steps (classifying the ticket, drafting the response), but the orchestration is deterministic.

Why this matters: Workflows are testable. You can write unit tests for each step. You can predict costs. You can audit decisions. When something breaks, you know exactly where.

Choose a Single Agent + Tools When…

Move to an agent when the path can’t be predetermined.

Agents win when:

Steps are interdependent and the path changes based on findings (retries, fallbacks, clarification, partial data)
Tightly coupled tasks where global context matters end-to-end
Limited toolset (typically <10-20 tools)

Example: An agent that writes code, runs it, inspects errors, fixes the code, and retries until tests pass. The number of iterations isn’t known upfront. The specific errors determine the next action.

%%{init: {"layout": "dagre"}}%%
flowchart TB
    Write[Write Code] --> Run[Run Tests]
    Run --> |Pass| Done[Done]
    Run --> |Fail| Inspect[Inspect Error]
    Inspect --> Fix[Fix Code]
    Fix --> Run

The tool limit is real. Past ~20 tools, tool selection degrades. Every tool’s name, description, and schema burns context before the agent even starts your task. I’ve seen agents with 50+ tools that spent more tokens parsing tool definitions than doing actual work.

Go Multi-Agent Only When Forced

Multi-agent is rarely the answer. It’s the architecture of last resort when specific constraints make simpler approaches impossible.

Multi-agent only when:

True parallelism: independent subtasks must run simultaneously for throughput/speed
Context/tool overload: >200K tokens or >20 tools degrades performance; split by domain
Distinct competencies: phases need fundamentally different reasoning modes (e.g., exploratory research vs deterministic writing)
Hard separation: security boundaries, compliance isolation, sensitive data handling

Example: One agent independently researches a topic in depth while another agent focuses solely on writing a structured technical article from the research agent’s output.

%%{init: {"layout": "dagre"}}%%
flowchart TB
    Task[Task] --> Orchestrator
    Orchestrator --> |parallel| Researcher
    Orchestrator --> |parallel| Writer
    Researcher --> |artifact| Orchestrator
    Writer --> |artifact| Orchestrator
    Orchestrator --> Result

The pattern that works: Orchestrator → worker or sequential handoff with explicit artifacts/contracts. Avoid everyone talking to everyone. That path leads to information silos and coordination failures.

When multi-agent doesn’t fit:

Tasks requiring all agents to share the same context
Many dependencies between agents requiring real-time coordination
Most coding tasks (fewer parallelizable subtasks than research)

Core Engineering Rules

Architecture choice matters less than execution. Here’s what makes any architecture reliable:

Thin agent, heavy tools. The agent plans and decides. Tools execute mechanics and enforce deterministic constraints. Business rules belong in tools, not prompts.

# Bad: Rule in prompt
"Never allow refunds over $500 without manager approval"

# Good: Rule in tool
def process_refund(amount: float) -> RefundResult:
    if amount > 500:
        return RefundResult(
            status="pending_approval",
            reason="Amount exceeds $500 threshold"
        )
    # Process refund...

Validation loops are non-negotiable. Generate → validate → fix with actionable feedback. Hard checks first (syntax, schema, constraints), then soft checks (LLM-as-judge for quality).

Observability from day one. Trace prompts, tools, token costs, outputs. Build auto-evals. The difference between a demo and production is knowing when things break.

Human-in-the-loop is a design choice. Place human checkpoints on key artifacts: plans, research notes, drafts, before irreversible actions. Autonomy isn’t binary. The best systems know when to pause.

The Practical Decision Process

Use this in kickoffs. Before writing code:

Clarify scope: Deliverable (prototype vs production vs handoff), demo cadence, documentation, who maintains it.
Map task shape:
- Sequential vs branching (parallel executions)
- Exploratory (flexible) vs deterministic (fixed)
- Single domain vs multiple domains
Pick the minimum architecture: Workflow first, then single agent, multi-agent only for true constraints.
Design tools: Group by domain. Enforce deterministic rules in code. Return structured outputs + actionable errors.
Choose orchestration: Simple loop if stateless. Graph or framework only if you need state persistence, branching execution, or pause/resume for long-running work.
Pick models by step difficulty: Strong models for planning/judgment. Cheaper models for narrow execution/cleanup.
Document decisions: Record what you chose and why. Prevents rework when stakeholders change.

The Bottom Line

The agent hype cycle wants you to build multi-agent systems for everything. The reality is simpler.

Most production AI systems are workflows. They’re predictable, testable, and cost-effective. When you do need agency, start with a single agent and a small toolset. Multi-agent is the architecture of last resort, not the default.

The question isn’t “how do I build an agent?” It’s “what’s the simplest architecture that solves my problem?”

Start left. Move right only when forced.

Building agentic systems? I’d love to hear what architecture you landed on and why. Reach out on LinkedIn.