Stop Coding Agents. Structure Them.

What I learned after building AI workflows both ways — and why I stopped writing framework code.

I've built AI agents with LangChain. I've wired up pipelines with the Anthropic SDK. I've written the Python glue code, the orchestration classes, the retry logic, the prompt chaining. I know what it feels like to ship something with those tools and feel productive.

Then I built plan-flow — a structured workflow system for AI-assisted development — and I didn't use any of them. No LangChain. No Semantic Kernel. No CrewAI. No agent SDK. Just markdown files, directories, and a thin CLI that copies files around.

And it works better than anything I built with frameworks.

Look, maybe I'm just some guy who doesn't know enough about the "right" way to build AI systems. Maybe the experts would look at what I'm doing and laugh. But after living in both worlds, something clicked for me that I can't unsee — and in my head, it makes total sense.

I've Been on the Treadmill

Here's the cycle I kept repeating with frameworks:

Write orchestration code. Ship it. A model update drops. Half the framework logic becomes redundant — the chain-of-thought wrapper I spent three weeks on? The new model does it natively. The retry logic? Built in now. The routing layer? Replaced by a system prompt tweak.

So I'd rewrite. Ship again. Another update. Rewrite again.

Every time, the same thought: "AI is just moving too fast."

But after building plan-flow, I realized that wasn't true. The problem wasn't the speed of AI. It was where I was building.

The Layer Problem

When I stepped back and looked at what I was actually doing with frameworks, I saw this:

┌───────────────────────────────────────────────────────┐
│                  THE AI STACK                          │
│                                                       │
│  Layer 4: Orchestration Frameworks                    │  ← I was here.
│           (LangChain, CrewAI, Semantic Kernel)        │     This keeps breaking.
│  ──────────────────────────────────────────────────── │
│  Layer 3: Provider APIs                               │  ← Shifts every quarter
│           (Claude API, OpenAI API, Gemini API)        │
│  ──────────────────────────────────────────────────── │
│  Layer 2: Foundation Models                           │  ← Evolves constantly
│           (Claude, GPT, Gemini, Llama)                │
│  ──────────────────────────────────────────────────── │
│  Layer 1: The File System                             │  ← Hasn't changed since
│           (directories, files, plain text)            │     the 1970s. Won't.
└───────────────────────────────────────────────────────┘

I was building at Layer 4 — the most volatile surface in the entire stack. Every model improvement erodes it from below. Every new provider feature makes a chunk of your code pointless.

With plan-flow, I accidentally dropped to Layer 1. And suddenly, updates stopped being a threat.

What Plan-Flow Taught Me

When I built plan-flow without frameworks, I had to figure out how to give AI the right instructions, the right tools, and the right context — without code. What I ended up with was embarrassingly simple:

Directories and files. That's it.

Here's the actual structure:

.claude/
├── commands/                      # Entry points — one .md file per workflow
│   ├── discovery-plan.md          # "How to gather requirements"
│   ├── create-plan.md             # "How to create an implementation plan"
│   ├── execute-plan.md            # "How to execute a plan phase by phase"
│   ├── review-code.md             # "How to review code changes"
│   └── ...                        # 13 commands total
│
├── rules/core/                    # Behavioral constraints — always loaded
│   ├── allowed-patterns.md        # "What you should do"
│   └── forbidden-patterns.md      # "What you must never do"
│
├── resources/                     # On-demand reference material
│   ├── skills/                    # Detailed step-by-step workflows
│   ├── patterns/                  # Templates and examples
│   └── tools/                     # Tool descriptions (MCP, testing, etc.)
│
flow/                              # Runtime — grows as you work
├── discovery/                     # Output from discovery workflows
├── plans/                         # Output from planning workflows
├── brain/                         # Knowledge vault (Obsidian-compatible)
│   ├── features/                  # What was built and why
│   └── errors/                    # Reusable error patterns
├── memory.md                      # What was completed
└── tasklist.md                    # What's in progress

No orchestration classes. No prompt chain code. No state management library. The AI reads markdown files from folders, uses MCP servers as tools, and writes its output back to the file system.

Every single workflow in plan-flow — discovery, planning, execution, code review, testing, knowledge capture — is just a folder with files inside it.

The Pattern I Can't Unsee

After building this, I went back and looked at every agent framework I'd used before. And I realized: they all decompose into the same thing.

Every agent needs three things to not be useless:

                    ┌───────────────────┐
                    │  USEFUL AI        │
                    │  = Right Routing  │
                    └─────────┬─────────┘
                              │
            ┌─────────────────┼─────────────────┐
            ▼                 ▼                 ▼
   ┌──────────────┐  ┌──────────────┐  ┌──────────────┐
   │ INSTRUCTIONS │  │ CAPABILITIES │  │   CONTEXT    │
   │              │  │              │  │              │
   │  What to do  │  │  What it     │  │  What it     │
   │  and how to  │  │  can call    │  │  should know │
   │  behave      │  │              │  │              │
   │              │  │  • APIs      │  │  • Schemas   │
   │  • Prompts   │  │  • MCP       │  │  • Examples  │
   │  • Rules     │  │    servers   │  │  • Domain    │
   │  • Guardrails│  │  • CLI tools │  │    knowledge │
   └──────────────┘  └──────────────┘  └──────────────┘
            │                 │                 │
            └─────────────────┼─────────────────┘
                              ▼
                    ┌───────────────────┐
                    │  = A directory    │
                    │    with files.    │
                    └───────────────────┘

That's not a metaphor. In plan-flow, that's literally what each workflow is. A folder with a prompt file, some tool configs, and relevant data files.

The LangChain agent I built last year with 400 lines of Python? It was doing the same thing — just wrapped in classes and decorators that added complexity without adding value.

What Coding Agents Already Do

Here's the part that really changed my perspective: coding agents already work this way natively.

Claude Code, Cursor, GitHub Copilot — they navigate your file system, read instructions from markdown, call tools via MCP servers, and spawn sub-processes for parallel work.

What I used to code in frameworks	What the agent does on its own
Prompt template engine	Reads a `.md` file
Tool registry	Connects to MCP servers
Context injection	Reads files in a directory
Sub-agent orchestration	Spawns child processes
State management	Writes files to disk
Workflow routing	Navigates the directory tree

Plan-flow's CLI is about 500 lines of TypeScript. And all it does is copy files to the right places and manage a background daemon. Zero business logic. Zero orchestration. The AI handles all of that by reading the file tree.

Why This Survives Updates

This is what really sold me. With frameworks, every model update is destructive. With directories, every model update is additive.

FRAMEWORK WORLD                    DIRECTORY WORLD
───────────────                    ───────────────

  ┌─────────┐   Model             ┌─────────┐   Model
  │  Your   │   Update            │  Your   │   Update
  │  Code   │ ─────────►  BROKE   │  Tree   │ ─────────►  SIMPLER
  └─────────┘                     └─────────┘
  Rewrite everything.             Delete a folder. Simplify a prompt.

When a provider ships a new feature that covers part of your workflow, you don't rewrite integration classes. You delete a subdirectory and maybe add a line to an existing prompt file.

BEFORE                                AFTER
──────                                ─────

onboarding/                           onboarding/
├── collect-user-info/                ├── collect-user-info/
│   ├── prompt.md                     │   ├── prompt.md
│   ├── capabilities/                 │   ├── capabilities/
│   └── context/                      │   │   └── provider-onboarding-v2.json  ← absorbed
├── verify-identity/    ───┐          │   └── context/
│   ├── prompt.md          │          └── send-welcome/
│   ├── capabilities/      ├── gone       ├── prompt.md
│   └── context/           │              └── ...
├── send-welcome/       ───┘
│   └── ...

I've seen this happen in plan-flow already. When models got better at following complex instructions, some of my multi-step skill files got shorter. When Claude Code shipped better tool support, I removed workaround prompts. The structure absorbed the improvements instead of fighting them.

The Two Timelines

Looking back at my own journey:

My Framework Projects:

  Code ───► Ship ───► Update Drops ───► Rewrite ───► Ship ───► Update ───► Rewrite
  8 weeks    1 week        ↓               6 weeks     1 week       ↓          6 weeks
                       "We need to                              "Again??"
                        refactor"


Plan-Flow:

  Structure ───► Ship ───► Update Drops ───► Simplify ───► Ship ───► Update ───► Simplify
    3 days        1 week         ↓               2 hours     1 week       ↓          2 hours
                             "Nice, we can                             "Even less
                              remove that"                              to maintain"

One is a treadmill. The other only moves forward.

Am I Crazy?

Honestly, I don't know. Maybe there's something I'm missing. Maybe there's a class of problems where you genuinely need a framework with custom orchestration logic in Python or C#. I'm not claiming to have all the answers.

But here's what I do know:

•I built AI workflows both ways.
•The directory-based approach is simpler, faster to build, and doesn't break on updates.
•Plan-flow handles discovery, planning, execution, code review, testing, and knowledge capture — all through markdown files in folders.
•The only code I wrote is a file copier and a daemon. Everything else is plain text.

If I'm wrong, I'm wrong in a way that still ships working software with minimal maintenance. I can live with that.

What I'd Tell You to Try

•
Pick one workflow you're building with a framework. Something small — a support classifier, a code reviewer, a data pipeline.
•
Decompose it into folders. One folder per step. Inside each: a prompt file, tool configs, context files.
•
Point a coding agent at it. Claude Code, Cursor, whatever. Tell it to follow the instructions in the directory.
•
See what happens. You might be surprised how little code you actually needed.
•
Wait for the next model update. Notice how your directory structure survives it.

The Point

"You're not falling behind because AI moves too fast. You're falling behind because you're building at a layer that's designed to be replaced."

The file system is the oldest, most stable, most universal abstraction in computing. It has survived mainframes, personal computers, the internet, mobile, cloud, and now AI.

I didn't set out to prove this. I just built something without frameworks and noticed the pattern after the fact. Maybe that makes it more credible. Maybe it makes it less. Either way, plan-flow works, it doesn't break on updates, and the entire "orchestration layer" is a folder structure anyone can read.

That's enough for me.

Build less. Structure more.