From Fork to Framework

From Fork to Framework, Part 7: How Claude Code and DevFlow Made This Possible

Engineering March 24, 2026 Justin Donnaruma 11 min read

One engineer, six months, five iterations, 1,400+ commits. Here is how AI-assisted development with structured tooling made that timeline real.

Abstract illustration of a hexagonal framework core connecting to multiple product modules

The Numbers That Need Explaining

If you have read the first six parts of this series, you may have noticed something unusual about the timeline. A single engineer, working from October 2025 through March 2026, shipped five distinct iterations of a multi-tenant AI platform, an LLM security gateway, a team communications product, a shared platform framework, and a cross-platform Flutter client. The commit counts tell the story: 522 on the LibreChat fork, 242 on the Python rewrite, 286 on the Rails iteration, 118 on the Flutter client in four days, 50 on Eden Circle in eight days, and 19 on the Eden Libs framework in three days. Over 1,400 commits across five technology stacks and a dozen repositories.

This is not a story about working nights and weekends. It is a story about a development methodology that treats AI code generation as a first-class engineering discipline rather than an autocomplete feature. The methodology is called DevFlow, and the AI is Claude Code. This post explains how they work together, why the combination produces results that look implausible on a traditional timeline, and what the honest limitations are.

Claude Code as an Engineering Partner

Claude Code is Anthropic’s CLI tool for working with Claude directly in the terminal. It reads your codebase, understands your architecture, executes shell commands, writes and edits files, and runs tests. It operates with the full context of your repository rather than receiving pasted snippets through a chat window.

The difference between using Claude through a chat interface and using Claude Code in your terminal is the difference between describing a problem to a consultant over the phone and having that consultant sit at your desk with your editor open. Claude Code reads the files it needs to modify, understands the patterns already established in the codebase, runs the tests after making changes, and adjusts when tests fail. It does not need you to copy-paste context. It navigates the project the way an engineer would.

This matters for velocity because the bottleneck in AI-assisted development is rarely the generation speed. It is the context-setting overhead: explaining the architecture, providing relevant files, correcting misunderstandings about existing patterns. Claude Code eliminates most of that overhead because it reads the codebase directly. When I asked Claude Code to port the Rails ChatService to Go, it read the Rails service, the database schema, the existing Go patterns in the project, and produced a port that matched the established conventions without being told what those conventions were.

The Context Problem

Raw Claude Code, without any additional structure, works well for tasks that complete within a single session. Write a function, fix a bug, add a test. The challenge appears when the work spans hours or days. Claude’s context window is finite. As the conversation grows, earlier context is compressed or dropped. Quality degrades. Instructions from the beginning of the session lose influence over decisions at the end. The AI begins to forget the architectural decisions you established, the patterns you agreed on, and the constraints you specified.

This is not a Claude limitation. It is a fundamental property of finite context windows. Any engineer who has spent four hours in a Claude Code session has experienced the moment where the AI suggests a pattern it would have rejected an hour earlier, because the reasoning that established the rejected pattern has been compressed out of the active context.

We call this context rot, and solving it is the central design goal of DevFlow.

DevFlow: Structured Development for AI Agents

DevFlow is a meta-prompting and spec-driven development system that we built for Claude Code. It is an npm package that installs skills, agents, templates, and hooks into the Claude Code environment. The core insight is simple: if you structure your development process into discrete, well-defined units of work, each unit can execute in a fresh context window with full quality, and the artifacts it produces become the context for the next unit.

The system works through four phases that repeat for each objective in a project:

graph LR DISCUSS["Discuss"] --> PLAN["Plan"] PLAN --> EXECUTE["Execute"] EXECUTE --> VERIFY["Verify"] VERIFY -->|"Issues found"| PLAN VERIFY -->|"Approved"| NEXT["Next Objective"]

Phase 1: Discuss

Before any planning begins, DevFlow captures implementation preferences through structured questions. For a visual feature, it asks about layout, density, interactions, and empty states. For an API, it asks about response formats, error handling, and authentication. The output is a Context document that feeds into planning, ensuring the AI builds what you actually want rather than what it assumes you want.

This phase typically takes ten minutes and prevents hours of revision later. When I discussed the Flutter client before planning, the questions surfaced decisions about navigation patterns, state management, and brand theming that would have otherwise emerged as corrections during implementation.

Phase 2: Plan

DevFlow spawns four parallel research agents that investigate the technology stack, feature requirements, architectural patterns, and potential pitfalls. A synthesizer aggregates their findings into a Research document. Then a planner agent reads the project vision, requirements, user context, and research, and produces Technical Requirements Documents.

A TRD is the atomic unit of execution. It contains structured XML tasks with explicit instructions, verification commands, and completion criteria. Each TRD is sized to complete within approximately fifty percent of Claude’s context window, which means the executing agent has room for the codebase context it needs without running into quality degradation.

This sizing constraint is the key architectural decision. A TRD that tries to do too much will push the executing agent past the point where context rot begins. A TRD that does too little creates unnecessary coordination overhead. The sweet spot is two to three tasks per TRD, each with clear verification criteria.

Phase 3: Execute

DevFlow analyzes TRD dependencies and groups them into waves. Independent TRDs execute in parallel, each in a fresh context window. Dependent TRDs wait for their prerequisites to complete.

When the AODex Flutter client was planned with eight objectives, the execution looked like this: foundation TRDs (authentication, app shell) ran first because everything depended on them. Once the foundation was verified, the chat, personas, projects, teams, knowledge, and notifications objectives could run in parallel because they were independent features. The final objective, which wired everything together, ran last.

Each task within a TRD produces an atomic git commit immediately upon completion. The commit message follows a conventional format: feat(08-02): create user registration endpoint. This granularity means any task can be individually reverted, and git bisect can identify the exact task that introduced a regression.

Phase 4: Verify

After execution, a verifier agent performs goal-backward analysis. It does not check whether tasks were completed. It checks whether the objective’s goals were actually achieved. A task “create chat component” can be marked complete when the component is a placeholder. The task is done, but the goal “working chat interface” is not. The verifier checks that files exist, contain real implementation rather than stubs, are wired to the rest of the system, and function when invoked.

If verification fails, DevFlow generates fix TRDs and re-executes them, up to two cycles. If the fix cycle fails, it stops for human input rather than looping indefinitely.

How This Played Out in Practice

The Go Migration: 10 Objectives in One Day

The AODex Go migration documented in Part 4 was the most structured DevFlow execution in the project. On March 21, I ran /df:plan-objective for all ten migration objectives. DevFlow produced TRDs covering the Go module bootstrap, database layer with sqlc, authentication middleware, authorization policies, REST API handlers, real-time WebSocket infrastructure, service layer, background job workers, email and telemetry, and integration testing.

The execution ran in waves. Objective 1 (foundation) executed first. Objectives 2 and 3 (authentication and authorization) ran in parallel in wave two, since neither depended on the other. Objectives 4, 5, and 6 (REST API, extended API, and real-time) ran in wave three. The service layer, background jobs, and infrastructure objectives followed.

By the end of March 21, the aodex repository contained 166 commits covering the complete Go backend, with every Rails feature ported: authentication, OAuth, two-factor, rate limiting, sessions, API keys, policies, conversations, personas, projects, teams, knowledge, memories, notifications, settings, billing, WebSocket hub, SSE streaming, River workers, email delivery, and OpenTelemetry instrumentation.

Each commit was individually meaningful. Each had been verified against the TRD’s acceptance criteria. The git history reads like a structured migration checklist because that is exactly what it was.

Flutter in Four Days: 118 Commits

The AODex Flutter client went from project initialization to feature-complete in four days because DevFlow’s planning phase had already decomposed the work into parallelizable objectives. The foundation objective (scaffold, auth, routing) ran first. Then six feature objectives ran in parallel waves: chat with streaming, persona browsing, project management, team collaboration, knowledge and memories, and notifications.

Each objective’s TRDs were sized for the executing agent to complete in a single session without context degradation. The chat streaming TRD, for example, contained three tasks: create the domain models and repository, implement the ActionCable streaming service, and build the conversation list and chat screen widgets. The executing agent loaded the TRD, the existing Flutter project structure, and the eden-ui-flutter component library, and had enough context remaining to produce quality code.

The parallel execution meant that while one agent was building the chat interface, another was building the team management screens, and a third was building the notification system. They did not interfere with each other because DevFlow’s dependency analysis had correctly identified them as independent.

Eden Circle: Proof at Scale

Eden Circle, the team communications platform described in Part 6, was the cleanest DevFlow execution. Ten objectives, fifty commits, eight days. The eden-libs platform handled authentication, RBAC, and multi-tenancy. DevFlow planned only the domain-specific work: messaging, presence, conferencing, timeline, notifications, AI features, deployment, and platform integration.

The planning phase was faster because the Research agents identified eden-libs as the platform layer and scoped the TRDs to domain code only. The execution phase was faster because the platform packages eliminated boilerplate. The verification phase was faster because the interceptor-based middleware meant auth and authorization worked automatically for every new endpoint.

The Twelve Agents

DevFlow orchestrates twelve specialized agents, each optimized for a specific type of work:

Agent	Purpose	When It Runs
Planner	Architecture decisions, task decomposition	Planning phase
Executor	Implement tasks, atomic commits	Execution phase
Verifier	Goal-backward verification	After execution
Job Checker	Validate plans achieve goals	Before execution
4x Researchers	Stack, features, architecture, pitfalls	Planning phase (parallel)
Research Synthesizer	Aggregate findings	After research
Debugger	Systematic failure diagnosis	When tasks fail
Codebase Mapper	Analyze existing code	Brownfield projects
Integration Checker	Cross-module wiring verification	After multi-objective execution

The key insight is that each agent runs in a fresh context window. The planner does not carry the executor’s context. The verifier does not carry the planner’s context. Each agent receives only the artifacts it needs: the planner reads the project vision, requirements, and research. The executor reads the TRD and relevant source files. The verifier reads the objective goals and the actual codebase.

This segregation is what prevents context rot across multi-hour sessions. The orchestrator stays lightweight, routing between agents. The agents do the heavy lifting in fresh windows.

What DevFlow Does Not Do

Intellectual honesty requires naming the limitations.

DevFlow does not replace architectural judgment. It structures the execution of decisions you have already made. The system will faithfully implement a bad architecture as efficiently as a good one. Every iteration described in this series involved human judgment about when to stop and change direction. DevFlow made the pivots faster, but it did not make the pivots for us.

DevFlow does not eliminate debugging. It reduces the frequency of bugs through structured verification and test requirements, but complex integration issues still require human reasoning. The /df:debug command helps by maintaining persistent state across debug sessions, but the diagnosis itself requires engineering skill.

DevFlow does not make Claude Code infallible. AI-generated code still needs review. The difference is that DevFlow’s atomic commit structure makes review tractable: each commit is a single, well-defined change rather than a thousand-line diff at the end of a session. The verification phase catches many issues, but code review remains essential.

DevFlow does not scale linearly. The returns diminish as objectives grow more interconnected. Independent features parallelize well. Features with deep interdependencies require sequential execution and careful integration testing. The Eden Circle execution was fast partly because its domain was self-contained. A project with pervasive cross-cutting concerns would see less parallelization benefit.

The Honest Multiplier

I estimate that the Claude Code plus DevFlow combination provides a three to five times velocity multiplier over solo development without AI assistance, for the types of work documented in this series. That multiplier is highest for well-understood patterns (CRUD endpoints, authentication flows, UI scaffolding) and lowest for novel architecture and complex debugging.

The multiplier is not uniform across the development lifecycle. Planning is perhaps one and a half times faster: the research agents surface useful context, but the human still makes the decisions. Execution is five to ten times faster for implementation work that follows established patterns. Verification is roughly equivalent to manual testing, with the advantage that the verifier checks systematically rather than relying on the developer to remember every acceptance criterion. Debugging is perhaps two times faster, with the debugger agent maintaining hypothesis state across sessions.

The aggregate effect is that a single engineer with Claude Code and DevFlow can sustain the output velocity of a small team. Not by working harder, but by eliminating the coordination overhead, context-switching cost, and ramp-up time that small teams spend on alignment. The AI does not attend standups. It does not lose context between sprints. It does not need to be onboarded to a codebase it read yesterday.

The Development Model

What emerged from this project is a development model where the human provides architectural vision, makes technology choices, reviews output, and handles the genuinely novel problems. The AI handles implementation velocity, pattern replication, boilerplate generation, and systematic verification. DevFlow provides the structure that keeps both parties productive across hours and days of sustained execution.

This is not pair programming. The AI is not suggesting alternatives while you type. It is executing a structured plan, task by task, with verification at each step and atomic commits that make the work reviewable and reversible. The human’s job shifts from writing code to reviewing code, from implementing features to validating features, from typing to thinking.

The five iterations documented in this series were not five attempts to get it right. They were five deliberate explorations, each building on the last, each executed at a pace that made exploration affordable. When a technology choice did not work, we could afford to try another because the cost of trying was days, not months. Claude Code provided the implementation speed. DevFlow provided the structure that kept that speed productive. The combination made iteration cheap enough that we could find the right answer by building all the wrong ones first.

That is the real lesson. AI-assisted development does not make you faster at building the first thing you think of. It makes you fast enough to build several things and choose the best one. The journey from LibreChat fork to Go and Flutter framework was not despite the iterations. It was because we could afford them.

← Back to Blog