Structured Handoffs

Seven specialized agents, each with one job. Work flows between them through structured handoffs — never improvised, never blind.

That one sentence is the whole idea. Safety in an agentic workflow does not come from trusting any individual agent to behave — it comes from dividing the work so that shipping requires several independent actors to agree, and a human to give final sign-off. The rest of this page makes that concrete.

The roster

Agent	Role	Hands off to
Product Manager	Strategy, vision, BUILD/DEFER/DECLINE decisions on new ideas.	Product Owner
Product Owner	Backlog management, ticket quality, Definition of Ready enforcement.	Worker
System Architect	Tech-stack decisions, design guidance, pattern library.	Product Owner / Worker
Quality Engineer	BDD test plans before implementation. Shift-left testing.	Worker / Reviewer
Worker	Picks one ticket. Implements. Opens a PR.	Reviewer
PR Reviewer	Reads the diff. Issues GO or NO-GO with reasons.	Human
DevOps Engineer	Deployment, previews, infra changes.	Human

The pattern is deliberate: the agent that decides what is never the agent that decides how, and the agent that writes is never the agent that approves. Authority is split along the seams where mistakes are most expensive — the PM sets direction but does not manage the backlog; the PO shapes the backlog but does not make strategic calls; the Worker writes code but cannot merge; the Reviewer judges but also cannot merge.

Why specialize?

A general-purpose agent is mediocre at everything. A specialized agent with one job, one context, and one set of guardrails is sharp. The cost of specialization is more handoffs — which is why every handoff is structured.

Specialization is also a control. The worker cannot review its own PRs. The reviewer cannot merge. Roles enforce separation of duties at the prompt level, with bot account separation enforcing it at the platform level.

The pipeline


Idea ──▶ PM ──▶ PO ──▶ Architect ─┐
                                  ▼
                         Worker ──▶ Reviewer ──▶ Human ──▶ merged
                            ▲          │
                            └──── QE ──┘

Each arrow is a structured handoff with a documented contract. Read the PR template, the ticket format, and the review template to see how each contract is enforced.

Where humans live in the loop

You sit at exactly three places: defining what to build (with the PM), approving the Ready column (with the PO), and merging PRs (after the Reviewer). Everything in between runs on agents. That is the point of the harness.

What makes it real, not aspirational

A mental model is only reassuring if it is enforced. In Gemba Flow the roster is backed by overlapping platform controls, so that even an agent that misreads its instructions cannot cross the lines above. The layers that matter most to an evaluator:

Branch protection on main. Direct pushes are blocked; every change must arrive through a reviewed pull request with passing checks. This is the hard boundary — if it holds, most other failure modes are contained.
Account separation. The human operator, the worker bot, and the reviewer bot are three distinct accounts with scoped permissions. The worker cannot review its own PRs; the reviewer cannot merge; neither bot has admin rights. A pre-operation hook switches to the correct account automatically, so attribution in the audit trail is always honest.
A merge deny rule. The gh pr merge command is denied at the framework level. Even if an agent is told to merge, the tool call is blocked before it runs. This is the hard enforcement behind “only humans merge” — it does not depend on the agent choosing to comply.
Drift detection in CI. A policy linter runs on every pull request and fails the build if a safety instruction (such as a “NEVER merge” rule) is ever weakened or removed, so the guardrails cannot quietly erode over time.
Local and scheduled checks. A pre-push hook runs lint and tests before code reaches GitHub, and weekly audits scan for any restricted action — a bot merge, a direct push to main — and raise an alert if one slips through.

The design principle is defense in depth: prompt-level rules handle nuance, platform rules handle the hard boundaries, CI catches drift, and audits catch whatever prevention missed. No single layer is trusted to be sufficient on its own. See Layered Controls for the full layered architecture.

Where to go next

Read Honest Limits for where this model intentionally stops — the cases where you should not hand work to the agents.
Ready to try it? The Quickstart takes you from install to your first agent-authored pull request.