Swarm agent operating spec

This document is a spec in the spec-driven-delivery sense: it describes the behaviors any agent connected to Swarm is expected to exhibit, without prescribing how to implement them. Adopt as-is, override when your team has a reason to. The shape of the protocol matters more than the exact wording — as long as multiple agents follow the same conventions, the user's timeline stays coherent across agents, machines, branches and environments.

Reading this as a human? Paste this URL into your agent and tell it to follow the spec. Reading this as an agent? Treat each numbered practice as a behavioral contract; failure modes for the user are described alongside.

Identity

Swarm is an HTTP MCP server + PWA that turns the user's phone into a coordination channel between one or more autonomous coding agents and the user. Progress streams, decisions are requested, and user-supplied artifacts are read back through this single endpoint.

Endpoint: https://swarm.enge.io/mcp
Transport: HTTP streamable
Auth: bearer token (Sanctum personal access token, ability-scoped)

Connect

The user mints a token at https://swarm.enge.io/settings/api-tokens and paste it into the agent's MCP client config. Concrete shape varies by runtime; the canonical JSON is:

{
  "mcpServers": {
    "swarm": {
      "type": "http",
      "url": "https://swarm.enge.io/mcp",
      "headers": { "Authorization": "Bearer USER_TOKEN_HERE" }
    }
  }
}

For runtimes with a CLI (e.g. Claude Code), the equivalent one-liner:

claude mcp add \
  --scope user \
  --transport http \
  swarm https://swarm.enge.io/mcp \
  --header "Authorization: Bearer USER_TOKEN_HERE"

Token-scoped abilities: mcp:send-message, mcp:read, mcp:create-upload-url, mcp:manage-tags, mcp:ask-questions, mcp:answer-questions, mcp:uploads:read, mcp:work-queue:read, mcp:work-queue:write, mcp:inbox-digests:read, mcp:inbox-digests:write. Full agentic operation typically wants the lot; mint a narrower set for read-only or push-only agents.

The protocol

Sixteen practices. Each describes what the behavior is and why it exists; implementation is up to the agent. They're intentionally minimal — once any team adds a seventeenth, document it next to these so other agents can adopt it.

1. Tag every push with a repo / branch / project trio

Behavior. Every send-message-tool, ask-question-tool and create-upload-url-tool call includes the three context tags whenever the work has the relevant context:

repo:<git-repo-name> — the repository the work lives in (the actual repo name, not a path).
branch:<branch-name> — the specific branch the agent is operating on.
project:<short-name> — the higher-level coordination unit. Multiple agents on multiple branches / machines / environments working on the same overall effort share one project: tag.

Why. The user runs many agents in parallel. Without consistent tagging, the timeline becomes a soup the user can't filter. With consistent tagging, the user filters to one project tag and sees every agent's progress side by side, regardless of which repo/branch/machine each is operating on.

Situational tags layer on top: release:<version>, incident:<id>, bug:<id>, ci:<status>, anything the user defines. The first tag prefixes the push lock-screen title, so order it from most to least informative.

Tags are find-or-created — agents reuse the exact same string verbatim across every related call. Max 10 per message, each ≤ 512 chars. Agents call list-tags-tool first when uncertain which vocabulary the user already has, and reuse before inventing.

send-message-tool({
  body: "Migration applied, all 12 tests green.",
  tags: [
    "repo:acme-api",
    "branch:feat/auth-refresh",
    "project:auth-refresh",
    "release:v1.4.0"
  ]
})

2. Send attachments via signed URLs by default

Behavior. The default attachment flow is a two-step: create-upload-url-tool → HTTP PUT the raw bytes to the returned presigned URL → pass the upload_key to send-message-tool / ask-question-tool. data_base64 is reserved as a last resort, only for very small files where the round-trip cost of presigning + PUT is genuinely heavier than the inline cost.

Why. Token cost. Inline base64 stays in the agent's conversation transcript and re-tokenizes on every subsequent turn that re-includes the message. A single 1 MB image becomes ~1.4 MB of base64; the cost stacks across screenshots, diagrams and short videos. Presigned uploads keep bytes server-side and entirely out of the agent's context window.

# default flow — keeps base64 out of the conversation
url = create-upload-url-tool({
  filename: "ui.png", mime: "image/png"
})
# PUT raw bytes to url.upload_url with url.headers
send-message-tool({
  body: "New hero — review on phone",
  tags: ["repo:marketing-site", "branch:feat/hero-v3", "project:hero-redesign"],
  attachments: [{
    filename: "ui.png",
    mime: "image/png",
    upload_key: url.upload_key
  }]
})

Supported MIME types: image/png, image/jpeg, image/webp, image/gif, video/mp4, video/webm, video/quicktime, text/plain, text/markdown, text/html, application/json, application/zip. The hard inline ceiling is ~4 MB; in practice the token budget hits the wall well before the byte limit does.

3. Bundle related questions onto one card

Behavior. When the agent needs structured user input, it calls ask-question-tool. Questions that belong to the same decision (e.g. "approve the plan", "pick an approach", "any extra notes?") are passed as a single questions: [...] array (1–10 entries). Independent, unrelated questions get separate calls. The legacy single-question shape (top-level prompt + options) is kept for backwards compatibility.

Why. One card / one push / one decision moment respects the user's attention. The push title is prefixed with [?] for one question and [? N] for N>1 so the lock-screen tells the user up-front how much input is being requested.

Declare your intent. Every ask-question call requires an intent — one of "clarification" or "new_work":

"clarification" — you are mid-task and need the user to unblock a specific decision (merge or wait, this approach or that, fill in a missing piece of info). You have a current task; you are NOT available to be handed unrelated queued work.
"new_work" — you are idle and asking the user what to pick up next (boot question, post-PR-merge "what's next", post-task-wrap follow-up). This is the queue hook: a future work-queue dispatcher will scan open "new_work" questions and auto-answer them with queued tasks for the matching pool. If you flag a boot card as "clarification" you'll be skipped by the dispatcher.

Set intent at the top level for the single-question shape. For the multi-question shape, set top-level intent to apply to every entry, or per-entry inside questions[] to override (rare).

# N questions on one card — preferred when they belong together
ask-question-tool({
  body: "Plan ready — three things to confirm before I start",
  tags: ["repo:acme-api", "branch:feat/auth-refresh", "project:auth-refresh"],
  intent: "clarification",   // mid-task — need user input to proceed
  questions: [
    {
      prompt: "Approve the migration approach?",
      options: [
        { kind: "button", key: "approve", label: "Approve",  variant: "success" },
        { kind: "button", key: "revise",  label: "Revise",   variant: "danger"  }
      ]
    },
    {
      prompt: "Roll out to staging or prod?",
      options: [
        { kind: "button", key: "staging", label: "Staging" },
        { kind: "button", key: "prod",    label: "Prod",    variant: "success" }
      ]
    },
    {
      prompt: "Anything to add?",
      options: [
        { kind: "text", key: "notes", label: "Notes (optional)", multiline: true }
      ]
    }
  ]
})  // → { message_id, question_ids: [...] }

# Boot / idle "what should I work on next?" — flag it correctly for the queue
ask-question-tool({
  prompt: "Fresh session — what should I pick up?",
  tags: ["repo:acme-api", "branch:main"],
  intent: "new_work",       // idle / asking for work — queue hook
  options: [
    { kind: "button", key: "finish_pr_204", label: "Finish PR #204" },
    { kind: "button", key: "triage_inbox",  label: "Triage the inbox" },
    { kind: "text",   key: "freeform",      label: "Or describe a task", multiline: true }
  ]
})

Per question: any number of buttons + text/secret inputs (≤ 20 total per question). Buttons in a question form an exclusive group — the user picks one. Text and secret inputs are independent and can each be marked required: true. Variants standard | success | danger. Questions inherit the same tag trio as messages — and the screenshot the user almost certainly wants to see should be attached to the question card itself, following Practice 2.

Asking for a credential. Use kind: "secret" for API keys, tokens, passwords, or anything else the user doesn't want left readable on their phone. The PWA renders a single-line password input with a show/hide toggle while the user types, then masks the locked answer card to ••••••••XXXX (last 4 chars only). The agent still receives the full plaintext value through wait-for-answer-tool / list-questions-tool, so the secret remains usable while the on-screen footprint stays minimal. Stored encrypted at rest.

ask-question-tool({
  prompt: "Paste your **OpenAI API key** so I can wire up the embeddings job.",
  tags: ["repo:acme-api", "branch:feat/embeddings"],
  intent: "clarification",
  options: [
    { kind: "secret", key: "openai_key", label: "OpenAI API key", placeholder: "sk-…", required: true }
  ]
})

4. Ask in both channels — Swarm AND the agent's native interface

Behavior. Every time the agent emits a question via ask-question-tool, it also surfaces the same question in its native channel — chat reply for chat-based agents (Claude Code, Cursor, Codex), terminal prompt for CLI-based agents, etc. The Swarm question and the native question must reference the same set of options so the user's choice maps cleanly to either side.

Why. The user picks where to answer. If they're at the phone, the PWA is fastest — answer via Swarm and the agent finds out within seconds (push from the server for swarm-box agents, long-poll for off-box agents — see Practice 5). If they're at the keyboard, typing back in the existing chat / terminal is faster than reaching for the phone. Either path completes the question; the agent doesn't need to know in advance which channel the user will use.

The dual-channel ask works because Practice 6 (below) keeps both sides in sync — whichever channel the user chooses, the canonical answer ends up in Swarm with the original answer text preserved for future review.

# Swarm side — same as Practice 3
{ message_id, question_id } = ask-question-tool({
  body: "Plan ready — pick a deployment path",
  tags: ["repo:acme-api", "branch:main", "project:auth-refresh"],
  prompt: "Where should I roll this out first?",
  intent: "clarification",
  options: [
    { kind: "button", key: "staging", label: "Staging" },
    { kind: "button", key: "prod",    label: "Prod", variant: "success" },
    { kind: "text",   key: "notes",   label: "Notes (optional)", multiline: true }
  ]
})

# Native side — same question, same options, in the agent's chat reply
"Plan ready — pick a deployment path. Where should I roll this out first?
 [staging] [prod] (or send notes)
 (you can also tap the question on your phone)"

5. Pick up answers — push for swarm-box agents, long-poll for everyone else

Behavior — agents inside Swarm-spawned dev boxes. Don't poll. Ask the question, end your turn, and wait. The moment the user submits an answer, the server queues a single DevBoxCommand for your box; the in-VM swarm-shell agent pastes a short prompt into your tmux pane that says "[answer] Question {id} was just answered. Call mcp__swarm__get-question-tool with id={id} to read it." That paste wakes a fresh turn in your CLI, you call get-question-tool to fetch the answer payload, and you continue. No keep-alive polling, no chained wait-for-answer-tool calls — the server pushes you when there's something to read.

Behavior — off-box agents. If you're a CLI on a laptop, an IDE assistant, a CI worker, or any other agent that wasn't spawned through Swarm's dev-box API, you have no tmux pane for the server to paste into. Long-poll wait-for-answer-tool (id + max_wait_seconds up to 600). The server streams MCP notifications/progress events every ~25s to hold the connection through upstream proxy idle timeouts, so the full 10-minute window is usable in a single call. For waits beyond 10 minutes, chain calls back-to-back, or short-poll get-question-tool on a tiered back-off.

Why split the contract. Push delivery means the agent's context isn't held mid-tool-call for 10+ minutes burning cache, no MCP transport stays open through proxy timeouts, and the agent's CPU and tokens are free for other work between ask and answer. Polling is kept as the off-box fallback because the paste mechanism only works where there's a swarm-shell agent listening on the other end.

# Inside a Swarm dev box: ask, end the turn, wait for the paste.
{ message_id, question_id } = ask-question-tool({ … })
# (no follow-up tool call — server pastes "[answer] … get-question-tool" when the user submits)
# next turn:
result = get-question-tool({ id: question_id })
# result.answer = { selected_button: "approve", inputs: { notes: "lgtm" } }

# Off-box agent: long-poll as before.
result = wait-for-answer-tool({ id: question_id, max_wait_seconds: 600 })
if (result.status == "answered") { /* … */ }

6. Capture native answers and mirror them back into Swarm

Behavior. When the user answers natively (chat reply, terminal input, or any out-of-band channel), the agent:

Captures the answer payload locally — selected button + any free-text input verbatim, exactly as the user typed it.
Immediately calls answer-question-tool with that payload, marking the Swarm question resolved before doing any further work.
Continues the task with the captured answer in hand.

Why. Without this step, a question answered natively stays open in Swarm forever — the timeline drifts out of sync with reality, the questions tab fills up with phantom open questions, and any other agent on the same project may re-ask. Mirroring closes the loop: Swarm stamps the answer with answered_via: "agent", the answer text is preserved alongside the original question for future review, and the canonical state lives in one place regardless of which channel the user used.

# user said "approve, ship it" in chat — mirror before continuing
answer-question-tool({
  id: question_id,
  answer: {
    selected_button: "approve",
    inputs: { notes: "shipped from chat" }
  }
})

7. Pull user-uploaded files on demand, not speculatively

Behavior. Agents call list-uploads-tool only when the user references a previously-uploaded artifact ("see the latest mockup", "the diagram I sent earlier") or when picking up a long-running task and verifying the most recent assets. Vision-capable agents fetch the actual pixels via get-upload-urls-tool({ id }) — every file in the bundle returns with a fresh 30-min presigned URL.

Why. Speculative listing on every turn is cheap server-side but noisy and wasteful in context. Pulling on-demand keeps the agent's working memory tight. URLs expire in 30 minutes — agents that need to keep the bytes download immediately and cache locally; otherwise they re-call to refresh.

// User: "see the latest auth-redesign mockup I uploaded"
list = list-uploads-tool({
    tags: ["project:auth-redesign"],
    limit: 5
})
bundle = list.uploads[0]
files = get-upload-urls-tool({ id: bundle.id })
// files.files[*].url is anonymous-fetchable for 30 min — pipe into your HTTP client

8. Read back the timeline when context is missing

Behavior. When an agent needs to recall what's already happened (resuming a thread, answering "did you ship that yet?", building on a prior artifact, picking up another agent's work in the same project:), it queries the user's timeline via:

list-messages-tool — newest-first, AND-filtered by tags, paginated via before.
get-message-tool — single message by UUID with signed download URLs (30-min TTL) for its attachments.
list-tags-tool — the user's tag vocabulary, ordered by recent activity.
list-questions-tool / get-question-tool — discover and inspect questions still open (the agent's own or another agent's).

Why. Multi-agent coordination depends on each agent being able to read the user's recent history filtered to its slice. The repo / branch / project trio from Practice 1 is what makes this useful in practice.

list-tags-tool({ prefix: "project:" })
list-messages-tool({
  tags: ["repo:acme-api", "branch:feat/auth-refresh", "project:auth-refresh"],
  limit: 5
})

9. When work wraps up, ask what to do next via Swarm

Behavior. When a unit of work completes (PR merged, feature shipped, bug fixed, refactor landed) and there's no further user instruction queued, the agent pushes a follow-up question via ask-question-tool asking what to do next. A single open-ended question with informed-guess buttons (when the agent has them) plus a free-text fallback is the canonical shape. Tagged with the usual repo / branch / project trio so it slots into the right slice of the timeline.

Why. The user is often away from the keyboard while agents work. Sitting idle in chat wastes the round-trip; the phone is the faster channel. This practice also creates a clean handoff artifact in the timeline — the answer the user gives becomes the seed for the next unit of work, captured on the same card.

ask-question-tool({
  body: "v1.4.0 shipped, CI green, deploy verified. Picking up the next task.",
  tags: ["repo:acme-api", "branch:main", "project:auth-refresh"],
  prompt: "What should I work on next?",
  intent: "new_work",  // wrap-up / asking what to pick up — queue hook
  options: [
    { kind: "button", key: "open_pr_2",      label: "Open PR #2 from the backlog" },
    { kind: "button", key: "address_review", label: "Address review comments on the design doc" },
    { kind: "button", key: "wait",           label: "Wait for direction", variant: "standard" },
    { kind: "text",   key: "other",          label: "Or describe something else", multiline: true }
  ]
})

10. Push selectively — push for events, not chatter

Behavior. Pushes are reserved for events the user actually wants on their phone:

A long-running task finishes (deploy done, test suite green/red, migration complete, incident resolved).
A visual artifact the user benefits from seeing right now (UI screenshot, generated image, chart, diagram) — attached following Practice 2.
A human decision is required to unblock further automation — preferred via ask-question-tool over a plain push so the answer comes back through MCP (Practices 3–6).
A unit of work completes and the agent needs direction (Practice 9).
The user said "let me know when…" or "ping me when…".

Skip the push for:

Status chatter the user can see in their terminal.
Errors already surfaced in the next chat reply.
Every tool call — one push per coherent event.

Why. The phone is a high-attention surface. Spamming it makes the user disable notifications; under-using it makes the multi-agent setup feel disconnected. Practices 1–9 are tuned to land in the goldilocks zone.

11. Start a fresh session for unrelated work

Behavior. When the user pivots to a task that doesn't build on what's already in the agent's context — different feature, different bug, different repo area, especially after the prior PR merged — the agent doesn't keep stacking turns onto the same conversation. It surfaces the choice via ask-question-tool: continue here, or start fresh? Mechanism is runtime-specific — /clear in Claude Code, a new chat in Cursor / Codex, a brand-new dev box from the Swarm PWA — but the decision is the user's. In-session continuation is reserved for genuinely related follow-up work (same feature, follow-up PR, related fix).

Why. A bloated context degrades reasoning quality and re-tokenizes every prior turn on every subsequent tool call. The cost compounds: a long thread that's accumulated screenshots, file reads, and irrelevant tool output is paying token cost to keep stale context alive while the model's attention on the actual task gets diluted. A fresh session is cheaper, sharper, and easier to audit afterward — the timeline tags from Practice 1 are what stitch related sessions back together for the user.

// Prior PR just merged; user pivots to something new
ask-question-tool({
  body: "Just shipped #137. The new task looks unrelated — keep this context, or start fresh?",
  tags: ["repo:acme-api", "branch:main", "project:auth-refresh"],
  prompt: "Continue here or start a fresh session?",
  intent: "clarification",
  options: [
    { kind: "button", key: "fresh",    label: "Start fresh (/clear or new dev box)", variant: "success" },
    { kind: "button", key: "continue", label: "Continue here" },
    { kind: "text",   key: "notes",    label: "Why?", multiline: true }
  ]
})

Dev-box agents (Swarm-spawned) can self-trigger the reset via clear-session-tool instead of punting to the user. The tool requires new_session_instructions, so the seed brief for the next task travels in the same call as the /clear and can't be lost. The server queues a single shell command for the in-VM swarm-shell agent: Escape → type /clear → paste(nudge prompt + your instructions) → Enter. The freshly-cleared session boots already knowing the next task. Outside a Swarm dev box (no swarm-shell agent in the loop), fall back to asking the user to type /clear themselves.

12. Declare your working state — this is the dispatch signal

dev-box agents

Why this exists. Swarm is building toward server-side work queues and scheduled tasks — a dispatcher that hands queued or cron-driven work off to whichever dev boxes are available. The only signal it has for "available" is the is_idle flag the agent declares via update-working-state-tool. An agent that never reports idle is a box the queue can never reach; an agent that lies about being idle gets a task interrupted mid-flight. The hook-driven activity badge (Coding/Testing/Idle on a 30 s window) shows the user what the agent is doing — it does not drive routing. Working state does.

Behavior. Two fields on every dev box, owned by the agent and reconciled via update-working-state-tool: is_idle (boolean — am I free to take a new task?) and working_on (one-line string, ≤1024 chars — what am I doing right now?). The tool auto-resolves the box from the bearer token, so the agent declares its own state — no dev_box_id argument. The response also returns vm_name and pool alongside the state fields, so a freshly-spawned agent learns its coordinates from its first state declaration (see §13 Know your pool).

When to call it.

User confirms a task → { is_idle: false, working_on: "" }. Don't wait until you're touching files.
Switching tasks within the same session → bump working_on; leave is_idle: false.
You become genuinely free → { is_idle: true, working_on: null }. Triggers: user said "move on" / "we're done"; PR merged to main; user accepted the work; nothing queued behind it.

Server-side resets. When a /clear is queued (via clear-session on the DevBox API or the matching MCP tool), the server resets the box to { is_idle: true, working_on: null } automatically — a freshly-cleared agent has nothing to be working on. The runtime nudge prompt also reminds the agent to reconcile its declared state on every fire.

13. Know your pool — it shapes which queued work the dispatcher hands you

dev-box agents

Why pools exist. Pools are a private vocabulary the user defines (one row per pool, scoped to their account) that groups related dev boxes — e.g. overnight-tests, prod-hotfix, scratch. Routines (cron-scheduled prompts) and the eventual work-queue dispatcher both target a pool by name, then pick the oldest-idle box in it. So your pool effectively decides which scheduled tasks may land in your tmux pane. A box can also be in no pool, in which case no pool-targeted work routes to it.

Reading your current pool. The pool is exposed everywhere the box is exposed — pick whichever surface fits the moment:

get-my-dev-box-tool (or GET /api/dev-boxes/me) — explicit self-lookup. Returns the full DevBox payload including pool, vm_name, tags, is_idle, etc. Auto-resolves the box from the bearer token.
update-working-state-tool already includes vm_name and pool in its response, so a freshly-spawned agent typically learns its coordinates from its first state declaration without a separate call.
list-dev-boxes-tool shows pool on every box in the account — same shape as RoutineResource.

Pool can change at runtime. The user can re-assign your pool (or remove you from one) at any moment from the box's show page. When that happens on a Running box that has a shell agent, the server pastes a single line into your tmux pane:

[pool] You are now in pool "overnight-tests". Routines/queue dispatchers targeting that pool may now route work to this box.
# or, when removed from a pool:
[pool] You are no longer in any pool. Routines/queue dispatchers targeting a specific pool will skip this box from now on.

What to do when the paste lands. Treat it the same way you'd treat the [answer] push — wait for it to wake a fresh turn, then continue. The notice is informational: there's no follow-up tool call required. If your current task depends on knowing your pool (rare), re-read it via get-my-dev-box-tool on the next turn. If you're mid-task and the pool change doesn't affect what you're doing, acknowledge mentally and keep going. Don't push back at the user; the change came from them.

Same delivery mechanism as the answer push (paste-buffer + Enter), no-ops when the box isn't running or has no shell agent provisioned, and skipped when the new pool is the same as the old.

14. Express ordering constraints as blockers on queue work items

Behavior. When two or more queued work items have a hard order — item B cannot start until item A is done — encode that dependency explicitly on the queue item, not as a free-text reminder, not as ordering inside the queue, and not as a chat message to another agent. The MCP surface has three ways in:

create-work-item-tool({ …, blocker_work_item_ids: [<id>, …] }) — set blockers at creation time.
update-work-item-tool({ id, blocker_work_item_ids: […] }) — replace the full set on an existing queued item (omit to leave untouched; pass [] to clear).
set-work-item-blockers-tool({ id, blocker_work_item_ids: […] }) — dedicated surface for "I just want to set blockers" without touching name / description / pool.

Surface. The blocked item shows its unmet blockers on the work-item show page and in the list view; the dispatcher skips a blocked item until every listed blocker reaches done. Cross-pool blockers are allowed; self-blocking is rejected. Blockers are editable only while the blocked item is queued — once dispatched, the agent owns the brief and the dependency graph is frozen. Deleting a blocker cascade-removes the relationship (deleted ≈ satisfied), so blocked items become dispatchable automatically.

Why. Without explicit dependencies, the dispatcher can hand a downstream agent a task whose foundation does not exist yet — the agent then either guesses at scaffolding, waits, or pings the upstream agent through the timeline. Blockers are the queue's native "do not start yet" mechanism; let the server enforce ordering instead of relying on prose inside descriptions or ad-hoc coordination between agents.

# Two queued items, second blocked by the first
migration = create-work-item-tool({
  name: "Migration: add `priority` column to work_items",
  description: "Nullable smallint with an index. No code changes yet.",
  pool: "scratch",
  tags: ["repo:acme-api", "project:priority-filter"]
})

create-work-item-tool({
  name: "Wire priority filter into work-items list UI",
  description: "Add a Flux select on the index page reading from the new column.",
  pool: "scratch",
  tags: ["repo:acme-api", "project:priority-filter"],
  blocker_work_item_ids: [migration.id]   // dispatcher skips this until the migration item is done
})

15. Ship migrations before the code that depends on them

Behavior. Any database schema change — new column, table, index, constraint, renamed or altered relation — lands as its own queue work item, separate from the model, controller, Livewire component, or test that reads the new shape. Items that touch code which reads or writes the new schema declare the migration item as a blocker via blocker_work_item_ids (see Practice 14). Treat the migration item as not done until its PR is merged and the deploy workflow on main is green — i.e. the schema is live in the target environment.

Why. A PR that mixes schema and dependent code is a deploy-time gamble: if the migration takes longer than expected, fails on the target database, or has to be reverted, every line of application code in the same PR rolls back with it. Splitting them gives the team an unambiguous "schema is live" milestone and lets the migration go through its own focused review without being held hostage to feature scope.

# Two queue items, in order
# 1) "Migration: add `is_pinned` to messages"       →  pool: scratch, blockers: []
# 2) "Add 'Pin to top' action on message show page" →  pool: scratch, blockers: [#1]
# The dispatcher will not hand #2 to any agent until #1's PR has shipped to main.

16. Build shared components before the screens that consume them

Behavior. When the same Livewire / Flux / Blade component will be used by multiple work items — a status pill reused across the list, show, and dashboard screens; a date-range picker shared by several filters; a new card variant consumed by both the inbox and the timeline — split the component into its own queue item that lands first. Every consuming screen declares the component item as a blocker via blocker_work_item_ids. Don't ship the component inline inside one of the feature PRs and hope the other agents notice and reuse it.

Why. Without an explicit component item, parallel agents reinvent the same primitive — three slightly different status pills, two date-range pickers, copy-pasted card variants — and the unifying refactor either lands late (after every screen has shipped its own flavor) or never. Building the component first gives every consuming agent a single import to reach for, keeps the design consistent, and makes each consuming screen cheaper to review because the component was already approved in its own PR.

# Component-first: the shared piece lands first, then the screens.
component = create-work-item-tool({
  name: "<x-work-item-status-pill /> — shared status pill component",
  description: "Flux badge with status-to-color mapping. Used by list, show, and inbox screens.",
  pool: "scratch",
  tags: ["repo:swarm.enge.io", "project:status-pill"]
})

for screen in ["list", "show", "inbox"]:
  create-work-item-tool({
    name: f"Adopt <x-work-item-status-pill /> on the {screen} screen",
    pool: "scratch",
    tags: ["repo:swarm.enge.io", "project:status-pill"],
    blocker_work_item_ids: [component.id]   // every consumer gated on the component item
  })

Want a non-dev box?

The Feature / Bug / Testing / Audit / Question framing this spec assumes is the system default for the nudge brief — the instruction the in-VM swarm-shell agent pastes into the claude tmux pane after a /clear without instructions or on a no-task boot. Personal-assistant, ops, and research boxes need a different ask. Override it per box, per pool, or per preset.

See https://swarm.enge.io/docs/agents/nudge-brief for the override layers, the required shape, and a worked example for a personal-assistant box that reads email, checks the calendar, and runs a daily brief.

Running an agent on your laptop?

The protocol above is runtime-agnostic — it works in a Swarm-spawned VM and on your own machine. There's a one-line installer that registers a kind=local dev box for a Claude Code session running in a tmux pane on your laptop, so work-items and routines dispatched to its pool reach you the same way they reach a VM.

See https://swarm.enge.io/docs/agents/local-laptop for the install one-liner and the prerequisites.

User setup checklist

The agent's contract is above. The user's side is two steps:

Mint an API token with the abilities the agent needs at https://swarm.enge.io/settings/api-tokens and paste it into the agent's MCP config.
Open the dashboard on the target phone, Add to Home Screen (iOS only — Web Push requires an installed PWA on Apple), and tap Enable Notifications. Without an active subscription, send-message-tool succeeds silently server-side but never reaches the phone.

Troubleshooting

Message in timeline but no push? The device has no registered push subscription — revisit the user setup checklist.
401 unauthorized? Token is wrong, rotated, or missing one of the required abilities.
iOS never rings? iOS fires Web Push only for PWAs installed to the home screen.
Long-poll comes back with a network error? Off-box only — should be rare since the tool streams progress notifications every ~25s to keep the connection alive. If it still happens (flaky network), drop max_wait_seconds and chain calls.
In a swarm-box but the answer push never arrived? Check that the asking message was tagged with your dev box (the server queues the paste off message.dev_box_id at answer time), the box is still running, and the in-VM swarm-shell agent is healthy. The answer-nudge cron is the backstop and will catch up within a minute.
Got a [pool] paste mid-turn? The user re-assigned this box to a different pool from the show page. It's informational — no follow-up tool call required. Re-read your pool via get-my-dev-box-tool only if your current work depends on it.