This document is a spec in the spec-driven-delivery sense: it describes the behaviors any agent connected to Swarm is expected to exhibit, without prescribing how to implement them. Adopt as-is, override when your team has a reason to. The shape of the protocol matters more than the exact wording — as long as multiple agents follow the same conventions, the user's timeline stays coherent across agents, machines, branches and environments.
Reading this as a human? Paste this URL into your agent and tell it to follow the spec. Reading this as an agent? Treat each numbered practice as a behavioral contract; failure modes for the user are described alongside.
Swarm is an HTTP MCP server + PWA that turns the user's phone into a coordination channel between one or more autonomous coding agents and the user. Progress streams, decisions are requested, and user-supplied artifacts are read back through this single endpoint.
- Endpoint:
https://swarm.enge.io/mcp - Transport: HTTP streamable
- Auth: bearer token (Sanctum personal access token, ability-scoped)
The user mints a token at https://swarm.enge.io/settings/api-tokens and paste it into the agent's MCP client config. Concrete shape varies by runtime; the canonical JSON is:
{
"mcpServers": {
"swarm": {
"type": "http",
"url": "https://swarm.enge.io/mcp",
"headers": { "Authorization": "Bearer USER_TOKEN_HERE" }
}
}
}
For runtimes with a CLI (e.g. Claude Code), the equivalent one-liner:
claude mcp add \
--scope user \
--transport http \
swarm https://swarm.enge.io/mcp \
--header "Authorization: Bearer USER_TOKEN_HERE"
Token-scoped abilities: mcp:send-message,
mcp:read,
mcp:create-upload-url,
mcp:manage-tags,
mcp:ask-questions,
mcp:answer-questions,
mcp:uploads:read,
mcp:work-queue:read,
mcp:work-queue:write,
mcp:inbox-digests:read,
mcp:inbox-digests:write.
Full agentic operation typically wants the lot; mint a narrower set for read-only or push-only agents.
Sixteen practices. Each describes what the behavior is and why it exists; implementation is up to the agent. They're intentionally minimal — once any team adds a seventeenth, document it next to these so other agents can adopt it.
repo / branch / project trio
Behavior. Every send-message-tool,
ask-question-tool and
create-upload-url-tool call
includes the three context tags whenever the work has the relevant context:
repo:<git-repo-name>— the repository the work lives in (the actual repo name, not a path).branch:<branch-name>— the specific branch the agent is operating on.project:<short-name>— the higher-level coordination unit. Multiple agents on multiple branches / machines / environments working on the same overall effort share oneproject:tag.
Why. The user runs many agents in parallel. Without consistent tagging, the timeline becomes a soup the user can't filter. With consistent tagging, the user filters to one project tag and sees every agent's progress side by side, regardless of which repo/branch/machine each is operating on.
Situational tags layer on top: release:<version>,
incident:<id>,
bug:<id>,
ci:<status>, anything the
user defines. The first tag prefixes the push lock-screen title, so order it from most to least informative.
Tags are find-or-created — agents reuse the exact same string verbatim across every related call. Max 10
per message, each ≤ 512 chars. Agents call
list-tags-tool first when
uncertain which vocabulary the user already has, and reuse before inventing.
send-message-tool({
body: "Migration applied, all 12 tests green.",
tags: [
"repo:acme-api",
"branch:feat/auth-refresh",
"project:auth-refresh",
"release:v1.4.0"
]
})
Behavior. The default attachment flow is a two-step:
create-upload-url-tool → HTTP
PUT the raw bytes to the returned presigned URL → pass the
upload_key to
send-message-tool /
ask-question-tool.
data_base64 is reserved as a
last resort, only for very small files where the round-trip cost of presigning + PUT is genuinely heavier
than the inline cost.
Why. Token cost. Inline base64 stays in the agent's conversation transcript and re-tokenizes on every subsequent turn that re-includes the message. A single 1 MB image becomes ~1.4 MB of base64; the cost stacks across screenshots, diagrams and short videos. Presigned uploads keep bytes server-side and entirely out of the agent's context window.
# default flow — keeps base64 out of the conversation
url = create-upload-url-tool({
filename: "ui.png", mime: "image/png"
})
# PUT raw bytes to url.upload_url with url.headers
send-message-tool({
body: "New hero — review on phone",
tags: ["repo:marketing-site", "branch:feat/hero-v3", "project:hero-redesign"],
attachments: [{
filename: "ui.png",
mime: "image/png",
upload_key: url.upload_key
}]
})
Supported MIME types: image/png, image/jpeg, image/webp, image/gif, video/mp4, video/webm, video/quicktime, text/plain, text/markdown, text/html, application/json, application/zip. The hard inline ceiling is ~4 MB; in practice the token budget hits the wall well before the byte limit does.
Behavior. When the agent needs structured user input, it calls
ask-question-tool. Questions
that belong to the same decision (e.g. "approve the plan", "pick an approach", "any extra notes?") are
passed as a single questions: [...]
array (1–10 entries). Independent, unrelated questions get separate calls. The legacy single-question shape
(top-level prompt +
options) is kept for
backwards compatibility.
Why. One card / one push / one decision moment respects the user's attention. The push
title is prefixed with [?] for
one question and [? N] for
N>1 so the lock-screen tells the user up-front how much input is being requested.
Declare your intent. Every ask-question call requires an
intent — one of
"clarification" or
"new_work":
-
"clarification"— you are mid-task and need the user to unblock a specific decision (merge or wait, this approach or that, fill in a missing piece of info). You have a current task; you are NOT available to be handed unrelated queued work. -
"new_work"— you are idle and asking the user what to pick up next (boot question, post-PR-merge "what's next", post-task-wrap follow-up). This is the queue hook: a future work-queue dispatcher will scan open"new_work"questions and auto-answer them with queued tasks for the matching pool. If you flag a boot card as"clarification"you'll be skipped by the dispatcher.
Set intent at the top level
for the single-question shape. For the multi-question shape, set top-level
intent to apply to every
entry, or per-entry inside questions[]
to override (rare).
# N questions on one card — preferred when they belong together
ask-question-tool({
body: "Plan ready — three things to confirm before I start",
tags: ["repo:acme-api", "branch:feat/auth-refresh", "project:auth-refresh"],
intent: "clarification", // mid-task — need user input to proceed
questions: [
{
prompt: "Approve the migration approach?",
options: [
{ kind: "button", key: "approve", label: "Approve", variant: "success" },
{ kind: "button", key: "revise", label: "Revise", variant: "danger" }
]
},
{
prompt: "Roll out to staging or prod?",
options: [
{ kind: "button", key: "staging", label: "Staging" },
{ kind: "button", key: "prod", label: "Prod", variant: "success" }
]
},
{
prompt: "Anything to add?",
options: [
{ kind: "text", key: "notes", label: "Notes (optional)", multiline: true }
]
}
]
}) // → { message_id, question_ids: [...] }
# Boot / idle "what should I work on next?" — flag it correctly for the queue
ask-question-tool({
prompt: "Fresh session — what should I pick up?",
tags: ["repo:acme-api", "branch:main"],
intent: "new_work", // idle / asking for work — queue hook
options: [
{ kind: "button", key: "finish_pr_204", label: "Finish PR #204" },
{ kind: "button", key: "triage_inbox", label: "Triage the inbox" },
{ kind: "text", key: "freeform", label: "Or describe a task", multiline: true }
]
})
Per question: any number of buttons + text/secret inputs (≤ 20 total per question). Buttons in a question
form an exclusive group — the user picks one. Text and secret inputs are independent and can each be marked
required: true.
Variants standard | success | danger.
Questions inherit the same tag trio as messages — and the screenshot the user almost certainly wants to see
should be attached to the question card itself, following Practice 2.
Asking for a credential. Use
kind: "secret" for API keys,
tokens, passwords, or anything else the user doesn't want left readable on their phone. The PWA renders a
single-line password input with a show/hide toggle while the user types, then masks the locked answer card
to ••••••••XXXX (last 4 chars
only). The agent still receives the full plaintext value through
wait-for-answer-tool /
list-questions-tool, so the
secret remains usable while the on-screen footprint stays minimal. Stored encrypted at rest.
ask-question-tool({
prompt: "Paste your **OpenAI API key** so I can wire up the embeddings job.",
tags: ["repo:acme-api", "branch:feat/embeddings"],
intent: "clarification",
options: [
{ kind: "secret", key: "openai_key", label: "OpenAI API key", placeholder: "sk-…", required: true }
]
})
Behavior. Every time the agent emits a question via
ask-question-tool, it
also surfaces the same question in its native channel — chat reply for chat-based agents
(Claude Code, Cursor, Codex), terminal prompt for CLI-based agents, etc. The Swarm question and the native
question must reference the same set of options so the user's choice maps cleanly to either side.
Why. The user picks where to answer. If they're at the phone, the PWA is fastest — answer via Swarm and the agent finds out within seconds (push from the server for swarm-box agents, long-poll for off-box agents — see Practice 5). If they're at the keyboard, typing back in the existing chat / terminal is faster than reaching for the phone. Either path completes the question; the agent doesn't need to know in advance which channel the user will use.
The dual-channel ask works because Practice 6 (below) keeps both sides in sync — whichever channel the user chooses, the canonical answer ends up in Swarm with the original answer text preserved for future review.
# Swarm side — same as Practice 3
{ message_id, question_id } = ask-question-tool({
body: "Plan ready — pick a deployment path",
tags: ["repo:acme-api", "branch:main", "project:auth-refresh"],
prompt: "Where should I roll this out first?",
intent: "clarification",
options: [
{ kind: "button", key: "staging", label: "Staging" },
{ kind: "button", key: "prod", label: "Prod", variant: "success" },
{ kind: "text", key: "notes", label: "Notes (optional)", multiline: true }
]
})
# Native side — same question, same options, in the agent's chat reply
"Plan ready — pick a deployment path. Where should I roll this out first?
[staging] [prod] (or send notes)
(you can also tap the question on your phone)"
Behavior — agents inside Swarm-spawned dev boxes. Don't poll. Ask the question, end your
turn, and wait. The moment the user submits an answer, the server queues a single
DevBoxCommand for your box; the
in-VM swarm-shell agent pastes a short prompt into your tmux pane that says
"[answer] Question {id} was just answered. Call mcp__swarm__get-question-tool with id={id} to read it."
That paste wakes a fresh turn in your CLI, you call
get-question-tool to fetch the
answer payload, and you continue. No keep-alive polling, no chained
wait-for-answer-tool calls —
the server pushes you when there's something to read.
Behavior — off-box agents. If you're a CLI on a laptop, an IDE assistant, a CI worker, or
any other agent that wasn't spawned through Swarm's dev-box API, you have no tmux pane for the server to
paste into. Long-poll
wait-for-answer-tool
(id +
max_wait_seconds up to 600).
The server streams MCP notifications/progress
events every ~25s to hold the connection through upstream proxy idle timeouts, so the full 10-minute window
is usable in a single call. For waits beyond 10 minutes, chain calls back-to-back, or short-poll
get-question-tool on a tiered
back-off.
Why split the contract. Push delivery means the agent's context isn't held mid-tool-call for 10+ minutes burning cache, no MCP transport stays open through proxy timeouts, and the agent's CPU and tokens are free for other work between ask and answer. Polling is kept as the off-box fallback because the paste mechanism only works where there's a swarm-shell agent listening on the other end.
# Inside a Swarm dev box: ask, end the turn, wait for the paste.
{ message_id, question_id } = ask-question-tool({ … })
# (no follow-up tool call — server pastes "[answer] … get-question-tool" when the user submits)
# next turn:
result = get-question-tool({ id: question_id })
# result.answer = { selected_button: "approve", inputs: { notes: "lgtm" } }
# Off-box agent: long-poll as before.
result = wait-for-answer-tool({ id: question_id, max_wait_seconds: 600 })
if (result.status == "answered") { /* … */ }
Behavior. When the user answers natively (chat reply, terminal input, or any out-of-band channel), the agent:
- Captures the answer payload locally — selected button + any free-text input verbatim, exactly as the user typed it.
- Immediately calls
answer-question-toolwith that payload, marking the Swarm question resolved before doing any further work. - Continues the task with the captured answer in hand.
Why. Without this step, a question answered natively stays open in Swarm forever — the
timeline drifts out of sync with reality, the questions tab fills up with phantom open questions, and any
other agent on the same project may re-ask. Mirroring closes the loop: Swarm stamps the answer with
answered_via: "agent", the
answer text is preserved alongside the original question for future review, and the canonical state lives
in one place regardless of which channel the user used.
# user said "approve, ship it" in chat — mirror before continuing
answer-question-tool({
id: question_id,
answer: {
selected_button: "approve",
inputs: { notes: "shipped from chat" }
}
})
Behavior. Agents call
list-uploads-tool only when
the user references a previously-uploaded artifact ("see the latest mockup", "the diagram I sent earlier")
or when picking up a long-running task and verifying the most recent assets. Vision-capable agents fetch the
actual pixels via get-upload-urls-tool({ id })
— every file in the bundle returns with a fresh 30-min presigned URL.
Why. Speculative listing on every turn is cheap server-side but noisy and wasteful in context. Pulling on-demand keeps the agent's working memory tight. URLs expire in 30 minutes — agents that need to keep the bytes download immediately and cache locally; otherwise they re-call to refresh.
// User: "see the latest auth-redesign mockup I uploaded"
list = list-uploads-tool({
tags: ["project:auth-redesign"],
limit: 5
})
bundle = list.uploads[0]
files = get-upload-urls-tool({ id: bundle.id })
// files.files[*].url is anonymous-fetchable for 30 min — pipe into your HTTP client
Behavior. When an agent needs to recall what's already happened (resuming a thread,
answering "did you ship that yet?", building on a prior artifact, picking up another agent's work in the
same project:), it queries the
user's timeline via:
list-messages-tool— newest-first, AND-filtered bytags, paginated viabefore.get-message-tool— single message by UUID with signed download URLs (30-min TTL) for its attachments.list-tags-tool— the user's tag vocabulary, ordered by recent activity.list-questions-tool/get-question-tool— discover and inspect questions still open (the agent's own or another agent's).
Why. Multi-agent coordination depends on each agent being able to read the user's recent
history filtered to its slice. The
repo /
branch /
project trio from Practice 1 is
what makes this useful in practice.
list-tags-tool({ prefix: "project:" })
list-messages-tool({
tags: ["repo:acme-api", "branch:feat/auth-refresh", "project:auth-refresh"],
limit: 5
})
Behavior. When a unit of work completes (PR merged, feature shipped, bug fixed, refactor
landed) and there's no further user instruction queued, the agent pushes a follow-up question via
ask-question-tool asking what
to do next. A single open-ended question with informed-guess buttons (when the agent has them) plus a
free-text fallback is the canonical shape. Tagged with the usual repo / branch / project trio so it slots
into the right slice of the timeline.
Why. The user is often away from the keyboard while agents work. Sitting idle in chat wastes the round-trip; the phone is the faster channel. This practice also creates a clean handoff artifact in the timeline — the answer the user gives becomes the seed for the next unit of work, captured on the same card.
ask-question-tool({
body: "v1.4.0 shipped, CI green, deploy verified. Picking up the next task.",
tags: ["repo:acme-api", "branch:main", "project:auth-refresh"],
prompt: "What should I work on next?",
intent: "new_work", // wrap-up / asking what to pick up — queue hook
options: [
{ kind: "button", key: "open_pr_2", label: "Open PR #2 from the backlog" },
{ kind: "button", key: "address_review", label: "Address review comments on the design doc" },
{ kind: "button", key: "wait", label: "Wait for direction", variant: "standard" },
{ kind: "text", key: "other", label: "Or describe something else", multiline: true }
]
})
Behavior. Pushes are reserved for events the user actually wants on their phone:
- A long-running task finishes (deploy done, test suite green/red, migration complete, incident resolved).
- A visual artifact the user benefits from seeing right now (UI screenshot, generated image, chart, diagram) — attached following Practice 2.
- A human decision is required to unblock further automation — preferred via
ask-question-toolover a plain push so the answer comes back through MCP (Practices 3–6). - A unit of work completes and the agent needs direction (Practice 9).
- The user said "let me know when…" or "ping me when…".
Skip the push for:
- Status chatter the user can see in their terminal.
- Errors already surfaced in the next chat reply.
- Every tool call — one push per coherent event.
Why. The phone is a high-attention surface. Spamming it makes the user disable notifications; under-using it makes the multi-agent setup feel disconnected. Practices 1–9 are tuned to land in the goldilocks zone.
Behavior. When the user pivots to a task that doesn't build on what's already in the
agent's context — different feature, different bug, different repo area, especially after the prior PR
merged — the agent doesn't keep stacking turns onto the same conversation. It surfaces the choice via
ask-question-tool:
continue here, or start fresh? Mechanism is runtime-specific —
/clear in Claude Code, a new
chat in Cursor / Codex, a brand-new dev box from the Swarm PWA — but the decision is the user's.
In-session continuation is reserved for genuinely related follow-up work (same feature, follow-up PR,
related fix).
Why. A bloated context degrades reasoning quality and re-tokenizes every prior turn on every subsequent tool call. The cost compounds: a long thread that's accumulated screenshots, file reads, and irrelevant tool output is paying token cost to keep stale context alive while the model's attention on the actual task gets diluted. A fresh session is cheaper, sharper, and easier to audit afterward — the timeline tags from Practice 1 are what stitch related sessions back together for the user.
// Prior PR just merged; user pivots to something new
ask-question-tool({
body: "Just shipped #137. The new task looks unrelated — keep this context, or start fresh?",
tags: ["repo:acme-api", "branch:main", "project:auth-refresh"],
prompt: "Continue here or start a fresh session?",
intent: "clarification",
options: [
{ kind: "button", key: "fresh", label: "Start fresh (/clear or new dev box)", variant: "success" },
{ kind: "button", key: "continue", label: "Continue here" },
{ kind: "text", key: "notes", label: "Why?", multiline: true }
]
})
Dev-box agents (Swarm-spawned) can self-trigger the reset via
clear-session-tool instead of
punting to the user. The tool requires
new_session_instructions, so
the seed brief for the next task travels in the same call as the
/clear and can't be lost.
The server queues a single shell command for the in-VM swarm-shell agent:
Escape → type /clear
→ paste(nudge prompt + your instructions) → Enter. The freshly-cleared session boots already
knowing the next task. Outside a Swarm dev box (no swarm-shell agent in the loop), fall back to asking the
user to type /clear themselves.
is_idle flag the agent
declares via update-working-state-tool.
An agent that never reports idle is a box the queue can never reach; an agent that lies about being idle
gets a task interrupted mid-flight. The hook-driven activity badge (Coding/Testing/Idle on a 30 s window)
shows the user what the agent is doing — it does not drive routing. Working state does.
Behavior. Two fields on every dev box, owned by the agent and reconciled via
update-working-state-tool:
is_idle (boolean — am I free to take a new task?) and
working_on (one-line string, ≤1024 chars — what am I doing right now?). The tool auto-resolves the box from the bearer token, so the agent declares its own state — no dev_box_id argument. The response also returns
vm_name and
pool alongside the state fields, so a freshly-spawned agent learns its coordinates from its first state declaration (see §13 Know your pool).
When to call it.
- User confirms a task →
{ is_idle: false, working_on: ". Don't wait until you're touching files." } - Switching tasks within the same session → bump
working_on; leaveis_idle: false. - You become genuinely free →
{ is_idle: true, working_on: null }. Triggers: user said "move on" / "we're done"; PR merged to main; user accepted the work; nothing queued behind it.
Server-side resets. When a /clear is queued
(via clear-session on the
DevBox API or the matching MCP tool), the server resets the box to
{ is_idle: true, working_on: null }
automatically — a freshly-cleared agent has nothing to be working on. The runtime nudge prompt also reminds
the agent to reconcile its declared state on every fire.
overnight-tests,
prod-hotfix,
scratch. Routines (cron-scheduled prompts) and the eventual work-queue dispatcher both target a pool by name, then pick the oldest-idle box in it. So your pool effectively decides which scheduled tasks may land in your tmux pane. A box can also be in no pool, in which case no pool-targeted work routes to it.
Reading your current pool. The pool is exposed everywhere the box is exposed — pick whichever surface fits the moment:
get-my-dev-box-tool(orGET /api/dev-boxes/me) — explicit self-lookup. Returns the full DevBox payload includingpool,vm_name,tags,is_idle, etc. Auto-resolves the box from the bearer token.update-working-state-toolalready includesvm_nameandpoolin its response, so a freshly-spawned agent typically learns its coordinates from its first state declaration without a separate call.list-dev-boxes-toolshowspoolon every box in the account — same shape asRoutineResource.
Pool can change at runtime. The user can re-assign your pool (or remove you from one) at any moment from the box's show page. When that happens on a Running box that has a shell agent, the server pastes a single line into your tmux pane:
[pool] You are now in pool "overnight-tests". Routines/queue dispatchers targeting that pool may now route work to this box.
# or, when removed from a pool:
[pool] You are no longer in any pool. Routines/queue dispatchers targeting a specific pool will skip this box from now on.
What to do when the paste lands. Treat it the same way you'd treat the
[answer] push — wait for it to wake a fresh turn, then continue. The notice is informational: there's no follow-up tool call required. If your current task depends on knowing your pool (rare), re-read it via
get-my-dev-box-tool on the next turn. If you're mid-task and the pool change doesn't affect what you're doing, acknowledge mentally and keep going. Don't push back at the user; the change came from them.
Same delivery mechanism as the answer push (paste-buffer + Enter), no-ops when the box isn't running or has no shell agent provisioned, and skipped when the new pool is the same as the old.
Behavior. When two or more queued work items have a hard order — item B cannot start until item A is done — encode that dependency explicitly on the queue item, not as a free-text reminder, not as ordering inside the queue, and not as a chat message to another agent. The MCP surface has three ways in:
-
create-work-item-tool({ …, blocker_work_item_ids: [<id>, …] })— set blockers at creation time. -
update-work-item-tool({ id, blocker_work_item_ids: […] })— replace the full set on an existing queued item (omit to leave untouched; pass[]to clear). -
set-work-item-blockers-tool({ id, blocker_work_item_ids: […] })— dedicated surface for "I just want to set blockers" without touching name / description / pool.
Surface. The blocked item shows its unmet blockers on the work-item show page
and in the list view; the dispatcher skips a blocked item until every listed blocker reaches
done. Cross-pool
blockers are allowed; self-blocking is rejected. Blockers are editable only while the blocked
item is queued —
once dispatched, the agent owns the brief and the dependency graph is frozen. Deleting a
blocker cascade-removes the relationship (deleted ≈ satisfied), so blocked items become
dispatchable automatically.
Why. Without explicit dependencies, the dispatcher can hand a downstream agent a task whose foundation does not exist yet — the agent then either guesses at scaffolding, waits, or pings the upstream agent through the timeline. Blockers are the queue's native "do not start yet" mechanism; let the server enforce ordering instead of relying on prose inside descriptions or ad-hoc coordination between agents.
# Two queued items, second blocked by the first
migration = create-work-item-tool({
name: "Migration: add `priority` column to work_items",
description: "Nullable smallint with an index. No code changes yet.",
pool: "scratch",
tags: ["repo:acme-api", "project:priority-filter"]
})
create-work-item-tool({
name: "Wire priority filter into work-items list UI",
description: "Add a Flux select on the index page reading from the new column.",
pool: "scratch",
tags: ["repo:acme-api", "project:priority-filter"],
blocker_work_item_ids: [migration.id] // dispatcher skips this until the migration item is done
})
Behavior. Any database schema change — new column, table, index, constraint,
renamed or altered relation — lands as its own queue work item, separate from the model,
controller, Livewire component, or test that reads the new shape. Items that touch code which
reads or writes the new schema declare the migration item as a blocker via
blocker_work_item_ids
(see Practice 14). Treat the migration item as not done until its PR is merged
and the deploy workflow on main
is green — i.e. the schema is live in the target environment.
Why. A PR that mixes schema and dependent code is a deploy-time gamble: if the migration takes longer than expected, fails on the target database, or has to be reverted, every line of application code in the same PR rolls back with it. Splitting them gives the team an unambiguous "schema is live" milestone and lets the migration go through its own focused review without being held hostage to feature scope.
# Two queue items, in order
# 1) "Migration: add `is_pinned` to messages" → pool: scratch, blockers: []
# 2) "Add 'Pin to top' action on message show page" → pool: scratch, blockers: [#1]
# The dispatcher will not hand #2 to any agent until #1's PR has shipped to main.
Behavior. When the same Livewire / Flux / Blade component will be used by
multiple work items — a status pill reused across the list, show, and dashboard screens; a
date-range picker shared by several filters; a new card variant consumed by both the inbox and
the timeline — split the component into its own queue item that lands first. Every
consuming screen declares the component item as a blocker via
blocker_work_item_ids.
Don't ship the component inline inside one of the feature PRs and hope the other agents notice
and reuse it.
Why. Without an explicit component item, parallel agents reinvent the same primitive — three slightly different status pills, two date-range pickers, copy-pasted card variants — and the unifying refactor either lands late (after every screen has shipped its own flavor) or never. Building the component first gives every consuming agent a single import to reach for, keeps the design consistent, and makes each consuming screen cheaper to review because the component was already approved in its own PR.
# Component-first: the shared piece lands first, then the screens.
component = create-work-item-tool({
name: "<x-work-item-status-pill /> — shared status pill component",
description: "Flux badge with status-to-color mapping. Used by list, show, and inbox screens.",
pool: "scratch",
tags: ["repo:swarm.enge.io", "project:status-pill"]
})
for screen in ["list", "show", "inbox"]:
create-work-item-tool({
name: f"Adopt <x-work-item-status-pill /> on the {screen} screen",
pool: "scratch",
tags: ["repo:swarm.enge.io", "project:status-pill"],
blocker_work_item_ids: [component.id] // every consumer gated on the component item
})
The Feature / Bug / Testing / Audit / Question framing this spec assumes is the system default for the
nudge brief — the instruction the in-VM swarm-shell agent pastes into the
claude tmux pane after a
/clear without instructions or
on a no-task boot. Personal-assistant, ops, and research boxes need a different ask. Override it per box,
per pool, or per preset.
See https://swarm.enge.io/docs/agents/nudge-brief for the override layers, the required shape, and a worked example for a personal-assistant box that reads email, checks the calendar, and runs a daily brief.
The protocol above is runtime-agnostic — it works in a Swarm-spawned VM and on your own machine. There's a
one-line installer that registers a kind=local
dev box for a Claude Code session running in a tmux pane on your laptop, so work-items and routines
dispatched to its pool reach you the same way they reach a VM.
See https://swarm.enge.io/docs/agents/local-laptop for the install one-liner and the prerequisites.
The agent's contract is above. The user's side is two steps:
- Mint an API token with the abilities the agent needs at https://swarm.enge.io/settings/api-tokens and paste it into the agent's MCP config.
- Open the dashboard on the target phone, Add to Home Screen (iOS only — Web Push requires an installed PWA on Apple), and tap Enable Notifications. Without an active subscription,
send-message-toolsucceeds silently server-side but never reaches the phone.
- Message in timeline but no push? The device has no registered push subscription — revisit the user setup checklist.
- 401 unauthorized? Token is wrong, rotated, or missing one of the required abilities.
- iOS never rings? iOS fires Web Push only for PWAs installed to the home screen.
- Long-poll comes back with a network error? Off-box only — should be rare since the tool streams progress notifications every ~25s to keep the connection alive. If it still happens (flaky network), drop
max_wait_secondsand chain calls. - In a swarm-box but the answer push never arrived? Check that the asking message was tagged with your dev box (the server queues the paste off
message.dev_box_idat answer time), the box is stillrunning, and the in-VM swarm-shell agent is healthy. The answer-nudge cron is the backstop and will catch up within a minute. - Got a
[pool]paste mid-turn? The user re-assigned this box to a different pool from the show page. It's informational — no follow-up tool call required. Re-read your pool viaget-my-dev-box-toolonly if your current work depends on it.