Build Loop Tests#
These tests must exist before agents claim the Build Test Fix loop is done.
Agent Logging Test#
Prove agents really log their work.
- Simulate or run an agent calling
covibe_taskwithoperation: "plan". - Verify Workstreams displays the planned task.
- Simulate or run an agent calling
covibe_taskwithoperation: "check". - Simulate or run the same agent calling
covibe_taskwithoperation: "start"oroperation: "start_planned". - Verify the task exists in structured storage.
- Verify a
task.startedwork event exists. - Verify a usage event exists for all MCP calls.
- Verify Workstreams displays the task.
- Verify Activity displays the event.
- Complete the task and verify Activity displays the result summary.
- Cancel a planned task and verify it leaves Workstreams while Activity shows the reason.
- Start an overlapping planned task with confirmation and verify Activity shows
warning.confirmed. - Log a blocker and decision and verify Coordination displays both.
- Verify
covibe_teamoperation: "state"returns the planned work, blocker, and decision context. - Verify
covibe_teamoperation: "state"returns recent completions with result summaries.
Session Heartbeat Test#
Prove long-running agents do not look active forever after going quiet.
- Start an agent session.
- Call
covibe_sessionwithoperation: "heartbeat". - Verify
last_seen_atupdates. - Age a session beyond the stale threshold.
- Verify the dashboard/session view reports
stale. - Verify
covibe_teamoperation: "state"returnsstale_sessions. - Verify stale sessions create warning feedback for agents.
Session Ownership Test#
Prove one developer token cannot close or revive another developer's run.
- Start an agent session with Developer A.
- Call
covibe_sessionoperation: "heartbeat"with Developer B's token and A'ssession_id. - Verify the response is an error.
- Call
covibe_sessionoperation: "end"with Developer B's token and A'ssession_id. - Verify the response is an error.
- Verify A's session remains active.
- Verify no
session.endedwork event was created by B.
Parallel Work Audit Test#
Prove duplicate work can be caught after work has already started.
- Developer A starts a long-running workstream.
- Developer B starts an overlapping task with a confirmation reason.
- Call
covibe_teamwithoperation: "audit_parallel_work". - Verify the response returns a structured warning.
- Verify the conflict includes both pieces of work.
- Verify recently completed matching work is also reported.
- Verify a feedback event and
parallel_work.auditwork event were saved. - Verify the dashboard
Audit parallel workbutton shows the same conflict. - Verify the dashboard audit does not mint an MCP token.
- Refresh and verify the latest audit still appears.
- Generate a weekly summary and verify
Parallel auditsincludes the conflict. - Verify
covibe_teamoperation: "state"returnslatest_parallel_auditsand warns the agent for active conflicts. - Complete both sides and verify
covibe_teamoperation: "state"no longer warns on that historical audit.
Token Inventory Test#
Prove developers can manage agent tokens after refresh.
- Create an MCP token from the UI.
- Verify the one-time modal shows the raw token and the Settings → Agent
connection stdio MCP config embeds no raw token (the stdio bridge resolves
it from
~/.covibe/credentials.jsonviaCOVIBE_AGENT+COVIBE_BASE_URL). - Create a second token and verify default labels are distinguishable.
- Refresh the page.
- Verify the token metadata still appears.
- Revoke the active token from Settings → Agent connection.
- Verify the UI reports revocation.
- Verify the revoked raw token can no longer call
/api/mcp. - Verify Activity shows token creation and revocation events without raw token values.
Customer Readiness Tests#
The local companion setup and hosted deployment canary checklists live in
customer-readiness-tests.md. They cover the
hosted /downloads/co-vibe.tgz install path,
npm exec -- covibe-local setup --base-url <origin>, manual
snapshot --base-url <origin>, watch --base-url <origin> --once, and
npm exec -- covibe-mcp.
Accuracy Gate#
Run npm run check:accuracy before customer handoff. It executes the committed
overlap scenarios, semantic calibration, and Performance quality read-model
tests so the duplicate-work and designed Performance functionality stay above
the 90% product-quality bar. The overlap scenario suite explicitly requires
100% recall for real duplicates and at least 90% precision over committed
stage-1 scenarios. npm run readiness includes this gate.
Local Identity Test#
Prove local dogfood mode does not hardcode one developer.
- Open the UI without selecting a developer.
- Create an MCP token and verify it is created for
hakan. - Select
dev2through the local-dev/api/dev-logintest path (the sign-in form only creates new tenants through/onboarding). - Create an MCP token.
- Verify the token is created for
dev2. - Use that token to start a task through MCP.
- Verify Workstreams shows the task under
@dev2. - Complete the task through MCP so the e2e run does not leave active test work behind.
Block And Warn Test#
Prove warnings and blocks are returned to the agent.
- Create an existing active task.
- Have another agent check similar work.
- Verify the MCP response returns a structured warning.
- Have the agent try to start without confirmation.
- Verify the MCP response returns a structured block.
- Verify the block tells the agent what to do next.
- Verify the warning and block are saved.
- Verify
covibe_teamoperation: "feedback"returns the saved warning or block. - Verify the UI shows the warning.
Example blocked response:
{
"status": "blocked",
"reason": "Possible duplicate work found.",
"required_action": "Ask the developer for confirmation and a reason before starting.",
"warning_id": "warn_123",
"matches": [
{
"type": "task",
"title": "Research Claude/Codex orchestration for overnight builds",
"owner": "hakan"
}
]
}Duplicate Work Scenario#
The loop must include this test scenario.
- Developer A starts: "Research Claude/Codex orchestration for overnight builds."
- Developer B checks: "Compare Gas Town and Mission Control for overnight AI builds."
- Co-Vibe warns that the work may overlap.
- Developer B tries to start without confirmation.
- Co-Vibe rejects it.
- Developer B starts again with a confirmation reason.
- Co-Vibe accepts it and logs the override.
- The UI shows the warning and confirmation reason.
- Activity shows the warning, block, confirmation, and accepted start.
- Usage events show the tool calls that created the flow.
- Metrics show tool calls, warnings, and confirmed overlaps from the same records.
- Settings → Agent connection does not retain
demo-scenariotokens. - Browser and stdio e2e tests revoke tokens they create.
- Dashboard and weekly summary reads stay bounded after repeated dogfood runs.
- Weekly summary structured data includes
plannedTasks.
This is the most important test.
Required Test Areas And First Human Test#
The broad coverage checklist and first human test live in
build-loop-test-areas.md. The Build Test Fix loop
must still include a browser test for the main user journey when the app has a
browser UI.