# Build Loop Tests

These tests must exist before agents claim the Build Test Fix loop is done.

## Agent Logging Test

Prove agents really log their work.

1. Simulate or run an agent calling `covibe_task` with `operation: "plan"`.
2. Verify Workstreams displays the planned task.
3. Simulate or run an agent calling `covibe_task` with `operation: "check"`.
4. Simulate or run the same agent calling `covibe_task` with `operation:
   "start"` or `operation: "start_planned"`.
5. Verify the task exists in structured storage.
6. Verify a `task.started` work event exists.
7. Verify a usage event exists for all MCP calls.
8. Verify Workstreams displays the task.
9. Verify Activity displays the event.
10. Complete the task and verify Activity displays the result summary.
11. Cancel a planned task and verify it leaves Workstreams while Activity shows the reason.
12. Start an overlapping planned task with confirmation and verify Activity shows `warning.confirmed`.
13. Log a blocker and decision and verify Coordination displays both.
14. Verify `covibe_team` `operation: "state"` returns the planned work, blocker, and decision context.
15. Verify `covibe_team` `operation: "state"` returns recent completions with result summaries.

## Session Heartbeat Test

Prove long-running agents do not look active forever after going quiet.

1. Start an agent session.
2. Call `covibe_session` with `operation: "heartbeat"`.
3. Verify `last_seen_at` updates.
4. Age a session beyond the stale threshold.
5. Verify the dashboard/session view reports `stale`.
6. Verify `covibe_team` `operation: "state"` returns `stale_sessions`.
7. Verify stale sessions create warning feedback for agents.

## Session Ownership Test

Prove one developer token cannot close or revive another developer's run.

1. Start an agent session with Developer A.
2. Call `covibe_session` `operation: "heartbeat"` with Developer B's token and A's `session_id`.
3. Verify the response is an error.
4. Call `covibe_session` `operation: "end"` with Developer B's token and A's `session_id`.
5. Verify the response is an error.
6. Verify A's session remains active.
7. Verify no `session.ended` work event was created by B.

## Parallel Work Audit Test

Prove duplicate work can be caught after work has already started.

1. Developer A starts a long-running workstream.
2. Developer B starts an overlapping task with a confirmation reason.
3. Call `covibe_team` with `operation: "audit_parallel_work"`.
4. Verify the response returns a structured warning.
5. Verify the conflict includes both pieces of work.
6. Verify recently completed matching work is also reported.
7. Verify a feedback event and `parallel_work.audit` work event were saved.
8. Verify the dashboard `Audit parallel work` button shows the same conflict.
9. Verify the dashboard audit does not mint an MCP token.
10. Refresh and verify the latest audit still appears.
11. Generate a weekly summary and verify `Parallel audits` includes the conflict.
12. Verify `covibe_team` `operation: "state"` returns `latest_parallel_audits` and warns the agent for active conflicts.
13. Complete both sides and verify `covibe_team` `operation: "state"` no longer warns on that historical audit.

## Token Inventory Test

Prove developers can manage agent tokens after refresh.

1. Create an MCP token from the UI.
2. Verify the one-time modal shows the raw token and the Settings → Agent
   connection stdio MCP config embeds no raw token (the stdio bridge resolves
   it from `~/.covibe/credentials.json` via `COVIBE_AGENT` + `COVIBE_BASE_URL`).
3. Create a second token and verify default labels are distinguishable.
4. Refresh the page.
5. Verify the token metadata still appears.
6. Revoke the active token from Settings → Agent connection.
7. Verify the UI reports revocation.
8. Verify the revoked raw token can no longer call `/api/mcp`.
9. Verify Activity shows token creation and revocation events without raw token values.

## Customer Readiness Tests

The local companion setup and hosted deployment canary checklists live in
[`customer-readiness-tests.md`](./customer-readiness-tests.md). They cover the
hosted `/downloads/co-vibe.tgz` install path,
`npm exec -- covibe-local setup --base-url <origin>`, manual
`snapshot --base-url <origin>`, `watch --base-url <origin> --once`, and
`npm exec -- covibe-mcp`.

## Accuracy Gate

Run `npm run check:accuracy` before customer handoff. It executes the committed
overlap scenarios, semantic calibration, and Performance quality read-model
tests so the duplicate-work and designed Performance functionality stay above
the 90% product-quality bar. The overlap scenario suite explicitly requires
100% recall for real duplicates and at least 90% precision over committed
stage-1 scenarios. `npm run readiness` includes this gate.

## Local Identity Test

Prove local dogfood mode does not hardcode one developer.

1. Open the UI without selecting a developer.
2. Create an MCP token and verify it is created for `hakan`.
3. Select `dev2` through the local-dev `/api/dev-login` test path (the sign-in
   form only creates new tenants through `/onboarding`).
4. Create an MCP token.
5. Verify the token is created for `dev2`.
6. Use that token to start a task through MCP.
7. Verify Workstreams shows the task under `@dev2`.
8. Complete the task through MCP so the e2e run does not leave active test work behind.

## Block And Warn Test

Prove warnings and blocks are returned to the agent.

1. Create an existing active task.
2. Have another agent check similar work.
3. Verify the MCP response returns a structured warning.
4. Have the agent try to start without confirmation.
5. Verify the MCP response returns a structured block.
6. Verify the block tells the agent what to do next.
7. Verify the warning and block are saved.
8. Verify `covibe_team` `operation: "feedback"` returns the saved warning or block.
9. Verify the UI shows the warning.

Example blocked response:

```json
{
  "status": "blocked",
  "reason": "Possible duplicate work found.",
  "required_action": "Ask the developer for confirmation and a reason before starting.",
  "warning_id": "warn_123",
  "matches": [
    {
      "type": "task",
      "title": "Research Claude/Codex orchestration for overnight builds",
      "owner": "hakan"
    }
  ]
}
```

## Duplicate Work Scenario

The loop must include this test scenario.

1. Developer A starts: "Research Claude/Codex orchestration for overnight builds."
2. Developer B checks: "Compare Gas Town and Mission Control for overnight AI builds."
3. Co-Vibe warns that the work may overlap.
4. Developer B tries to start without confirmation.
5. Co-Vibe rejects it.
6. Developer B starts again with a confirmation reason.
7. Co-Vibe accepts it and logs the override.
8. The UI shows the warning and confirmation reason.
9. Activity shows the warning, block, confirmation, and accepted start.
10. Usage events show the tool calls that created the flow.
11. Metrics show tool calls, warnings, and confirmed overlaps from the same records.
12. Settings → Agent connection does not retain `demo-scenario` tokens.
13. Browser and stdio e2e tests revoke tokens they create.
14. Dashboard and weekly summary reads stay bounded after repeated dogfood runs.
15. Weekly summary structured data includes `plannedTasks`.

This is the most important test.

## Required Test Areas And First Human Test

The broad coverage checklist and first human test live in
[`build-loop-test-areas.md`](./build-loop-test-areas.md). The Build Test Fix loop
must still include a browser test for the main user journey when the app has a
browser UI.