Engineering

MCP Overlap Contract#

This file expands the overlap-specific stage-2 protocol referenced by mcp-contract.md.

Overlap Detection v2 (agent-judge)#

Duplicate-work warnings are precise: the server retrieves candidates, and the session's own agent judges whether its task is the SAME SCOPE as each. The server records the verdict and gates the start on it.

covibe_task Check (Stage 1)#

Optional input scope: a short "what this work actually is", richer than the title. It is folded into retrieval (embedding + lexical) and shown to the judge.

Optional precision inputs are additive and never reduce recall:

  • action: create|modify|remove|fix|audit
  • component: a layer tag up to 40 chars, such as server, client, db, or ui

A different declared component, or an opposed action, is treated as different work and will not escalate even on near-identical text. Both persist on the task and help the next agent's check. overlap-precision.md covers the signal set.

Stage 1 retrieves open, recent, and reserved work, then escalates only strong, non-divergent, non-low-information candidates from still-open work. No strong candidate means status: "ok", data.candidates: [], and data.requires_verdict: false. Matches against done/cancelled work inform (they appear in data.matches with their status) but never escalate and never block.

Weak warn-floor matches are still recorded for the dashboard, and they are fully disclosed: whenever a warning is recorded the check response carries warning_id plus data.matches with each match's score and human reason — even when requires_verdict is false. A warn-floor warning is visibility only; it never gates the start (see Stage 2). Strong candidates return status: "warning" with the judge instruction in feedback.required_action and:

json
{
  "warning_id": "warn_123",
  "check_id": "chk_456",
  "requires_verdict": true,
  "candidates": [
    {
      "id": "task_1",
      "type": "task",
      "title": "...",
      "scope": "...",
      "owner": "hakan",
      "status": "active",
      "repo": "co-vibe",
      "score": 0.79,
      "reason": "Semantically similar (79%) to \"...\"."
    }
  ],
  "matches": []
}

Candidates carry the same score + reason evidence the workstream check exposes, so the judging agent sees what the engine saw. data.matches repeats the full warn-floor list in the same shape.

A candidate with "type": "intent" is another agent's live reservation: work that has been checked but not yet started. Reservations close the simultaneous-start race. Two agents checking the same brand-new scope before either starts now see each other's intent as a candidate. Reservation hygiene:

  • a re-check by the same developer with the same title+scope fingerprint REPLACES the prior reservation — never duplicates it
  • plan, start, and start_planned consume the developer's matching reservation, so the real task row becomes the only candidate
  • a reservation can surface as an escalated candidate only at/above the escalation threshold; below it, it informs via data.matches and never blocks anyone
  • reservations have a short TTL and lazily expire

Start / Start Planned (Stage 2)#

New inputs:

  • check_id: the id returned by the matching operation: "check". Required when the warning carries requires_verdict.
  • scope_verdict: { "same_scope_candidate_ids": ["task_1"], "reason": "..." }

The start is bound to the check by session plus a fingerprint of title, scope, and repo, so a check for one scope cannot authorize a different start.

Start blocks ONLY on the escalated path: when the bound warning carries requires_verdict, a missing/stale check_id, a missing scope_verdict, unknown candidate ids, or a confirmed duplicate without confirmation_reason each block with a required_action that names the exact field to send next.

Weak warn-floor warnings (requires_verdict = 0) NEVER block a start — they are dashboard visibility only. The start proceeds with no extra fields; the warning stays pending for the dashboard unless the agent explicitly resolves it (matching check_id + empty verdict dismisses it; the legacy warning_id + confirmation_reason shape still confirms it). The live detection battery (2026-06-09) measured the old behavior — weak warnings silently hard-blocking starts — at a 7/7 false-block rate on non-duplicate traps with scores as low as 29%, which is why this gate is escalation-only now.

Candidates that completed or were cancelled between check and start drop from the required set. If none remain live, the warning auto-clears. The same liveness rule applies to covibe_workstream start: a warning whose matches are all done/cancelled auto-dismisses instead of blocking.

Started work is stamped with an overlap_status derived from how the warning resolved: confirmed only when the agent confirmed a duplicate, warning when a weak warn-floor warning is still pending, otherwise clear.

On a passing verdict the server records one scope_verdicts row per candidate with the stage-1 score and scope fingerprints, resolves the warning, and writes a scope.judged work event.

The warning resolves as confirmed when a duplicate was confirmed, otherwise as dismissed from the agent's reason.

Activation Re-check (start_planned)#

covibe_task and covibe_workstream operation: "start_planned" re-run overlap retrieval at activation time, because work can start between planning (or the agent's fresh check) and activation. The re-check folds in live intents and applies the same escalation bar and cooldown as Stage 1.

If strong candidates emerge that the bound check's warning does not already cover, the start does NOT proceed. It records a new requires_verdict warning plus preflight and returns status: "warning" with the standard escalation shape — warning_id, a NEW check_id, requires_verdict: true, and candidates. The agent judges the candidates and retries start_planned with that check_id and a scope_verdict; the retry flows through the same scope-verdict gate as a direct start (confirmed duplicates still need confirmation_reason). Candidates the bound check already escalated do not re-trigger the re-check — the gate enforces their verdict instead.

The re-check writes a task.checked / workstream.checked work event with payload.activation_recheck: true so audits can tell server re-checks from agent-called preflights.

Cooldown#

Once an agent judges a candidate NOT a duplicate, the server stops re-escalating that same source-to-candidate pair on later checks until either side's title or scope text changes. A new fingerprint re-opens the evaluation.

The trust model is to trust the agent's verdict; the dashboard Audit signal is the backstop.

View as .md