MCP Overlap Contract#
This file expands the overlap-specific stage-2 protocol referenced by
mcp-contract.md.
Overlap Detection v2 (agent-judge)#
Duplicate-work warnings are precise: the server retrieves candidates, and the session's own agent judges whether its task is the SAME SCOPE as each. The server records the verdict and gates the start on it.
covibe_task Check (Stage 1)#
Optional input scope: a short "what this work actually is", richer than the
title. It is folded into retrieval (embedding + lexical) and shown to the judge.
Optional precision inputs are additive and never reduce recall:
action:create|modify|remove|fix|auditcomponent: a layer tag up to 40 chars, such asserver,client,db, orui
A different declared component, or an opposed action, is treated as different
work and will not escalate even on near-identical text. Both persist on the task
and help the next agent's check. overlap-precision.md covers the signal set.
Stage 1 retrieves open, recent, and reserved work, then escalates only strong,
non-divergent, non-low-information candidates from still-open work. No strong
candidate means status: "ok", data.candidates: [], and
data.requires_verdict: false. Matches against done/cancelled work inform
(they appear in data.matches with their status) but never escalate and never
block.
Weak warn-floor matches are still recorded for the dashboard, and they are
fully disclosed: whenever a warning is recorded the check response carries
warning_id plus data.matches with each match's score and human reason —
even when requires_verdict is false. A warn-floor warning is visibility
only; it never gates the start (see Stage 2). Strong candidates return
status: "warning" with the judge instruction in feedback.required_action
and:
{
"warning_id": "warn_123",
"check_id": "chk_456",
"requires_verdict": true,
"candidates": [
{
"id": "task_1",
"type": "task",
"title": "...",
"scope": "...",
"owner": "hakan",
"status": "active",
"repo": "co-vibe",
"score": 0.79,
"reason": "Semantically similar (79%) to \"...\"."
}
],
"matches": []
}Candidates carry the same score + reason evidence the workstream check
exposes, so the judging agent sees what the engine saw. data.matches repeats
the full warn-floor list in the same shape.
A candidate with "type": "intent" is another agent's live reservation: work
that has been checked but not yet started. Reservations close the
simultaneous-start race. Two agents checking the same brand-new scope before
either starts now see each other's intent as a candidate. Reservation hygiene:
- a re-check by the same developer with the same title+scope fingerprint REPLACES the prior reservation — never duplicates it
plan,start, andstart_plannedconsume the developer's matching reservation, so the real task row becomes the only candidate- a reservation can surface as an escalated candidate only at/above the
escalation threshold; below it, it informs via
data.matchesand never blocks anyone - reservations have a short TTL and lazily expire
Start / Start Planned (Stage 2)#
New inputs:
check_id: the id returned by the matchingoperation: "check". Required when the warning carriesrequires_verdict.scope_verdict:{ "same_scope_candidate_ids": ["task_1"], "reason": "..." }
The start is bound to the check by session plus a fingerprint of title, scope, and repo, so a check for one scope cannot authorize a different start.
Start blocks ONLY on the escalated path: when the bound warning carries
requires_verdict, a missing/stale check_id, a missing scope_verdict,
unknown candidate ids, or a confirmed duplicate without confirmation_reason
each block with a required_action that names the exact field to send next.
Weak warn-floor warnings (requires_verdict = 0) NEVER block a start — they
are dashboard visibility only. The start proceeds with no extra fields; the
warning stays pending for the dashboard unless the agent explicitly resolves it
(matching check_id + empty verdict dismisses it; the legacy
warning_id + confirmation_reason shape still confirms it). The live detection
battery (2026-06-09) measured the old behavior — weak warnings silently
hard-blocking starts — at a 7/7 false-block rate on non-duplicate traps with
scores as low as 29%, which is why this gate is escalation-only now.
Candidates that completed or were cancelled between check and start drop from
the required set. If none remain live, the warning auto-clears. The same
liveness rule applies to covibe_workstream start: a warning whose matches are
all done/cancelled auto-dismisses instead of blocking.
Started work is stamped with an overlap_status derived from how the warning
resolved: confirmed only when the agent confirmed a duplicate, warning when
a weak warn-floor warning is still pending, otherwise clear.
On a passing verdict the server records one scope_verdicts row per candidate
with the stage-1 score and scope fingerprints, resolves the warning, and writes
a scope.judged work event.
The warning resolves as confirmed when a duplicate was confirmed, otherwise as
dismissed from the agent's reason.
Activation Re-check (start_planned)#
covibe_task and covibe_workstream operation: "start_planned" re-run
overlap retrieval at activation time, because work can start between planning
(or the agent's fresh check) and activation. The re-check folds in live intents
and applies the same escalation bar and cooldown as Stage 1.
If strong candidates emerge that the bound check's warning does not already
cover, the start does NOT proceed. It records a new requires_verdict warning
plus preflight and returns status: "warning" with the standard escalation
shape — warning_id, a NEW check_id, requires_verdict: true, and
candidates. The agent judges the candidates and retries start_planned with
that check_id and a scope_verdict; the retry flows through the same
scope-verdict gate as a direct start (confirmed duplicates still need
confirmation_reason). Candidates the bound check already escalated do not
re-trigger the re-check — the gate enforces their verdict instead.
The re-check writes a task.checked / workstream.checked work event with
payload.activation_recheck: true so audits can tell server re-checks from
agent-called preflights.
Cooldown#
Once an agent judges a candidate NOT a duplicate, the server stops re-escalating that same source-to-candidate pair on later checks until either side's title or scope text changes. A new fingerprint re-opens the evaluation.
The trust model is to trust the agent's verdict; the dashboard Audit signal is the backstop.