Files

Marc J. Schmidt bd178fcaf0 Initial release: MCP server enforcing Worker-Reviewer loop

Diligence prevents AI agents from shipping quick fixes that break things
by enforcing a research-propose-verify loop before any code changes.

Key features:
- Worker sub-agent researches and proposes with file:line citations
- Reviewer sub-agent independently verifies claims by searching codebase
- Iterates until approved (max 5 rounds)
- Loads project-specific context from .claude/CODEBASE_CONTEXT.md
- State persisted across sessions

Validated on production codebase: caught architectural mistake (broker
subscriptions on client-side code) that naive agent would have shipped.

2026-01-22 06:22:59 +01:00

9.3 KiB

Raw Blame History

@strikt/diligence

An MCP server that enforces a Worker-Reviewer loop before code changes, preventing quick fixes that break things.

The Problem

AI coding agents are too eager. Given a bug like "permission cache doesn't invalidate," they'll add .clear() somewhere and call it done. But real codebases have:

Broker events that need subscribing
Caches in multiple locations
Patterns that must be followed
Client/server architecture boundaries
Edge cases that cause production fires

A naive agent doesn't know what it doesn't know. It proposes a fix, you approve it, and three days later you're debugging why permissions are broken in a completely different feature.

The Solution

Diligence forces a research-verify loop before any code is written:

┌─────────────────────────────────────────────────────────────┐
│  1. WORKER (sub-agent)                                      │
│     - Searches codebase thoroughly                          │
│     - Traces data flow from origin to all consumers         │
│     - Finds existing patterns for similar features          │
│     - Proposes fix with file:line citations                 │
└─────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│  2. REVIEWER (separate sub-agent)                           │
│     - Gets fresh context (no Worker bias)                   │
│     - Verifies every claim by searching codebase            │
│     - Checks against architecture patterns                  │
│     - Returns NEEDS_WORK with specific feedback             │
└─────────────────────────────────────────────────────────────┘
                            │
                    ┌───────┴───────┐
                    ▼               ▼
              NEEDS_WORK        APPROVED
              (loop back)           │
                                    ▼
┌─────────────────────────────────────────────────────────────┐
│  3. IMPLEMENTATION                                          │
│     - Only now can code be changed                          │
│     - Follows the verified proposal                         │
└─────────────────────────────────────────────────────────────┘

Real-World Validation

Tested on a production codebase with a P0 bug: "Permission cache doesn't invalidate on role changes."

Round	Worker Proposed	Reviewer Found
1	Add broker event subscriptions to TeamService	Wrong - broker is server-side only, TeamService is client-side
2	Revised RPC-based approach	Missing error handling, incomplete pattern match
3	Simpler solution using ProjectService	Still gaps in implementation
4	Complete RPC stream solution	Approved

A naive agent would have shipped Round 1 - adding broker subscriptions to client-side code that can't access the broker. That fix would have done nothing and wasted hours of debugging.

Installation

npm install @strikt/diligence

Or clone directly:

git clone https://github.com/strikt/diligence ~/tools/diligence
cd ~/tools/diligence && npm install

Setup

1. Add MCP Server

In your project's .mcp.json or .claude/settings.json:

{
  "mcpServers": {
    "diligence": {
      "command": "node",
      "args": ["/path/to/diligence/index.mjs"]
    }
  }
}

2. Create Codebase Context

Create .claude/CODEBASE_CONTEXT.md in your project:

# Architecture

[Describe your system architecture]

## Key Patterns

[Document patterns agents MUST follow]

## Common Pitfalls

[List things that break if done wrong]

## Events/Hooks

[List broker events, hooks, subscriptions that exist]

This context is loaded into both Worker and Reviewer briefs.

3. Add to CLAUDE.md (Recommended)

Add to your project's CLAUDE.md:

## Code Changes

For any code changes, use the diligence workflow:

1. Call `mcp__diligence__start` with the task
2. Spawn a Worker sub-agent with `get_worker_brief` to research and propose
3. Spawn a Reviewer sub-agent with `get_reviewer_brief` to verify
4. Loop until APPROVED
5. Call `implement` and make changes

This ensures Claude picks up diligence automatically without explicit instructions.

How It Works

Workflow Phases

conversation → researching → approved → implementing → conversation
                   ↑              │
                   └──────────────┘
                    (NEEDS_WORK, max 5 rounds)

MCP Tools

Tool	Description
`start`	Begin workflow with a task description
`get_worker_brief`	Get full context for Worker sub-agent
`propose`	Worker submits proposal with citations
`get_reviewer_brief`	Get full context for Reviewer sub-agent
`review`	Reviewer submits APPROVED or NEEDS_WORK
`implement`	Begin implementation (requires approval)
`complete`	Mark done, reset to conversation
`status`	Check current workflow state
`abort`	Cancel and reset

Sub-Agent Pattern

The key insight: Worker and Reviewer are separate sub-agents with fresh context.

Main Claude Session
    │
    ├─► Worker Sub-Agent (Explore)
    │   - Receives: task + codebase context + previous feedback
    │   - Does: searches, reads, analyzes
    │   - Returns: proposal with file:line citations
    │
    └─► Reviewer Sub-Agent (Explore)
        - Receives: proposal + codebase context (NOT Worker's searches)
        - Does: independently verifies every claim
        - Returns: APPROVED or NEEDS_WORK with specifics

The Reviewer doesn't see the Worker's search results. It must verify claims by actually searching the codebase. This prevents rubber-stamping.

Project Structure

.claude/
├── CODEBASE_CONTEXT.md      # Required - architecture & patterns
├── WORKER_CONTEXT.md        # Optional - project-specific worker guidance
├── REVIEWER_CONTEXT.md      # Optional - project-specific reviewer guidance
├── context/                 # Optional - additional context files
│   ├── voice.md
│   └── permissions.md
└── .diligence-state.json    # Auto-generated workflow state

Example Session

User: Fix the permission cache bug from the todo list

Claude: [reads todo, understands task]
Claude: [calls diligence.start]

Claude: [spawns Worker with Task tool]
Worker: [searches for permission cache, finds memoizedPermissions]
Worker: [searches for role events, finds BusTeamRoleChange]
Worker: [searches for patterns, finds how chat.service handles this]
Worker: [calls diligence.propose with full analysis]

Claude: [spawns Reviewer with Task tool]
Reviewer: [verifies claim about memoizedPermissions - confirmed at team.service.ts:45]
Reviewer: [verifies claim about events - finds Worker missed BusTeamMemberRoleChange]
Reviewer: [calls diligence.review with NEEDS_WORK]

Claude: [spawns new Worker with feedback]
Worker: [addresses feedback, adds missing event]
Worker: [calls diligence.propose]

Claude: [spawns Reviewer]
Reviewer: [all claims verified]
Reviewer: [calls diligence.review with APPROVED]

Claude: [calls diligence.implement]
Claude: [makes code changes following approved proposal]
Claude: [calls diligence.complete]

Why This Works

Enforcement - Can't write code without approval
Fresh Context - Reviewer has no bias from Worker's reasoning
Verification - Claims are checked against actual codebase
Iteration - Multiple rounds refine understanding
Architecture Awareness - Both agents loaded with project context
Audit Trail - Full history of proposals and feedback

Configuration

Environment Variables

Variable	Description
None required	State stored in `.claude/.diligence-state.json`

Constants (in index.mjs)

Constant	Default	Description
`MAX_ROUNDS`	5	Maximum Worker-Reviewer iterations before reset

Testing

# Run workflow mechanics tests
npm run test:workflow

# Run mock scenario tests
npm run test:mock

# Dry-run against a real project
node test/dry-run.mjs --project=/path/to/project --scenario=blocking-voice

Limitations

Requires Claude Code with MCP support
Sub-agents use context, so complex codebases may need focused CODEBASE_CONTEXT.md
Not a replacement for human code review - complements it

License

MIT

Contributing

Issues and PRs welcome at https://github.com/strikt/diligence

9.3 KiB Raw Blame History