Files
diligence/README.md
Marc J. Schmidt bd178fcaf0 Initial release: MCP server enforcing Worker-Reviewer loop
Diligence prevents AI agents from shipping quick fixes that break things
by enforcing a research-propose-verify loop before any code changes.

Key features:
- Worker sub-agent researches and proposes with file:line citations
- Reviewer sub-agent independently verifies claims by searching codebase
- Iterates until approved (max 5 rounds)
- Loads project-specific context from .claude/CODEBASE_CONTEXT.md
- State persisted across sessions

Validated on production codebase: caught architectural mistake (broker
subscriptions on client-side code) that naive agent would have shipped.
2026-01-22 06:22:59 +01:00

9.3 KiB

@strikt/diligence

An MCP server that enforces a Worker-Reviewer loop before code changes, preventing quick fixes that break things.

The Problem

AI coding agents are too eager. Given a bug like "permission cache doesn't invalidate," they'll add .clear() somewhere and call it done. But real codebases have:

  • Broker events that need subscribing
  • Caches in multiple locations
  • Patterns that must be followed
  • Client/server architecture boundaries
  • Edge cases that cause production fires

A naive agent doesn't know what it doesn't know. It proposes a fix, you approve it, and three days later you're debugging why permissions are broken in a completely different feature.

The Solution

Diligence forces a research-verify loop before any code is written:

┌─────────────────────────────────────────────────────────────┐
│  1. WORKER (sub-agent)                                      │
│     - Searches codebase thoroughly                          │
│     - Traces data flow from origin to all consumers         │
│     - Finds existing patterns for similar features          │
│     - Proposes fix with file:line citations                 │
└─────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│  2. REVIEWER (separate sub-agent)                           │
│     - Gets fresh context (no Worker bias)                   │
│     - Verifies every claim by searching codebase            │
│     - Checks against architecture patterns                  │
│     - Returns NEEDS_WORK with specific feedback             │
└─────────────────────────────────────────────────────────────┘
                            │
                    ┌───────┴───────┐
                    ▼               ▼
              NEEDS_WORK        APPROVED
              (loop back)           │
                                    ▼
┌─────────────────────────────────────────────────────────────┐
│  3. IMPLEMENTATION                                          │
│     - Only now can code be changed                          │
│     - Follows the verified proposal                         │
└─────────────────────────────────────────────────────────────┘

Real-World Validation

Tested on a production codebase with a P0 bug: "Permission cache doesn't invalidate on role changes."

Round Worker Proposed Reviewer Found
1 Add broker event subscriptions to TeamService Wrong - broker is server-side only, TeamService is client-side
2 Revised RPC-based approach Missing error handling, incomplete pattern match
3 Simpler solution using ProjectService Still gaps in implementation
4 Complete RPC stream solution Approved

A naive agent would have shipped Round 1 - adding broker subscriptions to client-side code that can't access the broker. That fix would have done nothing and wasted hours of debugging.

Installation

npm install @strikt/diligence

Or clone directly:

git clone https://github.com/strikt/diligence ~/tools/diligence
cd ~/tools/diligence && npm install

Setup

1. Add MCP Server

In your project's .mcp.json or .claude/settings.json:

{
  "mcpServers": {
    "diligence": {
      "command": "node",
      "args": ["/path/to/diligence/index.mjs"]
    }
  }
}

2. Create Codebase Context

Create .claude/CODEBASE_CONTEXT.md in your project:

# Architecture

[Describe your system architecture]

## Key Patterns

[Document patterns agents MUST follow]

## Common Pitfalls

[List things that break if done wrong]

## Events/Hooks

[List broker events, hooks, subscriptions that exist]

This context is loaded into both Worker and Reviewer briefs.

Add to your project's CLAUDE.md:

## Code Changes

For any code changes, use the diligence workflow:

1. Call `mcp__diligence__start` with the task
2. Spawn a Worker sub-agent with `get_worker_brief` to research and propose
3. Spawn a Reviewer sub-agent with `get_reviewer_brief` to verify
4. Loop until APPROVED
5. Call `implement` and make changes

This ensures Claude picks up diligence automatically without explicit instructions.

How It Works

Workflow Phases

conversation → researching → approved → implementing → conversation
                   ↑              │
                   └──────────────┘
                    (NEEDS_WORK, max 5 rounds)

MCP Tools

Tool Description
start Begin workflow with a task description
get_worker_brief Get full context for Worker sub-agent
propose Worker submits proposal with citations
get_reviewer_brief Get full context for Reviewer sub-agent
review Reviewer submits APPROVED or NEEDS_WORK
implement Begin implementation (requires approval)
complete Mark done, reset to conversation
status Check current workflow state
abort Cancel and reset

Sub-Agent Pattern

The key insight: Worker and Reviewer are separate sub-agents with fresh context.

Main Claude Session
    │
    ├─► Worker Sub-Agent (Explore)
    │   - Receives: task + codebase context + previous feedback
    │   - Does: searches, reads, analyzes
    │   - Returns: proposal with file:line citations
    │
    └─► Reviewer Sub-Agent (Explore)
        - Receives: proposal + codebase context (NOT Worker's searches)
        - Does: independently verifies every claim
        - Returns: APPROVED or NEEDS_WORK with specifics

The Reviewer doesn't see the Worker's search results. It must verify claims by actually searching the codebase. This prevents rubber-stamping.

Project Structure

.claude/
├── CODEBASE_CONTEXT.md      # Required - architecture & patterns
├── WORKER_CONTEXT.md        # Optional - project-specific worker guidance
├── REVIEWER_CONTEXT.md      # Optional - project-specific reviewer guidance
├── context/                 # Optional - additional context files
│   ├── voice.md
│   └── permissions.md
└── .diligence-state.json    # Auto-generated workflow state

Example Session

User: Fix the permission cache bug from the todo list

Claude: [reads todo, understands task]
Claude: [calls diligence.start]

Claude: [spawns Worker with Task tool]
Worker: [searches for permission cache, finds memoizedPermissions]
Worker: [searches for role events, finds BusTeamRoleChange]
Worker: [searches for patterns, finds how chat.service handles this]
Worker: [calls diligence.propose with full analysis]

Claude: [spawns Reviewer with Task tool]
Reviewer: [verifies claim about memoizedPermissions - confirmed at team.service.ts:45]
Reviewer: [verifies claim about events - finds Worker missed BusTeamMemberRoleChange]
Reviewer: [calls diligence.review with NEEDS_WORK]

Claude: [spawns new Worker with feedback]
Worker: [addresses feedback, adds missing event]
Worker: [calls diligence.propose]

Claude: [spawns Reviewer]
Reviewer: [all claims verified]
Reviewer: [calls diligence.review with APPROVED]

Claude: [calls diligence.implement]
Claude: [makes code changes following approved proposal]
Claude: [calls diligence.complete]

Why This Works

  1. Enforcement - Can't write code without approval
  2. Fresh Context - Reviewer has no bias from Worker's reasoning
  3. Verification - Claims are checked against actual codebase
  4. Iteration - Multiple rounds refine understanding
  5. Architecture Awareness - Both agents loaded with project context
  6. Audit Trail - Full history of proposals and feedback

Configuration

Environment Variables

Variable Description
None required State stored in .claude/.diligence-state.json

Constants (in index.mjs)

Constant Default Description
MAX_ROUNDS 5 Maximum Worker-Reviewer iterations before reset

Testing

# Run workflow mechanics tests
npm run test:workflow

# Run mock scenario tests
npm run test:mock

# Dry-run against a real project
node test/dry-run.mjs --project=/path/to/project --scenario=blocking-voice

Limitations

  • Requires Claude Code with MCP support
  • Sub-agents use context, so complex codebases may need focused CODEBASE_CONTEXT.md
  • Not a replacement for human code review - complements it

License

MIT

Contributing

Issues and PRs welcome at https://github.com/strikt/diligence