# @strikt/diligence An MCP server that enforces a Worker-Reviewer loop before code changes, preventing quick fixes that break things. ## The Problem AI coding agents are too eager. Given a bug like "permission cache doesn't invalidate," they'll add `.clear()` somewhere and call it done. But real codebases have: - Broker events that need subscribing - Caches in multiple locations - Patterns that must be followed - Client/server architecture boundaries - Edge cases that cause production fires **A naive agent doesn't know what it doesn't know.** It proposes a fix, you approve it, and three days later you're debugging why permissions are broken in a completely different feature. ## The Solution **Diligence** forces a research-verify loop before any code is written: ``` ┌─────────────────────────────────────────────────────────────┐ │ 1. WORKER (sub-agent) │ │ - Searches codebase thoroughly │ │ - Traces data flow from origin to all consumers │ │ - Finds existing patterns for similar features │ │ - Proposes fix with file:line citations │ └─────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ 2. REVIEWER (separate sub-agent) │ │ - Gets fresh context (no Worker bias) │ │ - Verifies every claim by searching codebase │ │ - Checks against architecture patterns │ │ - Returns NEEDS_WORK with specific feedback │ └─────────────────────────────────────────────────────────────┘ │ ┌───────┴───────┐ ▼ ▼ NEEDS_WORK APPROVED (loop back) │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ 3. IMPLEMENTATION │ │ - Only now can code be changed │ │ - Follows the verified proposal │ └─────────────────────────────────────────────────────────────┘ ``` ## Real-World Validation Tested on a production codebase with a P0 bug: "Permission cache doesn't invalidate on role changes." | Round | Worker Proposed | Reviewer Found | |-------|-----------------|----------------| | 1 | Add broker event subscriptions to TeamService | **Wrong** - broker is server-side only, TeamService is client-side | | 2 | Revised RPC-based approach | Missing error handling, incomplete pattern match | | 3 | Simpler solution using ProjectService | Still gaps in implementation | | 4 | Complete RPC stream solution | **Approved** | **A naive agent would have shipped Round 1** - adding broker subscriptions to client-side code that can't access the broker. That fix would have done nothing and wasted hours of debugging. ## Installation ```bash npm install @strikt/diligence ``` Or clone directly: ```bash git clone https://github.com/strikt/diligence ~/tools/diligence cd ~/tools/diligence && npm install ``` ## Setup ### 1. Add MCP Server In your project's `.mcp.json` or `.claude/settings.json`: ```json { "mcpServers": { "diligence": { "command": "node", "args": ["/path/to/diligence/index.mjs"] } } } ``` ### 2. Create Codebase Context Create `.claude/CODEBASE_CONTEXT.md` in your project: ```markdown # Architecture [Describe your system architecture] ## Key Patterns [Document patterns agents MUST follow] ## Common Pitfalls [List things that break if done wrong] ## Events/Hooks [List broker events, hooks, subscriptions that exist] ``` This context is loaded into both Worker and Reviewer briefs. ### 3. Add to CLAUDE.md (Recommended) Add to your project's `CLAUDE.md`: ```markdown ## Code Changes For any code changes, use the diligence workflow: 1. Call `mcp__diligence__start` with the task 2. Spawn a Worker sub-agent with `get_worker_brief` to research and propose 3. Spawn a Reviewer sub-agent with `get_reviewer_brief` to verify 4. Loop until APPROVED 5. Call `implement` and make changes ``` This ensures Claude picks up diligence automatically without explicit instructions. ## How It Works ### Workflow Phases ``` conversation → researching → approved → implementing → conversation ↑ │ └──────────────┘ (NEEDS_WORK, max 5 rounds) ``` ### MCP Tools | Tool | Description | |------|-------------| | `start` | Begin workflow with a task description | | `get_worker_brief` | Get full context for Worker sub-agent | | `propose` | Worker submits proposal with citations | | `get_reviewer_brief` | Get full context for Reviewer sub-agent | | `review` | Reviewer submits APPROVED or NEEDS_WORK | | `implement` | Begin implementation (requires approval) | | `complete` | Mark done, reset to conversation | | `status` | Check current workflow state | | `abort` | Cancel and reset | ### Sub-Agent Pattern The key insight: **Worker and Reviewer are separate sub-agents** with fresh context. ``` Main Claude Session │ ├─► Worker Sub-Agent (Explore) │ - Receives: task + codebase context + previous feedback │ - Does: searches, reads, analyzes │ - Returns: proposal with file:line citations │ └─► Reviewer Sub-Agent (Explore) - Receives: proposal + codebase context (NOT Worker's searches) - Does: independently verifies every claim - Returns: APPROVED or NEEDS_WORK with specifics ``` The Reviewer doesn't see the Worker's search results. It must verify claims by actually searching the codebase. This prevents rubber-stamping. ## Project Structure ``` .claude/ ├── CODEBASE_CONTEXT.md # Required - architecture & patterns ├── WORKER_CONTEXT.md # Optional - project-specific worker guidance ├── REVIEWER_CONTEXT.md # Optional - project-specific reviewer guidance ├── context/ # Optional - additional context files │ ├── voice.md │ └── permissions.md └── .diligence-state.json # Auto-generated workflow state ``` ## Example Session ``` User: Fix the permission cache bug from the todo list Claude: [reads todo, understands task] Claude: [calls diligence.start] Claude: [spawns Worker with Task tool] Worker: [searches for permission cache, finds memoizedPermissions] Worker: [searches for role events, finds BusTeamRoleChange] Worker: [searches for patterns, finds how chat.service handles this] Worker: [calls diligence.propose with full analysis] Claude: [spawns Reviewer with Task tool] Reviewer: [verifies claim about memoizedPermissions - confirmed at team.service.ts:45] Reviewer: [verifies claim about events - finds Worker missed BusTeamMemberRoleChange] Reviewer: [calls diligence.review with NEEDS_WORK] Claude: [spawns new Worker with feedback] Worker: [addresses feedback, adds missing event] Worker: [calls diligence.propose] Claude: [spawns Reviewer] Reviewer: [all claims verified] Reviewer: [calls diligence.review with APPROVED] Claude: [calls diligence.implement] Claude: [makes code changes following approved proposal] Claude: [calls diligence.complete] ``` ## Why This Works 1. **Enforcement** - Can't write code without approval 2. **Fresh Context** - Reviewer has no bias from Worker's reasoning 3. **Verification** - Claims are checked against actual codebase 4. **Iteration** - Multiple rounds refine understanding 5. **Architecture Awareness** - Both agents loaded with project context 6. **Audit Trail** - Full history of proposals and feedback ## Configuration ### Environment Variables | Variable | Description | |----------|-------------| | None required | State stored in `.claude/.diligence-state.json` | ### Constants (in index.mjs) | Constant | Default | Description | |----------|---------|-------------| | `MAX_ROUNDS` | 5 | Maximum Worker-Reviewer iterations before reset | ## Testing ```bash # Run workflow mechanics tests npm run test:workflow # Run mock scenario tests npm run test:mock # Dry-run against a real project node test/dry-run.mjs --project=/path/to/project --scenario=blocking-voice ``` ## Limitations - Requires Claude Code with MCP support - Sub-agents use context, so complex codebases may need focused CODEBASE_CONTEXT.md - Not a replacement for human code review - complements it ## License MIT ## Contributing Issues and PRs welcome at https://github.com/strikt/diligence