Initial release: MCP server enforcing Worker-Reviewer loop

Diligence prevents AI agents from shipping quick fixes that break things by enforcing a research-propose-verify loop before any code changes. Key features: - Worker sub-agent researches and proposes with file:line citations - Reviewer sub-agent independently verifies claims by searching codebase - Iterates until approved (max 5 rounds) - Loads project-specific context from .claude/CODEBASE_CONTEXT.md - State persisted across sessions Validated on production codebase: caught architectural mistake (broker subscriptions on client-side code) that naive agent would have shipped.
2026-01-22 06:22:59 +01:00
commit bd178fcaf0
23 changed files with 4001 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,274 @@
+# @strikt/diligence
+
+An MCP server that enforces a Worker-Reviewer loop before code changes, preventing quick fixes that break things.
+
+## The Problem
+
+AI coding agents are too eager. Given a bug like "permission cache doesn't invalidate," they'll add `.clear()` somewhere and call it done. But real codebases have:
+
+- Broker events that need subscribing
+- Caches in multiple locations
+- Patterns that must be followed
+- Client/server architecture boundaries
+- Edge cases that cause production fires
+
+**A naive agent doesn't know what it doesn't know.** It proposes a fix, you approve it, and three days later you're debugging why permissions are broken in a completely different feature.
+
+## The Solution
+
+**Diligence** forces a research-verify loop before any code is written:
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│  1. WORKER (sub-agent)                                      │
+│     - Searches codebase thoroughly                          │
+│     - Traces data flow from origin to all consumers         │
+│     - Finds existing patterns for similar features          │
+│     - Proposes fix with file:line citations                 │
+└─────────────────────────────────────────────────────────────┘
+                            │
+                            ▼
+┌─────────────────────────────────────────────────────────────┐
+│  2. REVIEWER (separate sub-agent)                           │
+│     - Gets fresh context (no Worker bias)                   │
+│     - Verifies every claim by searching codebase            │
+│     - Checks against architecture patterns                  │
+│     - Returns NEEDS_WORK with specific feedback             │
+└─────────────────────────────────────────────────────────────┘
+                            │
+                    ┌───────┴───────┐
+                    ▼               ▼
+              NEEDS_WORK        APPROVED
+              (loop back)           │
+                                    ▼
+┌─────────────────────────────────────────────────────────────┐
+│  3. IMPLEMENTATION                                          │
+│     - Only now can code be changed                          │
+│     - Follows the verified proposal                         │
+└─────────────────────────────────────────────────────────────┘
+```
+
+## Real-World Validation
+
+Tested on a production codebase with a P0 bug: "Permission cache doesn't invalidate on role changes."
+
+| Round | Worker Proposed | Reviewer Found |
+|-------|-----------------|----------------|
+| 1 | Add broker event subscriptions to TeamService | **Wrong** - broker is server-side only, TeamService is client-side |
+| 2 | Revised RPC-based approach | Missing error handling, incomplete pattern match |
+| 3 | Simpler solution using ProjectService | Still gaps in implementation |
+| 4 | Complete RPC stream solution | **Approved** |
+
+**A naive agent would have shipped Round 1** - adding broker subscriptions to client-side code that can't access the broker. That fix would have done nothing and wasted hours of debugging.
+
+## Installation
+
+```bash
+npm install @strikt/diligence
+```
+
+Or clone directly:
+
+```bash
+git clone https://github.com/strikt/diligence ~/tools/diligence
+cd ~/tools/diligence && npm install
+```
+
+## Setup
+
+### 1. Add MCP Server
+
+In your project's `.mcp.json` or `.claude/settings.json`:
+
+```json
+{
+  "mcpServers": {
+    "diligence": {
+      "command": "node",
+      "args": ["/path/to/diligence/index.mjs"]
+    }
+  }
+}
+```
+
+### 2. Create Codebase Context
+
+Create `.claude/CODEBASE_CONTEXT.md` in your project:
+
+```markdown
+# Architecture
+
+[Describe your system architecture]
+
+## Key Patterns
+
+[Document patterns agents MUST follow]
+
+## Common Pitfalls
+
+[List things that break if done wrong]
+
+## Events/Hooks
+
+[List broker events, hooks, subscriptions that exist]
+```
+
+This context is loaded into both Worker and Reviewer briefs.
+
+### 3. Add to CLAUDE.md (Recommended)
+
+Add to your project's `CLAUDE.md`:
+
+```markdown
+## Code Changes
+
+For any code changes, use the diligence workflow:
+
+1. Call `mcp__diligence__start` with the task
+2. Spawn a Worker sub-agent with `get_worker_brief` to research and propose
+3. Spawn a Reviewer sub-agent with `get_reviewer_brief` to verify
+4. Loop until APPROVED
+5. Call `implement` and make changes
+```
+
+This ensures Claude picks up diligence automatically without explicit instructions.
+
+## How It Works
+
+### Workflow Phases
+
+```
+conversation → researching → approved → implementing → conversation
+                   ↑              │
+                   └──────────────┘
+                    (NEEDS_WORK, max 5 rounds)
+```
+
+### MCP Tools
+
+| Tool | Description |
+|------|-------------|
+| `start` | Begin workflow with a task description |
+| `get_worker_brief` | Get full context for Worker sub-agent |
+| `propose` | Worker submits proposal with citations |
+| `get_reviewer_brief` | Get full context for Reviewer sub-agent |
+| `review` | Reviewer submits APPROVED or NEEDS_WORK |
+| `implement` | Begin implementation (requires approval) |
+| `complete` | Mark done, reset to conversation |
+| `status` | Check current workflow state |
+| `abort` | Cancel and reset |
+
+### Sub-Agent Pattern
+
+The key insight: **Worker and Reviewer are separate sub-agents** with fresh context.
+
+```
+Main Claude Session
+    │
+    ├─► Worker Sub-Agent (Explore)
+    │   - Receives: task + codebase context + previous feedback
+    │   - Does: searches, reads, analyzes
+    │   - Returns: proposal with file:line citations
+    │
+    └─► Reviewer Sub-Agent (Explore)
+        - Receives: proposal + codebase context (NOT Worker's searches)
+        - Does: independently verifies every claim
+        - Returns: APPROVED or NEEDS_WORK with specifics
+```
+
+The Reviewer doesn't see the Worker's search results. It must verify claims by actually searching the codebase. This prevents rubber-stamping.
+
+## Project Structure
+
+```
+.claude/
+├── CODEBASE_CONTEXT.md      # Required - architecture & patterns
+├── WORKER_CONTEXT.md        # Optional - project-specific worker guidance
+├── REVIEWER_CONTEXT.md      # Optional - project-specific reviewer guidance
+├── context/                 # Optional - additional context files
+│   ├── voice.md
+│   └── permissions.md
+└── .diligence-state.json    # Auto-generated workflow state
+```
+
+## Example Session
+
+```
+User: Fix the permission cache bug from the todo list
+
+Claude: [reads todo, understands task]
+Claude: [calls diligence.start]
+
+Claude: [spawns Worker with Task tool]
+Worker: [searches for permission cache, finds memoizedPermissions]
+Worker: [searches for role events, finds BusTeamRoleChange]
+Worker: [searches for patterns, finds how chat.service handles this]
+Worker: [calls diligence.propose with full analysis]
+
+Claude: [spawns Reviewer with Task tool]
+Reviewer: [verifies claim about memoizedPermissions - confirmed at team.service.ts:45]
+Reviewer: [verifies claim about events - finds Worker missed BusTeamMemberRoleChange]
+Reviewer: [calls diligence.review with NEEDS_WORK]
+
+Claude: [spawns new Worker with feedback]
+Worker: [addresses feedback, adds missing event]
+Worker: [calls diligence.propose]
+
+Claude: [spawns Reviewer]
+Reviewer: [all claims verified]
+Reviewer: [calls diligence.review with APPROVED]
+
+Claude: [calls diligence.implement]
+Claude: [makes code changes following approved proposal]
+Claude: [calls diligence.complete]
+```
+
+## Why This Works
+
+1. **Enforcement** - Can't write code without approval
+2. **Fresh Context** - Reviewer has no bias from Worker's reasoning
+3. **Verification** - Claims are checked against actual codebase
+4. **Iteration** - Multiple rounds refine understanding
+5. **Architecture Awareness** - Both agents loaded with project context
+6. **Audit Trail** - Full history of proposals and feedback
+
+## Configuration
+
+### Environment Variables
+
+| Variable | Description |
+|----------|-------------|
+| None required | State stored in `.claude/.diligence-state.json` |
+
+### Constants (in index.mjs)
+
+| Constant | Default | Description |
+|----------|---------|-------------|
+| `MAX_ROUNDS` | 5 | Maximum Worker-Reviewer iterations before reset |
+
+## Testing
+
+```bash
+# Run workflow mechanics tests
+npm run test:workflow
+
+# Run mock scenario tests
+npm run test:mock
+
+# Dry-run against a real project
+node test/dry-run.mjs --project=/path/to/project --scenario=blocking-voice
+```
+
+## Limitations
+
+- Requires Claude Code with MCP support
+- Sub-agents use context, so complex codebases may need focused CODEBASE_CONTEXT.md
+- Not a replacement for human code review - complements it
+
+## License
+
+MIT
+
+## Contributing
+
+Issues and PRs welcome at https://github.com/strikt/diligence