Initial release: MCP server enforcing Worker-Reviewer loop

Diligence prevents AI agents from shipping quick fixes that break things
by enforcing a research-propose-verify loop before any code changes.

Key features:
- Worker sub-agent researches and proposes with file:line citations
- Reviewer sub-agent independently verifies claims by searching codebase
- Iterates until approved (max 5 rounds)
- Loads project-specific context from .claude/CODEBASE_CONTEXT.md
- State persisted across sessions

Validated on production codebase: caught architectural mistake (broker
subscriptions on client-side code) that naive agent would have shipped.
This commit is contained in:
2026-01-22 06:22:59 +01:00
commit bd178fcaf0
23 changed files with 4001 additions and 0 deletions

274
README.md Normal file
View File

@@ -0,0 +1,274 @@
# @strikt/diligence
An MCP server that enforces a Worker-Reviewer loop before code changes, preventing quick fixes that break things.
## The Problem
AI coding agents are too eager. Given a bug like "permission cache doesn't invalidate," they'll add `.clear()` somewhere and call it done. But real codebases have:
- Broker events that need subscribing
- Caches in multiple locations
- Patterns that must be followed
- Client/server architecture boundaries
- Edge cases that cause production fires
**A naive agent doesn't know what it doesn't know.** It proposes a fix, you approve it, and three days later you're debugging why permissions are broken in a completely different feature.
## The Solution
**Diligence** forces a research-verify loop before any code is written:
```
┌─────────────────────────────────────────────────────────────┐
│ 1. WORKER (sub-agent) │
│ - Searches codebase thoroughly │
│ - Traces data flow from origin to all consumers │
│ - Finds existing patterns for similar features │
│ - Proposes fix with file:line citations │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ 2. REVIEWER (separate sub-agent) │
│ - Gets fresh context (no Worker bias) │
│ - Verifies every claim by searching codebase │
│ - Checks against architecture patterns │
│ - Returns NEEDS_WORK with specific feedback │
└─────────────────────────────────────────────────────────────┘
┌───────┴───────┐
▼ ▼
NEEDS_WORK APPROVED
(loop back) │
┌─────────────────────────────────────────────────────────────┐
│ 3. IMPLEMENTATION │
│ - Only now can code be changed │
│ - Follows the verified proposal │
└─────────────────────────────────────────────────────────────┘
```
## Real-World Validation
Tested on a production codebase with a P0 bug: "Permission cache doesn't invalidate on role changes."
| Round | Worker Proposed | Reviewer Found |
|-------|-----------------|----------------|
| 1 | Add broker event subscriptions to TeamService | **Wrong** - broker is server-side only, TeamService is client-side |
| 2 | Revised RPC-based approach | Missing error handling, incomplete pattern match |
| 3 | Simpler solution using ProjectService | Still gaps in implementation |
| 4 | Complete RPC stream solution | **Approved** |
**A naive agent would have shipped Round 1** - adding broker subscriptions to client-side code that can't access the broker. That fix would have done nothing and wasted hours of debugging.
## Installation
```bash
npm install @strikt/diligence
```
Or clone directly:
```bash
git clone https://github.com/strikt/diligence ~/tools/diligence
cd ~/tools/diligence && npm install
```
## Setup
### 1. Add MCP Server
In your project's `.mcp.json` or `.claude/settings.json`:
```json
{
"mcpServers": {
"diligence": {
"command": "node",
"args": ["/path/to/diligence/index.mjs"]
}
}
}
```
### 2. Create Codebase Context
Create `.claude/CODEBASE_CONTEXT.md` in your project:
```markdown
# Architecture
[Describe your system architecture]
## Key Patterns
[Document patterns agents MUST follow]
## Common Pitfalls
[List things that break if done wrong]
## Events/Hooks
[List broker events, hooks, subscriptions that exist]
```
This context is loaded into both Worker and Reviewer briefs.
### 3. Add to CLAUDE.md (Recommended)
Add to your project's `CLAUDE.md`:
```markdown
## Code Changes
For any code changes, use the diligence workflow:
1. Call `mcp__diligence__start` with the task
2. Spawn a Worker sub-agent with `get_worker_brief` to research and propose
3. Spawn a Reviewer sub-agent with `get_reviewer_brief` to verify
4. Loop until APPROVED
5. Call `implement` and make changes
```
This ensures Claude picks up diligence automatically without explicit instructions.
## How It Works
### Workflow Phases
```
conversation → researching → approved → implementing → conversation
↑ │
└──────────────┘
(NEEDS_WORK, max 5 rounds)
```
### MCP Tools
| Tool | Description |
|------|-------------|
| `start` | Begin workflow with a task description |
| `get_worker_brief` | Get full context for Worker sub-agent |
| `propose` | Worker submits proposal with citations |
| `get_reviewer_brief` | Get full context for Reviewer sub-agent |
| `review` | Reviewer submits APPROVED or NEEDS_WORK |
| `implement` | Begin implementation (requires approval) |
| `complete` | Mark done, reset to conversation |
| `status` | Check current workflow state |
| `abort` | Cancel and reset |
### Sub-Agent Pattern
The key insight: **Worker and Reviewer are separate sub-agents** with fresh context.
```
Main Claude Session
├─► Worker Sub-Agent (Explore)
│ - Receives: task + codebase context + previous feedback
│ - Does: searches, reads, analyzes
│ - Returns: proposal with file:line citations
└─► Reviewer Sub-Agent (Explore)
- Receives: proposal + codebase context (NOT Worker's searches)
- Does: independently verifies every claim
- Returns: APPROVED or NEEDS_WORK with specifics
```
The Reviewer doesn't see the Worker's search results. It must verify claims by actually searching the codebase. This prevents rubber-stamping.
## Project Structure
```
.claude/
├── CODEBASE_CONTEXT.md # Required - architecture & patterns
├── WORKER_CONTEXT.md # Optional - project-specific worker guidance
├── REVIEWER_CONTEXT.md # Optional - project-specific reviewer guidance
├── context/ # Optional - additional context files
│ ├── voice.md
│ └── permissions.md
└── .diligence-state.json # Auto-generated workflow state
```
## Example Session
```
User: Fix the permission cache bug from the todo list
Claude: [reads todo, understands task]
Claude: [calls diligence.start]
Claude: [spawns Worker with Task tool]
Worker: [searches for permission cache, finds memoizedPermissions]
Worker: [searches for role events, finds BusTeamRoleChange]
Worker: [searches for patterns, finds how chat.service handles this]
Worker: [calls diligence.propose with full analysis]
Claude: [spawns Reviewer with Task tool]
Reviewer: [verifies claim about memoizedPermissions - confirmed at team.service.ts:45]
Reviewer: [verifies claim about events - finds Worker missed BusTeamMemberRoleChange]
Reviewer: [calls diligence.review with NEEDS_WORK]
Claude: [spawns new Worker with feedback]
Worker: [addresses feedback, adds missing event]
Worker: [calls diligence.propose]
Claude: [spawns Reviewer]
Reviewer: [all claims verified]
Reviewer: [calls diligence.review with APPROVED]
Claude: [calls diligence.implement]
Claude: [makes code changes following approved proposal]
Claude: [calls diligence.complete]
```
## Why This Works
1. **Enforcement** - Can't write code without approval
2. **Fresh Context** - Reviewer has no bias from Worker's reasoning
3. **Verification** - Claims are checked against actual codebase
4. **Iteration** - Multiple rounds refine understanding
5. **Architecture Awareness** - Both agents loaded with project context
6. **Audit Trail** - Full history of proposals and feedback
## Configuration
### Environment Variables
| Variable | Description |
|----------|-------------|
| None required | State stored in `.claude/.diligence-state.json` |
### Constants (in index.mjs)
| Constant | Default | Description |
|----------|---------|-------------|
| `MAX_ROUNDS` | 5 | Maximum Worker-Reviewer iterations before reset |
## Testing
```bash
# Run workflow mechanics tests
npm run test:workflow
# Run mock scenario tests
npm run test:mock
# Dry-run against a real project
node test/dry-run.mjs --project=/path/to/project --scenario=blocking-voice
```
## Limitations
- Requires Claude Code with MCP support
- Sub-agents use context, so complex codebases may need focused CODEBASE_CONTEXT.md
- Not a replacement for human code review - complements it
## License
MIT
## Contributing
Issues and PRs welcome at https://github.com/strikt/diligence