Initial release: MCP server enforcing Worker-Reviewer loop
Diligence prevents AI agents from shipping quick fixes that break things by enforcing a research-propose-verify loop before any code changes. Key features: - Worker sub-agent researches and proposes with file:line citations - Reviewer sub-agent independently verifies claims by searching codebase - Iterates until approved (max 5 rounds) - Loads project-specific context from .claude/CODEBASE_CONTEXT.md - State persisted across sessions Validated on production codebase: caught architectural mistake (broker subscriptions on client-side code) that naive agent would have shipped.
This commit is contained in:
274
README.md
Normal file
274
README.md
Normal file
@@ -0,0 +1,274 @@
|
||||
# @strikt/diligence
|
||||
|
||||
An MCP server that enforces a Worker-Reviewer loop before code changes, preventing quick fixes that break things.
|
||||
|
||||
## The Problem
|
||||
|
||||
AI coding agents are too eager. Given a bug like "permission cache doesn't invalidate," they'll add `.clear()` somewhere and call it done. But real codebases have:
|
||||
|
||||
- Broker events that need subscribing
|
||||
- Caches in multiple locations
|
||||
- Patterns that must be followed
|
||||
- Client/server architecture boundaries
|
||||
- Edge cases that cause production fires
|
||||
|
||||
**A naive agent doesn't know what it doesn't know.** It proposes a fix, you approve it, and three days later you're debugging why permissions are broken in a completely different feature.
|
||||
|
||||
## The Solution
|
||||
|
||||
**Diligence** forces a research-verify loop before any code is written:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ 1. WORKER (sub-agent) │
|
||||
│ - Searches codebase thoroughly │
|
||||
│ - Traces data flow from origin to all consumers │
|
||||
│ - Finds existing patterns for similar features │
|
||||
│ - Proposes fix with file:line citations │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ 2. REVIEWER (separate sub-agent) │
|
||||
│ - Gets fresh context (no Worker bias) │
|
||||
│ - Verifies every claim by searching codebase │
|
||||
│ - Checks against architecture patterns │
|
||||
│ - Returns NEEDS_WORK with specific feedback │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
┌───────┴───────┐
|
||||
▼ ▼
|
||||
NEEDS_WORK APPROVED
|
||||
(loop back) │
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ 3. IMPLEMENTATION │
|
||||
│ - Only now can code be changed │
|
||||
│ - Follows the verified proposal │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Real-World Validation
|
||||
|
||||
Tested on a production codebase with a P0 bug: "Permission cache doesn't invalidate on role changes."
|
||||
|
||||
| Round | Worker Proposed | Reviewer Found |
|
||||
|-------|-----------------|----------------|
|
||||
| 1 | Add broker event subscriptions to TeamService | **Wrong** - broker is server-side only, TeamService is client-side |
|
||||
| 2 | Revised RPC-based approach | Missing error handling, incomplete pattern match |
|
||||
| 3 | Simpler solution using ProjectService | Still gaps in implementation |
|
||||
| 4 | Complete RPC stream solution | **Approved** |
|
||||
|
||||
**A naive agent would have shipped Round 1** - adding broker subscriptions to client-side code that can't access the broker. That fix would have done nothing and wasted hours of debugging.
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
npm install @strikt/diligence
|
||||
```
|
||||
|
||||
Or clone directly:
|
||||
|
||||
```bash
|
||||
git clone https://github.com/strikt/diligence ~/tools/diligence
|
||||
cd ~/tools/diligence && npm install
|
||||
```
|
||||
|
||||
## Setup
|
||||
|
||||
### 1. Add MCP Server
|
||||
|
||||
In your project's `.mcp.json` or `.claude/settings.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"diligence": {
|
||||
"command": "node",
|
||||
"args": ["/path/to/diligence/index.mjs"]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Create Codebase Context
|
||||
|
||||
Create `.claude/CODEBASE_CONTEXT.md` in your project:
|
||||
|
||||
```markdown
|
||||
# Architecture
|
||||
|
||||
[Describe your system architecture]
|
||||
|
||||
## Key Patterns
|
||||
|
||||
[Document patterns agents MUST follow]
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
[List things that break if done wrong]
|
||||
|
||||
## Events/Hooks
|
||||
|
||||
[List broker events, hooks, subscriptions that exist]
|
||||
```
|
||||
|
||||
This context is loaded into both Worker and Reviewer briefs.
|
||||
|
||||
### 3. Add to CLAUDE.md (Recommended)
|
||||
|
||||
Add to your project's `CLAUDE.md`:
|
||||
|
||||
```markdown
|
||||
## Code Changes
|
||||
|
||||
For any code changes, use the diligence workflow:
|
||||
|
||||
1. Call `mcp__diligence__start` with the task
|
||||
2. Spawn a Worker sub-agent with `get_worker_brief` to research and propose
|
||||
3. Spawn a Reviewer sub-agent with `get_reviewer_brief` to verify
|
||||
4. Loop until APPROVED
|
||||
5. Call `implement` and make changes
|
||||
```
|
||||
|
||||
This ensures Claude picks up diligence automatically without explicit instructions.
|
||||
|
||||
## How It Works
|
||||
|
||||
### Workflow Phases
|
||||
|
||||
```
|
||||
conversation → researching → approved → implementing → conversation
|
||||
↑ │
|
||||
└──────────────┘
|
||||
(NEEDS_WORK, max 5 rounds)
|
||||
```
|
||||
|
||||
### MCP Tools
|
||||
|
||||
| Tool | Description |
|
||||
|------|-------------|
|
||||
| `start` | Begin workflow with a task description |
|
||||
| `get_worker_brief` | Get full context for Worker sub-agent |
|
||||
| `propose` | Worker submits proposal with citations |
|
||||
| `get_reviewer_brief` | Get full context for Reviewer sub-agent |
|
||||
| `review` | Reviewer submits APPROVED or NEEDS_WORK |
|
||||
| `implement` | Begin implementation (requires approval) |
|
||||
| `complete` | Mark done, reset to conversation |
|
||||
| `status` | Check current workflow state |
|
||||
| `abort` | Cancel and reset |
|
||||
|
||||
### Sub-Agent Pattern
|
||||
|
||||
The key insight: **Worker and Reviewer are separate sub-agents** with fresh context.
|
||||
|
||||
```
|
||||
Main Claude Session
|
||||
│
|
||||
├─► Worker Sub-Agent (Explore)
|
||||
│ - Receives: task + codebase context + previous feedback
|
||||
│ - Does: searches, reads, analyzes
|
||||
│ - Returns: proposal with file:line citations
|
||||
│
|
||||
└─► Reviewer Sub-Agent (Explore)
|
||||
- Receives: proposal + codebase context (NOT Worker's searches)
|
||||
- Does: independently verifies every claim
|
||||
- Returns: APPROVED or NEEDS_WORK with specifics
|
||||
```
|
||||
|
||||
The Reviewer doesn't see the Worker's search results. It must verify claims by actually searching the codebase. This prevents rubber-stamping.
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
.claude/
|
||||
├── CODEBASE_CONTEXT.md # Required - architecture & patterns
|
||||
├── WORKER_CONTEXT.md # Optional - project-specific worker guidance
|
||||
├── REVIEWER_CONTEXT.md # Optional - project-specific reviewer guidance
|
||||
├── context/ # Optional - additional context files
|
||||
│ ├── voice.md
|
||||
│ └── permissions.md
|
||||
└── .diligence-state.json # Auto-generated workflow state
|
||||
```
|
||||
|
||||
## Example Session
|
||||
|
||||
```
|
||||
User: Fix the permission cache bug from the todo list
|
||||
|
||||
Claude: [reads todo, understands task]
|
||||
Claude: [calls diligence.start]
|
||||
|
||||
Claude: [spawns Worker with Task tool]
|
||||
Worker: [searches for permission cache, finds memoizedPermissions]
|
||||
Worker: [searches for role events, finds BusTeamRoleChange]
|
||||
Worker: [searches for patterns, finds how chat.service handles this]
|
||||
Worker: [calls diligence.propose with full analysis]
|
||||
|
||||
Claude: [spawns Reviewer with Task tool]
|
||||
Reviewer: [verifies claim about memoizedPermissions - confirmed at team.service.ts:45]
|
||||
Reviewer: [verifies claim about events - finds Worker missed BusTeamMemberRoleChange]
|
||||
Reviewer: [calls diligence.review with NEEDS_WORK]
|
||||
|
||||
Claude: [spawns new Worker with feedback]
|
||||
Worker: [addresses feedback, adds missing event]
|
||||
Worker: [calls diligence.propose]
|
||||
|
||||
Claude: [spawns Reviewer]
|
||||
Reviewer: [all claims verified]
|
||||
Reviewer: [calls diligence.review with APPROVED]
|
||||
|
||||
Claude: [calls diligence.implement]
|
||||
Claude: [makes code changes following approved proposal]
|
||||
Claude: [calls diligence.complete]
|
||||
```
|
||||
|
||||
## Why This Works
|
||||
|
||||
1. **Enforcement** - Can't write code without approval
|
||||
2. **Fresh Context** - Reviewer has no bias from Worker's reasoning
|
||||
3. **Verification** - Claims are checked against actual codebase
|
||||
4. **Iteration** - Multiple rounds refine understanding
|
||||
5. **Architecture Awareness** - Both agents loaded with project context
|
||||
6. **Audit Trail** - Full history of proposals and feedback
|
||||
|
||||
## Configuration
|
||||
|
||||
### Environment Variables
|
||||
|
||||
| Variable | Description |
|
||||
|----------|-------------|
|
||||
| None required | State stored in `.claude/.diligence-state.json` |
|
||||
|
||||
### Constants (in index.mjs)
|
||||
|
||||
| Constant | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `MAX_ROUNDS` | 5 | Maximum Worker-Reviewer iterations before reset |
|
||||
|
||||
## Testing
|
||||
|
||||
```bash
|
||||
# Run workflow mechanics tests
|
||||
npm run test:workflow
|
||||
|
||||
# Run mock scenario tests
|
||||
npm run test:mock
|
||||
|
||||
# Dry-run against a real project
|
||||
node test/dry-run.mjs --project=/path/to/project --scenario=blocking-voice
|
||||
```
|
||||
|
||||
## Limitations
|
||||
|
||||
- Requires Claude Code with MCP support
|
||||
- Sub-agents use context, so complex codebases may need focused CODEBASE_CONTEXT.md
|
||||
- Not a replacement for human code review - complements it
|
||||
|
||||
## License
|
||||
|
||||
MIT
|
||||
|
||||
## Contributing
|
||||
|
||||
Issues and PRs welcome at https://github.com/strikt/diligence
|
||||
Reference in New Issue
Block a user