# @strikt/diligence

An MCP server that enforces a Worker-Reviewer loop before code changes, preventing quick fixes that break things.

## The Problem

AI coding agents are too eager. Given a bug like "permission cache doesn't invalidate," they'll add `.clear()` somewhere and call it done. But real codebases have:

- Broker events that need subscribing
- Caches in multiple locations
- Patterns that must be followed
- Client/server architecture boundaries
- Edge cases that cause production fires

**A naive agent doesn't know what it doesn't know.** It proposes a fix, you approve it, and three days later you're debugging why permissions are broken in a completely different feature.

## The Solution

**Diligence** forces a research-verify loop before any code is written:

```
┌─────────────────────────────────────────────────────────────┐
│  1. WORKER (sub-agent)                                      │
│     - Searches codebase thoroughly                          │
│     - Traces data flow from origin to all consumers         │
│     - Finds existing patterns for similar features          │
│     - Proposes fix with file:line citations                 │
└─────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│  2. REVIEWER (separate sub-agent)                           │
│     - Gets fresh context (no Worker bias)                   │
│     - Verifies every claim by searching codebase            │
│     - Checks against architecture patterns                  │
│     - Returns NEEDS_WORK with specific feedback             │
└─────────────────────────────────────────────────────────────┘
                            │
                    ┌───────┴───────┐
                    ▼               ▼
              NEEDS_WORK        APPROVED
              (loop back)           │
                                    ▼
┌─────────────────────────────────────────────────────────────┐
│  3. IMPLEMENTATION                                          │
│     - Only now can code be changed                          │
│     - Follows the verified proposal                         │
└─────────────────────────────────────────────────────────────┘
```

## Real-World Validation

Tested on a production codebase with a P0 bug: "Permission cache doesn't invalidate on role changes."

| Round | Worker Proposed | Reviewer Found |
|-------|-----------------|----------------|
| 1 | Add broker event subscriptions to TeamService | **Wrong** - broker is server-side only, TeamService is client-side |
| 2 | Revised RPC-based approach | Missing error handling, incomplete pattern match |
| 3 | Simpler solution using ProjectService | Still gaps in implementation |
| 4 | Complete RPC stream solution | **Approved** |

**A naive agent would have shipped Round 1** - adding broker subscriptions to client-side code that can't access the broker. That fix would have done nothing and wasted hours of debugging.

## Installation

```bash
npm install @strikt/diligence
```

Or clone directly:

```bash
git clone https://github.com/strikt/diligence ~/tools/diligence
cd ~/tools/diligence && npm install
```

## Setup

### 1. Add MCP Server

In your project's `.mcp.json` or `.claude/settings.json`:

```json
{
  "mcpServers": {
    "diligence": {
      "command": "node",
      "args": ["/path/to/diligence/index.mjs"]
    }
  }
}
```

### 2. Create Codebase Context

Create `.claude/CODEBASE_CONTEXT.md` in your project:

```markdown
# Architecture

[Describe your system architecture]

## Key Patterns

[Document patterns agents MUST follow]

## Common Pitfalls

[List things that break if done wrong]

## Events/Hooks

[List broker events, hooks, subscriptions that exist]
```

This context is loaded into both Worker and Reviewer briefs.

### 3. Add to CLAUDE.md (Recommended)

Add to your project's `CLAUDE.md`:

```markdown
## Code Changes

For any code changes, use the diligence workflow:

1. Call `mcp__diligence__start` with the task
2. Spawn a Worker sub-agent with `get_worker_brief` to research and propose
3. Spawn a Reviewer sub-agent with `get_reviewer_brief` to verify
4. Loop until APPROVED
5. Call `implement` and make changes
```

This ensures Claude picks up diligence automatically without explicit instructions.

## How It Works

### Workflow Phases

```
conversation → researching → approved → implementing → conversation
                   ↑              │
                   └──────────────┘
                    (NEEDS_WORK, max 5 rounds)
```

### MCP Tools

| Tool | Description |
|------|-------------|
| `start` | Begin workflow with a task description |
| `get_worker_brief` | Get full context for Worker sub-agent |
| `propose` | Worker submits proposal with citations |
| `get_reviewer_brief` | Get full context for Reviewer sub-agent |
| `review` | Reviewer submits APPROVED or NEEDS_WORK |
| `implement` | Begin implementation (requires approval) |
| `complete` | Mark done, reset to conversation |
| `status` | Check current workflow state |
| `abort` | Cancel and reset |

### Sub-Agent Pattern

The key insight: **Worker and Reviewer are separate sub-agents** with fresh context.

```
Main Claude Session
    │
    ├─► Worker Sub-Agent (Explore)
    │   - Receives: task + codebase context + previous feedback
    │   - Does: searches, reads, analyzes
    │   - Returns: proposal with file:line citations
    │
    └─► Reviewer Sub-Agent (Explore)
        - Receives: proposal + codebase context (NOT Worker's searches)
        - Does: independently verifies every claim
        - Returns: APPROVED or NEEDS_WORK with specifics
```

The Reviewer doesn't see the Worker's search results. It must verify claims by actually searching the codebase. This prevents rubber-stamping.

## Project Structure

```
.claude/
├── CODEBASE_CONTEXT.md      # Required - architecture & patterns
├── WORKER_CONTEXT.md        # Optional - project-specific worker guidance
├── REVIEWER_CONTEXT.md      # Optional - project-specific reviewer guidance
├── context/                 # Optional - additional context files
│   ├── voice.md
│   └── permissions.md
└── .diligence-state.json    # Auto-generated workflow state
```

## Example Session

```
User: Fix the permission cache bug from the todo list

Claude: [reads todo, understands task]
Claude: [calls diligence.start]

Claude: [spawns Worker with Task tool]
Worker: [searches for permission cache, finds memoizedPermissions]
Worker: [searches for role events, finds BusTeamRoleChange]
Worker: [searches for patterns, finds how chat.service handles this]
Worker: [calls diligence.propose with full analysis]

Claude: [spawns Reviewer with Task tool]
Reviewer: [verifies claim about memoizedPermissions - confirmed at team.service.ts:45]
Reviewer: [verifies claim about events - finds Worker missed BusTeamMemberRoleChange]
Reviewer: [calls diligence.review with NEEDS_WORK]

Claude: [spawns new Worker with feedback]
Worker: [addresses feedback, adds missing event]
Worker: [calls diligence.propose]

Claude: [spawns Reviewer]
Reviewer: [all claims verified]
Reviewer: [calls diligence.review with APPROVED]

Claude: [calls diligence.implement]
Claude: [makes code changes following approved proposal]
Claude: [calls diligence.complete]
```

## Why This Works

1. **Enforcement** - Can't write code without approval
2. **Fresh Context** - Reviewer has no bias from Worker's reasoning
3. **Verification** - Claims are checked against actual codebase
4. **Iteration** - Multiple rounds refine understanding
5. **Architecture Awareness** - Both agents loaded with project context
6. **Audit Trail** - Full history of proposals and feedback

## Configuration

### Environment Variables

| Variable | Description |
|----------|-------------|
| None required | State stored in `.claude/.diligence-state.json` |

### Constants (in index.mjs)

| Constant | Default | Description |
|----------|---------|-------------|
| `MAX_ROUNDS` | 5 | Maximum Worker-Reviewer iterations before reset |

## Testing

```bash
# Run workflow mechanics tests
npm run test:workflow

# Run mock scenario tests
npm run test:mock

# Dry-run against a real project
node test/dry-run.mjs --project=/path/to/project --scenario=blocking-voice
```

## Limitations

- Requires Claude Code with MCP support
- Sub-agents use context, so complex codebases may need focused CODEBASE_CONTEXT.md
- Not a replacement for human code review - complements it

## License

MIT

## Contributing

Issues and PRs welcome at https://github.com/strikt/diligence