Initial release: MCP server enforcing Worker-Reviewer loop
Diligence prevents AI agents from shipping quick fixes that break things by enforcing a research-propose-verify loop before any code changes. Key features: - Worker sub-agent researches and proposes with file:line citations - Reviewer sub-agent independently verifies claims by searching codebase - Iterates until approved (max 5 rounds) - Loads project-specific context from .claude/CODEBASE_CONTEXT.md - State persisted across sessions Validated on production codebase: caught architectural mistake (broker subscriptions on client-side code) that naive agent would have shipped.
This commit is contained in:
243
TESTING.md
Normal file
243
TESTING.md
Normal file
@@ -0,0 +1,243 @@
|
||||
# Testing the Diligence MCP Server
|
||||
|
||||
## Test Suite Overview
|
||||
|
||||
The diligence project includes a comprehensive test suite that validates:
|
||||
|
||||
1. **Workflow mechanics** - State transitions, round limits, phase enforcement
|
||||
2. **Mock scenarios** - Predefined Worker-Reviewer interactions with expected outcomes
|
||||
3. **Dry-run mode** - Test against real projects without making changes
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
# Run all workflow tests
|
||||
node test/run-tests.mjs --workflow
|
||||
|
||||
# Run mock scenario tests
|
||||
node test/run-tests.mjs --mock
|
||||
|
||||
# Run a specific scenario
|
||||
node test/run-tests.mjs --mock --scenario=blocking-voice
|
||||
|
||||
# Dry run against nexus (no changes)
|
||||
node test/dry-run.mjs --project=~/bude/codecharm/nexus --scenario=blocking-voice
|
||||
```
|
||||
|
||||
## Test Structure
|
||||
|
||||
```
|
||||
test/
|
||||
├── mcp-client.mjs # Programmatic MCP client
|
||||
├── run-tests.mjs # Main test runner
|
||||
├── dry-run.mjs # Dry-run against real projects
|
||||
├── fixture/ # Mock codebase for testing
|
||||
│ ├── src/
|
||||
│ │ ├── broker/events.ts
|
||||
│ │ ├── services/
|
||||
│ │ └── controllers/
|
||||
│ └── .claude/
|
||||
│ └── CODEBASE_CONTEXT.md
|
||||
└── scenarios/ # Test scenarios
|
||||
├── index.json
|
||||
├── blocking-voice.json
|
||||
└── permission-cache.json
|
||||
```
|
||||
|
||||
## Test Modes
|
||||
|
||||
### 1. Workflow Tests (`--workflow`)
|
||||
|
||||
Tests the MCP server mechanics without AI:
|
||||
|
||||
- Phase transitions (conversation → researching → approved → implementing)
|
||||
- Round increment on NEEDS_WORK
|
||||
- Max rounds enforcement (resets after 5 rounds)
|
||||
- Feedback accumulation
|
||||
- Abort functionality
|
||||
|
||||
```bash
|
||||
node test/run-tests.mjs --workflow
|
||||
```
|
||||
|
||||
### 2. Mock Tests (`--mock`)
|
||||
|
||||
Tests complete Worker-Reviewer scenarios with predefined responses:
|
||||
|
||||
1. Scenario defines a task and expected naive/correct fixes
|
||||
2. Test simulates Worker submitting naive proposal
|
||||
3. Reviewer catches issues, sends NEEDS_WORK
|
||||
4. Worker submits revised proposal with all fixes
|
||||
5. Reviewer approves
|
||||
6. Validates that proposal mentions all required elements
|
||||
|
||||
```bash
|
||||
# All scenarios
|
||||
node test/run-tests.mjs --mock
|
||||
|
||||
# Single scenario
|
||||
node test/run-tests.mjs --mock --scenario=permission-cache
|
||||
```
|
||||
|
||||
### 3. Dry Run (`--project`)
|
||||
|
||||
Connects to MCP server in a real project directory:
|
||||
|
||||
```bash
|
||||
# With predefined scenario
|
||||
node test/dry-run.mjs --project=~/bude/codecharm/nexus --scenario=blocking-voice
|
||||
|
||||
# With custom task
|
||||
node test/dry-run.mjs --project=/path/to/project --task="Fix the caching bug"
|
||||
```
|
||||
|
||||
This:
|
||||
- Starts the workflow with the task
|
||||
- Shows the full Worker Brief (including real CODEBASE_CONTEXT.md)
|
||||
- Does NOT make any code changes
|
||||
- Aborts the workflow on exit
|
||||
|
||||
## Test Fixture
|
||||
|
||||
The fixture (`test/fixture/`) is a mini codebase that mirrors real-world patterns:
|
||||
|
||||
### Files
|
||||
|
||||
| File | Purpose | Known Bugs (for testing) |
|
||||
|------|---------|-------------------------|
|
||||
| `broker/events.ts` | Event bus definitions | Reference implementation |
|
||||
| `services/user-block.service.ts` | Blocking logic | Missing voice cleanup |
|
||||
| `services/voice-channel.service.ts` | Voice/DM calls | Missing blocking check on answerDmCall |
|
||||
| `services/team.service.ts` | Permission cache | Doesn't subscribe to role events |
|
||||
| `services/chat.service.ts` | **Correct pattern** | Shows permission vs action separation |
|
||||
| `controllers/roles.controller.ts` | Role CRUD | Missing broker events on create/delete |
|
||||
|
||||
### Patterns Tested
|
||||
|
||||
1. **Broker event emission** - Every state change should emit events
|
||||
2. **Cache invalidation** - Caches must subscribe to relevant events
|
||||
3. **Permission vs Action** - Permissions control visibility, actions have separate checks
|
||||
4. **Multi-location fixes** - If one place has a check, similar places need it too
|
||||
|
||||
## Test Scenarios
|
||||
|
||||
### blocking-voice
|
||||
|
||||
**Task:** Fix blocked users can still answer DM voice calls
|
||||
|
||||
**Naive fix:** Add blocking check to answerDmCall only
|
||||
|
||||
**Correct fix:**
|
||||
- Add blocking check to answerDmCall
|
||||
- Add blocking check to declineDmCall
|
||||
- Filter notifications for blocked users
|
||||
- Add voice cleanup to blockUser()
|
||||
- Subscribe to BusUserBlockChange for mid-call kicks
|
||||
|
||||
### permission-cache
|
||||
|
||||
**Task:** Fix permission cache doesn't invalidate when roles change
|
||||
|
||||
**Naive fix:** Add .clear() somewhere
|
||||
|
||||
**Correct fix:**
|
||||
- Subscribe to BusTeamRoleChange in team.service
|
||||
- Subscribe to BusTeamMemberRoleChange in team.service
|
||||
- Add broker event to createRole()
|
||||
- Add broker event to deleteRole()
|
||||
|
||||
## Adding New Scenarios
|
||||
|
||||
Create a new JSON file in `test/scenarios/`:
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "my-scenario",
|
||||
"name": "Human-readable name",
|
||||
"description": "Brief description",
|
||||
|
||||
"task": "The task description given to the Worker",
|
||||
|
||||
"naive_fix": {
|
||||
"description": "What a quick-fix agent would do",
|
||||
"changes": [
|
||||
{ "file": "path/to/file.ts", "change": "Quick fix description" }
|
||||
],
|
||||
"issues": [
|
||||
"What the naive fix misses #1",
|
||||
"What the naive fix misses #2"
|
||||
]
|
||||
},
|
||||
|
||||
"correct_fix": {
|
||||
"description": "Complete fix description",
|
||||
"required_changes": [
|
||||
{ "file": "path.ts", "function": "funcName", "change": "What to change" }
|
||||
],
|
||||
"required_broker_subscriptions": [
|
||||
{ "service": "x.service.ts", "event": "BusEventName", "action": "What to do" }
|
||||
]
|
||||
},
|
||||
|
||||
"validation_criteria": {
|
||||
"must_mention": ["keyword1", "keyword2"],
|
||||
"should_reference_pattern": "reference-file.ts"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Add to `test/scenarios/index.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"scenarios": [
|
||||
{ "id": "my-scenario", "file": "my-scenario.json" }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Real-World Testing
|
||||
|
||||
For true validation, test with real Claude sub-agents:
|
||||
|
||||
```
|
||||
Root Agent:
|
||||
1. Call mcp__diligence__start with task
|
||||
2. Spawn Worker agent (Task tool) with get_worker_brief
|
||||
3. Worker researches, submits proposal via mcp__diligence__propose
|
||||
4. Spawn Reviewer agent (Task tool) with get_reviewer_brief
|
||||
5. Reviewer verifies claims, submits via mcp__diligence__review
|
||||
6. If NEEDS_WORK, spawn new Worker with updated brief
|
||||
7. If APPROVED, proceed to implementation
|
||||
```
|
||||
|
||||
**Why separate agents matter:**
|
||||
- Fresh context = no bias from previous reasoning
|
||||
- Reviewer doesn't know Worker's search results
|
||||
- Forces genuine verification, not rubber-stamping
|
||||
|
||||
## Success Criteria
|
||||
|
||||
1. **Reviewer catches issues** that Worker initially misses
|
||||
2. **Multiple rounds** occur before approval
|
||||
3. **Final proposal** is more complete than naive approach
|
||||
4. **Validation criteria** are all met
|
||||
|
||||
## Environment Variables
|
||||
|
||||
| Variable | Purpose |
|
||||
|----------|---------|
|
||||
| `DEBUG=1` | Show MCP server stderr output |
|
||||
| `ANTHROPIC_API_KEY` | Required for `--live` mode (future) |
|
||||
|
||||
## CI Integration
|
||||
|
||||
```bash
|
||||
# Run in CI
|
||||
npm test
|
||||
|
||||
# Or directly
|
||||
node test/run-tests.mjs --workflow && node test/run-tests.mjs --mock
|
||||
```
|
||||
|
||||
Exit codes: 0 = pass, 1 = fail
|
||||
Reference in New Issue
Block a user