# Testing the Diligence MCP Server ## Test Suite Overview The diligence project includes a comprehensive test suite that validates: 1. **Workflow mechanics** - State transitions, round limits, phase enforcement 2. **Mock scenarios** - Predefined Worker-Reviewer interactions with expected outcomes 3. **Dry-run mode** - Test against real projects without making changes ## Quick Start ```bash # Run all workflow tests node test/run-tests.mjs --workflow # Run mock scenario tests node test/run-tests.mjs --mock # Run a specific scenario node test/run-tests.mjs --mock --scenario=blocking-voice # Dry run against nexus (no changes) node test/dry-run.mjs --project=~/bude/codecharm/nexus --scenario=blocking-voice ``` ## Test Structure ``` test/ ├── mcp-client.mjs # Programmatic MCP client ├── run-tests.mjs # Main test runner ├── dry-run.mjs # Dry-run against real projects ├── fixture/ # Mock codebase for testing │ ├── src/ │ │ ├── broker/events.ts │ │ ├── services/ │ │ └── controllers/ │ └── .claude/ │ └── CODEBASE_CONTEXT.md └── scenarios/ # Test scenarios ├── index.json ├── blocking-voice.json └── permission-cache.json ``` ## Test Modes ### 1. Workflow Tests (`--workflow`) Tests the MCP server mechanics without AI: - Phase transitions (conversation → researching → approved → implementing) - Round increment on NEEDS_WORK - Max rounds enforcement (resets after 5 rounds) - Feedback accumulation - Abort functionality ```bash node test/run-tests.mjs --workflow ``` ### 2. Mock Tests (`--mock`) Tests complete Worker-Reviewer scenarios with predefined responses: 1. Scenario defines a task and expected naive/correct fixes 2. Test simulates Worker submitting naive proposal 3. Reviewer catches issues, sends NEEDS_WORK 4. Worker submits revised proposal with all fixes 5. Reviewer approves 6. Validates that proposal mentions all required elements ```bash # All scenarios node test/run-tests.mjs --mock # Single scenario node test/run-tests.mjs --mock --scenario=permission-cache ``` ### 3. Dry Run (`--project`) Connects to MCP server in a real project directory: ```bash # With predefined scenario node test/dry-run.mjs --project=~/bude/codecharm/nexus --scenario=blocking-voice # With custom task node test/dry-run.mjs --project=/path/to/project --task="Fix the caching bug" ``` This: - Starts the workflow with the task - Shows the full Worker Brief (including real CODEBASE_CONTEXT.md) - Does NOT make any code changes - Aborts the workflow on exit ## Test Fixture The fixture (`test/fixture/`) is a mini codebase that mirrors real-world patterns: ### Files | File | Purpose | Known Bugs (for testing) | |------|---------|-------------------------| | `broker/events.ts` | Event bus definitions | Reference implementation | | `services/user-block.service.ts` | Blocking logic | Missing voice cleanup | | `services/voice-channel.service.ts` | Voice/DM calls | Missing blocking check on answerDmCall | | `services/team.service.ts` | Permission cache | Doesn't subscribe to role events | | `services/chat.service.ts` | **Correct pattern** | Shows permission vs action separation | | `controllers/roles.controller.ts` | Role CRUD | Missing broker events on create/delete | ### Patterns Tested 1. **Broker event emission** - Every state change should emit events 2. **Cache invalidation** - Caches must subscribe to relevant events 3. **Permission vs Action** - Permissions control visibility, actions have separate checks 4. **Multi-location fixes** - If one place has a check, similar places need it too ## Test Scenarios ### blocking-voice **Task:** Fix blocked users can still answer DM voice calls **Naive fix:** Add blocking check to answerDmCall only **Correct fix:** - Add blocking check to answerDmCall - Add blocking check to declineDmCall - Filter notifications for blocked users - Add voice cleanup to blockUser() - Subscribe to BusUserBlockChange for mid-call kicks ### permission-cache **Task:** Fix permission cache doesn't invalidate when roles change **Naive fix:** Add .clear() somewhere **Correct fix:** - Subscribe to BusTeamRoleChange in team.service - Subscribe to BusTeamMemberRoleChange in team.service - Add broker event to createRole() - Add broker event to deleteRole() ## Adding New Scenarios Create a new JSON file in `test/scenarios/`: ```json { "id": "my-scenario", "name": "Human-readable name", "description": "Brief description", "task": "The task description given to the Worker", "naive_fix": { "description": "What a quick-fix agent would do", "changes": [ { "file": "path/to/file.ts", "change": "Quick fix description" } ], "issues": [ "What the naive fix misses #1", "What the naive fix misses #2" ] }, "correct_fix": { "description": "Complete fix description", "required_changes": [ { "file": "path.ts", "function": "funcName", "change": "What to change" } ], "required_broker_subscriptions": [ { "service": "x.service.ts", "event": "BusEventName", "action": "What to do" } ] }, "validation_criteria": { "must_mention": ["keyword1", "keyword2"], "should_reference_pattern": "reference-file.ts" } } ``` Add to `test/scenarios/index.json`: ```json { "scenarios": [ { "id": "my-scenario", "file": "my-scenario.json" } ] } ``` ## Real-World Testing For true validation, test with real Claude sub-agents: ``` Root Agent: 1. Call mcp__diligence__start with task 2. Spawn Worker agent (Task tool) with get_worker_brief 3. Worker researches, submits proposal via mcp__diligence__propose 4. Spawn Reviewer agent (Task tool) with get_reviewer_brief 5. Reviewer verifies claims, submits via mcp__diligence__review 6. If NEEDS_WORK, spawn new Worker with updated brief 7. If APPROVED, proceed to implementation ``` **Why separate agents matter:** - Fresh context = no bias from previous reasoning - Reviewer doesn't know Worker's search results - Forces genuine verification, not rubber-stamping ## Success Criteria 1. **Reviewer catches issues** that Worker initially misses 2. **Multiple rounds** occur before approval 3. **Final proposal** is more complete than naive approach 4. **Validation criteria** are all met ## Environment Variables | Variable | Purpose | |----------|---------| | `DEBUG=1` | Show MCP server stderr output | | `ANTHROPIC_API_KEY` | Required for `--live` mode (future) | ## CI Integration ```bash # Run in CI npm test # Or directly node test/run-tests.mjs --workflow && node test/run-tests.mjs --mock ``` Exit codes: 0 = pass, 1 = fail