# Diligence vs Naive Approach: Comparison Report **Date:** 2026-01-22 **Test Bug:** B1 - Blocked users can answer DM voice calls **Project:** nexus (~/bude/codecharm/nexus) --- ## Executive Summary | Metric | Naive Approach | Diligence Approach | |--------|----------------|-------------------| | Bug verified exists? | ✅ Yes | ✅ Yes | | Correct line numbers? | ✅ Yes (1050, 965) | ✅ Worker correct | | Found declineDmCall gap? | ✅ Yes | ⚠️ Reviewer found it | | Found notification filtering? | ✅ Yes | ⚠️ Reviewer found it | | Found blockUser cleanup? | ✅ Yes | ⚠️ Reviewer found it | | Reviewer caught errors? | N/A | ✅ Caught line number discrepancy* | *Reviewer searched wrong codebase (test fixture instead of nexus), but the PROCESS of verification worked. --- ## Bug Verification: CONFIRMED REAL **Evidence from actual nexus code:** ```typescript // startDmCall (lines 965-969) - HAS blocking check ✅ const blocked = await this.userBlockService.isBlockingEitherWay(callerId, calleeId); if (blocked) { throw new UserError('Cannot call this user'); } // answerDmCall (line 1050+) - NO blocking check ❌ async answerDmCall(callId: MongoId): Promise<{ token: string; channelId: string }> { // Only checks: auth, call exists, state=ringing, user=callee, not expired // MISSING: blocking check } // declineDmCall (line 1115+) - NO blocking check ❌ async declineDmCall(callId: MongoId): Promise { // Only checks: auth, call exists, state=ringing, user=callee // MISSING: blocking check } ``` **Conclusion:** Bug B1 is REAL. Both approaches correctly identified it. --- ## Detailed Comparison ### Naive Approach Output The naive agent (single Explore agent) produced: - ✅ Root cause analysis - ✅ Correct file identification (`voice-channel.rpc.ts`) - ✅ Correct line numbers (965-969, 1050-1109) - ✅ Compared startDmCall vs answerDmCall patterns - ✅ Identified additional issues: - declineDmCall needs blocking check - notifyDmCall needs filtering - blockUser() needs voice cleanup - BusUserBlockChange subscription needed - ✅ Implementation order recommendation - ✅ Edge cases considered **Quality:** Surprisingly thorough. Searched actual code, cited lines, found patterns. ### Diligence Approach Output **Worker:** - ✅ Verified bug exists by searching code - ✅ Correct line numbers - ✅ Cited exact file:line - ✅ Proposed fix matching startDmCall pattern **Reviewer:** - ✅ Attempted independent verification (correct process) - ❌ Searched wrong codebase (test fixture, 220 lines) - ✅ Noticed discrepancy ("file only 220 lines, Worker cited 1050") - ✅ Found additional gaps (declineDmCall, notification filtering) - ✅ Gave NEEDS_WORK decision with specific issues **Quality:** Process worked correctly. Reviewer caught a "discrepancy" (even if due to searching wrong place). --- ## Key Findings ### 1. Both approaches verified the bug exists Neither approach blindly trusted the task description. Both: - Searched for answerDmCall implementation - Compared with startDmCall pattern - Verified blocking check is actually missing ### 2. Naive approach was surprisingly thorough The single agent produced analysis comparable to the Worker. This suggests: - For bugs with clear descriptions, naive approach may suffice - The value of diligence may be in more ambiguous tasks ### 3. Reviewer process works, but needs correct context The Reviewer: - Did NOT rubber-stamp the Worker's proposal - Actually searched and found discrepancies - Caught additional issues the Worker missed - BUT searched the wrong codebase due to test setup ### 4. Test setup flaw identified The Reviewer searched `/Users/marc/bude/strikt/diligence/test/fixture/` instead of `~/bude/codecharm/nexus`. This is because: - Agents were spawned from the diligence project - They defaulted to searching the current working directory **Fix needed:** In real usage, diligence MCP runs IN the target project, so this wouldn't happen. --- ## What Diligence Should Catch That Naive Might Miss Based on this test, diligence adds value when: 1. **Worker makes incorrect claims** - Reviewer verifies by searching 2. **Worker misses related issues** - Reviewer's independent search finds them 3. **Task description is wrong** - Both should verify bug exists, not assume 4. **Patterns are misunderstood** - Reviewer checks against CODEBASE_CONTEXT.md ### This test showed: | Scenario | Did Diligence Help? | |----------|---------------------| | Verify bug exists | Both approaches did this | | Catch wrong line numbers | Reviewer caught discrepancy ✅ | | Find additional gaps | Reviewer found more than Worker ✅ | | Prevent hallucinated bugs | Would catch if Reviewer searched correctly | --- ## Recommendations ### 1. Run real test in nexus project Start a Claude Code session IN nexus and test the full workflow there. This ensures: - MCP server runs in correct project - Agents search the right codebase - Full context from CODEBASE_CONTEXT.md is loaded ### 2. Test with a more ambiguous bug B1 is well-documented. Test with something like: - "Voice seems laggy sometimes" - "Users report weird permission issues" These require more investigation to even determine if there's a bug. ### 3. Test if diligence catches non-bugs Give a task for a bug that doesn't exist. Does the workflow correctly identify "no bug found"? ### 4. Add explicit codebase path to Worker/Reviewer briefs The briefs should specify: "Search in /path/to/project, not elsewhere" --- ## Conclusion **Does diligence work?** Yes, the process is sound: - Worker researches and proposes - Reviewer independently verifies - Discrepancies are caught - Multiple rounds can iterate **Is it better than naive?** For this test, similar results. But: - Reviewer caught additional issues Worker missed - Process would catch hallucinated bugs if Reviewer searches correctly - Real value may be in more complex/ambiguous tasks **Next step:** Run a real test in a Claude Code session in nexus, with a more ambiguous task.