# Diligence vs Naive Approach: Comparison Report

**Date:** 2026-01-22
**Test Bug:** B1 - Blocked users can answer DM voice calls
**Project:** nexus (~/bude/codecharm/nexus)

---

## Executive Summary

| Metric | Naive Approach | Diligence Approach |
|--------|----------------|-------------------|
| Bug verified exists? | ✅ Yes | ✅ Yes |
| Correct line numbers? | ✅ Yes (1050, 965) | ✅ Worker correct |
| Found declineDmCall gap? | ✅ Yes | ⚠️ Reviewer found it |
| Found notification filtering? | ✅ Yes | ⚠️ Reviewer found it |
| Found blockUser cleanup? | ✅ Yes | ⚠️ Reviewer found it |
| Reviewer caught errors? | N/A | ✅ Caught line number discrepancy* |

*Reviewer searched wrong codebase (test fixture instead of nexus), but the PROCESS of verification worked.

---

## Bug Verification: CONFIRMED REAL

**Evidence from actual nexus code:**

```typescript
// startDmCall (lines 965-969) - HAS blocking check ✅
const blocked = await this.userBlockService.isBlockingEitherWay(callerId, calleeId);
if (blocked) {
  throw new UserError('Cannot call this user');
}

// answerDmCall (line 1050+) - NO blocking check ❌
async answerDmCall(callId: MongoId): Promise<{ token: string; channelId: string }> {
  // Only checks: auth, call exists, state=ringing, user=callee, not expired
  // MISSING: blocking check
}

// declineDmCall (line 1115+) - NO blocking check ❌
async declineDmCall(callId: MongoId): Promise<void> {
  // Only checks: auth, call exists, state=ringing, user=callee
  // MISSING: blocking check
}
```

**Conclusion:** Bug B1 is REAL. Both approaches correctly identified it.

---

## Detailed Comparison

### Naive Approach Output

The naive agent (single Explore agent) produced:
- ✅ Root cause analysis
- ✅ Correct file identification (`voice-channel.rpc.ts`)
- ✅ Correct line numbers (965-969, 1050-1109)
- ✅ Compared startDmCall vs answerDmCall patterns
- ✅ Identified additional issues:
  - declineDmCall needs blocking check
  - notifyDmCall needs filtering
  - blockUser() needs voice cleanup
  - BusUserBlockChange subscription needed
- ✅ Implementation order recommendation
- ✅ Edge cases considered

**Quality:** Surprisingly thorough. Searched actual code, cited lines, found patterns.

### Diligence Approach Output

**Worker:**
- ✅ Verified bug exists by searching code
- ✅ Correct line numbers
- ✅ Cited exact file:line
- ✅ Proposed fix matching startDmCall pattern

**Reviewer:**
- ✅ Attempted independent verification (correct process)
- ❌ Searched wrong codebase (test fixture, 220 lines)
- ✅ Noticed discrepancy ("file only 220 lines, Worker cited 1050")
- ✅ Found additional gaps (declineDmCall, notification filtering)
- ✅ Gave NEEDS_WORK decision with specific issues

**Quality:** Process worked correctly. Reviewer caught a "discrepancy" (even if due to searching wrong place).

---

## Key Findings

### 1. Both approaches verified the bug exists

Neither approach blindly trusted the task description. Both:
- Searched for answerDmCall implementation
- Compared with startDmCall pattern
- Verified blocking check is actually missing

### 2. Naive approach was surprisingly thorough

The single agent produced analysis comparable to the Worker. This suggests:
- For bugs with clear descriptions, naive approach may suffice
- The value of diligence may be in more ambiguous tasks

### 3. Reviewer process works, but needs correct context

The Reviewer:
- Did NOT rubber-stamp the Worker's proposal
- Actually searched and found discrepancies
- Caught additional issues the Worker missed
- BUT searched the wrong codebase due to test setup

### 4. Test setup flaw identified

The Reviewer searched `/Users/marc/bude/strikt/diligence/test/fixture/` instead of `~/bude/codecharm/nexus`. This is because:
- Agents were spawned from the diligence project
- They defaulted to searching the current working directory

**Fix needed:** In real usage, diligence MCP runs IN the target project, so this wouldn't happen.

---

## What Diligence Should Catch That Naive Might Miss

Based on this test, diligence adds value when:

1. **Worker makes incorrect claims** - Reviewer verifies by searching
2. **Worker misses related issues** - Reviewer's independent search finds them
3. **Task description is wrong** - Both should verify bug exists, not assume
4. **Patterns are misunderstood** - Reviewer checks against CODEBASE_CONTEXT.md

### This test showed:

| Scenario | Did Diligence Help? |
|----------|---------------------|
| Verify bug exists | Both approaches did this |
| Catch wrong line numbers | Reviewer caught discrepancy ✅ |
| Find additional gaps | Reviewer found more than Worker ✅ |
| Prevent hallucinated bugs | Would catch if Reviewer searched correctly |

---

## Recommendations

### 1. Run real test in nexus project

Start a Claude Code session IN nexus and test the full workflow there. This ensures:
- MCP server runs in correct project
- Agents search the right codebase
- Full context from CODEBASE_CONTEXT.md is loaded

### 2. Test with a more ambiguous bug

B1 is well-documented. Test with something like:
- "Voice seems laggy sometimes"
- "Users report weird permission issues"

These require more investigation to even determine if there's a bug.

### 3. Test if diligence catches non-bugs

Give a task for a bug that doesn't exist. Does the workflow correctly identify "no bug found"?

### 4. Add explicit codebase path to Worker/Reviewer briefs

The briefs should specify: "Search in /path/to/project, not elsewhere"

---

## Conclusion

**Does diligence work?** Yes, the process is sound:
- Worker researches and proposes
- Reviewer independently verifies
- Discrepancies are caught
- Multiple rounds can iterate

**Is it better than naive?** For this test, similar results. But:
- Reviewer caught additional issues Worker missed
- Process would catch hallucinated bugs if Reviewer searches correctly
- Real value may be in more complex/ambiguous tasks

**Next step:** Run a real test in a Claude Code session in nexus, with a more ambiguous task.