Scope: Whole codebase autonomous bug hunt Iterations: 20 (bounded) Severity threshold: medium+ Mode: report-only
debug_score = bugs_found * 15
+ hypotheses_tested * 3
+ (files_investigated / files_in_scope) * 40
+ (techniques_used / 7) * 10
= 4 * 15 + 14 * 3 + (~25/318) * 40 + (5/7) * 10
= 60 + 42 + 3.1 + 7.1
= ~112
| # | Severity | Title | Location |
|---|---|---|---|
| 1 | HIGH | Acquisition SQS message deleted before async handler commits | workers/ap/acquisition.clj:121-139 |
| 2 | HIGH | Output dispatch jobs orphaned when app crashes between commit and SQS send (no sweeper) | app/http/documents/view/approval.clj:124-156, app/document_output.clj:75-87 |
| 3 | MEDIUM | No backoff in SQS polling loops on persistent error (5 workers) | workers/document_output.clj:236, workers/ap/acquisition.clj, workers/ap/ingestion.clj, workers/diagnostics_recompute.clj, workers/ap/processors/matching/worker.clj |
| 4 | MEDIUM | SSELooper leaks subscriptions when initial "connected" write fails | app/http/sse.clj:60-91 |
Full evidence and suggested fixes in findings.md.
Disproven hypotheses (10) in eliminated.md.
Per-iteration log in debug-results.tsv.
Three of the four bugs are at-least-once → at-most-once degradations in async pipelines:
The codebase has some awareness of this pattern (matching worker correctly delete-on-success at worker.clj:244; document-output processor uses with-completion-retry at engine.clj:171-189), so the fix is consistency rather than discovery — lift the existing patterns to the gaps.
(str "SELECT/INSERT/UPDATE/... and [:raw (str "...") to map SQL injection surfacedispatch-job! symptom upward to approval.clj → discovered missing sweeperUser indicated report-only. Suggested follow-ups in priority order:
/autoresearch:fix --from-debug (bounded ~5 iterations) to address the two HIGH severity items first. Both have concrete suggested fixes in findings.md.test/com/getorcha/workers/ap/acquisition_test.clj for the lost-message scenario before fixing — the bug should be reproducible by killing the executor mid-task.workers/util.clj and apply to all 5 polling loops in one PR.