Skip to content

When AI Gives Garbage Output: A Systematic Troubleshooting Guide

intermediate 15 min 📋 Copy-paste ready
Sources verified Dec 28, 2025
debuggingtroubleshootingdecision-treeintermediate

Scenario

Context: AI output isn't working - tests fail, output is wrong, or behavior is unexpected

Goal: Systematically diagnose AI failure modes using a decision tree approach

Anti-pattern: Repeatedly prompting 'fix it' without understanding the root cause

Tools: Any AI coding assistant

Conversation

AI Failure Triage

When AI output is wrong, the problem falls into one of five categories. This guide helps you identify which one and route to the right fix.

Symptom Likely Cause Section
Hallucinated APIs, fake URLs, non-existent methods Hallucination A
Security holes, obvious bugs in generated code Vibe Coding (insufficient review) B
"Works" in dev, fails in prod Silent Substitution C
Output quality degrades over long session Context/Review Fatigue D
Confidently wrong about logic or requirements Reasoning Limits E

Identify your symptom, then jump to the relevant section.

Section A: Hallucination

Symptoms:

  • Method names that don't exist in the library
  • URLs that 404
  • Citations to papers/articles that can't be found
  • API parameters that aren't valid
  • Package names that don't exist

Quick Test:
Ask: "Can you show me the documentation link for [method]?"

If AI corrects itself or provides a broken link, it was hallucinating.

Deep Dive: See conversation: catching_hallucination

👤 You

The fs.watchDirectory() method you suggested doesn't exist

Calling out a specific hallucination directly.
🤖 AI

You're right, I apologize. There is no fs.watchDirectory() in Node.js. The correct method is fs.watch() for watching files/directories, or fs.watchFile() for polling-based watching of individual files.

Here's the corrected code:

fs.watch('./data', (eventType, filename) => {
  console.log(`${eventType}: ${filename}`);
});
AI corrects itself when confronted with specific evidence.

Hallucination Prevention:

  1. Verify method names in official docs before using
  2. For unfamiliar APIs, ask AI to provide documentation links
  3. Test in REPL before integrating
  4. Use TypeScript/IDE autocomplete to catch non-existent methods

Section B: Vibe Coding (Insufficient Review)

Symptoms:

  • Security vulnerabilities in generated code (SQL injection, XSS)
  • Obvious logic bugs that would be caught by reading the code
  • Hardcoded secrets or credentials
  • Missing error handling

Root Cause: Accepting AI changes without reading the diff.

Deep Dive: See conversation: vibe_coding_failures

👤 You ✗ Bad example

I accepted 200 lines of login code and now I'm seeing SQL injection warnings in my security scan

Classic vibe coding consequence - accepting large changes without review.
🤖 AI

Let me review the login code for SQL injection vulnerabilities.

I found the issue on line 47:

// VULNERABLE
db.query(`SELECT * FROM users WHERE email = '${email}'`)

This should use parameterized queries:

// FIXED
db.query('SELECT * FROM users WHERE email = $1', [email])

I also found 3 other security issues in this code:

  1. Password comparison using === (timing attack vulnerable)
  2. JWT secret hardcoded as 'secret123'
  3. No rate limiting on login attempts

Would you like me to fix all of these?

AI identifies the security issues, but this could have been prevented with upfront review.

Vibe Coding Prevention:

  1. For security-sensitive code, always read the full diff
  2. Ask AI to review its own code for OWASP Top 10 before accepting
  3. Large diffs (100+ lines) need proportionally more review time
  4. Run security scanners (Snyk, Semgrep) on AI-generated code

Section C: Silent Substitution

Symptoms:

  • Tests pass locally but fail in production
  • AI says "I notice X isn't available, so I..."
  • Mocked dependencies that should be real
  • Environment-specific behavior

Root Cause: AI observes environment state and makes silent architectural decisions.

Deep Dive: See conversation: silent_substitution

👤 You ✗ Bad example

My tests pass but production crashes saying 'database connection failed'

Classic silent substitution - tests against mocks, prod needs real DB.
🤖 AI

Let me check your database connection code and test setup.

I found the issue. Your tests are using an in-memory mock:

// In tests
jest.mock('./db', () => ({
  query: jest.fn().mockResolvedValue({ rows: [] })
}));

But your production code expects a real PostgreSQL connection. The mock was added because the test environment doesn't have database credentials.

Options:

  1. Use a test database with Docker
  2. Create integration tests that run against a real DB
  3. Add a startup check that fails fast if DB is unavailable
AI diagnoses the mock/real discrepancy.

Silent Substitution Red Flag:

"When AI says 'I notice X isn't available, so I...' - STOP."

This phrase indicates the AI is making architectural decisions based on environment state. Always ask: "What did you substitute and why?"

Section D: Context/Review Fatigue

Symptoms:

  • Output quality was good at session start, now degrading
  • AI seems to forget earlier instructions
  • Repeated similar mistakes
  • You're rubber-stamping changes without reading them

Root Cause: Long sessions exhaust context window OR human attention.

Deep Dive: See conversation: review_fatigue

👤 You ✗ Bad example

We've been working on this for 3 hours and now the AI keeps suggesting changes that break things we already fixed

Session fatigue affecting both context and human attention.
🤖 AI

After a long session, both AI context and human attention degrade. Let's reset:

  1. Summarize current state: What works, what's broken?
  2. Start fresh session: Clear context, provide summary
  3. Take a break: Your review quality matters too

Would you like me to generate a summary of our current progress that you can paste into a fresh session?

AI suggests context reset strategies.

Fatigue Prevention:

  1. Use the Document-Clear-Resume pattern for multi-day work
  2. Commit working checkpoints before long refactoring sessions
  3. Take breaks - your review quality is part of the system
  4. For critical changes, review in a fresh mental state

Section E: Reasoning Limits

Symptoms:

  • AI confidently explains incorrect logic
  • Off-by-one errors, boundary condition bugs
  • Misunderstanding of requirements
  • Plausible-sounding but wrong explanations

Root Cause: LLMs predict plausible text, not correct logic.

No deep dive link - this section covers the pattern.

👤 You

The AI insists this sorting algorithm is O(n log n) but I think it's O(n^2)

Challenging a confident but potentially wrong AI explanation.
🤖 AI

You're right to question this. Let me trace through the algorithm:

for i in range(n):           # O(n)
    for j in range(i, n):    # O(n) in worst case
        if arr[j] < arr[i]:
            arr[i], arr[j] = arr[j], arr[i]

This is a nested loop where the inner loop runs approximately n times for each outer iteration. That's O(n * n) = O(n^2).

I apologize for the error. This is selection sort, not a divide-and-conquer algorithm. The O(n log n) claim was incorrect.

AI corrects itself when pushed with specific reasoning.

Reasoning Limits Prevention:

  1. For algorithms/math, trace through examples by hand
  2. Ask AI to "show your work" step by step
  3. Be skeptical of confident explanations for complex logic
  4. Use tests to verify behavior, not AI explanations

Quick Checks Checklist

Before accepting AI output, run through this checklist:

  • Hallucination: Did I verify the API/method exists?
  • Security: Did I review for OWASP Top 10 issues?
  • Substitution: Did AI mention "I notice X isn't available"?
  • Freshness: Is my review quality still high?
  • Logic: Does the explanation match what the code does?

One "no" answer = investigate before accepting.

Key Takeaways

  • AI failures fall into 5 categories: hallucination, vibe coding, silent substitution, fatigue, and reasoning limits
  • Each category has distinct symptoms and prevention strategies
  • "When AI says 'I notice X isn't available, so I...' - STOP" is the key silent substitution heuristic
  • Repeated 'fix it' prompts without understanding indicate you're in a failure loop
  • Your review quality is part of the system - fatigue affects both AI and human

Try It Yourself

Prompt Template

Next time AI output doesn't work, identify which of the 5 categories it falls into before asking for a fix.

Variations to Try

  • For hallucinations: 'Show me the documentation link for [method]'
  • For security: 'Review this for OWASP Top 10 before I accept'
  • For substitution: 'What assumptions did you make about my environment?'
  • For reasoning: 'Trace through this step by step with example input'

Sources

Tempered AI Forged Through Practice, Not Hype

Keyboard Shortcuts

j
Next page
k
Previous page
h
Section home
/
Search
?
Show shortcuts
m
Toggle sidebar
Esc
Close modal
Shift+R
Reset all progress
? Keyboard shortcuts