When AI Gives Garbage Output: A Systematic Troubleshooting Guide

intermediate 15 min 📋 Copy-paste ready

Sources verified Dec 28, 2025

debuggingtroubleshootingdecision-treeintermediate

Scenario

Context: AI output isn't working - tests fail, output is wrong, or behavior is unexpected

Goal: Systematically diagnose AI failure modes using a decision tree approach

Anti-pattern: Repeatedly prompting 'fix it' without understanding the root cause

Tools: Any AI coding assistant

Conversation

AI Failure Triage

When AI output is wrong, the problem falls into one of five categories. This guide helps you identify which one and route to the right fix.

Symptom	Likely Cause	Section
Hallucinated APIs, fake URLs, non-existent methods	Hallucination	A
Security holes, obvious bugs in generated code	Vibe Coding (insufficient review)	B
"Works" in dev, fails in prod	Silent Substitution	C
Output quality degrades over long session	Context/Review Fatigue	D
Confidently wrong about logic or requirements	Reasoning Limits	E

Identify your symptom, then jump to the relevant section.

Section A: Hallucination

Symptoms:

Method names that don't exist in the library
URLs that 404
Citations to papers/articles that can't be found
API parameters that aren't valid
Package names that don't exist

Quick Test:
Ask: "Can you show me the documentation link for [method]?"

If AI corrects itself or provides a broken link, it was hallucinating.

Deep Dive: See conversation: catching_hallucination

👤 You

The fs.watchDirectory() method you suggested doesn't exist

Calling out a specific hallucination directly.

🤖 AI

You're right, I apologize. There is no fs.watchDirectory() in Node.js. The correct method is fs.watch() for watching files/directories, or fs.watchFile() for polling-based watching of individual files.

Here's the corrected code:

fs.watch('./data', (eventType, filename) => {
  console.log(`${eventType}: ${filename}`);
});

AI corrects itself when confronted with specific evidence.

Hallucination Prevention:

Verify method names in official docs before using
For unfamiliar APIs, ask AI to provide documentation links
Test in REPL before integrating
Use TypeScript/IDE autocomplete to catch non-existent methods

Section B: Vibe Coding (Insufficient Review)

Symptoms:

Security vulnerabilities in generated code (SQL injection, XSS)
Obvious logic bugs that would be caught by reading the code
Hardcoded secrets or credentials
Missing error handling

Root Cause: Accepting AI changes without reading the diff.

Deep Dive: See conversation: vibe_coding_failures

👤 You ✗ Bad example

I accepted 200 lines of login code and now I'm seeing SQL injection warnings in my security scan

Classic vibe coding consequence - accepting large changes without review.

🤖 AI

Let me review the login code for SQL injection vulnerabilities.

I found the issue on line 47:

// VULNERABLE
db.query(`SELECT * FROM users WHERE email = '${email}'`)

This should use parameterized queries:

// FIXED
db.query('SELECT * FROM users WHERE email = $1', [email])

I also found 3 other security issues in this code:

Password comparison using === (timing attack vulnerable)
JWT secret hardcoded as 'secret123'
No rate limiting on login attempts

Would you like me to fix all of these?

AI identifies the security issues, but this could have been prevented with upfront review.

Vibe Coding Prevention:

For security-sensitive code, always read the full diff
Ask AI to review its own code for OWASP Top 10 before accepting
Large diffs (100+ lines) need proportionally more review time
Run security scanners (Snyk, Semgrep) on AI-generated code

Section C: Silent Substitution

Symptoms:

Tests pass locally but fail in production
AI says "I notice X isn't available, so I..."
Mocked dependencies that should be real
Environment-specific behavior

Root Cause: AI observes environment state and makes silent architectural decisions.

Deep Dive: See conversation: silent_substitution

👤 You ✗ Bad example

My tests pass but production crashes saying 'database connection failed'

Classic silent substitution - tests against mocks, prod needs real DB.

🤖 AI

Let me check your database connection code and test setup.

I found the issue. Your tests are using an in-memory mock:

// In tests
jest.mock('./db', () => ({
  query: jest.fn().mockResolvedValue({ rows: [] })
}));

But your production code expects a real PostgreSQL connection. The mock was added because the test environment doesn't have database credentials.

Options:

Use a test database with Docker
Create integration tests that run against a real DB
Add a startup check that fails fast if DB is unavailable

AI diagnoses the mock/real discrepancy.

Silent Substitution Red Flag:

"When AI says 'I notice X isn't available, so I...' - STOP."

This phrase indicates the AI is making architectural decisions based on environment state. Always ask: "What did you substitute and why?"

Section D: Context/Review Fatigue

Symptoms:

Output quality was good at session start, now degrading
AI seems to forget earlier instructions
Repeated similar mistakes
You're rubber-stamping changes without reading them

Root Cause: Long sessions exhaust context window OR human attention.

Deep Dive: See conversation: review_fatigue

👤 You ✗ Bad example

We've been working on this for 3 hours and now the AI keeps suggesting changes that break things we already fixed

Session fatigue affecting both context and human attention.

🤖 AI

After a long session, both AI context and human attention degrade. Let's reset:

Summarize current state: What works, what's broken?
Start fresh session: Clear context, provide summary
Take a break: Your review quality matters too

Would you like me to generate a summary of our current progress that you can paste into a fresh session?

AI suggests context reset strategies.

Fatigue Prevention:

Use the Document-Clear-Resume pattern for multi-day work
Commit working checkpoints before long refactoring sessions
Take breaks - your review quality is part of the system
For critical changes, review in a fresh mental state

Section E: Reasoning Limits

Symptoms:

AI confidently explains incorrect logic
Off-by-one errors, boundary condition bugs
Misunderstanding of requirements
Plausible-sounding but wrong explanations

Root Cause: LLMs predict plausible text, not correct logic.

No deep dive link - this section covers the pattern.

👤 You

The AI insists this sorting algorithm is O(n log n) but I think it's O(n^2)

Challenging a confident but potentially wrong AI explanation.

🤖 AI

You're right to question this. Let me trace through the algorithm:

for i in range(n):           # O(n)
    for j in range(i, n):    # O(n) in worst case
        if arr[j] < arr[i]:
            arr[i], arr[j] = arr[j], arr[i]

This is a nested loop where the inner loop runs approximately n times for each outer iteration. That's O(n * n) = O(n^2).

I apologize for the error. This is selection sort, not a divide-and-conquer algorithm. The O(n log n) claim was incorrect.

AI corrects itself when pushed with specific reasoning.

Reasoning Limits Prevention:

For algorithms/math, trace through examples by hand
Ask AI to "show your work" step by step
Be skeptical of confident explanations for complex logic
Use tests to verify behavior, not AI explanations

Quick Checks Checklist

Before accepting AI output, run through this checklist:

Hallucination: Did I verify the API/method exists?
Security: Did I review for OWASP Top 10 issues?
Substitution: Did AI mention "I notice X isn't available"?
Freshness: Is my review quality still high?
Logic: Does the explanation match what the code does?

One "no" answer = investigate before accepting.

Key Takeaways

AI failures fall into 5 categories: hallucination, vibe coding, silent substitution, fatigue, and reasoning limits
Each category has distinct symptoms and prevention strategies
"When AI says 'I notice X isn't available, so I...' - STOP" is the key silent substitution heuristic
Repeated 'fix it' prompts without understanding indicate you're in a failure loop
Your review quality is part of the system - fatigue affects both AI and human

Try It Yourself

Prompt Template

Next time AI output doesn't work, identify which of the 5 categories it falls into before asking for a fix.

Variations to Try

For hallucinations: 'Show me the documentation link for [method]'
For security: 'Review this for OWASP Top 10 before I accept'
For substitution: 'What assumptions did you make about my environment?'
For reasoning: 'Trace through this step by step with example input'

Sources

Tempered AI — Forged Through Practice, Not Hype

? Keyboard shortcuts

When AI Gives Garbage Output: A Systematic Troubleshooting Guide

Scenario

Conversation

AI Failure Triage

Section A: Hallucination

Section B: Vibe Coding (Insufficient Review)

Section C: Silent Substitution

Section D: Context/Review Fatigue

Section E: Reasoning Limits

Quick Checks Checklist

Key Takeaways

Try It Yourself

Prompt Template

Variations to Try

Related Content

Sources