Article

My Vibe-Coded App Worked in Demo. It Broke in Production. Now What?

Your AI-built MVP looked fine in the demo. In production, it's a different story — performance collapses, edge cases crash the app, data migrations never existed. This isn't a model problem. It's an architecture problem.

You shipped fast. Lovable, Bolt, Cursor, Replit — pick your tool. The demo was clean. Users signed up. Then real traffic hit, real data arrived, real edge cases appeared, and everything fell apart.

You're not alone. A small Final Round AI survey of 18 CTOs in 2025 found 16 reported production disasters tied to vibe-coded systems. TechCrunch reported that a quarter of YC's Winter 2025 batch had codebases that were 95% AI-generated, based on comments from YC managing partner Jared Friedman.

This page explains exactly what breaks and why.

Why AI-Built Apps Break in Production

AI tools are optimized for demo success, not production resilience. The patterns that fail first are always the same:

No schema migration strategy. The AI created tables directly. When you change the schema, it breaks existing data.
No error handling at the edges. Happy path works. Any unexpected input crashes silently or loudly.
Context window collapse. Past roughly 5,000 lines of code, the AI starts forgetting earlier decisions. It contradicts itself. Bugs compound.
No environment separation. Dev, staging, and production share state. A fix in dev breaks prod.
Auth bolted on, not designed in. Session management, token refresh, role-based access — all fragile.

What "Production Ready" Actually Requires

Production readiness isn't a checkbox. It's a set of decisions that need to be made before users arrive:

Concern	What AI Generates	What Production Needs
Schema changes	Direct ALTER TABLE	Versioned migrations like Flyway, Alembic, or ActiveRecord
Error handling	Happy path only	Circuit breakers, retries, graceful degradation
Auth	Session cookies, basic JWT	Token rotation, refresh logic, RBAC
Logging	Console.log	Structured logs, alerting, tracing
Load	Single-user tested	Load tested, connection pooling, caching

The Rescue Process: What We Actually Do

When a vibe-coded app comes to us broken, we follow a fixed triage sequence.

Week 1 — Audit and Stabilize

Map all database writes and find every place data can be corrupted.
Identify all unauthenticated endpoints.
Freeze new feature work.
Add error monitoring, such as Sentry or equivalent.

Week 2–3 — Structural Repairs

Introduce migration tooling and rewrite schema changes as versioned migrations.
Separate dev, staging, and production environments.
Add input validation at every external boundary.
Replace any AI-generated auth with a managed provider like Clerk, Auth0, or Supabase Auth.

Week 4 — Handoff

Document every decision the AI made implicitly.
Write the runbook that didn't exist.
Define what the AI should and should not touch going forward.

We've run this process across more than a dozen rescued codebases. The average time from "broken in production" to "stable and extensible" is 3–5 weeks, not months — if you don't add features during the rescue.

The Real Cost of Waiting

Every week a broken production app runs, you're accumulating two debts simultaneously: user trust debt and codebase debt. Users who hit bugs churn. Code that gets patched on top of bad foundations gets worse, not better.

The founders who wait longest before calling for help are the ones who spend the most. The ones who call early spend $15k–$30k. The ones who wait until the data is corrupted or the security breach happens spend $80k–$200k and sometimes lose the company.

FAQ

How do I know if my app is "vibe-coded" in the risky sense?

If you can't explain why any three random functions in your codebase are structured the way they are — and neither can your team — you have comprehension debt. That's the defining characteristic of risky AI-generated code.

Can I just ask the AI to fix the production problems?

Sometimes. For isolated bugs, yes. For structural problems like schema management, auth design, and environment separation, no. The AI doesn't have the context to understand what it already broke, and adding more AI output on top of a fragile foundation makes it more fragile.

How long does a rescue take?

3–5 weeks for stabilization. 2–3 months for full structural remediation. The difference is whether you're fixing the acute problems or also building the foundation that prevents them from recurring.

Do I need to rewrite from scratch?

Rarely. In our experience, a full rewrite is only necessary when the data model is fundamentally wrong or the framework choice creates irresolvable performance ceilings. Most rescues are surgical, not wholesale.

What's the most dangerous thing in a vibe-coded codebase?

AI-generated authentication and authorization logic. Every other problem is recoverable. A security breach caused by auth holes can end the company.

Need a production rescue?

If your AI-built app is already in production and showing cracks, book a short call. We'll figure out whether it needs a surgical rescue or a deeper rebuild plan.

Apply for a 30-min intro call