When AI Writes Your SQL: A Data Engineer's 6-Month Reality Check

📖 You May Also Like:I Switched from ChatGPT to Dee · Cursor vs Claude Code vs Copil

In early 2026, I handed over my production SQL to AI. Six months later, my database is still alive, but my blood pressure isn't what it used to be.

This isn't a "5 Best AI SQL Tools" article. This is a war journal — every crash, every wrong number, every embarrassing meeting where I had to explain why the data didn't match. If you're using AI to write SQL, these stories will save you from repeating my mistakes.

Phase 1: "This Is Amazing" — The Two-Week Honeymoon

I was working on an e-commerce BI project. Dozens of tables, tangled business logic, endless report requests. Writing a single report query used to take half a day: understand the requirement, dig up table schemas, write the SQL, tune it, test it.

With AI, I'd describe the requirement to Cursor or ChatGPT and get a working query in 10 seconds. Paste it into DBeaver, hit run, and most of the time — it just worked.

My first month was glorious. I was writing reports 3x faster. My takeaway at the time: "I'll never write SQL manually again."

Phase 2: "Wait, What?" — The First Data Mismatch

The honeymoon ended in month two.

A stakeholder asked a simple question: "What was our user retention rate last month?" I let AI write the query, got the number, and sent it over.

The next day, they came back with a second dataset from another team. The numbers differed by 8 percentage points.

I started digging. Line by line through the AI-generated SQL. After 30 minutes, I found it:

-- AI-generated
SELECT 
    COUNT(DISTINCT user_id) AS active_users,
    COUNT(DISTINCT CASE WHEN login_date >= '2026-04-01' THEN user_id END) AS retained_users
FROM user_actions
WHERE action_date >= '2026-03-01'

Looks fine, right? But the AI only queried user_actions. It completely ignored users who registered but never performed any action. Those silent users — roughly 8% of the total — simply didn't exist in the query's universe.

That moment hit me hard: the SQL was syntactically perfect and logically broken. This kind of failure is far more dangerous than a syntax error — the query runs, the numbers look plausible, and you only catch the bug when someone cross-checks with ground truth.

Phase 3: A Catalog of Crashes

Over six months, I catalogued every way AI-generated SQL betrayed me. Here are the 13 most memorable:

Crash 1: COUNT(DISTINCT) Memory Blowout

AI wrote a retention analysis query using COUNT(DISTINCT user_id) across 70 million rows, joined with two other large tables. The query ran for 8 minutes and pinned the analytics database CPU at 100%.

The problem? AI never considers data volume. 100 rows and 100 million rows produce the same query structure.

Crash 2: Implicit Type Conversion Catastrophe

A monetary field was stored as VARCHAR. AI generated a SUM query that silently concatenated strings. Result: "12.5" + "8.30" = "12.58.30".

-- AI wrote this (wrong)
SELECT SUM(amount) FROM orders
-- amount was VARCHAR, SUM implicitly concatenated

Fix: CAST(amount AS DECIMAL(10,2)) before aggregation.

Crash 3: Lazy LEFT JOIN Conditions

When joining orders to refunds, AI wrote:

LEFT JOIN refunds r ON o.order_id = r.order_id

No r.status = 'completed' filter. In-progress refunds got counted as completed ones. Refund statistics were inflated by 30%.

Crash 4: Time Zone Amnesia

AI mixed up database dialects constantly. Writing DATE(created_at) for MySQL is fine. Using the same syntax in ClickHouse — crash. After switching to ClickHouse's toDate(created_at), it forgot timezone conversion entirely.

Crash 5: Window Function Partition Fail

When deduplicating with ROW_NUMBER(), AI routinely omitted PARTITION BY:

ROW_NUMBER() OVER (ORDER BY create_time DESC) AS rn
-- Should be: PARTITION BY user_id ORDER BY create_time DESC

Result: global ranking instead of per-user ranking. Every row was wrong.

Crash 6: NULL Blindness

For average order value, AI wrote total_amount / order_count. When order_count was 0, this threw a division-by-zero error. The pattern NULLIF(order_count, 0) only appeared when I explicitly asked for it.

Crash 7: JOIN Order Chaos

On 5-table JOINs, AI never considered join order or predicate pushdown. Sometimes the small table drove the large table; sometimes the opposite. On big data, this was the difference between 3 seconds and 3 minutes.

Crash 8: GROUP BY Field Omissions

AI occasionally added non-aggregated columns to SELECT that weren't in GROUP BY. In MySQL strict mode, this throws an error. In relaxed mode, it silently returns wrong data.

Crash 9: Subquery Inception

AI loves nested subqueries. I once saw 5 levels of nesting. EXPLAIN showed 5 full table scans. Every time, I had to manually rewrite it using CTEs or temp tables.

Crash 10: OFFSET Performance Trap

AI always generates LIMIT 10 OFFSET 20 for pagination. Fine for small datasets. On millions of rows, OFFSET gets exponentially slower. It never considers cursor-based pagination.

Crash 11: SELECT * — Still Alive

Despite being a well-known anti-pattern, AI loves SELECT *. On wide tables, this means unnecessary network transfer and memory. More insidiously, if the schema changes later, your application silently breaks.

Crash 12: Database Dialect Mixing

I maintain both MySQL and ClickHouse. AI regularly writes MySQL's IFNULL in ClickHouse queries (should be ifNull), or ClickHouse's arrayJoin in MySQL. Without specifying the database type in the prompt, dialect mixing happens about 30% of the time.

Crash 13: The Phantom WHERE Clause

The most insidious bug of all: AI once generated a DELETE query where the WHERE clause was syntactically present but semantically empty due to a logic error in a subquery. The result? Every row got deleted.

DELETE FROM users 
WHERE user_id IN (
    SELECT user_id FROM inactive_users -- This subquery returned ALL user_ids due to bug
)

We caught this in staging, not production. But it was a close call that changed how I review every DELETE/UPDATE statement.

7 Rules I Now Live By

After six months of scars, I developed a system:

Rule 1: Never Run AI SQL Unreviewed

I check three things before hitting execute:

JOIN conditions: Complete? Precise enough?
Aggregation logic: GROUP BY correct? Is DISTINCT truly necessary?
NULLs and edge cases: Division by zero? NULL concatenation? Type safety?

Rule 2: Your Prompt Is the Difference Between Garbage and Gold

My early prompts:

"Write a SQL query to get last month's order data"

My current prompts:

Database: MySQL 8.0
Table: orders(id, user_id, amount DECIMAL(10,2), status VARCHAR(20), created_at DATETIME)
Indexes: orders_created_at_idx
Row count: ~5 million
Requirement: Last month's total orders, total revenue, avg order value, daily trend
Note: amount may be NULL — handle appropriately. Only status='completed' orders.
Prefer CTEs over subqueries. No nested queries beyond 2 levels.

The more context, the better the SQL.

Rule 3: Build a Review Checklist

I keep a Markdown checklist pinned in my project wiki:
- [ ] All columns come from correct tables
- [ ] JOIN conditions are complete and precise
- [ ] GROUP BY includes all non-aggregated SELECT columns
- [ ] NULL handling is explicit
- [ ] Type conversions are correct
- [ ] Time ranges account for timezone and boundaries
- [ ] WHERE conditions can leverage indexes
- [ ] No SELECT *
- [ ] Large queries have LIMIT or pagination
- [ ] DELETE/UPDATE has a COUNT preview first

Rule 4: Make AI Review Its Own SQL

A trick I discovered recently: after AI generates SQL, ask it to review its own output.

"Please review the SQL you just generated:
1. Are there any performance issues?
2. Any logical gaps?
3. Can you write a better version?"

AI is surprisingly good at catching its own mistakes — because it "sees" the full text during review, whereas generation is token-by-token with limited look-ahead.

Rule 5: Test on a Sample First

For any query on production-sized data, I first run it with a small LIMIT or on a subset. If the sample looks right, I remove the limit and run for real. This catches 90% of logic errors before they burn CPU cycles.

Rule 6: Version Control Your SQL

AI-generated SQL goes through the same review process as hand-written code. Pull request, code review, merge. No direct-to-production queries, even for "simple" reports.

Rule 7: Know When Not to Trust AI

Scenario	AI Reliability	My Approach
Simple CRUD	★★★★★	Quick glance, run it
Single-table aggregation	★★★★☆	Check aggregation logic
Multi-table JOIN (2-3 tables)	★★★☆☆	Careful JOIN review
Complex analytical queries	★★☆☆☆	Validate each CTE, sample data first
Production DDL / data mutations	★☆☆☆☆	Write manually, AI only assists

Will AI Replace Data Engineers?

After all these crash stories, you might think I'm anti-AI. The opposite is true: 80% of my SQL is now AI-generated.

The key word is "generated," not "trusted."

AI is an incredibly efficient first-draft generator — but not yet a reliable final deliverable. It compressed my 8-hour workdays into 2 hours, but that remaining hour must be spent reviewing, testing, and optimizing.

AI writing SQL in 2026 is like a brilliant but green junior engineer — quick, creative, and completely unaware of which patterns cause production incidents.

Our job isn't to "let AI replace our SQL writing." It's to be the safety net beneath AI.

Six months in, my BI project hasn't crashed. The data hasn't had a major error. The team's efficiency has improved 2-3x. Every crash I've catalogued here was painful in the moment, but each one taught me something about the tool's limits — and my own blind spots.

AI's ability to write SQL will only improve. But for now, every time I'm about to press "Execute," I ask myself one question:

"Do I truly understand every line of this query?"

If yes — run it.

If no — read it again until I do.