How do you detect an implicit Cartesian join in AI-generated SQL?

The item presents a multi-table SQL query produced by an AI assistant and asks the candidate to identify whether it contains an implicit Cartesian join — a join with no predicate, or with a predicate that fails to constrain the relationship between the joined tables. The competence under test is the ability to read AI-generated SQL critically, recognize the row-explosion pattern before it ships to production, and articulate why the AI got it wrong. Implicit Cartesian joins are among the most expensive bugs in AI-suggested SQL because they often pass syntactic review, sometimes pass row-count smoke tests on small fixtures, and only manifest as a problem when a previously empty table acquires data.

What this question tests

The concept being tested is the distinction between a join that is syntactically well-formed and a join that semantically constrains the relationship between two tables. A query that lists multiple tables in the FROM clause without a connecting ON predicate, or whose ON predicate is always true (e.g., ON 1=1, or ON a.x = a.x), produces the Cartesian product — every row from one table paired with every row from the other. The result is correct under the SQL standard but almost never what the developer intended.

AI assistants generate implicit Cartesian joins in three recurring patterns: omitting the ON clause when joining a small dimension table that happens to have a single row in the training-data fixture; writing ON a.id = b.id when the foreign key actually lives on b.a_id; and using CROSS JOIN explicitly but forgetting to add a WHERE predicate that constrains the relationship downstream. The probe tests whether the candidate recognizes all three.

Why this is the right answer

Consider an AI-generated query that looks reasonable on first read:

-- AI-generated: missing the ON predicate entirely
SELECT
  c.customer_id,
  c.name,
  o.order_id,
  o.amount
FROM customers c, orders o
WHERE c.region = 'NA';

The comma-separated FROM clause is the legacy SQL-89 join syntax, which is still valid but produces the Cartesian product when no join predicate appears in the WHERE clause. With 10,000 customers in NA and 1,000,000 orders, this query returns 10 billion rows. The fix is to add the missing predicate:

-- Corrected: predicate connects the tables
SELECT
  c.customer_id,
  c.name,
  o.order_id,
  o.amount
FROM customers c
JOIN orders o ON o.customer_id = c.customer_id
WHERE c.region = 'NA';

The right answer recognizes the legacy comma-join syntax as a red flag and verifies that some predicate in the query constrains the join. A subtler variant uses explicit JOIN but with an always-true predicate:

-- AI-generated: ON clause is syntactically valid but
-- semantically vacuous
SELECT c.name, o.amount
FROM customers c
JOIN orders o ON c.region = 'NA';

Here the ON clause filters customers by region but does not join the two tables. Each NA customer is paired with every order, regardless of whether the order belongs to that customer. The query runs, produces output, and may even pass a smoke test that only checks for non-zero rows — but the result is nonsense. The right answer recognizes that the ON predicate must reference columns from both joined tables to constrain the relationship.

A third diagnostic pattern: the query references a join column that exists on neither table, but the AI has confidently hallucinated it. Reading the schema (or running EXPLAIN) catches this; trusting the AI’s output does not.

What the wrong answers reveal

The three incorrect options each map to a common but mistaken mental model:

“The query is fine because it returns rows when run on a test fixture.” Empty tables, single-row fixtures, and permissive smoke tests all hide Cartesian joins. The Cartesian product of two empty tables is empty; the Cartesian product of a 1-row table and a 100-row table looks identical to a correct join in row count. Respondents picking this option have a smoke-test mental model and haven’t internalized that row-count checks don’t catch this class of bug.
“Cartesian joins are obvious — they always show up as CROSS JOIN in the AI output.” This is true for explicit Cartesian joins. The hard cases are implicit ones where the AI used comma-join syntax or wrote a vacuous ON clause. Respondents picking this option recognize the explicit form but miss the implicit forms that AI tools more commonly produce.
“Modern query optimizers reject Cartesian joins; if the query runs, it must be correctly joined.” Optimizers do warn about Cartesian products in some plan-display tools, but they do not reject the query — the result of a Cartesian product is well-defined and sometimes intentional (e.g., generating a date series cross-joined to a category list). Respondents picking this option conflate “the optimizer warns about it” with “the optimizer prevents it.”

The wrong-answer patterns cluster into three distinct misreadings of how Cartesian joins manifest in production code.

How the sample test scores you

In the AIEH 5-question AI-Augmented SQL sample, this item contributes one of five datapoints aggregated into a single ai_augmented_sql score via the W3.2 normalize-by-count threshold. Binary scoring per item: 5 for the correct option, 1 for any of the three wrong options. With 5 binary items, the average ranges 1–5 and the level threshold maps avg ≤ 2 to low, ≤ 4 to mid, > 4 to high.

Data Notice: Sample-test results are directional indicators only. A 5-question sample can’t reliably distinguish a candidate who reads AI SQL critically from one who got lucky on these specific items; for a verified Skills Passport credential, take the full 50-question assessment.

The full assessment probes Cartesian-join detection across multiple syntax variants (comma-join, vacuous ON, missing multi-column key, hallucinated column), tests the candidate’s ability to read EXPLAIN output, and includes adversarial items where the AI suggestion is correct and the candidate must resist over-rejecting. See the scoring methodology for how AI-Augmented SQL scores map onto the AIEH 300–850 Skills Passport scale.

Composite-key joins. When the foreign-key relationship spans multiple columns (e.g., (tenant_id, user_id)), AI tools often join on only one of the columns, producing a partial Cartesian explosion within each tenant. The fix is to verify that the ON clause covers every column of the composite key, not just the most-named one.
USING clause as a Cartesian-prevention idiom. Writing JOIN orders USING (customer_id) requires both tables to have a column named customer_id; if the AI has hallucinated the column name on one side, the query fails at parse time rather than producing nonsense at runtime. Some teams standardize on USING specifically to make this class of bug fail loudly.
Outer-join asymmetry. A LEFT JOIN with a missing or vacuous predicate doesn’t produce a Cartesian product in the same way — it produces every row from the left table paired with every matching row from the right (or NULL when no match). The bug manifests as duplicated left-table rows rather than as an N×M explosion, which makes it harder to catch by row-count alone.
EXPLAIN plan inspection as the canonical fix. Every major engine surfaces “Nested Loop without join condition” or equivalent text in the plan output for a Cartesian join. Reading the plan is the cheapest way to verify that an AI- suggested query joins the way the candidate expects.

For the broader AI-Augmented SQL lineup, see the tests catalog. For how this skill fits into a hiring loop, see hiring loop design and interview question design.

Sources

ISO/IEC 9075-2:2023. Information technology — Database languages — SQL — Part 2: Foundation (SQL/Foundation). International Organization for Standardization. — Cross product semantics in clause 7.7 (joined table).
PostgreSQL Global Development Group. (2024). PostgreSQL Documentation: SELECT — FROM Clause. https://www.postgresql.org/docs/current/sql-select.html
Oracle Corporation. (2023). Oracle Database SQL Language Reference: Joins. https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/Joins.html
Sackett, P. R., & Lievens, F. (2008). Personnel selection. Annual Review of Psychology, 59, 419–450. — Reviews evidence that work-sample tests (of which AI-augmented code review is a contemporary form) show among the highest validity coefficients for predicting job performance in cognitively demanding roles.

What this question tests

Why this is the right answer

What the wrong answers reveal

How the sample test scores you

Related concepts

Sources

Try the question yourself