When does selecting a non-aggregated column with GROUP BY produce wrong results across dialects?

The item shows a query that selects a column not in the GROUP BY list and not wrapped in an aggregate, and asks how the query behaves across PostgreSQL, MySQL (with and without ONLY_FULL_GROUP_BY), and SQL Server. The competence under test is whether the candidate understands that the same SQL text produces an error on some engines, a deterministic result on others, and a non-deterministic result on still others — and whether they recognize this as one of the most insidious AI-generated SQL bugs because the AI’s training data conflates all three behaviors.

What this question tests

The concept being tested is the SQL standard’s “functionally dependent” rule for GROUP BY and the inconsistent way different engines enforce it. Under the standard (SQL:1999 and later), every column in the SELECT list of a query with GROUP BY must either appear in the GROUP BY clause, be wrapped in an aggregate function, or be functionally dependent on a column in the GROUP BY clause (typically because the grouped column is a primary key of the table the selected column comes from).

PostgreSQL implements the rule strictly and rejects queries that violate it at parse time. MySQL ships with ONLY_FULL_GROUP_BY enabled by default since 5.7.5 — also rejecting violations — but the setting can be turned off, and older codebases routinely run with it disabled. With the setting off, MySQL accepts the query and returns a value from some row in the group, with no guarantee about which row. SQL Server strictly enforces the rule and rejects the query. SQLite is permissive by default and returns a value from some row.

AI assistants generate queries that work on the engine referenced most heavily in their training data and silently break on others. The probe tests whether the candidate recognizes the portability failure mode.

Why this is the right answer

Consider an AI-generated query intended to find the highest-paid employee per department:

-- AI-generated: violates standard GROUP BY rules
SELECT
  department_id,
  employee_name,
  MAX(salary) AS top_salary
FROM employees
GROUP BY department_id;

The intent appears to be “show the name of the employee earning the maximum salary in each department.” The query does not actually do this — employee_name is not in the GROUP BY clause, not aggregated, and not functionally dependent on department_id. On PostgreSQL the query errors with column "employees.employee_name" must appear in the GROUP BY clause or be used in an aggregate function. On MySQL with ONLY_FULL_GROUP_BY enabled (the default since 5.7.5), the same error. On MySQL with the setting off, the query runs and returns some employee name from each department — which employee is implementation-defined and may change between runs. On SQL Server the query errors. On SQLite it runs and returns some row.

The right answer recognizes this and articulates the standard fix: a self-join, window function, or subquery that explicitly links the name to the maximum salary.

-- Window-function fix: deterministic, portable
SELECT department_id, employee_name, salary AS top_salary
FROM (
  SELECT
    department_id,
    employee_name,
    salary,
    ROW_NUMBER() OVER (
      PARTITION BY department_id
      ORDER BY salary DESC
    ) AS rn
  FROM employees
) ranked
WHERE rn = 1;
-- Self-join fix: also portable, sometimes faster on small data
SELECT e.department_id, e.employee_name, e.salary
FROM employees e
JOIN (
  SELECT department_id, MAX(salary) AS max_salary
  FROM employees
  GROUP BY department_id
) m
  ON m.department_id = e.department_id
 AND m.max_salary = e.salary;

Both forms are explicit about how employee_name is selected, work the same way across dialects, and handle the tie case (window function picks one row arbitrarily; self-join returns all tied rows) deterministically.

What the wrong answers reveal

The three incorrect options each map to a common but mistaken mental model:

  • “The query is valid SQL and returns the name of the highest-paid employee per department.” This is the most damaging misreading because it appears to work on permissive MySQL configurations. Respondents picking this option have internalized a non-standard MySQL mental model and misattribute the “first row” result to the MAX(salary) aggregate, when in fact the engine returned an arbitrary row.
  • “The query is invalid SQL on every engine and always raises a parse error.” This is true under standard-strict configurations but not universally. MySQL with ONLY_FULL_GROUP_BY off, SQLite, and older MySQL versions all accept the query. Respondents picking this option know the standard but underestimate how often production code runs on permissive configurations.
  • “The query works correctly on every engine because the MAX aggregate forces the engine to pick the row with the maximum salary.” This is the conceptual error at the heart of the bug — MAX(salary) computes the maximum salary per group, but does not constrain which row’s employee_name is selected. The two columns are evaluated independently. Respondents picking this option have misunderstood how aggregates and non-aggregated columns interact.

The wrong-answer patterns cluster into three distinct misreadings: trusting permissive-MySQL behavior, over- generalizing strict-mode rejection, and conflating aggregate evaluation with row selection.

How the sample test scores you

In the AIEH 5-question AI-Augmented SQL sample, this item contributes one of five datapoints aggregated into a single ai_augmented_sql score via the W3.2 normalize-by-count threshold. Binary scoring per item: 5 for the correct option, 1 for any of the three wrong options. With 5 binary items, the average ranges 1–5 and the level threshold maps avg ≤ 2 to low, ≤ 4 to mid, > 4 to high.

Data Notice: Sample-test results are directional indicators only. A 5-question sample can’t reliably distinguish “knows standard GROUP BY rules” from “got lucky on these specific items”; for a verified Skills Passport credential, take the full 50-question assessment.

The full assessment probes GROUP BY semantics, functional dependency detection (PostgreSQL allows non-grouped columns that are functionally dependent on a primary-key grouping column), HAVING versus WHERE semantics, and grouping with ROLLUP, CUBE, and GROUPING SETS. See the scoring methodology for how AI-Augmented SQL scores map onto the AIEH 300–850 Skills Passport scale.

  • Functional dependency exemption. SQL:1999 specifies that a non-aggregated column is permitted in SELECT if it is functionally dependent on a column in GROUP BY. PostgreSQL implements this for primary-key dependencies: GROUP BY primary_key permits selecting any other column from the same table without aggregation. MySQL 5.7.5+ implements a similar but narrower rule. Recognizing when this exemption applies is part of intermediate SQL fluency.
  • ANY_VALUE as an explicit “I don’t care” idiom. MySQL and BigQuery provide ANY_VALUE(column) to explicitly tell the engine “return any value from the group; I accept the non-determinism.” This is the right way to write the permissive MySQL behavior intentionally, and it makes the intent visible in code review. AI tools rarely emit ANY_VALUE; reviewers can suggest it as the explicit form.
  • DISTINCT ON (PostgreSQL). PostgreSQL’s SELECT DISTINCT ON (department_id) employee_name, salary FROM employees ORDER BY department_id, salary DESC expresses “for each department, return the row with the highest salary” directly, without the window-function or self-join machinery. It’s a non-standard extension but often the clearest way to express the intent on PostgreSQL.
  • The GROUP BY rule extends to ORDER BY in some dialects. Several engines reject ORDER BY non_aggregated in a grouped query for the same reason. AI tools often emit this pattern when sorting aggregated results, producing another portability surprise.

For the broader AI-Augmented SQL lineup, see the tests catalog. For how SQL skill probes fit into a hiring loop, see skills-based hiring evidence and cognitive ability in hiring.


Sources

  • ISO/IEC 9075-2:2023. Information technology — Database languages — SQL — Part 2: Foundation (SQL/Foundation). International Organization for Standardization. — Functional dependency rules for GROUP BY defined in clause 7.12.
  • MySQL Documentation. (2024). MySQL Handling of GROUP BY: ONLY_FULL_GROUP_BY. https://dev.mysql.com/doc/refman/8.0/en/group-by-handling.html
  • PostgreSQL Global Development Group. (2024). PostgreSQL Documentation: GROUP BY Clause. https://www.postgresql.org/docs/current/sql-select.html#SQL-GROUPBY
  • Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124(2), 262–274. — Job-knowledge tests of the kind used to probe SQL fluency show validity coefficients in the .48–.51 range against job performance criteria.

Try the question yourself

This explainer covers what the item measures. To see how you score on the full ai augmented sql family, take the free 5-question sample.

Take the ai augmented sql sample