From the ai augmented python sample test
What shape-mismatch bugs does AI-generated numpy code typically introduce?
NumPy broadcasting is one of the most powerful and most
misused features in scientific Python. AI coding assistants
produce broadcasting bugs at a measurable rate because the
training corpus mixes idiomatic vectorized code with shape
mistakes that the assistant has no way to validate without
running the code. The dangerous case isn’t the bug that raises
ValueError: operands could not be broadcast together — that
one fails loudly. The dangerous case is the shape mismatch
that happens to broadcast, producing a silently-wrong result
of the wrong shape. This item probes whether a candidate can
read AI-generated numpy code, recognize the silent
broadcasting hazard, and apply a fix.
What this question tests
The concept is numpy’s broadcasting rules and the asymmetry
between failures that raise and failures that silently
succeed. NumPy aligns array shapes from the rightmost
dimension. Two arrays broadcast if, for each aligned dimension,
either the dimensions are equal or one of them is 1. If
neither, broadcasting fails with ValueError. The hazard is
that a (10,) 1-D array and a (10, 1) 2-D array broadcast
to (10, 10), not to (10,) — a subtle mistake when one of
the operands was supposed to be a column vector but was
flattened upstream.
AI tools reproduce these bugs because broadcasting is shape-driven, and the assistant only sees code, not runtime shapes. When a prompt asks for “subtract the mean of each row from each row,” the AI may produce code that’s correct only if the input shapes match its assumption — and that assumption is often unstated in the prompt. The candidate’s job is to read the AI suggestion, check shapes mentally, and either fix the code or add an explicit reshape.
Why this is the right answer
The correct option identifies the silent shape-broadcasting
mismatch and proposes an explicit reshape (or a keepdims=True
on the upstream reduction) that constrains the broadcast to
the intended shape. Here’s the canonical AI-generated bug:
# AI-generated: row-mean subtraction, looks vectorized
import numpy as np
def center_rows(X):
"""Subtract row mean from each row of X (shape: n_rows x n_cols)."""
row_means = X.mean(axis=1) # shape: (n_rows,)
return X - row_means # shape: (n_rows, n_rows) !
The bug: X.mean(axis=1) returns a 1-D array of shape
(n_rows,). When subtracted from X of shape
(n_rows, n_cols), broadcasting aligns the rightmost dimension
— so (n_rows,) aligns to the n_cols axis of X. If
n_rows == n_cols (a square matrix), the operation broadcasts
“successfully” and returns a square matrix of completely wrong
values. If n_rows != n_cols, the operation raises
ValueError. So the bug surfaces or hides depending on input
shape.
The fix uses keepdims=True (or an explicit reshape) so the
mean has shape (n_rows, 1), which broadcasts correctly
against (n_rows, n_cols):
def center_rows(X):
row_means = X.mean(axis=1, keepdims=True) # shape: (n_rows, 1)
return X - row_means # shape: (n_rows, n_cols)
keepdims=True is the modern numpy idiom for “keep the reduced
axis as a length-1 dimension so subsequent broadcasting matches
the original layout.” The same pattern works for np.sum,
np.std, np.max, and most other reductions. Without
keepdims, you’d write row_means[:, np.newaxis] or
row_means.reshape(-1, 1) — both work, but keepdims=True is
cleaner and signals intent.
A related AI-generated bug is the implicit-flatten:
# AI-generated: image-mean subtraction, looks fine for grayscale
def normalize(images):
"""Normalize a stack of images (shape: batch x H x W)."""
mean = images.mean(axis=0) # shape: (H, W)
return (images - mean) / images.std() # full-array std, not per-image!
The bug: images.std() with no axis returns a scalar, not a
per-image std. The AI suggestion looks like a vectorized
normalization but actually normalizes by a scalar that may not
be what the prompt intended. The fix is explicit axis handling
on every reduction.
What the wrong answers reveal
The three incorrect options each map to a common gap:
- “The code is correct because numpy broadcasting handles shape alignment automatically.” Respondents picking this option are taking AI suggestions at face value and have not internalized that “broadcasting succeeds” doesn’t mean “broadcasting did the intended thing.” This is the highest-risk gap.
- “The fix is to wrap the entire function in
np.atleast_2dto ensure 2-D inputs.” Respondents picking this option recognize a shape problem but reach for an upstream workaround that doesn’t actually fix the reduction-axis issue.np.atleast_2densures the input is 2-D but doesn’t change how.mean(axis=1)reduces. - “The fix is to add a runtime assertion
assert row_means.shape == X.shape[:1]to catch the mismatch.” Respondents picking this option recognize the shape problem and add an assertion, but the assertion would pass — the shapes do match what the buggy line produces. The respondent is debugging the wrong layer of the problem.
The first wrong-answer pattern is the costliest in production. A candidate who trusts AI numpy output without mentally tracing shapes will silently produce wrong-shape arrays that propagate downstream — including into ML models, where silently wrong shapes cause silently wrong loss values and silently wrong gradients.
How the sample test scores you
In the AIEH 5-question AI-Augmented Python sample, this item contributes one of five datapoints aggregated into a single ai_python_proficiency score via the W3.2 normalize-by-count threshold. Binary scoring per item: 5 for the correct option, 1 for any of the three wrong options. With 5 binary items, the average ranges 1–5 and the level threshold maps avg ≤ 2 to low, ≤ 4 to mid, > 4 to high.
Data Notice: Sample-test results are directional. A 5-question sample can flag general numpy shape-judgment skill but can’t distinguish a candidate who knows numpy broadcasting deeply from one who recognized this specific pattern; for a verified Skills Passport credential, take the full AI-Augmented Python assessment.
The full assessment probes numpy broadcasting, dtype promotion, memory layout, in-place vs out-of-place operations, and the specific failure modes AI coding assistants produce in numerical code. See the scoring methodology for how scores roll up to the AIEH 300–850 Skills Passport scale, and /assess/ for how teams use these scores in hiring.
Related concepts
np.einsumas a shape-explicit alternative. When broadcasting gets ambiguous,np.einsumlets you write the reduction as an explicit index equation ('ij,j->i'etc.) that documents shape intent in the source. AI tools rarely reach foreinsumeven when it would be clearer.- PyTorch and JAX broadcasting parity. Both PyTorch and JAX inherit numpy’s broadcasting rules almost exactly, so the same AI-generated shape bugs appear in deep-learning code with the same hazard profile. Skill transfers across the three libraries.
- Shape-checking tools and gradual typing. Libraries like
jaxtyping,torchtyping, andbeartypeadd runtime or static shape annotations that catch broadcasting bugs at function boundaries. AI tools rarely produce shape-annotated code unprompted.
For deeper coverage of numerical-Python interview judgment, see ml engineering interview prep and algorithms data structures prep. Hiring teams can find verified AI-Augmented Python candidates at /hire/, and learners can prepare via /learn/ and skills-based hiring evidence. For broader context on the credential, start at cognitive ability in hiring.
Sources
- NumPy Developers. (2024). NumPy User Guide: Broadcasting. https://numpy.org/doc/stable/user/basics.broadcasting.html
- NumPy Developers. (2024). numpy.ndarray.mean and the keepdims parameter. https://numpy.org/doc/stable/reference/generated/numpy.mean.html
- Harris, C. R., et al. (2020). Array programming with NumPy. Nature, 585(7825), 357–362. https://doi.org/10.1038/s41586-020-2649-2 — Foundational paper documenting numpy’s array semantics including broadcasting.
- Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology. Psychological Bulletin, 124(2), 262–274. — Establishes structured code-review work samples (AIEH’s item format) as a high-validity selection method.