What shape-mismatch bugs does AI-generated numpy code typically introduce?

NumPy broadcasting is one of the most powerful and most misused features in scientific Python. AI coding assistants produce broadcasting bugs at a measurable rate because the training corpus mixes idiomatic vectorized code with shape mistakes that the assistant has no way to validate without running the code. The dangerous case isn’t the bug that raises ValueError: operands could not be broadcast together — that one fails loudly. The dangerous case is the shape mismatch that happens to broadcast, producing a silently-wrong result of the wrong shape. This item probes whether a candidate can read AI-generated numpy code, recognize the silent broadcasting hazard, and apply a fix.

What this question tests

The concept is numpy’s broadcasting rules and the asymmetry between failures that raise and failures that silently succeed. NumPy aligns array shapes from the rightmost dimension. Two arrays broadcast if, for each aligned dimension, either the dimensions are equal or one of them is 1. If neither, broadcasting fails with ValueError. The hazard is that a (10,) 1-D array and a (10, 1) 2-D array broadcast to (10, 10), not to (10,) — a subtle mistake when one of the operands was supposed to be a column vector but was flattened upstream.

AI tools reproduce these bugs because broadcasting is shape-driven, and the assistant only sees code, not runtime shapes. When a prompt asks for “subtract the mean of each row from each row,” the AI may produce code that’s correct only if the input shapes match its assumption — and that assumption is often unstated in the prompt. The candidate’s job is to read the AI suggestion, check shapes mentally, and either fix the code or add an explicit reshape.

Why this is the right answer

The correct option identifies the silent shape-broadcasting mismatch and proposes an explicit reshape (or a keepdims=True on the upstream reduction) that constrains the broadcast to the intended shape. Here’s the canonical AI-generated bug:

# AI-generated: row-mean subtraction, looks vectorized
import numpy as np

def center_rows(X):
    """Subtract row mean from each row of X (shape: n_rows x n_cols)."""
    row_means = X.mean(axis=1)              # shape: (n_rows,)
    return X - row_means                    # shape: (n_rows, n_rows) !

The bug: X.mean(axis=1) returns a 1-D array of shape (n_rows,). When subtracted from X of shape (n_rows, n_cols), broadcasting aligns the rightmost dimension — so (n_rows,) aligns to the n_cols axis of X. If n_rows == n_cols (a square matrix), the operation broadcasts “successfully” and returns a square matrix of completely wrong values. If n_rows != n_cols, the operation raises ValueError. So the bug surfaces or hides depending on input shape.

The fix uses keepdims=True (or an explicit reshape) so the mean has shape (n_rows, 1), which broadcasts correctly against (n_rows, n_cols):

def center_rows(X):
    row_means = X.mean(axis=1, keepdims=True)  # shape: (n_rows, 1)
    return X - row_means                        # shape: (n_rows, n_cols)

keepdims=True is the modern numpy idiom for “keep the reduced axis as a length-1 dimension so subsequent broadcasting matches the original layout.” The same pattern works for np.sum, np.std, np.max, and most other reductions. Without keepdims, you’d write row_means[:, np.newaxis] or row_means.reshape(-1, 1) — both work, but keepdims=True is cleaner and signals intent.

A related AI-generated bug is the implicit-flatten:

# AI-generated: image-mean subtraction, looks fine for grayscale
def normalize(images):
    """Normalize a stack of images (shape: batch x H x W)."""
    mean = images.mean(axis=0)              # shape: (H, W)
    return (images - mean) / images.std()   # full-array std, not per-image!

The bug: images.std() with no axis returns a scalar, not a per-image std. The AI suggestion looks like a vectorized normalization but actually normalizes by a scalar that may not be what the prompt intended. The fix is explicit axis handling on every reduction.

What the wrong answers reveal

The three incorrect options each map to a common gap:

“The code is correct because numpy broadcasting handles shape alignment automatically.” Respondents picking this option are taking AI suggestions at face value and have not internalized that “broadcasting succeeds” doesn’t mean “broadcasting did the intended thing.” This is the highest-risk gap.
“The fix is to wrap the entire function in np.atleast_2d to ensure 2-D inputs.” Respondents picking this option recognize a shape problem but reach for an upstream workaround that doesn’t actually fix the reduction-axis issue. np.atleast_2d ensures the input is 2-D but doesn’t change how .mean(axis=1) reduces.
“The fix is to add a runtime assertion assert row_means.shape == X.shape[:1] to catch the mismatch.” Respondents picking this option recognize the shape problem and add an assertion, but the assertion would pass — the shapes do match what the buggy line produces. The respondent is debugging the wrong layer of the problem.

The first wrong-answer pattern is the costliest in production. A candidate who trusts AI numpy output without mentally tracing shapes will silently produce wrong-shape arrays that propagate downstream — including into ML models, where silently wrong shapes cause silently wrong loss values and silently wrong gradients.

How the sample test scores you

In the AIEH 5-question AI-Augmented Python sample, this item contributes one of five datapoints aggregated into a single ai_python_proficiency score via the W3.2 normalize-by-count threshold. Binary scoring per item: 5 for the correct option, 1 for any of the three wrong options. With 5 binary items, the average ranges 1–5 and the level threshold maps avg ≤ 2 to low, ≤ 4 to mid, > 4 to high.

Data Notice: Sample-test results are directional. A 5-question sample can flag general numpy shape-judgment skill but can’t distinguish a candidate who knows numpy broadcasting deeply from one who recognized this specific pattern; for a verified Skills Passport credential, take the full AI-Augmented Python assessment.

The full assessment probes numpy broadcasting, dtype promotion, memory layout, in-place vs out-of-place operations, and the specific failure modes AI coding assistants produce in numerical code. See the scoring methodology for how scores roll up to the AIEH 300–850 Skills Passport scale, and /assess/ for how teams use these scores in hiring.

np.einsum as a shape-explicit alternative. When broadcasting gets ambiguous, np.einsum lets you write the reduction as an explicit index equation ('ij,j->i' etc.) that documents shape intent in the source. AI tools rarely reach for einsum even when it would be clearer.
PyTorch and JAX broadcasting parity. Both PyTorch and JAX inherit numpy’s broadcasting rules almost exactly, so the same AI-generated shape bugs appear in deep-learning code with the same hazard profile. Skill transfers across the three libraries.
Shape-checking tools and gradual typing. Libraries like jaxtyping, torchtyping, and beartype add runtime or static shape annotations that catch broadcasting bugs at function boundaries. AI tools rarely produce shape-annotated code unprompted.

For deeper coverage of numerical-Python interview judgment, see ml engineering interview prep and algorithms data structures prep. Hiring teams can find verified AI-Augmented Python candidates at /hire/, and learners can prepare via /learn/ and skills-based hiring evidence. For broader context on the credential, start at cognitive ability in hiring.

Sources

NumPy Developers. (2024). NumPy User Guide: Broadcasting. https://numpy.org/doc/stable/user/basics.broadcasting.html
NumPy Developers. (2024). numpy.ndarray.mean and the keepdims parameter. https://numpy.org/doc/stable/reference/generated/numpy.mean.html
Harris, C. R., et al. (2020). Array programming with NumPy. Nature, 585(7825), 357–362. https://doi.org/10.1038/s41586-020-2649-2 — Foundational paper documenting numpy’s array semantics including broadcasting.
Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology. Psychological Bulletin, 124(2), 262–274. — Establishes structured code-review work samples (AIEH’s item format) as a high-validity selection method.

What this question tests

Why this is the right answer

What the wrong answers reveal

How the sample test scores you

Related concepts

Sources

Try the question yourself