From the ai augmented python sample test
Why does AI sometimes suggest mutating a list while iterating it, and what breaks?
A surprisingly common pattern in AI-suggested Python code is a
loop that iterates a list while calling .remove(), .pop(),
or del on the same list inside the loop body. The code looks
plausible — even idiomatic — and frequently passes a hand-run
test on a tiny example. Then it ships, runs against a realistic
input, and silently skips half the elements. This item probes
whether a candidate can read AI-generated code, recognize the
iterator-invalidation pattern, and propose a correct
replacement.
What this question tests
The concept is mutation of a sequence during iteration over that same sequence. Python’s list iterator advances by index internally; when an element is removed, every subsequent element shifts one position to the left, and the iterator’s next index now points past the element that took the removed element’s slot. The loop silently skips elements rather than raising — which is precisely what makes the bug dangerous.
AI coding assistants reproduce this pattern frequently for two reasons. First, the training corpus contains a great deal of beginner-level Python tutorials where the buggy pattern is shown as a “natural” first attempt before being corrected. Second, the natural-language prompt “remove all items where X” maps onto the literal English sentence structure better than onto the correct list-comprehension or filter idiom. The candidate’s job is not to memorize that AI does this — it’s to read code critically and recognize the pattern regardless of who or what wrote it.
Why this is the right answer
The correct option identifies the iterator-invalidation problem and proposes a replacement that builds a new list rather than mutating the original. The canonical AI-generated bug looks like this:
def remove_inactive(users):
# AI-generated: looks fine, silently skips users
for user in users:
if not user.active:
users.remove(user)
return users
Run this on [A_inactive, B_inactive, C_active] and you get
back [B_inactive, C_active] — the iterator skipped B
because removing A shifted B into index 0 while the
iterator advanced to index 1. The bug compounds with every
consecutive inactive user, so test cases with isolated
inactive users mask it; only realistic input surfaces the
problem.
The correct fix builds a new list:
def remove_inactive(users):
return [user for user in users if user.active]
Or, if mutation is required for downstream identity reasons:
def remove_inactive(users):
users[:] = [user for user in users if user.active]
return users
The list comprehension iterates the original, builds a new list of survivors, and either returns it or assigns it back to the original list’s slice. No iterator sees a list mutating underneath it.
For dict and set, the equivalent bug raises
RuntimeError: dictionary changed size during iteration
rather than silently skipping — a kinder failure mode but the
same underlying issue. The fix is the same shape: iterate a
snapshot (list(d.items())) or build a new collection.
What the wrong answers reveal
The three incorrect options each map to a common misunderstanding:
- “This code is correct; Python’s
forloop handles mutation safely.” Respondents picking this option are taking the AI suggestion at face value and haven’t internalized iterator semantics. This is the most concerning gap because the candidate would ship the bug. - “The code raises a
RuntimeErrorbecause lists can’t be mutated during iteration.” Respondents picking this option are conflating dict/set behavior (which does raise) with list behavior (which silently skips). The mental model is half-correct but missing the asymmetry — and “raises” is a much friendlier failure mode than “silently produces wrong output,” so respondents holding this model would be less cautious in code review than they should be. - “The fix is to iterate
range(len(users))and decrement the index when removing.” Respondents picking this option recognize the iterator problem but propose a fragile manual-index workaround that often introduces off-by-one bugs of its own. It also misses the cleaner functional-style fix that the Python ecosystem strongly prefers.
The wrong-answer pattern that matters most is the first: the candidate who takes AI output at face value rather than reading it critically. That gap shows up in production as shipped bugs.
How the sample test scores you
In the AIEH 5-question AI-Augmented Python sample, this item contributes one of five datapoints aggregated into a single ai_python_proficiency score via the W3.2 normalize-by-count threshold. Binary scoring per item: 5 for the correct option, 1 for any of the three wrong options. With 5 binary items, the average ranges 1–5 and the level threshold maps avg ≤ 2 to low, ≤ 4 to mid, > 4 to high.
Data Notice: Sample-test results are directional indicators only. A 5-question sample can’t reliably distinguish “knows AI-generated bug patterns” from “got lucky on these specific items”; for a verified Skills Passport credential, take the full assessment. See the scoring methodology for how AI-Augmented Python scores roll up to the AIEH 300–850 scale.
The full AI-Augmented Python assessment probes prompt-design hygiene, AI-output review, hallucinated-API detection, and the specific failure modes of common AI coding assistants at depth.
Related concepts
- Iterator invalidation in C++ and Java. The same family of
bugs exists in C++ (
std::vector::eraseinvalidates iterators past the erased element) and Java (ConcurrentModificationExceptionfromArrayListmutation during iteration). Python’s silent-skip behavior is unusual; most other languages raise. itertools.filterfalseand the functional-style idiom. Modern Python style prefers building a new collection over mutating in place. The list comprehension[x for x in xs if pred(x)]anditertools.filterfalse(neg_pred, xs)are the canonical patterns.- AI coding assistants as accelerated junior developers. Most AI-generated code is correct on simple inputs and wrong on adversarial inputs in roughly the same distribution as junior-developer code. The skill being tested is code review, not AI-prompt engineering. See AI fluency in hiring for how AIEH frames this skill.
For the broader AI-Augmented Python lineup including the full assessment when it ships, see the tests catalog or explore skills-based hiring evidence for the research backing the credential. Employers comparing candidates can use /hire/ to filter for verified AI-Augmented Python proficiency, and learners can prepare via /learn/.
Sources
- Python Software Foundation. (2024). The Python Language Reference: The for statement. https://docs.python.org/3/reference/compound_stmts.html#the-for-statement
- Python Software Foundation. (2024). The Python Tutorial: Looping Techniques. https://docs.python.org/3/tutorial/datastructures.html#looping-techniques
- Slatkin, B. (2019). Effective Python: 90 Specific Ways to
Write Better Python (2nd ed.). Addison-Wesley. — Item 13
(“Prefer Catch-All Unpacking Over Slicing”) and Item 27
(“Use Comprehensions Instead of
mapandfilter”) cover the canonical replacements for in-place mutation patterns. - Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology. Psychological Bulletin, 124(2), 262–274. — Establishes work-sample tests (which code-review items emulate) as high-validity selection signal.