Why does AI sometimes suggest mutating a list while iterating it, and what breaks?

A surprisingly common pattern in AI-suggested Python code is a loop that iterates a list while calling .remove(), .pop(), or del on the same list inside the loop body. The code looks plausible — even idiomatic — and frequently passes a hand-run test on a tiny example. Then it ships, runs against a realistic input, and silently skips half the elements. This item probes whether a candidate can read AI-generated code, recognize the iterator-invalidation pattern, and propose a correct replacement.

What this question tests

The concept is mutation of a sequence during iteration over that same sequence. Python’s list iterator advances by index internally; when an element is removed, every subsequent element shifts one position to the left, and the iterator’s next index now points past the element that took the removed element’s slot. The loop silently skips elements rather than raising — which is precisely what makes the bug dangerous.

AI coding assistants reproduce this pattern frequently for two reasons. First, the training corpus contains a great deal of beginner-level Python tutorials where the buggy pattern is shown as a “natural” first attempt before being corrected. Second, the natural-language prompt “remove all items where X” maps onto the literal English sentence structure better than onto the correct list-comprehension or filter idiom. The candidate’s job is not to memorize that AI does this — it’s to read code critically and recognize the pattern regardless of who or what wrote it.

Why this is the right answer

The correct option identifies the iterator-invalidation problem and proposes a replacement that builds a new list rather than mutating the original. The canonical AI-generated bug looks like this:

def remove_inactive(users):
    # AI-generated: looks fine, silently skips users
    for user in users:
        if not user.active:
            users.remove(user)
    return users

Run this on [A_inactive, B_inactive, C_active] and you get back [B_inactive, C_active] — the iterator skipped B because removing A shifted B into index 0 while the iterator advanced to index 1. The bug compounds with every consecutive inactive user, so test cases with isolated inactive users mask it; only realistic input surfaces the problem.

The correct fix builds a new list:

def remove_inactive(users):
    return [user for user in users if user.active]

Or, if mutation is required for downstream identity reasons:

def remove_inactive(users):
    users[:] = [user for user in users if user.active]
    return users

The list comprehension iterates the original, builds a new list of survivors, and either returns it or assigns it back to the original list’s slice. No iterator sees a list mutating underneath it.

For dict and set, the equivalent bug raises RuntimeError: dictionary changed size during iteration rather than silently skipping — a kinder failure mode but the same underlying issue. The fix is the same shape: iterate a snapshot (list(d.items())) or build a new collection.

What the wrong answers reveal

The three incorrect options each map to a common misunderstanding:

“This code is correct; Python’s for loop handles mutation safely.” Respondents picking this option are taking the AI suggestion at face value and haven’t internalized iterator semantics. This is the most concerning gap because the candidate would ship the bug.
“The code raises a RuntimeError because lists can’t be mutated during iteration.” Respondents picking this option are conflating dict/set behavior (which does raise) with list behavior (which silently skips). The mental model is half-correct but missing the asymmetry — and “raises” is a much friendlier failure mode than “silently produces wrong output,” so respondents holding this model would be less cautious in code review than they should be.
“The fix is to iterate range(len(users)) and decrement the index when removing.” Respondents picking this option recognize the iterator problem but propose a fragile manual-index workaround that often introduces off-by-one bugs of its own. It also misses the cleaner functional-style fix that the Python ecosystem strongly prefers.

The wrong-answer pattern that matters most is the first: the candidate who takes AI output at face value rather than reading it critically. That gap shows up in production as shipped bugs.

How the sample test scores you

In the AIEH 5-question AI-Augmented Python sample, this item contributes one of five datapoints aggregated into a single ai_python_proficiency score via the W3.2 normalize-by-count threshold. Binary scoring per item: 5 for the correct option, 1 for any of the three wrong options. With 5 binary items, the average ranges 1–5 and the level threshold maps avg ≤ 2 to low, ≤ 4 to mid, > 4 to high.

Data Notice: Sample-test results are directional indicators only. A 5-question sample can’t reliably distinguish “knows AI-generated bug patterns” from “got lucky on these specific items”; for a verified Skills Passport credential, take the full assessment. See the scoring methodology for how AI-Augmented Python scores roll up to the AIEH 300–850 scale.

The full AI-Augmented Python assessment probes prompt-design hygiene, AI-output review, hallucinated-API detection, and the specific failure modes of common AI coding assistants at depth.

Iterator invalidation in C++ and Java. The same family of bugs exists in C++ (std::vector::erase invalidates iterators past the erased element) and Java (ConcurrentModificationException from ArrayList mutation during iteration). Python’s silent-skip behavior is unusual; most other languages raise.
itertools.filterfalse and the functional-style idiom. Modern Python style prefers building a new collection over mutating in place. The list comprehension [x for x in xs if pred(x)] and itertools.filterfalse(neg_pred, xs) are the canonical patterns.
AI coding assistants as accelerated junior developers. Most AI-generated code is correct on simple inputs and wrong on adversarial inputs in roughly the same distribution as junior-developer code. The skill being tested is code review, not AI-prompt engineering. See AI fluency in hiring for how AIEH frames this skill.

For the broader AI-Augmented Python lineup including the full assessment when it ships, see the tests catalog or explore skills-based hiring evidence for the research backing the credential. Employers comparing candidates can use /hire/ to filter for verified AI-Augmented Python proficiency, and learners can prepare via /learn/.

Sources

Python Software Foundation. (2024). The Python Language Reference: The for statement. https://docs.python.org/3/reference/compound_stmts.html#the-for-statement
Python Software Foundation. (2024). The Python Tutorial: Looping Techniques. https://docs.python.org/3/tutorial/datastructures.html#looping-techniques
Slatkin, B. (2019). Effective Python: 90 Specific Ways to Write Better Python (2nd ed.). Addison-Wesley. — Item 13 (“Prefer Catch-All Unpacking Over Slicing”) and Item 27 (“Use Comprehensions Instead of map and filter”) cover the canonical replacements for in-place mutation patterns.
Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology. Psychological Bulletin, 124(2), 262–274. — Establishes work-sample tests (which code-review items emulate) as high-validity selection signal.

What this question tests

Why this is the right answer

What the wrong answers reveal

How the sample test scores you

Related concepts

Sources

Try the question yourself