What does the "confounding vs causation" cognitive-reasoning question test?

Note on framing: This is the cog_sample_3 item-level explainer for the AIEH cognitive-reasoning sample-test family. The item draws on the canonical causal-inference literature including Pearl’s structural causal model framework and the directed-acyclic-graph (DAG) approach to confounder identification.

This item presents an observational-data finding — typically of the form “employees who use [tool X] have [outcome Y] at rate Z” — and asks the candidate to evaluate whether the data supports a causal claim that tool X causes outcome Y. The intuitive answer assumes correlation implies causation; the correct answer recognizes that the observational design admits at least one plausible confounder and that the causal claim is therefore unsupported until the confounder is controlled. The scenario probes confounder identification in observational data, one of the foundational skills in causal reasoning and one of the most-skipped steps in casual interpretation of business and scientific findings.

What this question tests

The item targets the ability to distinguish observational correlation from established causation, specifically by identifying a plausible confounder — a third variable that causes both the proposed cause and the proposed effect, producing the correlation without any direct causal link. The skill matters because observational data is the overwhelmingly dominant source of “evidence” in business contexts: dashboards, A/B test post-mortems run on non-random groups, retrospective hiring analyses, customer behavior cohorting, and most operational metrics work all produce correlational findings that respondents must evaluate without the benefit of randomized-controlled designs.

The construct sits at the deliberate-reasoning end of Stanovich’s rationality framework: untrained reasoners default to causal interpretation of correlations, and the trained-skill overlay is the discipline of pausing to enumerate plausible confounders before accepting the causal interpretation. Pearl’s structural causal model work formalized confounder identification using directed acyclic graphs (DAGs), but the underlying skill — recognizing when a third variable could explain the correlation — is older and is teachable independent of the formal apparatus. AIEH’s cognitive-reasoning items test the trained-skill overlay rather than the formal DAG-construction skill, because the applied-job-relevance target is everyday data interpretation rather than academic-grade causal-inference work.

The confounding-vs-causation skill correlates with on-the- job performance in roles that interpret observational data: product analysts, ops engineers, healthcare administrators, recruiting analysts, and AI-output evaluators all benefit from the trained habit of pausing on a correlational finding to ask “what else could produce this pattern?” Schmidt & Hunter’s 1998 meta-analytic evidence on cognitive-ability validity in hiring extends to this applied-skill component: the corrected validity of general mental ability for complex roles (~0.65) is partly mediated by the candidate’s ability to do this kind of causal-reasoning work on the job.

Why this is the right answer (concrete worked example)

The correct response is the data does not support the causal claim because at least one plausible confounder explains the correlation, and the observational design cannot rule it out. The right answer requires both recognizing the limitation and naming a specific plausible confounder.

A worked example: the item presents the finding, “Employees who attended the optional management-skills workshop were promoted at twice the rate of employees who did not attend, over the following 18 months.” The intuitive interpretation — that the workshop caused the promotions — would justify scaling the workshop. The problem is that workshop attendance was self-selected, not assigned. Plausible confounders include:

Ambition / promotion-seeking disposition. Employees who actively seek promotion are both more likely to attend an optional skill-building workshop and more likely to be promoted. The workshop attendance and the promotion are both downstream of the underlying ambition, with no direct causal link between them.
Manager visibility. Employees with high-engagement managers may be more likely to be told about the workshop AND more likely to be considered for promotion; the manager-visibility variable causes both.
Tenure / seniority. Long-tenure employees may have both more flexibility to attend optional workshops AND more accumulated promotion-eligibility; tenure causes both.

Each of these is a plausible confounder that produces the observed correlation without any direct workshop-causes- promotion relationship. The candidate who identifies even one such confounder demonstrates that they have applied the trained habit of pausing on correlational findings. The candidate who accepts the causal interpretation without identifying confounders has skipped the step that distinguishes trained from untrained respondents.

The fix to the underlying study design is to randomize workshop attendance — assign half the eligible employees to attend and half not to, and compare promotion rates in the two groups. Randomization equalizes the confounder distribution across the two groups (in expectation), isolating the workshop’s direct effect. The item does not require the candidate to design the randomized study, but it does require recognition that the observational data alone does not support the causal claim.

What the wrong answers reveal

The distractors map to canonical causal-reasoning failure modes:

“Yes, the data supports the causal claim because the effect is large (2x rate).” This response represents the effect-size fallacy — treating large correlations as more likely to be causal than small correlations. Effect size and causal status are independent properties; large confound-driven correlations are common and easy to find.
“Yes, the data supports the causal claim because the sample size is large.” This response represents the sample-size fallacy — treating large samples as reducing the confounding problem. Large samples reduce random noise but do not address systematic confounding; a large-N observational study produces precise estimates of a confounded correlation, not unconfounded estimates.
“No, but only because correlation never proves causation.” This response signals partial competence: the respondent has internalized the slogan but not the underlying mechanism. The right answer requires identifying a specific plausible confounder, not just citing the slogan; randomized data CAN prove causation (within standard inferential limits), and a strong candidate distinguishes the conditions under which causal inference is and is not supported.
“Yes, the temporal sequence (workshop before promotion) establishes causation.” This response represents the post hoc fallacy — treating temporal precedence as sufficient for causation. Temporal precedence is necessary for causation but not sufficient; confounders that precede both variables can produce the observed temporal pattern.

How the sample test scores you

In the AIEH 5-item cognitive-reasoning sample test, this item contributes one of five datapoints aggregated into the single cognitive_reasoning score via the W3.2 normalize-by-count threshold. Graded scoring with full credit for both rejecting the causal claim AND naming a plausible confounder, partial credit for rejecting the claim without naming a confounder, and zero credit for accepting the causal claim regardless of justification.

Data Notice: Sample-test results are directional indicators only. Confounder-identification skill has high item-specific variance because trained respondents apply the same pausing habit to all observational findings while untrained respondents may catch some confounders and miss others. For a verified Skills Passport credential, take the full 50-item assessment.

See the scoring methodology for how cognitive-reasoning scores map onto the AIEH 300–850 Skills Passport scale.

Directed acyclic graphs (DAGs). Pearl’s formal apparatus for representing causal structure as a graph of variables and arrows. DAGs make confounders visually apparent: a confounder is a variable with arrows into both the proposed cause and the proposed effect. AIEH cognitive-reasoning items do not require DAG construction but the DAG framework is the underlying mental model.
Randomized controlled trials (RCTs). The gold-standard design for causal inference because randomization equalizes confounder distributions across treatment groups. When RCTs are infeasible, quasi-experimental designs (regression discontinuity, instrumental variables, difference-in-differences) approximate the randomization benefit under specific assumptions.
Selection bias. A specific confounder family in which the selection mechanism into the analyzed sample itself depends on both the proposed cause and the proposed effect. The optional-workshop scenario is a classic selection-bias case: who self-selected into attending is correlated with who would have been promoted anyway.
Schmidt & Hunter 1998 cognitive-ability validity. The meta-analytic evidence that cognitive-ability measures predict job performance, with the causal-reasoning component contributing meaningfully to validity in observational-data-interpretation roles.

For broader context on cognitive-ability validity in hiring, see the cognitive ability in hiring overview, the skills-based hiring evidence page, and the hiring loop design guide for how causal-reasoning items fit into a complete hiring loop.

Sources

Carroll, J. B. (1993). Human Cognitive Abilities: A Survey of Factor-Analytic Studies. Cambridge University Press.
Pearl, J. (2009). Causality: Models, Reasoning, and Inference (2nd ed.). Cambridge University Press.
Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology. Psychological Bulletin, 124(2), 262–274.
Stanovich, K. E. (2009). What Intelligence Tests Miss: The Psychology of Rational Thought. Yale University Press.

What this question tests

Why this is the right answer (concrete worked example)

What the wrong answers reveal

How the sample test scores you

Related concepts

Sources

Try the question yourself