Integrity Signals Across Vendors — 2026 Methodology Comparison

Assessment-integrity vendors emphasize different signal categories — some focus on behavioral telemetry (copy-paste detection, browser-switching), some on biometric monitoring (eye-tracking, face presence), some on content comparison (plagiarism scoring, similarity to reference solutions), and some on psychometric anomaly detection (response patterns, timing distributions). The vendor-by-vendor variation in signal emphasis can make integrity decisions hard to compare across platforms; the underlying methodological question is which signals defensibly support which decisions.

This comparison is for assessment-program owners, hiring-loop designers, and integrity-vendor evaluators who need to understand the signal landscape independent of any specific vendor’s marketing positioning. The verdict is conditional; no single signal is sufficient, and signal combination matters more than any individual signal’s sophistication.

Data Notice: Reported false-positive rates and signal sensitivity vary substantially across implementations and populations; the descriptions reflect the dominant methodological patterns in the published literature.

What each approach is

Copy-paste detection monitors clipboard activity during the assessment session, flagging events where text is pasted into response fields rather than typed. Variants include keystroke-dynamic analysis (typing rhythm signatures) and paste-source tracking (whether pasted content came from another browser tab or an external application). The signal is most informative for free-response items where typed responses are expected; less informative for multiple-choice or selection items.

Browser-switch / window-focus monitoring detects when the candidate’s focus moves away from the assessment window — to other browser tabs, other applications, or loss of focus entirely. Variants include duration tracking (brief vs sustained focus loss) and pattern analysis (repeated short switches vs single long switches). The signal is sensitive to delivery environment and personal behavior patterns, producing notable false-positive rates in real-world deployments.

Eye-tracking / gaze analysis uses webcam input to estimate candidate gaze direction and detect off-screen looking, multiple faces, or absence of the registered candidate. Variants range from coarse face-presence detection to detailed gaze-direction tracking. Eye-tracking signals are sensitive to lighting, camera quality, and environmental factors that vary across socioeconomic contexts. Karim, Kaminsky, and Behrend (2014) documented substantial candidate-experience and false-positive concerns.

Plagiarism / similarity scoring compares candidate responses against reference corpora — known solutions, internet sources, prior-candidate response databases, and in some cases AI-detection models. Variants include n-gram-based similarity, semantic-similarity (embedding- based), and structural similarity (for code submissions). The signal is informative for content theft but produces false positives on common idioms, standard solution patterns, and convergent reasoning.

Response-pattern anomaly detection uses psychometric or statistical models to flag response patterns that are unusual relative to the calibration population — anomalous timing distributions, item-response patterns inconsistent with the candidate’s overall ability estimate, or suspicious score patterns across item difficulty. The signal is sensitive to legitimate variation (test anxiety, strategic guessing) and requires careful threshold calibration. Sackett (2002) and adjacent work in counterproductive-work-behavior literature discuss the methodological foundations.

Where each one wins

Three signal-category patterns where each shines:

Free-response written assessments — copy-paste detection and plagiarism scoring. These signals directly probe the threat models most relevant to written responses (content theft, externally generated content). Eye-tracking and browser-switch signals are less informative for this format.
Multiple-choice or selection-based assessments — browser-switch monitoring and response-pattern anomaly detection. Copy-paste signals are less informative for selection items; behavioral and pattern signals are more informative. See the scoring methodology for the AIEH approach to multi-method composition.
Coding or technical assessments — copy-paste detection, browser-switch monitoring, and code- similarity scoring. The threat models for technical assessments span multiple categories; combining signals is particularly important. See skills-based hiring evidence for context on technical-assessment integrity.

Despite different mechanisms, all five signal categories share a structural gap: they detect anomalies, not intent. Every signal category produces false positives from legitimate behavior — environmental factors, candidate-individual variation, network conditions, and benign multitasking all produce signals that look indistinguishable from cheating signals at the individual-event level. Defensible integrity decisions require adjudication processes that distinguish anomaly from violation, not signal-only automated decisions.

The complementary relationship: AIEH’s portable credentials combine multiple integrity signals with explicit adjudication thresholds, human-review workflows for ambiguous events, and adverse-impact monitoring on flag rates to produce integrity decisions that are both defensible and equitable. The assessment infrastructure treats integrity signal combination as one component of overall assessment defensibility.

Common pitfalls

Five patterns recurring at organizations evaluating integrity signals:

Single-signal decisions. Treating any individual signal as sufficient evidence for disqualification produces wrongful disqualifications at the signal’s false-positive rate. Even sophisticated signals (eye-tracking, response-pattern anomaly) produce non-trivial false-positive rates on realistic candidate populations.
Threshold opacity. Many integrity vendors produce flag scores or risk scores without documenting the underlying threshold calibration. Programs deploying opaque thresholds cannot defend the resulting decisions and cannot monitor adverse impact on flag rates.
Ignoring adverse impact on flags. Multiple signal categories show documented disparate flag rates across demographic groups, environmental contexts, or device categories. Programs that monitor adverse impact only on score outcomes miss the upstream filtering effect of biased flag rates. See hiring bias mitigation.
No appeal mechanism. Integrity decisions affect candidate outcomes; defensible programs provide candidate-appeal mechanisms with human adjudication of contested flags. Programs without appeal mechanisms face legal and ethical exposure when wrongful disqualifications occur.
Treating AI-generated-content detection as reliable. AI-content-detection models (a common recent addition to plagiarism scoring) have documented reliability problems — both false positives on human-written content and false negatives on AI-generated content. Programs treating these tools as reliable produce defensibility problems.

Practitioner workflow: how to evaluate

Three practical questions for organizations choosing integrity signal combinations:

What’s the threat model for this assessment format? The threat models for written, selection, and code-based assessments are different; the signal combination should match the threat model. Programs that adopt vendor-default signal combinations without threat-model analysis often misallocate signal investment.
What false-positive rate is acceptable? Each signal category has empirical false-positive rates on realistic populations; the acceptable rate depends on the cost of wrongful disqualification (legal, reputational, candidate-experience) versus the cost of missed cheating. Programs without explicit false-positive thresholds cannot calibrate signal combinations defensibly.
What adjudication capacity exists? Multi-signal integrity systems produce adjudication workload that scales with assessment volume and flag rate. Programs without adjudication capacity cannot defensibly deploy multi-signal integrity systems. See the hiring loop design for context on integrating adjudication into the loop.

Operational considerations specific to integrity signals

Beyond the signal combination, several operational considerations affect integrity-signal deployment:

Privacy and regulatory compliance. Several signal categories (eye-tracking, biometric monitoring, content comparison against candidate-response databases) involve substantial personal-data collection that triggers GDPR, BIPA, and international privacy requirements.
Candidate consent and disclosure. Defensible integrity-signal deployment requires informed candidate consent — clear disclosure of which signals are collected, how they’re used, and what decisions they inform. Loops that deploy integrity signals without explicit consent face legal and ethical exposure.
Data retention. Integrity-signal data (recordings, telemetry) creates substantial data- retention obligations. Programs must establish retention policies that balance defensibility needs (data for adjudication and appeal) against privacy obligations.
Audit-trail quality. Defensible integrity decisions require audit trails that document which signals fired, what thresholds were applied, and which adjudication occurred. Programs with weak audit trails cannot defend contested decisions.
Vendor heterogeneity. Integrity vendors vary substantially in signal-category coverage, reported false-positive rates, and adjudication-workflow support. Programs should evaluate vendor capabilities against the signal-combination they need rather than adopting vendor-default configurations.

Migration / adoption considerations

Organizations adopting integrity signals (or moving between vendors) face substantial operational work:

Threat-model analysis. Adopting integrity signals without explicit threat-model analysis often produces signal combinations that don’t match the actual threats the assessment faces.
Adjudication-process design. Multi-signal integrity systems require human-review workflows; the workflow design (reviewer training, adjudication criteria, escalation procedures, candidate-appeal mechanisms) affects the defensibility of the resulting decisions.
Adverse-impact baselining. Integrity deployments need adverse-impact baselines established before deployment to enable monitoring of post-deployment effects on flag rates and downstream outcomes.
Communication to candidates. Integrity-signal deployment changes the candidate experience; communication updates (consent forms, technical-requirements documentation, FAQ) affect drop-out rates and candidate-experience scores.

The migration cost is substantial enough that integrity- signal changes are infrequent within established programs — typically tied to specific defensibility concerns or to major platform changes. Programs that anticipate eventual signal changes typically maintain vendor optionality through neutral data architecture — recording integrity events in a vendor-independent format that could be re-analyzed under alternative signal combinations rather than locking in a vendor-specific signal taxonomy.

Takeaway

Integrity signals operationalize different points on the detection-vs-false-positive-vs-privacy design space: each of the five signal categories produces meaningful evidence under appropriate conditions and substantial false positives under inappropriate conditions. No single signal is sufficient for defensible integrity decisions; signal combination, explicit threshold calibration, human adjudication of ambiguous events, and ongoing adverse-impact monitoring on flag rates are the design elements that distinguish defensible from indefensible integrity programs. The choice of which signals to combine depends on the assessment format, threat model, candidate-experience constraints, and regulatory environment — not on which signals individual vendors emphasize in marketing. Migration costs are substantial enough that programs should make first-time signal selection deliberately rather than treating it as a vendor-default.

For broader treatments, see assessment infrastructure, scoring methodology, hiring loop design, hiring bias mitigation, and candidate experience evidence.

Sources

Sackett, P. R. (2002). The structure of counterproductive work behaviors: Dimensionality and relationships with facets of job performance. International Journal of Selection and Assessment, 10(1-2), 5-11.
Honorton, C., & Ferrari, D. C. (1989). “Future telling”: A meta-analysis of forced-choice precognition experiments, 1935-1987. Journal of Parapsychology, 53, 281-308.
Karim, M. N., Kaminsky, S. E., & Behrend, T. S. (2014). Cheating, reactions, and performance in remotely proctored testing: An exploratory experimental study. Journal of Business and Psychology, 29(4), 555-572.
Cluskey, G. R., Ehlen, C. R., & Raiborn, M. H. (2011). Thwarting online exam cheating without proctor supervision. Journal of Academic and Business Ethics, 4, 1-7.
Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology. Psychological Bulletin, 124(2), 262-274.
Sackett, P. R., & Lievens, F. (2008). Personnel selection. Annual Review of Psychology, 59, 419-450.