Do AI interview assessments work? What the validity evidence shows
By UnchartedCareer
Share
Partly, and less well than the sales page implies. The strongest independent research finds that AI scoring of your personality has no reliable link to how you later perform on the job, and even the best number the industry can show, HireVue's own reported validity of r-bar = .24, trails a human-run structured interview at r-bar = .32. An AI interview assessment can pick up some signal from what you say, much less from how you come across, and none of it predicts your future performance cleanly enough to trust on its own. So the scorer you might want to game cannot reliably tell whether you would do the job, which makes being genuinely good on camera the only move that travels.
By UnchartedCareer
Last updated: July 2026
Are AI interview assessments even common?
Common enough to prepare for, not universal. Recruiting is the single HR function with the highest AI adoption, at 27 percent of US organizations in an SHRM survey fielded December 2025 (SHRM, 2026). On the candidate side the tooling spread even faster, with 39 percent of job seekers saying they used AI somewhere in the application process in a Gartner survey from late 2024 (Gartner, 2024). So you may well face a recorded interview that an algorithm scores first, or a human who leans on an AI summary, and you often will not be told which on the day. That uncertainty is exactly why whether these tools work matters to you.
Do AI interview assessments actually predict job performance?
Mostly not, and the field says so in its own journals. In a 2024 study by Stevenor and colleagues in the International Journal of Selection and Assessment, AI-scored interview personality assessments had weak, statistically nonsignificant relations with later supervisor ratings of job performance. The revealing detail is the sample. That result came from a Study 2 group of just 25 people (Stevenor et al., 2024), with models trained on verbal data and interviewer ratings from low-stakes interviews and then applied to high-stakes ones. Small sample, no demonstrated link to on-the-job performance.
The most-cited prior work did not even test that link. Hickman and colleagues in 2022, the automated video personality study summarized by SIOP, the main US body for industrial and organizational psychology, never examined whether its assessments predict job performance at all (Hickman et al., 2022, via SIOP). It found validity was better when models were trained on interviewer observations than on candidate self-reports, which showed little reliability or validity. SIOP's read is that organizations should proceed cautiously with AI personality assessment given mixed findings, and that AI hiring assessments must meet the same validity standard as traditional tests, including showing that scores relate to future job performance (SIOP, 2023).
What does HireVue's own research show?
The best defensible number comes from the vendor, and it is modest. HireVue's own researchers report that automated video competency assessments hit an uncorrected, sample-weighted validity of r-bar = .24 with job performance across five US samples totaling 1,124 people, in a peer-reviewed 2024 paper whose authors were mostly HireVue employees (Liff et al., HireVue, 2024). That validity ranged from .20 for maintenance workers to .27 for call-center workers. The same authors place it below, though comparable to, human-rated structured interviews at an uncorrected r-bar = .32.
Read that honestly and it cuts both ways. Competency scoring of what you actually say has some real signal, which is the concession that makes the rest of this credible. But the number is uncorrected, it comes from the company selling the tool, and even at its best it trails a trained human asking you structured questions. A validity that tops out near r-bar = .24 (HireVue, 2024) is not a lie detector for talent. It is a weak-to-moderate correlation.
Competency scoring or personality scoring: which is which?
The two get sold as one product and perform very differently. Competency scoring rates the substance of your answers, what you did and how you solved a problem, and that is where the defensible r-bar = .24 signal sits (HireVue, 2024). Personality scoring rates how you come across, your traits inferred from tone and delivery, and that is where the independent evidence goes cold, with nonsignificant links to performance in Stevenor et al. (2024) and no performance test at all in Hickman et al. (2022). If a system claims to read your character from a two-minute clip and predict your work, the research does not back it yet.
So should you try to game the AI scorer?
No, because there is no stable model to game. A 2025 editorial in the International Journal of Selection and Assessment flags automated-scoring efficacy as an open research question rather than settled science, and concludes that these video interviews still need more work on validity evidence and equitable access (International Journal of Selection and Assessment, 2025). You cannot reverse-engineer a scoring model that the field itself calls unsettled, and trying to perform for one usually reads worse to the human who watches next.
The durable play is the boring one. Answer in structured specifics, the situation you faced and what you actually did about it, in your own words from memory, and hold that framing when the follow-up lands. That reads as competence to a human reviewer and gives a model the cleanest signal it can score. Confidence is the trainable part, and it trains through reps, not through tricks.
How should you prepare instead?
Be defensibly good on camera, and build it the slow way. Open your laptop camera and answer one hard question out loud with no notes, something like "tell me about a time you owned something that broke and what you did." Then watch it back and find the three tells that survive into any recorded interview: where you reached for filler, where your pace ran away, and the moment your eyes left the lens. Redo it until the follow-up stops rattling you. That is the substance a human rewards and the cleanest signal an algorithm can score.
A recording will show you the tells, but it will not ask the question you did not see coming, and the unseen follow-up is where most answers fall apart. When you want that follow-up thrown at you on demand, scored and repeatable, AI interview practice is built to do it, after you run the manual drill yourself.
Career growth shouldn't be a luxury. Start free.
World-class AI career tools. No contracts, no fine print. Try everything with a 7-day free trial.
Get started freeShare this article