In 1979, the US Army conducted one of the largest studies of job performance prediction ever done. They tested thousands of recruits across dozens of assessment methods — structured tests, supervisor ratings, interviews, work samples — and tracked actual job performance over years. The conclusions, along with decades of subsequent industrial-organizational psychology research, point to a finding that the hiring industry has spent the subsequent forty years quietly ignoring:

The unstructured job interview — the kind most companies still use most of the time — predicts job performance with a validity coefficient of roughly 0.2 on a scale where 1.0 is perfect prediction. To put that differently: a typical interview predicts job performance only slightly better than a coin flip.

What the Research Actually Shows

The landmark meta-analysis here is Schmidt and Hunter's 1998 paper in Psychological Bulletin, which synthesized eighty-five years of research on selection methods. Their findings on predictive validity for various methods:

Work sample tests: 0.54 validity
Cognitive ability tests: 0.51 validity
Structured interviews: 0.51 validity
Job knowledge tests: 0.48 validity
Unstructured interviews: 0.38 validity
Years of experience: 0.18 validity
Reference checks: 0.26 validity
Years of education: 0.10 validity

The hiring processes that most companies actually use — unstructured conversations, resume reviews focused on credentials and job titles, gut-feel assessments based on interview performance — are among the worst predictors of actual job performance in the research. The methods that predict best — work samples, structured interviews with defined scoring criteria — are used far less.

Why Gut Instinct Is So Unreliable

The psychological mechanisms behind interview failure are well-documented:

The halo effect. A positive impression formed on one dimension (confident presenter, went to a top college, has an impressive title) spreads to other dimensions regardless of evidence. We rate people highly across the board when we like one thing about them, and poorly across the board when we don't.

Affinity bias. Interviewers rate candidates higher who are similar to themselves — same educational background, communication style, professional history, in-group markers. This is not a deliberate discrimination strategy; it happens automatically and below conscious awareness.

The "like me" problem compounds over time. Teams that hire primarily through cultural "fit" — often code for "comfortable for the existing team" — become progressively more homogeneous. The homogeneity feels like evidence that the culture is working. It's often evidence that the selection mechanism is narrowing.

Snap judgment research. Nalini Ambady's work on "thin slices" of behavior showed that first impressions formed in less than 30 seconds are remarkably persistent and difficult to update with subsequent information. An interviewer who forms a negative first impression will tend to ask questions that confirm it, and to interpret ambiguous answers negatively. One who forms a positive first impression does the reverse.

We use interviews primarily to confirm the impressions we formed before the interview started. This is not interviewing — it's validation theater.

The Specific Failure Mode for Technical Roles

For technical hiring, the disconnect is particularly sharp.

The standard software engineering interview circuit — LeetCode algorithmic problems, whiteboard coding, abstract data structures questions — tests for a specific kind of mathematical reasoning that is genuinely important for some roles and largely irrelevant for most. A staff engineer maintaining a complex financial system needs deep systems thinking, strong code review judgment, ability to mentor, and understanding of production operations. How quickly they can reverse a linked list on a whiteboard correlates weakly with any of these.

FAANG companies developed this interview culture when they were primarily building novel algorithmic products. It spread to companies that aren't building novel algorithmic products, partly by mimicry and partly because the interview format lets companies feel like they're being rigorous.

Research from Triplebyte (now defunct but with published data) found that performance on common technical interview questions was a poor predictor of performance on actual work simulations for engineers — precisely because the interviews test narrow algorithmic skill while the work tests entirely different things.

What Actually Predicts Performance

The evidence converges on a few findings:

Structured interviews with behaviorally-anchored questions — specifically asking for past behavior examples rather than hypothetical responses — significantly outperform unstructured conversations. The question "tell me about a time you had to deliver difficult feedback to a peer" generates more predictive information than "how would you handle a difficult colleague?"

Work sample tests are the single most predictive method available in most contexts. Show someone a representative piece of actual work, have them do it, evaluate the output. This is obvious in retrospect, rarely practiced, and almost always results in better hires when implemented.

Structured scoring with defined criteria, evaluated independently before discussion. Group debriefs where the most senior person speaks first contaminate individual assessments. Independent scoring, then aggregation, produces better calibration.

Trials and paid projects for roles where a short-term test is possible. A meaningful portion of the knowledge work required in most roles can be simulated in a few hours. The information generated is far more predictive than hours of conversational interviewing.

Why Companies Don't Change

The persistence of bad hiring practices, in the face of abundant evidence that they're bad, is itself a psychological phenomenon worth understanding.

The unstructured interview feels more like information gathering because it's open-ended. Hiring managers believe they're seeing something real about the candidate — and they are seeing something, just not something that predicts performance.

Implementing structured interviews requires upfront work (defining criteria, writing questions, training interviewers to score consistently) that feels like overhead. It also reduces the hiring manager's sense of control — which people resist even when the outcome is objectively better.

The most direct path to better hiring is also the most uncomfortable one: accepting that your instincts about candidates are frequently wrong, and building a process that doesn't rely on them.

---

Found this useful? Share it.

Written by

HireMinds Team

Content Team

The HireMinds editorial team writes about AI in hiring, recruitment trends, and the future of talent acquisition.

Why We're Terrible at Predicting Who Will Be Good at a Job

What the Research Actually Shows

Why Gut Instinct Is So Unreliable

The Specific Failure Mode for Technical Roles

What Actually Predicts Performance

Why Companies Don't Change

Related Articles

The Bias Problem in Hiring: Can AI Actually Fix It?

What Burnout Actually Feels Like (It's Not What Most Articles Say)

Attention Is the New Currency. Here's How Work Is Stealing It.

Hire smarter with AI-powered talent intelligence