Who’s Really Making That Underwriting Decision? Wharton Research Reveals AI’s Invisible Takeover of Human Judgment

By James W. Moore | InsuranceIndustry.AI

Executive Summary

New research from the Wharton School introduces “cognitive surrender,” a phenomenon where decision-makers adopt AI-generated outputs without critical evaluation, even when those outputs are wrong. Across three experiments with nearly 10,000 individual reasoning trials, participants followed faulty AI recommendations 80% of the time and reported higher confidence in their answers after consulting AI, regardless of whether the AI was accurate. For an industry where every bind, every reserve, and every claim payment carries legal and financial consequences, this isn’t an academic curiosity. It’s an operational risk hiding in plain sight.

The Research Your Competitors Haven’t Read Yet

In January 2026, Wharton researchers Steven D. Shaw and Gideon Nave published a working paper that should be required reading for every insurance executive deploying AI. Titled Thinking — Fast, Slow, and Artificial, the paper extends Nobel laureate Daniel Kahneman’s famous dual-process model of cognition (System 1 for fast, intuitive thinking; System 2 for slow, deliberative reasoning) by adding a third system: System 3, artificial cognition performed by AI.

The framework they call “Tri-System Theory” treats AI not as a passive tool but as a functional cognitive agent that can supplement human reasoning, suppress it, or replace it entirely. The most troubling finding? When AI is available, people don’t just use it. They stop thinking.

Shaw and Nave call this “cognitive surrender”: the behavioral tendency to defer judgment, effort, and responsibility to AI output, particularly when that output arrives fluently, confidently, and with minimal friction. It is distinct from “cognitive offloading,” where a professional strategically delegates a specific task to an external tool (using a calculator, for example) while retaining active oversight. Cognitive surrender is what happens when the oversight itself disappears.

The Numbers That Should Keep You Up at Night

The experimental design was elegant and devastating. Participants solved reasoning problems with optional access to a ChatGPT-based AI assistant. Crucially, the researchers manipulated the AI’s accuracy using hidden prompts, so that on some trials the AI gave correct answers and on others it confidently delivered wrong ones.

The results across three studies (N = 1,372 participants):

Participants consulted the AI on more than 50% of trials when it was available.
When they consulted it, they followed its recommendation roughly 80% of the time, even when the AI was wrong.
Accuracy rose 25 percentage points when the AI was correct and dropped 15 points when it was faulty, relative to participants working without AI.
Confidence increased by nearly 12 percentage points when AI was available, regardless of whether the AI was giving them right or wrong answers.
The people most susceptible to cognitive surrender were those with higher trust in AI and lower analytical engagement. The people most resistant were those with higher fluid intelligence and a stronger intrinsic motivation to think carefully.

Perhaps most striking: even when researchers gave participants financial incentives for correct answers and immediate feedback after each question, cognitive surrender persisted. Participants got better at overriding bad AI advice under those conditions, but the fundamental pattern held. The AI’s accuracy, not the human’s judgment, remained the dominant predictor of outcomes.

Why This Matters for Insurance

Now translate those findings to the decisions that define your business.

An underwriter reviews an AI-generated risk assessment on a $3 million commercial property account. The model flags it as favorable. The underwriter glances at the recommendation, confirms it aligns with general expectations, and binds the risk. That underwriter just engaged System 3 and, if the Wharton data generalizes (and there’s no reason to think insurance professionals are immune), there’s roughly a four-in-five chance that if the AI was wrong, the underwriter followed it anyway.

A claims adjuster uses an AI triage tool that recommends a reserve amount based on historical patterns and document analysis. The reserve looks reasonable. The adjuster approves it. Three months later, the claim develops in ways the model didn’t anticipate, and the reserve proves inadequate by a factor of three. Did the adjuster make that reserve decision, or did they merely ratify it?

This is precisely the concern that Upendra Belhe, Ph.D., and Marty Ellingsworth raised in their recent Insurance Innovation Reporter article, “Why AI Breaks in Insurance Production.” Belhe, a former Chief Analytics Officer at Gen Re and Chubb, and Ellingsworth, a veteran analytics executive with experience at USAA and Verisk, argue that governance becomes real at the point of decision, not in the framework. Their core thesis: once AI shifts from “analysis” to “influence,” the burden of proof shifts with it. Organizations invest heavily in data integrity and platform modernization but leave the harder problem undone: unclear decision rights, ad hoc escalation, and weak evidence capture. In insurance, where underwriting and claims decisions are routinely contested months or years later, that mismatch is where AI credibility breaks.

The Wharton cognitive surrender data gives Belhe and Ellingsworth’s operational argument a behavioral science foundation. They warned that prior decision-support systems assumed advice, not action; tools surfaced options, and humans decided. Today’s AI systems don’t just inform judgment. They shape it, and at machine speed. If the human at the point of bind, pay, or reserve is cognitively surrendered to the AI’s recommendation, then the entire accountability framework that insurance regulation is built on starts to erode.

The Deskilling Parallel Is Already Playing Out in Medicine

If this sounds theoretical, consider that the deskilling effect Shaw and Nave describe has already been documented in clinical medicine. A 2025 Lancet Gastroenterology & Hepatology study found that physicians who routinely used AI-assisted colonoscopy tools showed a statistically significant decline in their ability to detect precancerous growths without AI. Their adenoma detection rate dropped from 28.4% to 22.4% after three months of AI exposure. The researchers called it the first real-world clinical evidence that AI use can erode professional competence.

Insurance isn’t colonoscopy, but the cognitive mechanism is identical. When professionals repeatedly defer to AI recommendations, the analytical muscles they’ve spent years developing begin to atrophy. An underwriter who stops critically evaluating risk because the model “usually gets it right” is on the same trajectory as the endoscopist who stops looking as carefully because the algorithm usually spots the polyp.

The Regulatory Collision Course

Here’s where this gets operationally urgent. The NAIC’s Model Bulletin on AI use, now adopted by more than half of U.S. states, explicitly requires that AI-assisted decisions comply with existing legal standards, including unfair trade practice and unfair claims settlement laws. The bulletin’s governance framework demands human oversight, documentation of AI-related decision processes, and internal controls for AI system outputs. A draft model law on third-party AI oversight is anticipated in 2026, potentially including licensing requirements for AI vendors serving insurers.

These regulatory frameworks assume that a human is actually exercising judgment. They assume the “human in the loop” is genuinely in the loop, not cognitively surrendered to System 3. Shaw and Nave’s research suggests that assumption may be wrong for a significant percentage of AI-assisted decisions. If a regulator examines an underwriting file and finds that the human decision-maker simply ratified an AI recommendation without independent analysis, the “human oversight” that the carrier documented in its AI governance program starts to look like a compliance fiction.

Colorado’s AI Act, New York’s DFS Circular Letter 2024-7, and the NAIC’s AI Systems Evaluation Tool (expected to be deployed in market conduct examinations in 2026) all presuppose meaningful human review. Cognitive surrender is the gap between what those frameworks require and what actually happens when a busy professional is presented with a confident, well-formatted AI recommendation at 4:30 on a Friday afternoon.

Strategic Implications

For Carriers: The Shaw and Nave research identifies specific, measurable risk factors for cognitive surrender: high trust in AI, low analytical engagement, and time pressure. These are exactly the conditions present in high-volume underwriting and claims operations. Carriers need to move beyond generic “human in the loop” policies and design operational workflows that actively counteract surrender. The Wharton data shows that incentives plus feedback significantly improved override rates for bad AI recommendations. Translated to insurance operations, this means structured second-look protocols, performance metrics that reward independent judgment (not just throughput), and systematic audit programs that test whether humans are genuinely evaluating AI output or merely confirming it.

For Regulators: The cognitive surrender research provides an empirical basis for what regulators have likely suspected but couldn’t prove: that “human oversight” of AI decisions may be illusory in practice. As the NAIC moves from model bulletins toward enforceable model laws, the question of what constitutes meaningful human review deserves sharper definition. Simply requiring a human to click “approve” does not constitute oversight if the Wharton data is correct about how humans actually interact with AI recommendations.

For Agencies and Wholesalers: Cognitive surrender doesn’t require enterprise-grade AI. It happens with ChatGPT. If your producers or CSRs are using general-purpose AI tools to draft coverage analyses, evaluate submissions, or summarize policy language, the same dynamics apply. The question isn’t whether your people are using AI. It’s whether they’re still thinking after they use it.

What To Do Now

Within 30 days: Audit your current AI-assisted decision workflows. Identify the specific points where a human is supposed to exercise independent judgment. Then ask honestly: is that actually happening, or are your people confirming AI output rather than evaluating it?

Within 90 days: Implement structured override protocols for high-stakes AI-assisted decisions. The Wharton research shows that feedback loops and accountability mechanisms measurably reduce cognitive surrender. Build them into your underwriting and claims processes.

Within 6 months: Align your AI governance documentation with regulatory expectations that will increasingly define “human oversight” with operational specificity. The carriers that get ahead of this will have a defensible position when examiners start using the NAIC’s AI evaluation tools. The ones that wait will be explaining why their “human in the loop” was just a rubber stamp.

The Bottom Line

Shaw and Nave’s paper ends with a question that should resonate with every insurance executive: “What happens when our judgments are shaped by minds not our own?”

For insurance, the stakes of that question are measured in binding authority, reserve adequacy, claims outcomes, regulatory compliance, and E&O exposure. Cognitive surrender isn’t a future risk. It’s a present condition, operating in your underwriting units and claims departments right now, and the Wharton data tells us it’s happening to roughly four out of five people who consult AI and get a wrong answer.

The carriers that treat this as a governance priority, not just a technology issue, will be the ones that capture AI’s genuine benefits without inheriting its hidden liabilities. The ones that don’t will discover the hard way that “the AI recommended it” is not a defense that regulators, courts, or policyholders are prepared to accept.

Sources:

Shaw, S. D., & Nave, G. (2026). Thinking — Fast, Slow, and Artificial: How AI is Reshaping Human Reasoning and the Rise of Cognitive Surrender. Working Paper, The Wharton School, University of Pennsylvania. SSRN
Budzyń, K., et al. (2025). Endoscopist deskilling risk after exposure to artificial intelligence in colonoscopy: a multicentre, observational study. The Lancet Gastroenterology & Hepatology, 10(10), 896–903. The Lancet
Belhe, U., & Ellingsworth, M. (2025). “Why AI Breaks in Insurance Production.” Insurance Innovation Reporter. IIReporter
NAIC Model Bulletin: Use of Artificial Intelligence Systems by Insurers (December 2023). NAIC
NAIC Big Data and Artificial Intelligence (H) Working Group — AI Systems Evaluation Tool and Model Law Discussion (2025–2026). NAIC
Fenwick & West LLP. Tracking the Evolution of AI Insurance Regulation (December 2025). Fenwick
Knowledge at Wharton. “How AI Is Reshaping Human Intuition and Reasoning” — Interview with Gideon Nave and Steven Shaw (February 2026). Knowledge at Wharton

Stay ahead of the curve. Subscribe to the InsuranceIndustry.AI newsletter for weekly analysis of how AI is reshaping insurance decision-making — before the rest of the industry catches up.

Subscribe to AI Insights Newsletter

AI Disclaimer: This blog post was created with assistance from artificial intelligence technology. While the content is based on factual information from the source material, readers should verify all details, pricing, and features directly with the respective AI tool providers before making business decisions. AI-generated content may not reflect the most current information, and individual results may vary. Always conduct your own research and due diligence before relying on information contained on this site.

Who’s Really Making That Underwriting Decision?

Who’s Really Making That Underwriting Decision? Wharton Research Reveals AI’s Invisible Takeover of Human Judgment

Executive Summary

The Research Your Competitors Haven’t Read Yet

The Numbers That Should Keep You Up at Night

Why This Matters for Insurance

The Deskilling Parallel Is Already Playing Out in Medicine

The Regulatory Collision Course

Strategic Implications

What To Do Now

The Bottom Line

Recent Posts

CATEGORIES

Who’s Really Making That Underwriting Decision?

Who’s Really Making That Underwriting Decision? Wharton Research Reveals AI’s Invisible Takeover of Human Judgment

Executive Summary

The Research Your Competitors Haven’t Read Yet

The Numbers That Should Keep You Up at Night

Why This Matters for Insurance

The Deskilling Parallel Is Already Playing Out in Medicine

The Regulatory Collision Course

Strategic Implications

What To Do Now

The Bottom Line

Related Posts

Recent Posts

CATEGORIES