Your AI Is Already Being Trained. The Question Is by Whom.

Every claims override, underwriting exception, and appeal reversal is a feedback signal. Most carriers have no idea what those signals are teaching their models.

By James W. Moore | InsuranceIndustry.AI

Executive Summary / Key Takeaways

Any insurance carrier that has connected a large language model to operational decision-making is already generating AI training signals through normal daily activity, whether or not it has a formal program to manage them.
The people whose judgment should shape those signals are senior underwriters, experienced adjusters, and compliance staff with deep institutional knowledge. They are the same people currently retiring in record numbers.
Even well-designed feedback programs mathematically encode unintended patterns alongside intended ones, according to recent preliminary research. Carriers cannot fully control what models learn from human feedback.
None of this is an argument against AI. Human decision-making carries the same structural noise at far greater scale, and with no mechanism to detect or correct it.
The strategic question is not whether to train your AI on operational feedback. That is already happening. The question is whether you are managing it deliberately.

You Are Already Doing This

Here is a scenario familiar to anyone who has run a claims operation.

A senior examiner reviews an AI-generated denial recommendation, decides the policy language actually supports a partial payment, and overrides the decision. The correction gets logged. The workflow moves on. Nobody calls it training.

But if a large language model is embedded in that workflow, that override is a training signal. The correction is feedback. The model is learning what the senior examiner would have done, and adjusting accordingly.

This process has a technical name: Reinforcement Learning from Human Feedback, or RLHF. In an AI research lab, it is used to align a model with broad human preferences. In an insurance company, it becomes something more specific: a continuous loop where institutional judgment, captured through operational decisions, gets absorbed into how the model behaves.

Most carriers running AI systems in claims, underwriting, or customer service are doing a version of this whether they have designed it or not. Call it shadow RLHF. Every adjuster override, every underwriting exception, every compliance escalation carries a signal about what the organization considers correct. The model is listening. The question is whether anyone is paying attention to what it hears.

The Talent Problem Inside the Technical One

If RLHF is fundamentally about encoding institutional judgment, then the people generating the feedback matter as much as the technology consuming it.

This is where the timing becomes difficult. The institutional knowledge most worth encoding is concentrated exactly where the industry is losing it fastest.

According to research from The Jonus Group, the insurance industry will lose roughly 361,000 workers to retirement in the next five to ten years, with another million in the 55-to-64 age bracket following soon after. The Bureau of Labor Statistics has documented similar trends, projecting the loss of around 400,000 workers through attrition by 2026. One in four underwriters is currently over 50. The median age across insurance carriers now sits at 44, compared to 42.2 for the broader U.S. workforce, and the gap is widening.

The deeper issue is what retires with them. A study at Convex, a London-based specialty carrier, put it plainly: when researchers observed what experienced underwriters actually do when evaluating risk, what those underwriters described as five steps turned out to be fifteen nuanced processes. The judgment is largely subconscious, built over decades, and not written down anywhere.

That tacit knowledge is exactly what a model connected to operational decisions will absorb, for better or worse, from whoever is reviewing its outputs. The carriers whose senior underwriters are active participants in AI feedback loops are encoding hard-won expertise. The carriers whose senior underwriters are two years from retirement, with no structured knowledge transfer in place, are encoding something else: the decisions of whoever is left to review the queue.

This reframes a workforce planning problem most carriers already recognize as also an AI governance problem they may not have considered yet. Who is qualified to train your model? Are you developing a bench of those people? And if the answer to both questions is unclear, what is the model currently learning from the feedback it is receiving?

What Gets Encoded That You Didn’t Choose

There is a harder layer still. Even carriers that get the people right, with the right experts deliberately providing feedback, cannot fully control what the model learns.

A preprint published on arXiv in April 2026 by researcher Vishal Rajput at KU Leuven argues, from a mathematical standpoint, that supervised fine-tuning, including RLHF, necessarily encodes spurious correlations alongside intended signals. The paper is preliminary and has not yet completed peer review, and should be weighed accordingly. But its core argument is consistent with what practitioners have observed empirically: models learn patterns from human feedback that the humans themselves would not consciously choose to teach.

In insurance terms, that means the model is learning not just a senior underwriter’s risk judgment, but also time-of-day patterns in how that underwriter reviews files. It is learning subtle preferences for account types that have nothing to do with risk. It is learning the effects of a heavy caseload on Friday afternoon. None of those patterns are in the training specification. All of them are in the feedback.

Vendors pitching RLHF as the path to perfect alignment with carrier-specific standards are offering something more limited than they are describing. Alignment is real and achievable. Complete control over what gets encoded is not. That distinction belongs in every conversation about AI deployment.

The Argument You Are Already Having

At this point, a reasonable executive might be thinking: if AI training inevitably absorbs patterns we didn’t intend, perhaps the more familiar risks of human judgment are preferable after all.

That argument doesn’t hold, and it is worth being direct about why.

In the previous article in this series, The Governance Problem AI Didn’t Create (But Might Actually Fix), we examined the Nobel laureate Daniel Kahneman’s noise audit of a large insurance company. Forty-eight experienced underwriters were given identical case files and asked to set premiums. Management expected roughly 10% variance between the highest and lowest quotes. The actual median variance was 55%. A second carrier came in near 60%.

Those underwriters were doing the same thing an AI system does when it encodes spurious correlations. They were absorbing irrelevant signals, weather, mood, fatigue, the last three accounts they handled, and allowing those signals to influence decisions. The difference is that nobody was measuring it, nobody could audit it, and there was no mechanism to correct it at scale.

A LinkedIn comment circulating in industry circles recently made this point with some bluntness: people complain about AI needing training and making mistakes, but plenty of humans are nearly untrainable, carry their own unexamined assumptions, and make decisions that would not survive scrutiny. The commenter had a point. The baseline for comparison is not perfect human judgment. It is the deeply imperfect, largely invisible human judgment that currently runs the book.

AI’s training flaws are measurable. Drift in underwriting patterns can be detected before it becomes systemic. Bias in claims outcomes can be flagged in real time. Feedback signals can be adjusted when problems emerge. None of that is true of the senior underwriter who has been making decisions a certain way for thirty years, and whose reasoning has never been audited.

The question is not which system is free of flawed pattern encoding. Neither is. The question is which system gives you visibility into the patterns and the ability to act on what you find.

What Deliberate Looks Like

The carriers that navigate this well are not necessarily the ones with the most sophisticated AI infrastructure. They are the ones who recognize an uncomfortable reality early: connecting a large language model to operational decisions creates a learning system. Managing that system is a governance obligation, not an IT project.

Regulators are beginning to see it the same way. The NAIC’s AI Systems Evaluation Tool pilot is currently running across 12 states, designed to help examiners assess insurer AI governance programs during market conduct examinations. The results will inform whether a formal model law follows. Carriers that cannot explain what their operational feedback is teaching their AI systems will be poorly positioned when those examiners arrive.

The foundational work is not technical. It is organizational. Which staff are currently providing feedback to AI systems? What institutional knowledge are they bringing to that role? What happens to that feedback quality as senior talent retires? And is there a structured effort to capture and systematize expertise before it walks out the door?

Most carriers have not framed the workforce challenge in these terms. Whether they should is a question worth sitting with.

Action Items for Insurance Leaders

Map your operational feedback loops. Identify every point in claims, underwriting, and compliance where staff are reviewing, overriding, or approving AI-generated outputs. Each is a training signal. Know what you are generating.
Assess the expertise behind your feedback. The quality of what your AI learns is only as good as the judgment of the people correcting it. Evaluate whether the right people are in those roles, and whether that will still be true in three years.
Separate productivity from training. A staff member using AI to draft a coverage letter is using a productivity tool. A staff member reviewing and correcting AI-generated decisions is participating in model training. The two require different governance.
Build knowledge transfer into your AI strategy. If your most experienced underwriters and adjusters retire without their judgment being systematized, the feedback quality gap will compound over time. Succession planning and AI training are now the same problem.

Sources

Kahneman, D., Sibony, O., & Sunstein, C.R. (2021). Noise: A Flaw in Human Judgment. Insurance Thought Leadership review
Quillette. “Noise: A Flaw in Human Judgment — A Review.” April 2022. Quillette
The Jonus Group. “Insurance Talent: Why 1.4 Million Retirements Will Reshape the Industry.” October 2025. The Jonus Group
InsuranceNewsNet. “Insurance industry retirement exodus creating a talent gap.” July 2025. InsuranceNewsNet
Slayton Search. “The Insurance Industry Retirement Crisis.” February 2026. Slayton Search
RSM US. “Skills gap in insurance industry’s aging workforce is a growing concern.” October 2023. RSM
Rajput, V. “Supervised Learning Has a Necessary Geometric Blind Spot: Theory, Consequences, and Minimal Repair.” Preprint, April 2026 (not yet peer-reviewed). arXiv
Fenwick & West LLP. “Tracking the Evolution of AI Insurance Regulation.” February 2026. Fenwick
WaterStreet Company. “What the NAIC Model Bulletin Means for Insurance AI.” April 2026. WaterStreet
National Association of Insurance Commissioners. “Model Bulletin: Use of Artificial Intelligence Systems by Insurers.” NAIC

Check out our three-part series on AI in Insurance Governance

Subscribe to AI Insights Newsletter

AI Disclaimer: This blog post was created with assistance from artificial intelligence technology. While the content is based on factual information from the source material, readers should verify all details, pricing, and features directly with the respective AI tool providers before making business decisions. AI-generated content may not reflect the most current information, and individual results may vary. Always conduct your own research and due diligence before relying on information contained on this site.

Your AI Is Already Being Trained. The Question Is by Whom.

Every claims override, underwriting exception, and appeal reversal is a feedback signal. Most carriers have no idea what those signals are teaching their models.

You Are Already Doing This

The Talent Problem Inside the Technical One

What Gets Encoded That You Didn’t Choose

The Argument You Are Already Having

What Deliberate Looks Like

Recent Posts

CATEGORIES

Your AI Is Already Being Trained. The Question Is by Whom.

Every claims override, underwriting exception, and appeal reversal is a feedback signal. Most carriers have no idea what those signals are teaching their models.

You Are Already Doing This

The Talent Problem Inside the Technical One

What Gets Encoded That You Didn’t Choose

The Argument You Are Already Having

What Deliberate Looks Like

Related Posts

Recent Posts

CATEGORIES