Skip to content

Human Factor Podcast Season 2 Episode 026: The Readiness Illusion: Why the AI Agent Era’s Loudest Claims Outrun the Evidence

Episode 026

The Readiness Illusion: Why the AI Agent Era’s Loudest Claims Outrun the Evidence

The Agent Era Is Real. We Are Not Yet Ready for It. Both of Those Statements can Be True at the Same Time.


Host: Kevin Novak


Duration: 25 minutes


Available: June 4, 2026

🎙️Season 2, Episode 26

Episodes are available in both video and audio formats across all major podcast platforms, including Spotify, YouTube, Pandora, Apple Podcasts, and via RSS, among others.

Transcript Available Below

Episode Overview

The Readiness Illusion: Why the AI Agent Era’s Loudest Claims Outrun the Evidence

Season 2 | Solo Episode

Earlier this week, Kevin sat down to work with an AI model on what should have been a straightforward task. By the fourth round of corrections, he had given the model more context than he would have given a competent junior teammate, and the result was acceptable rather than excellent. That experience is the honest texture of using these tools, and it is almost completely missing from the conversation about AI agents.

In this episode, Kevin names the gap between what the technology actually does and what the language around it suggests it does: The Readiness Illusion. It is the assumption, encouraged by vendor stories and amplified by professional anxiety, that capability and readiness are the same thing. They are not. The distance between them is filled with trust, governance, judgment, error recovery, accountability, and the slow and expensive work of building the human and organizational capacity to know what to hand off, what to check, and what to keep close.

The evidence is sobering. MIT’s NANDA initiative reported that approximately 95 percent of enterprise generative AI pilots had not produced measurable returns. Gartner forecasts that at least 30 percent of generative AI projects will be abandoned after proof of concept, citing poor data quality, escalating costs, unclear business value, and inadequate risk controls. Apple’s 2025 research found that reasoning models often sound more confident as their accuracy drops. Decades of automation bias research show that humans overtrust confident systems, with checking that worsens as the system appears more competent.

Kevin examines where organizational trust actually lives: shared accountability, the ability to explain, and recovery. He names the measurement gap, where we track adoption rates and time saved while measuring consequence and capability with almost nothing at all. And he raises the apprenticeship problem, a foreseeable consequence of agents absorbing the routine work through which junior practitioners have always built professional judgment.

The choice is not between aggressive adoption and standing still. It is between aggressive adoption with shallow readiness and deliberate adoption with deeper readiness. The first looks faster in the first quarter. The second looks faster across the first three years.

The companion article, The Readiness Illusion, appears in the Ideas and Innovations Newsletter.

Resources:

Learn more about the Human Factor Podcast>

Subscribe to the Ideas and Innovations Newsletter> (It’s free)

Key Takeaways

1

Capability and Readiness are not the Same Thing

2

We are Asking Whether to Trust Ourselves to Know When to Trust the Agents

3

Learn Why We Measure Consequence and Capability with almost Nothing at All

Season 2, Episode 26 Transcript

Available June 4, 2026

Episode 026: The Readiness Illusion: Why the AI Agent Era’s Loudest Claims Outrun the Evidence

DURATION: 25 minutes
HOST: Kevin Novak
SHOW: The Human Factor Podcast

Kevin Novak (00:05)

I want to start this episode with a small confession. Earlier this week, I sat down to work with an AI model on what should have been a straightforward task. I gave it context. It produced an answer that looked confident and was just so wrong. I corrected it. The next answer was better and was still wrong. I corrected it again. By the fourth round, I had given the model more context than I would have given a competent junior teammate, and the result was acceptable rather than the quality I was after. That experience is not unusual. It is the texture of using these tools honestly. And it is almost completely missing from the conversation we are having about AI agents.

That is what this episode is about. The gap between what the technology actually does in the hands of individuals and organizations, and what the language around it suggests it does. I want to call that gap something specific. I am calling it The Readiness Illusion. And I want to spend the next half hour walking through what it is, why it matters, what the research actually says, and what serious adoption of AI agents looks like when we stop confusing announcement with arrival.

I’m Kevin Novak, CEO of 2040 Digital, Professor at the University of Maryland, and author of The Truth About Transformation: Leading in the Age of AI, Uncertainty, and Human Complexity, along with the Ideas and Innovations weekly newsletter.

Welcome to the Human Factor Podcast, the show that explores the intersection of humanity, technology, and transformation, along with the psychology behind transformation success. This is Season 2.

Setting the Stage

A few weeks ago, HubSpot published its vision for an open ecosystem for the agent era. It is a serious piece of corporate strategy, and I think it deserves to be taken seriously. The framing they offer is that agents should be able to run on HubSpot, meaning any agent can plug into HubSpot’s data and capabilities, and that agents should be able to run HubSpot, meaning agents can operate the platform end to end through APIs, MCP, and CLI, which are communication protocols and connections for the non-technical audience. The remote MCP server is now generally available with read access to campaigns, landing pages, website pages, and blog posts, and write access to the core CRM objects. HubSpot updated its Developer Terms on May 4, 2026, clarified that agent and MCP-based access falls under existing terms, stated explicitly that customer data belongs to the customer, and added a restriction on using API accessed data to train AI models. IDC has placed a $36 billion estimate on the agent-driven shift inside its ecosystem.

That is a real announcement. It signals real capability and real strategy. And the same week it was published, I was on my fourth round of corrections with a model that should have produced acceptable output on the first.

I want you to sit with that contrast for a moment, because it is the contrast at the center of this episode. The infrastructure for agent-driven work is being built quickly and credibly. The actual readiness of the technology, and the readiness of the humans expected to govern it, is not where the language suggests. Both of those statements are true at the same time. Holding them both is hard. The loudest voices of the moment do not want us to hold them both. They want us to choose between enthusiasm and skepticism. I just don’t think the choice is between those two things. What I do think is that the choice is between honesty and convenience. And to be clear, before I go further, this is not pessimism. By nature, I am an optimistic person. What follows is a clear-eyed read of the evidence to date.

Naming the Pattern

Let me name the pattern, because naming it makes it easier to think about. The Readiness Illusion is the assumption, encouraged by vendor stories and amplified by professional anxiety, that capability and readiness are the same thing. They aren’t. A system that can do something in a controlled demonstration is not the same as a system that should do something in a real environment, on real customer data, with real consequences. The distance between those two states is filled with trust, governance, judgment, error recovery, accountability, and the slow and expensive work of building the human and organizational capacity to know what to hand off, what to check, and what to keep close.

We are confusing capability with readiness. And the cost of that confusion is going to land on the people inside organizations who are being asked to absorb a class of tools that act, not just tools that suggest, while the conditions for absorbing them well are still being built.

What the Research Actually Says

Before I go further, I want to ground this in evidence, because I don’t want to make claims I can’t defend. Here is what the research has been telling us.

MIT’s NANDA initiative reported in 2025 that approximately 95 percent of generative AI pilots inside enterprises had not yet produced measurable returns at the time of the study. Gartner has forecast that, through 2024 and 2025, at least 30 percent of generative AI projects would be abandoned after a proof of concept by the end of 2025, and the cited reasons are poor data quality, escalating costs, unclear business value, and inadequate risk controls.

McKinsey’s State of AI reporting has consistently found that the organizations capturing meaningful value from AI are the ones that redesign their workflows around it, and the workflow redesign is overwhelmingly a human factor problem rather than a technical one. Apple published a paper in June 2025 examining the limits of large reasoning models, and what they found was that even systems explicitly designed for complex thought and reasoning collapse on problems above a certain level of complexity, and they often sound more confident as their accuracy drops. Anthropic, the company building one of the most capable models on the market, has published its own work on how to test what AI is actually doing, and on the tendency of models to agree with users or to make subtle errors with conviction. The people building these systems do not consider the trust questions resolved.

That is not the picture of a technology ready to operate on its own on work that matters. It is the picture of a technology that is genuinely capable in narrow situations where the request is clear, the guidance is good, and there is human oversight, and that performs unevenly when those conditions are not in place.

If you have used these tools as a practitioner rather than described them as a presenter, none of that should surprise you. And yet the loudest voices of the moment are making a different claim. Some want to ride the wave. Some want to be the voice in the crowd, telling us where we are heading. Their motives are often more about achieving notoriety and being seen as an early adopter and thought leader than about helping the rest of us understand what is actually happening. The problem is not that they are wrong about the direction. The problem is that the way they are speaking suggests we are already there. We simply aren’t. The technology isn’t. The humans tasked with adopting it, governing it, and answering for its outputs aren’t. The organizations that employ those humans aren’t.

The Honest Texture of Daily Use

I want to talk for a minute about the texture of using these tools, because if I don’t, the rest of this conversation will sound too abstract. Let me describe what I actually experience.

When I work with a model, the quality of the output is highly sensitive to the construction of my prompt. That is sometimes treated as a user error. I want to challenge that framing. If a tool’s effectiveness depends on the user mastering a skill that has only existed in any disciplined form for two or three years, the tool is not yet ready to be characterized as autonomous. It is collaborative. And the more we describe it as autonomous, the less we invest in helping users develop the collaborative skills that actually unlock its value.

When I get an output, my next step is almost always to evaluate it. The evaluation finds errors. Sometimes those errors are factual. Sometimes they are logical. Sometimes the model has done what I asked but not what I meant. Sometimes the model has missed nuances I assumed it would catch and, when I name the nuances explicitly, it produces a much better answer the second, third or fourth time.

The pattern I see is that the model doesn’t engage in what a human would call deep thinking. It does not sit with the ambiguity of a request. It does not ask the clarifying question that a competent colleague would ask, although I often wish it would. It produces something. I correct. It produces something better. I correct again. Across enough rounds we arrive somewhere useful. The total time invested is real. The marketing materials never include that time in their efficiency claims.

I want to say this carefully because I am not trying to overstate the case. These tools are powerful. They produce work I could not produce alone in the same time. They expand what is possible for an individual and a small team. The point isn’t that they are useless. The point is that the work they produce requires substantial human supervision, correction, and judgment to become trustworthy, and the path toward genuine autonomy is longer than the loudest voices suggest. Anyone telling you otherwise is selling something or has not used the tools enough to know.

Where Trust Actually Lives

The word trust is doing a lot of heavy lifting right now in vendor materials, and I want to frame where I think the trust question actually lives, because the way it is being used doesn’t match the way it functions inside organizations.

Trust in this context is not primarily a question about whether the model is accurate enough. Accuracy is necessary, and it isn’t sufficient. Trust in an organizational context is built through three things the model can’t supply on its own.

The first is shared accountability. When the agent acts, and the action is wrong, who is responsible? Not in policy. In practice. Whose name is on the consequence? Most organizations have not answered this clearly, and the absence of an answer creates a kind of accountability vacuum where everyone assumes someone else is supervising and no one actually is.

The second is the ability to explain. After the agent acts, can a human explain why it did what it did, in terms specific enough to audit, learn from, and correct? We are already seeing this play out in the workplace, where users share work created by AI but cannot speak in any depth about the reasoning or the substance behind the output. If the answer is no, then the organization cannot improve. It can only react to outcomes after the fact. That is a thin foundation on which to build autonomous operations.

The third is recovery. When the agent gets it wrong, and at scale it will, can the organization undo, contain, or correct the consequences without producing damage that exceeds the value of the actions that worked. Very few organizations have done the operational planning here. Most have governance documents that describe recovery in the abstract and have never tested it in practice.

If you don’t have shared accountability, the ability to explain, and recovery, you don’t have trust. You have hope. And hope is not a change, adoption or transformation strategy.

Trusting Ourselves to Know When to Trust

There is a more uncomfortable version of the trust question that the data forces us to name. We are not only asking whether to trust the agent. We are asking whether to trust ourselves to know when to trust the agent. And humans are, on average, not very good at that.

There is a substantial body of research on automation bias going back to Mosier and Skitka in the 1990s. The pattern is consistent. Humans tend to over-trust automated outputs, especially under time pressure, and to under-check the output even when they have been told to. The more competent the system appears, the worse the checking becomes. This has been replicated in aviation, in clinical decision support, and now in software contexts. It isn’t a character flaw. It is a feature of how human cognition interacts with apparently and perceptionally competent automation.

That has profound implications for how organizations deploy agents. If you put a confident-sounding agent in front of a busy professional, the professional will, on average, accept its output more readily than they should. If you don’t design the surrounding human work to make verification natural and rewarded, you will get less verification than your policy and governance documents assume. The tooling decision is, again, also a human design decision. They can’t be separated.

The Human Factor Frame

This is where the work we do at 2040 Digital and the research behind the Human Factor Method connect directly to this moment. Most of the agent era conversation treats deployment as a tooling decision. It isn’t. It is a redesign of who decides what, who is accountable for what, how people see their own role, and how the organization remembers what it has learned.

When an agent drafts the email, the question is not whether the email is good. The question is what role the human now plays in the act of communication, and how the organization knows whether that role is being performed well. When an agent updates a deal funnel stage in the CRM, the question is not whether the update is correct. The question is how the salesperson’s professional judgment is now being formed, exercised, and developed, and what that means for the organization’s ability to learn over time.

These are not new questions invented for AI. They are the same questions that have always lived under change and transformation work. The difference is that the pace of the technology has run ahead of the maturity of the institutional and societal response.

What We Are Not Measuring

We measure the wrong things, as we usually do. We measure adoption rates, prompts per user, time saved per task, and percentage of workflows automated. These are the easy measures. They are not the measures that will tell us whether the agent era is actually working.

The measures that matter, and that almost no one is publishing, are the rate at which agent or model outputs require correction, the time spent on correction relative to the time saved by automation, the rate at which agent driven actions produce customer impact that requires recovery, the change in employee judgment and capability over time as agents handle more of the routine work, and the change in how quickly the organization learns. We measure activity with precision. We measure consequence and capability with almost nothing at all.

If you are responsible for an agent deployment in your organization, that is the measurement gap I want you to look at this week. What are you measuring? What are you not measuring? What would you have to change about your reporting before you could honestly say whether your deployment is working for your people and your customers, not just for your dashboard?

The Apprenticeship Problem

There is one more dimension I want to name because it has received almost no serious treatment in the agent era conversation to my knowledge. As agents take on more of the routine work, the pathways by which humans develop judgment in their fields begin to change.

Junior practitioners in marketing, sales, support, finance, and operations have historically built their professional instincts by doing the routine work, making small mistakes, receiving feedback, and gradually internalizing the patterns that distinguish good work from average work. If agents now perform that routine work, the apprenticeship that produced the next generation of seasoned practitioners is interrupted. Organizations that do not redesign their development paths around this shift will discover, five years on, that they have a layer of senior managers, a layer of agents, and very little capability in between.

That is a foreseeable consequence. Addressing it is a leadership decision, not a technology decision. The organizations that take it seriously now will have professional capacity in the next decade. The ones that do not will be in trouble in ways that will take them years to even recognize.

The Counterargument and a Fair Hearing

I want to give the counterargument a fair hearing because the discipline of this show is to take the other side seriously. The counterargument is that organizations that wait will be left behind. That AI adoption follows an experience curve. That the only way to learn is to do. That hesitation is itself a form of risk.

All of that is true to some degree. The two thousand plus apps in the HubSpot ecosystem and the IDC sized opportunity are not imaginary. The cost of being late to a significant shift in infrastructure is real. I would be way too careless to dismiss the potential and, really the strong call to action for adoption.

But the choice we actually face is not between aggressive adoption and standing still. The choice is between aggressive adoption with shallow readiness and deliberate adoption with deeper readiness. The first looks faster in the first quarter, and given our default toward short-term thinking, many are attempting to adapt and adopt quickly. The second looks faster across the first three years, because it doesn’t spend most of that period recovering from the consequences of the first. It spends most of that period learning, adapting, and finding the appropriate applications of the technology. The research on innovation, change, and transformation tells us this consistently. The pattern is not new. The technology is.

The Reframed Question

Here is the closing reframe I want to offer you today, and it is the question I am sitting with in my own work and would invite you to sit with in yours.

We have spent the last several years asking whether AI is ready for us. That is the wrong question, or at least it is only half of the question. The other half, the one that determines whether the agent era produces value or just activity, is whether we are ready for AI.

Are our organizations structured to absorb a class of tools that act, not just tools that suggest? Are our leaders prepared to redefine accountability around outputs they did not directly produce? Are our measurement systems honest enough to surface the cost of correction alongside the benefit of automation? Are our people invested in deeply enough that their judgment continues to develop rather than fade as agents take more of the routine work? Are we, as practitioners, willing to slow down enough to do the boring, expensive, unglamorous work of building the trust foundations that make autonomy safe?

The technology will keep moving. The vendors will keep announcing. The Readiness Illusion will keep tempting us to confuse the announcement with the arrival. The work, as it almost always is, will be to stay inside the difficulty, to refuse the easy story, and to build the human capacity that any serious technology eventually demands of the people who use it.

The agent era is real. We aren’t yet ready for it. Both of those statements can be true at the same time. Holding them both is the beginning of doing the work well.

Closing

If you found today’s episode valuable, subscribe to the Human Factor Podcast wherever you watch or listen to podcasts. Leave a rating and a comment, and share this episode with your leadership team. Subscribe to the Ideas and Innovations Newsletter at 2040digital.com for weekly frameworks and research on why change succeeds or fails. The companion piece for this episode is in this week’s newsletter under the title The Readiness Illusion. Connect with me on LinkedIn where I post regularly about the psychology of transformation.

Until then, remember, transformation does not fail because of technology, strategy, or market conditions. It fails because of people. And the more deeply you understand the human factor, the more likely your transformation is to succeed. I’m Kevin Novak. Thanks for watching or listening.

END OF EPISODE

Available Everywhere

The Human Factor Podcast is available on all major platforms

🎵

Apple Podcasts

🎧

Spotify

🎙️

Google Music

🎶

Amazon Music

📺

YouTube

📻

Pandora

❤️

iHeartRadio

📡

RSS Feed

Or wherever you get your podcasts

New episodes every Thursday

Upcoming Episodes

Upcoming: Episode 027: Transformation IN Practice Series Episode 7 of the Series 

Season 2, Part 2 began May 1, 2026

 

🎙️

More Episodes Coming Soon

View Main Podcast Page →

The Complete Transformation Ecosystem

Weekly Transformation Psychology Insights

Join 5,000+ leaders getting practical insights every Thursday


© 2025 Kevin Novak. All rights reserved. Based on analysis of 100+ transformation projects • Proven methodology

Kevin Novak is the Founder & CEO of 2040 Digital, a professor of digital strategy and organizational transformation, and author of The Truth About Transformation. He is the creator of the Human Factor Method™, a framework that integrates psychology, identity, and behavior into how organizations navigate change. Kevin publishes the long-running Ideas & Innovations newsletter, hosts the Human Factor Podcast, and advises executives, associations, and global organizations on strategy, transformation, and the human dynamics that determine success or failure.