The Readiness Illusion – Why the AI Agent Era’s Loudest Claims Outrun the Evidence
The Readiness Illusion – Why the AI Agent Era’s Loudest Claims Outrun the Evidence
The Readiness Illusion: Why the AI Agent Era’s Loudest Claims Outrun the Evidence
Issue 265, May 21, 2026
I have been working with large language models long enough now to have formed a view that runs against the way most of the market is talking about this right now. I sat recently with a request that should have been straightforward. The model produced an output that looked confident and was wrong in ways that mattered. I corrected it. The next response was better and still wrong. I corrected it again. By the fourth round, I had given the model more context than I would have given a competent junior teammate, and the result was just acceptable rather than the quality I was after. This was not a failure of the model. It was a faithful reflection of what these systems can and cannot yet do, and of how much of the work the human is still doing while the marketing language describes us as supervisors rather than the ones doing the heavy lifting.
I share that experience not as a complaint but as a calibration. The conversation about AI agents has moved at a pace that the underlying reality has not matched, and the gap between the two is now wide enough that it deserves a name. I will call it The Readiness Illusion. The Readiness Illusion is the assumption, encouraged by vendor stories and amplified by professional anxiety, that being capable and being ready are the same thing. They are not. A system that can do something in a controlled demonstration is not the same as a system that should do something in a real environment, on real customer data, with real consequences. The distance between those two states is filled with trust, governance, judgment, error recovery, accountability, and the slow and expensive work of building the human and organizational capacity to know what to hand off, what to check, and what to keep close. Despite our perceptions of our abilities and our capabilities, finding the balance is going to take some time.
I want to be transparent about something before I generalize from my own use. My pattern of working with these tools is not representative. I have spent decades inside emerging technologies, innovation, reinvention, and change and transformation work. What I have learned about myself over these many years is that I have a habit of seeking out patterns, which helps me see what these systems do well and where they fall short. I am also more willing than the average user to push back, reframe, and insist on a better answer or outcome. That makes my experience, in some ways, an outlier toward the productive end of the curve. If my outcome after four rounds of correction is acceptable rather than the quality I was looking for, the typical user’s outcome after one round is something measurably less than that. The discipline I bring to the interaction is itself the variable that produces the value, and most users have not yet built that discipline. I want to keep that calibration in view throughout, because the gap between what skilled users currently produce with these tools and what typical users currently produce is itself part of the story. Any honest account of the technology has to hold both ends at the same time.
The HubSpot Signal
HubSpot’s recent post on building an open ecosystem for the agent era is, in my opinion, a serious and well-constructed piece of corporate strategy, and the strategic frame it offers is worth taking on its own terms. The vision the company describes: that agents should be able to run on HubSpot and that agents should be able to run HubSpot, captures the shift accurately. The remote Model Context Protocol, or MCP, server is now generally available, with read access to campaigns, landing pages, website pages, and blog posts, and write access to the core CRM objects. The Developer Terms were updated on May 4, 2026, to clarify that agent and MCP-based access falls under existing terms, to state explicitly that customer data belongs to the customer, and to add a restriction on using API accessed data to train AI models. IDC has placed a $36 billion estimate on the agent-driven shift inside HubSpot’s ecosystem alone. None of that is hype. It is real forward-looking potential, and it matters.
What I want to do in this piece, and what I want to ask you to do as you read it, is hold two things in mind at the same time. The infrastructure for agent-driven work is being built quickly and credibly, faster than most people realize. The readiness of both the technology and the humans expected to govern it is not where the hype and the current conversation suggest. That second observation is not pessimism. By nature, I am an optimistic person. It is a claim grounded in evidence, and the research supports it more than the marketing does.
What the Research Actually Says
Consider the evidence. MIT’s NANDA initiative reported in 2025 that approximately 95 percent of generative AI pilots inside enterprises had not yet produced measurable returns at the time of the study. Gartner’s analysis through 2024 and 2025 forecasts that at least 30 percent of generative AI projects will be abandoned after a proof of concept by the end of 2025, citing poor data quality, escalating costs, unclear business value, and inadequate risk controls. McKinsey’s State of AI reporting has consistently found that organizations adopting AI without redesigning their workflows around it capture only a small fraction of the value that those who do redesign them realize, and the workflow redesign is overwhelmingly a human factor problem rather than a technical one. Apple’s June 2025 paper on the limits of large reasoning models found that even systems explicitly designed for complex thought and reasoning collapse on problems above a certain level of complexity, and that they often sound more confident as their accuracy drops. Anthropic’s own published work on how to test what AI is actually doing, and on the tendency of models to agree with users or to make subtle errors with conviction, makes clear that the people building these systems do not consider the trust questions resolved.
This is not the picture of a technology that is ready to operate on its own on work that matters. It is the picture of a technology that is genuinely capable in narrow situations where the request is clear, the guidance is good, and there is human oversight, and that performs unevenly when those conditions are not in place. None of that should surprise anyone who has spent time using these tools rather than describing them. Yet the loudest voices of the moment are making a different claim. Some want to ride the wave. Some want to be the voice in the crowd, telling us where we are heading. Their motives are often more about achieving notoriety and being seen as an early adopter and thought leader than about helping the rest of us understand what is actually happening. The problem is that the way they are speaking suggests we are already there. We are not. The technology is not. The humans tasked with adopting it, governing it, and answering for its outputs and outcomes are not. The organizations and the leaders inside them are not either.
A Longer Pattern of Over-Promise
It is worth situating this gap inside a longer pattern, because the over-promise dynamic is not unique to AI. It is a chronic feature of how new capabilities enter a market. Organizations pursuing first-mover advantage, market footprint, and revenue momentum have always had an incentive to describe early capability as if it were mature performance. ERP implementations promised seamless enterprise integration and produced years of integration debt, custom development, and process workarounds before the value emerged, if it emerged at all. Mergers promised synergies and produced cultural collision and prolonged executive attention to internal politics. Digital transformation promised reinvention and produced a great deal of activity that did not change the underlying business models. Consumer products routinely promise outcomes that don’t deliver and rely on aggregate marketing claims to outpace any one customer’s disappointment. The pattern is built into the way capability is sold, not an accident of any one product.
Markets reward the announcement, and the work of integration, accommodation, human absorption, and human-driven correction happens later, on the customer’s time and at the customer’s expense. The dismissal or under-consideration of human dynamics is not a flaw of any single market. It is the default condition of how capability is sold, and it has been the default for decades, perhaps centuries or even longer. Treating the AI version of this pattern as if it were uniquely a property of the technology lets the broader system off the hook. The technology is participating in a pattern that organizations and markets have run many times before. That observation does not diminish the AI-specific concerns. It places them inside a frame that leaders already recognize, which makes them easier to act on rather than easier to dismiss.
The hardest part of this is that the gap between promise and practice is not a fault or blame to be assigned to anyone. It is what happens when a new technology arrives faster than society, and the organizations meant to use it can adjust. The models can do remarkable things and still fabricate a citation. The user can write a careful prompt and still produce an output that requires four rounds of correction to become usable. The platform can declare that customer data belongs to the customer and still build an architecture in which the practical control of that data depends on configuration choices most customers do not yet know how to make. Back to the HubSpot example, a majority of platform clients are small and mid-size businesses, and most of the people inside those businesses who use HubSpot are marketers. HubSpot’s new direction assumes that those core users are familiar with how AI agents work and can apply that familiarity to the platform. That is a meaningful assumption, and it is going to take real time and real investment for the average user to catch up to it.
None of these failures, perceptions, or omissions invalidate the underlying direction of where we and AI are heading. They simply mean that the time horizon for trustworthy autonomy is longer than the loudest voices want to admit.
Where Trust Actually Lives
I want to frame where I think the trust question actually lives, because the way the word is used in vendor materials does not match the way it functions inside organizations, or really across society today. Trust in this context is not primarily a question of whether the model is accurate enough. Accuracy is necessary, and it is not sufficient. Trust in an organizational context is built through three things that the model cannot supply on its own. The first is shared accountability, meaning a clear answer to the question of who is responsible when the agent acts, and the action is wrong. The second is the ability to explain, after the fact, why the agent did what it did, in terms a human can audit and act on. We are already seeing this play out in the workplace, where users share work created by AI but cannot speak in any depth about the reasoning or the substance behind the output. The third is recovery, meaning the ability to undo, contain, or correct the consequences of a wrong action without producing damage that exceeds the value of the right ones. Most organizations preparing to deploy agents have given some thought to the first, almost no rigorous thought to the second, and very little operational planning for the third. That is the readiness gap. That is where the next two or more years of serious, thoughtful, and directional work need to happen.
There is also a more uncomfortable version of the trust question, and the available research data forces us to name it. We are not only asking whether to trust the agent. We are asking whether to trust ourselves to know when to trust the agent. That is something humans are, on average, not very good at. Behavioral research on automation bias, going back to the work of Mosier and Skitka in the 1990s and continuing through more recent studies in clinical, aviation, and now software settings, shows that humans tend to over-trust automated outputs, especially under time pressure, and to under-check the output even when they have been told to. Why? Because checking takes both energy and attention that most people don’t want to spend. The more competent the system appears, the worse the checking becomes. This is a well-documented pattern. It does not mean we should not deploy agents. It does mean that the design of the human work around the agent has to be a primary focus of safety and value, not an afterthought.
Not a Tooling Decision
This is where the Human Factor Method has something specific and useful to say about the moment. Most of the agent era conversation treats deployment as a tooling decision. It is not. It is a redesign of who decides what, who is accountable for what, how people see their own role, and how the organization remembers what it has learned. When an agent drafts the email, the question is not whether the email is good. The question is what role the human now plays in the act of communication, and how the organization knows whether that role is being performed well. When an agent updates a deal funnel stage in the CRM, the question is not whether the update is correct. The question is how the salesperson’s professional judgment is now being formed, exercised, and developed, and what that means for the organization’s ability to learn over time. These are not new questions invented for the AI era. They are the same questions that have always lived under the change and transformation work. The difference is that the pace of the technology has run ahead of the maturity of the institutional and societal response, both historically and right now. There is a real fear that stems from these questions remaining unaddressed, mostly around what a human’s role is in a professional identity sense. When we have not defined the relationship and the path, the fear of loss overrides any excitement about the change.
We Measure the Wrong Things
We measure the wrong things, as we often do. We measure adoption rates, prompts per user, time saved per task, and percentage of workflows automated. These are the easy measures, and they let an individual, a team, or an entire organization feel a sense of accomplishment. They are not the measures that will tell us whether the agent era is actually working for the people inside the organization or for the customers on the other side. The measures that matter, and that almost no one is publishing, are the rate at which agent or model outputs require correction, the time spent on correction relative to the time saved by automation, the rate at which agent driven actions produce customer impact that requires recovery, the change in employee judgment and capability over time as agents handle more of the routine work, and the change in how quickly the organization learns. We measure activity with precision. We measure consequence and capability with almost nothing at all. The gap between those two is exactly where the value of AI adoption and human adaptation is either created or quietly destroyed.
I want to be careful here, because the failure mode of this kind of article is to move from a clear-eyed assessment into reflexive caution. That is not the argument I am making. The argument is that the appropriate response to a serious and credible technology shift is serious and credible adoption, and that serious adoption looks different from the current dominant narrative. It looks like making deliberate choices about where and how to use the technology. It looks like investing in the human capability to supervise rather than assuming supervision will emerge from policy memos. It looks like measurement systems that distinguish between activity and consequence. It looks like governance that names how the organization will recover before the agent is deployed, not after the first incident. It looks like resisting the suggestion, embedded in many vendor narratives, that not adopting at the maximum available pace is a failure of nerve and guts. It is not. It is a failure of nerve and guts to confuse motion with progress, which we so often do, particularly in an organizational setting. It is wisdom to insist that capability and readiness be evaluated separately and that an organization commit only to what it can supervise.
Aggressive Adoption Versus Deliberate Adoption
There is a reasonable counterargument, and I want to give it its due because it is an important one. The counterargument is that organizations that wait will be left behind, that AI adoption follows an experience curve, that the only way to learn is to do, and that hesitation is itself a form of risk. All of that is true to some degree. The 2,000-plus apps in the HubSpot ecosystem and the IDC sized opportunity are not imaginary. The cost of being late to a significant shift in infrastructure is real. But the choice is not between aggressive adoption and standing still. It is between aggressive adoption with shallow readiness and deliberate adoption with deeper readiness. The first looks faster in the first quarter, and given our default toward short-term thinking, many are attempting to adapt and adopt quickly. The second looks faster across the first three years, because it does not spend most of that period recovering from the consequences of the first. It spends most of that period learning, adapting, and finding the appropriate applications of the technology. The research on innovation, change, and transformation tells us this consistently. The pattern is not new. The technology is.
The Apprenticeship Question
There is one more dimension worth naming, because it sits directly inside the human factor and has received almost no serious treatment in the agent era conversation. As agents take on more of the routine work, the pathways by which humans develop judgment in their fields begin to change. Junior practitioners in marketing, sales, support, finance, and operations have historically built their professional instincts by doing the routine work, making small mistakes, receiving feedback, and gradually internalizing the patterns that distinguish good work from average work. If agents now perform that routine work, the apprenticeship that produced the next generation of seasoned practitioners is interrupted. Organizations that do not redesign their development paths around this shift will discover, five years on, that they have a layer of senior managers, a layer of agents, and very little capability in between. That is a foreseeable consequence, and addressing it is a leadership decision, not a technology decision.
Are We Ready for AI
The closing reframe I want to offer is simple, and it is the question I am sitting with in my own work and would invite you to sit with in yours. We have spent the last several years asking whether AI is ready for us. That is the wrong question, or at least it is only half of the question. The other half, the one that determines whether the agent era produces value or just activity, is whether we are ready for AI. Are our organizations structured to absorb a class of tools that act, not just tools that suggest? Are our leaders prepared to redefine accountability around outputs they did not directly produce? Are our measurement systems honest enough to surface the cost of correction alongside the benefit of automation? Are our people invested in deeply enough that their judgment continues to develop rather than fade as agents take more of the routine work? Are we, as practitioners, willing to slow down enough to do the boring, expensive, unglamorous work of building the trust foundations that make autonomy safe?
The technology will keep moving. The vendors will keep announcing. The Readiness Illusion will keep tempting us to confuse the announcement with the arrival. The work, as it almost always is, will be to stay inside the difficulty, to refuse the easy story, and to build the human capacity that any serious technology eventually demands of the people who use it. The agent era is real. We are not yet ready for it. Both of those statements can be true at the same time. Holding them both is the beginning of doing the work well.
Sources and References
HubSpot, “Our Vision for Building an Open Ecosystem for the Agent Era,” 2026.
HubSpot Developers, “Our Ecosystem Vision for the Agent Era and Updated Developer Terms,” May 4, 2026.
HubSpot, “The $30B Opportunity: How AI and Unified Data Will Define the Next Era of HubSpot’s Ecosystem,” 2026.
HubSpot and IDC, “Agentic Shift Reshapes $36B HubSpot Ecosystem Opportunity,” 2026.
MIT NANDA initiative, study reporting that approximately 95 percent of generative AI pilots in enterprises had not produced measurable returns, 2025.
Gartner, forecasts on generative AI pilot abandonment and agentic AI deployment trajectory, 2024 to 2025.
McKinsey and Company, State of AI reports, 2024 and 2025, on workflow redesign as a precondition for AI value capture.
Apple Machine Learning Research, “The Illusion of Thinking” examining limits of large reasoning models, June 2025.
Anthropic, public research on automated alignment auditing, sycophancy, and the limits of model trustworthiness.
Mosier, K. L. and Skitka, L. J., research on automation bias, originating in the 1990s and continuing through subsequent literature in aviation, clinical, and software contexts.
Connect With Us
What leadership challenges are shaping your decisions right now? Share your experiences and join the conversation.
Go Deeper: Human Factor Podcast
From resistance and identity to the frameworks that help leaders navigate transformation. Available wherever you listen to or watch podcasts.
Kevin Novak
Kevin Novak is the Founder & CEO of 2040 Digital, a professor of digital strategy and organizational transformation, and author of The Truth About Transformation. He is the creator of the Human Factor Method™, a framework that integrates psychology, identity, and behavior into how organizations navigate change. Kevin publishes the long-running Ideas & Innovations newsletter, hosts the Human Factor Podcast, and advises executives, associations, and global organizations on strategy, transformation, and the human dynamics that determine success or failure.
