Why most company AI projects stall

Companies have spent tens of billions on enterprise AI, and most of it has returned nothing. The reason is not the models. It is that companies bolt agents onto an operating layer built for humans, with no shared memory, no house voice, no inbox, and no rules. The operating substrate is what separates the AI projects that work from the ninety-five percent that stall.

The bottleneck.

Companies spent somewhere between thirty and forty billion dollars on enterprise generative AI by 2025, and a widely-cited MIT study that year found that ninety-five percent of those initiatives returned nothing measurable. The other surveys rhyme with it: IDC found that eighty-eight percent of AI proofs of concept never reach production, only four of every thirty-three make the cut, and Gartner predicts that more than forty percent of agentic-AI projects will be canceled by the end of 2027, citing escalating costs, unclear business value, and inadequate risk controls. The numbers are large enough that the reflex is to blame the technology. The models are not the problem. The models are the most capable they have ever been, and they keep getting better on a schedule.

The problem is that companies are bolting agents onto an operating layer that was built for humans, not for agents. A demo works because the person running it supplies all the missing context by hand: who the company is, what the project is, where the file goes, what the right tone is, what the agent is allowed to touch. Production is the same task without that person standing there. The agent that looked brilliant in the demo is, in production, a stranger asked to do skilled work with no memory, no house style, no inbox, and no rules. It stalls, and the pilot gets quietly shelved.

The fix is not a better model or a bigger pilot budget. It is an operating substrate: the layer underneath operations that agents read from and write to. The companies in the surviving five percent have one, whether they call it that or not. The ones in the ninety-five percent are trying to run agents on top of nothing.

The failure is structural, not technological.

Read the post-mortems and the same shape appears. The MIT work attributes most of the failure to a gap between the tool and the workflow: the successful deployments integrate into how the company actually operates, and the failed ones sit beside it, demanding that humans feed and babysit them. A 2025 industry report on agent pilots put it more bluntly, that the pilots fail because they lack an operating system to manage memory, input and output, and permissions. An analysis the same year made the data-side version of the argument: agentic AI breaks when the data it reads was built for humans and carries no machine-legible context, no record of when something was true, and no statement of what the agent is allowed to do with it.

None of this is a model-capability problem. It is an infrastructure problem wearing a model's clothes. The agent can reason; it has nothing reliable to reason over and nowhere disciplined to put the result. Every one of those failures traces back to a missing layer that a human was silently standing in for during the demo.

What a substrate actually is.

Operations is the layer most people picture when they think about running a company: finance, customer success, hiring, the workflows a team executes. The substrate is the layer underneath, the one that makes those workflows legible to a machine. It is the company's memory, voice, accounting, and rules expressed in a form an agent can read on its own, without a person in the loop translating.

A company that builds operations on top of an absent or accidental substrate builds them twice. Once for the version that lived in the founders' heads and got the demo working, and again for the version the agents and the next ten people can actually use. A company that installs the substrate first builds them once. The substrate is not glamorous and it does not demo well, which is exactly why most companies skip it and end up in the ninety-five percent. There are five layers, and the leverage is in installing them as one composable system rather than as five disconnected initiatives.

The five layers.

Shared memory. Every agent conversation begins with context: who the company is, what it is working on, what the recent decisions were, who the clients are. Without a memory layer the operator re-establishes that context at the top of every session, and by the third month the time spent re-explaining the company exceeds the time the agent saves, so the agent quietly falls out of use. The fix is a written, layered context the agent reads automatically: one description at the company level, one per active project, one per local concern. This is the layer the new interoperability standards are built to serve. Anthropic's Model Context Protocol, an open standard for connecting agents to a company's data and tools, has moved from one vendor's idea to shared infrastructure, now donated to the Linux Foundation's Agentic AI Foundation and joined by Google's agent-to-agent protocol.
Voice and output discipline. An ungoverned model drafts in marketing English: validation lines, hedges, the house style of nobody. For internal notes that is harmless; for client correspondence it quietly erodes trust, one off-key email at a time. The fix is not a style guide that nobody reads. It is a guardrail in the path of the output that refuses to let off-voice text through, checked at the moment the draft is produced rather than caught later in review. The discipline has to live in the system, not in a document, because the system is what the agent actually consults.
Capture. The real record of a company lives in email, chat, and calls. If none of that lands in the substrate in a structured form, the agent is blind to what actually happened and can only act on what someone remembers to paste in. The fix is a capture pipeline that routes inbound communication to the right place automatically, tags who it is from and what it concerns, and files the attachments, so the substrate reflects reality instead of a partial transcript of it.
Time and money attribution. Agents cost money per task and increasingly do work a company bills for. Untracked, both numbers are guesses: the operator cannot price the work, defend an invoice, or know which client is consuming the AI budget. The fix is attribution that runs continuously, tying agent cost and agent work to the client and the project as it happens, not reconstructed from memory at the end of the month.
Skills and governance. Agent capabilities arrive as a stream of discrete tools, and an operator who chases each one learns three and forgets five. Worse, an agent with no boundaries reaches into systems it should never touch. The fix is two-sided: an institutional baseline of capabilities the whole company inherits, and hard credential boundaries that define what each agent can and cannot reach. Governance is not bureaucracy here. It is the difference between an agent that is useful and an agent that is a liability the first time it touches production with permissions it was never meant to have.

Why now.

The substrate used to be a bespoke build, which is part of why so few companies bothered. That is changing. The interoperability layer is consolidating into open standards: the Model Context Protocol for connecting agents to data and tools, agent-to-agent protocols for letting them coordinate, and a neutral foundation now stewarding both. The practical consequence for an operating company is that the memory and tooling layers no longer have to be invented from scratch; they can be installed against standards the major model vendors are converging on.

The window is the same one every infrastructure shift opens. The companies that install the substrate while the standards are consolidating compound on it for years. The companies that wait for the dust to settle spend that time in the ninety-five percent, running pilots that stall for reasons they keep misattributing to the model.

How Rarefied Earth thinks about this work.

The firm builds the operating substrate for companies that run on AI, and it runs that substrate on itself first. The five layers above are not a slide; they are the company's own operating floor, debugged in production before any of it is deployed for a client. The memory cascade, the voice guard, the capture pipeline, the time and cost attribution, the credential boundaries: each one earns its place by being used daily, which is the only test that separates infrastructure that works from infrastructure that demos.

The engineering credentials behind the firm (structural engineering, an MSCE, a peer-reviewed publication, FDOT bridge work) matter most in heavy civil and construction, the vertical where a buyer needs to trust that the person building their systems understands physical infrastructure and field operations. The substrate itself is vertical-neutral. The thing a law firm, a contractor, and a forming startup have in common is that an agent without memory, voice, capture, accounting, and rules will stall in all three, and the fix is the same in all three.

The posture is the one that runs through everything the firm publishes: structure the work around the questions an operator actually asks, not around a vendor demo. The model is not the hard part anymore. The substrate is. The companies that figure that out are the five percent.

Sources and further reading.

Public references

The GenAI Divide: State of AI in Business 2025 (MIT) · Source of the 95%-zero-return finding across enterprise GenAI initiatives, reported widely in August 2025. Fortune coverage · report deck
IDC (with Lenovo) on pilots reaching production · The finding that 88% of AI proofs of concept never reach production, only four of every thirty-three, attributed by IDC Group VP Ashish Nadkarni to low organizational readiness in data and process. CIO coverage of the IDC research
Gartner: agentic-AI cancellations · Press release, 25 June 2025, projecting that over 40% of agentic-AI projects will be canceled by the end of 2027 due to escalating costs, unclear business value, or inadequate risk controls. Gartner newsroom
Model Context Protocol (Anthropic) · The open standard for connecting AI systems to data and tools. Introduction · donation to the Linux Foundation Agentic AI Foundation
Open standards for agentic AI · Coverage of the move toward shared protocols (MCP, Google's A2A) for agent interoperability. CIO Dive

Related work.

This is the broad version of the argument the firm makes in its more vertical pieces. The field guide on AI takeoffs for general contractors and the guide to the free Florida bid channels that beat ConstructConnect are the same posture applied to construction, the vertical where the credentials carry the most weight.

Discussion

Disagree, or running into this at your company? Reply by email: joseph.scott@rarefied.earth.