From Copilots to Autonomous Companies - enterprise AI agents that execute and are governed by humans, orbiting a central context layer

From Copilots to Autonomous Companies: The Shift to AI-Native Operations

Why the bottleneck to enterprise AI is no longer the model. It is the context.

The quiet end of the copilot era

In 2023, “AI in the enterprise” meant a person typing into a sidebar. GitHub Copilot suggested code, Microsoft 365 Copilot summarized meetings, Salesforce Einstein drafted replies, and every SaaS vendor stapled a chat icon to their product. Microsoft now reports that its Copilot family supports tens of millions of active users across GitHub Copilot, Microsoft 365 Copilot, and consumer Copilot, and is consolidating them into a single “super app” by the end of summer 2026. Copilots became table stakes faster than any enterprise technology in recent memory.

And yet, the boards funding all of this are starting to ask the same question. Where is the money?

The data is not subtle. McKinsey’s November 2025 State of AI survey found that 88 percent of organizations now use AI in at least one function, but only 39 percent report any enterprise-wide EBIT impact. MIT’s NANDA initiative, in its “GenAI Divide: State of AI in Business 2025” study, found that roughly 95 percent of enterprise generative AI pilots produced no measurable P&L lift. RAND Corporation’s 2024 report “The Root Causes of Failure for Artificial Intelligence Projects” concluded that by some estimates more than 80 percent of AI projects fail, twice the rate of failure for IT projects that do not involve AI. Gartner predicts that more than 40 percent of agentic AI projects will be canceled by end of 2027 because of escalating costs, unclear value, or inadequate risk controls.

THE ENTERPRISE AI REALITY 88% use AI in ≥1 function 39% report any EBIT impact 95% GenAI pilots, no P&L lift >80% AI projects that fail >40% agentic work cut by 2027 McKinsey 2025 · MIT NANDA · RAND 2024 · Gartner 2025
Fig 1 - The spend is real; the return is not. The gap is a context problem, not a model problem.

This is not a story about bad models. Frontier model capability has, if anything, raced ahead of enterprise readiness. As Tom Blomfield, the Monzo founder and now Y Combinator general partner, put it in YC’s Summer 2026 Request for Startups: the biggest blocker to AI automation of companies is no longer the models, they just got so good so quickly. Now the blocker is the domain knowledge.

That single sentence is the inflection point. Copilots assist humans inside a single application. The next era, AI-native operations, will have agents executing business processes end to end, across data and systems, in production, without a human keystroke per step. The companies that win that era will be what YC and Sequoia have started calling autonomous companies or AI-native services companies: organizations where the unit of work is shipped by an agent, governed by humans, and measured in outcomes rather than seats.

The bottleneck is not the model. The bottleneck is the context. And the missing infrastructure is what we at Colrows call the Autonomous Semantic Layer.

Copilots were step one. They are not the destination.

Copilots solved a real problem. They lowered the activation energy for AI inside the enterprise by sitting next to a human in a tool that human already used. That gave organizations a safe sandbox. It also exposed the ceiling. Three structural limits have become impossible to ignore.

COPILOT ERA AI-NATIVE OPS Unit of work a draft or query an executed process Who acts human, AI assists agent, human governs Scope one application across systems Errors fuzziness tolerated must be exact Measured in seats outcomes
Fig 2 - The jump from copilots to AI-native operations is a jump from tolerable fuzziness to required precision.

They are personal, not operational. A copilot drafts a paragraph or a query. It does not run a forecast cycle, reconcile a ledger, route a procurement decision, or close a quarter. McKinsey’s 2025 survey shows that most of the value from copilots accrues at the use-case level, not the enterprise level. That gap is not a transition phase, it is a structural property of the human-in-the-loop pattern.

They retrieve, they do not execute. As Blomfield put it, a Company Brain is not a company-wide search or a chatbot over documents. It is a living map of how a company works. The distinction between retrieval and execution is the technical moat for the next wave. Retrieval can tolerate fuzziness. Execution cannot.

They hallucinate at the boundary that matters most: numbers. Retrieval-augmented generation has not solved this. Stanford RegLab’s 2025 study in the Journal of Empirical Legal Studies found that production legal RAG systems still hallucinated on 17 to 33 percent of queries. A separate 2025 JMIR Cancer study at Japan’s National Cancer Center found that GPT-3.5-based RAG chatbots drawing on general search produced a 35 percent hallucination rate on questions not covered by the curated knowledge base. On the structured side, the Spider 2.0 benchmark presented at ICLR 2025 showed GPT-4o solving only 10.1 percent of real enterprise text-to-SQL tasks compared to 86.6 percent on the academic Spider 1.0. Drop a frontier model into a real warehouse with thousands of columns and scattered tribal knowledge, and it gets the answer right less than one time in five.

For a copilot helping a human draft an email, this is acceptable. For an agent executing a refund, recognizing revenue, or pricing an insurance policy, it is disqualifying.

The shift: AI-native operations and the autonomous company

Across the most opinionated investors in the market, the framing is converging.

Sequoia’s Julien Bek, in his March 2026 essay “Services: The New Software,” argued that the next wave is not copilots that help professionals work, it is autopilots that sell the work itself. His central statistic: for every dollar spent on software, six are spent on services. He maps the playbook from Crosby in legal NDAs to Rillet in accounting to WithCoverage in insurance brokerage, where AI-native services companies are collapsing that ratio.

a16z’s Jason Cui and Jennifer Li, in their March 2026 piece “Your Data Agents Need Context,” put the diagnosis in one line: over the past year, the market has realized that data and analytics agents are essentially useless without the right context. They cannot tease apart vague questions, decipher business definitions, and reason across disparate data effectively.

And YC, in its Summer 2026 RFS, called the missing primitive by a name founders will recognize for years: the Company Brain. Blomfield describes it as a new primitive that pulls knowledge out of fragmented sources, structures it, keeps it current, and turns it into an executable skills file for AI. The company brain, he writes, becomes the missing layer between raw company data and reliable AI automation, and every company in the world is going to need one.

YC is asking founders to build it. Gartner is forecasting demand for it: 40 percent of enterprise applications integrated with task-specific AI agents by end of 2026, up from less than 5 percent in 2025, and at least 15 percent of day-to-day work decisions made autonomously by 2028. The category is real. The question is what it must actually do to function.

This is what we mean by AI-native operations: an operating model designed on the assumption that AI agents participate in work alongside humans, applications, and data, with structured semantics rather than vibes as the substrate.

Why context, not model capability, is the real bottleneck

Three patterns explain why every enterprise data leader feels stuck right now.

Context fragmentation

A typical Fortune 500 runs a few dozen systems of record (Salesforce, Workday, ServiceNow, SAP, NetSuite, Snowflake, Databricks, plus a long tail of vertical SaaS), a few hundred dashboards, and a tribal layer of definitions that live in Confluence pages, dbt YAML files written by someone who left in 2023, Slack threads, and the heads of three analysts. Salesforce’s 2024 Connectivity Benchmark found 72 percent of IT leaders describe their infrastructure as overly interdependent, and 80 percent say data silos hinder digital transformation.

For a human, this is annoying. For an autonomous agent, this is fatal. There is no fallback. If an agent does not know whether “revenue” is run-rate ARR, GAAP-recognized revenue, or billings net of refunds, it does not pick the safe interpretation. It picks one, and it picks confidently. We call this condition context fragmentation, and it is the single biggest reason enterprise agents fail in production.

AI Agent asks: “What is revenue?” SALES / CRM Run-rate ARR annualized run-rate FINANCE GAAP-recognized recognized revenue BILLING Billings net of refunds after refunds The agent picks one - confidently. There is no safe default.
Fig 3 - When “revenue” means three different things, an agent does not hedge. It commits to one interpretation and acts on it.

Semantic gravity

Data has gravity. There is a parallel force we call semantic gravity: business meaning accumulates around the systems where decisions are repeatedly made, and once meaning lives in one place, every other tool reaches for it. In the dashboard era, semantic gravity sat inside Looker, Tableau, or a stack of dbt models. In the agent era, every agent, MCP client, vertical app, and downstream automation needs to reach for that same authoritative meaning, in real time, at machine speed.

If meaning does not live in a deterministic, shared, governed layer, every consumer of data invents its own version. That is exactly what the Open Semantic Interchange initiative was created to address.

The model is not the moat. The semantics are.

Anthropic’s Model Context Protocol, launched November 2024 and donated to the Linux Foundation under the newly formed Agentic AI Foundation in December 2025, solved the plumbing problem. It standardized how an agent connects to a tool or data source: a universal, open standard that replaces fragmented integrations with a single protocol. OpenAI, Google, Microsoft, and AWS adopted it within a year.

But MCP is the USB-C, not the disk. It does not tell the agent what “active customer” means. It does not tell the agent which join path is valid. It does not tell the agent that the marketing CDP defines a session differently than the product analytics warehouse. The wire is now solved. The payload, the meaning that travels over the wire, is not.

MCP STANDARDIZES THE CONNECTION, NOT THE MEANING AI Agent MCP client Enterprise Data & Systems MCP : THE WIRE (USB-C) SOLVED THE PAYLOAD : MEANING NOT SOLVED ? ? ? what “active customer” means · which join is valid · which grain
Fig 4 - MCP solved the wire. The meaning that travels over the wire is still up for grabs - which is where the semantic layer lives.
The Colrows thesis

This is why we say at Colrows: fix the context, not the model.

Why existing approaches are structurally behind

Almost every category that pitches itself as the answer to enterprise AI agents was designed before agents existed. That matters.

Hyperscaler data platforms (Snowflake, Databricks, Microsoft Fabric). All three are racing to add semantic layers on top of their warehouses. Snowflake launched Semantic Views and Cortex Analyst. Databricks added Unity Catalog Metric Views and Genie. The bet is data gravity. The limit is that each remains a single-warehouse view of the world, and enterprises do not run on a single warehouse. A semantic layer locked to one engine cannot be the cross-enterprise brain.

BI semantic layers (Looker/LookML, dbt Semantic Layer with MetricFlow, AtScale, Cube). These were built to serve dashboards. They define metrics in YAML and govern the retrieval step of a BI query. As a16z lays out, they are usually hand-constructed by data teams using very specific syntax and wired to a single BI tool. They cover specific metric definitions, not canonical entities, identity resolution, join-path proofs, grain validation, and live evolution of all of the above. As Tellius put it in 2025, a traditional semantic layer governs roughly 20 percent of what an agent actually needs.

Metadata catalogs (Collibra, Alation, Atlan, Informatica). Catalogs are valuable. They document. They do not execute. Telling an agent “here is a glossary” is not the same as giving it a deterministic compiler from intent to query. Catalogs sit beside the query path, not on it.

RAG systems. RAG remains the default for unstructured Q&A and a probabilistic retrieval pattern. Stanford RegLab put production legal RAG hallucination rates at 17 to 33 percent. For numbers and operational decisions, this is the wrong primitive.

Text-to-SQL. On Spider 2.0, the enterprise-grade benchmark at ICLR 2025, GPT-4o scored 10.1 percent and o1-preview 17.1 percent, against 86.6 percent on the academic Spider 1.0. Text-to-SQL collapses the moment you point it at a 3,000-column real warehouse. The right answer is not better SQL generation. It is removing the need for the model to write SQL at all.

Knowledge graphs (Palantir, Neo4j-based stacks, Stardog). Powerful where they fit. Expensive to model, brittle under schema drift, operationally heavy. Most enterprises that tried to build one in-house never finished.

Agent orchestration frameworks (LangChain, LangGraph, AutoGen, CrewAI). These are control planes. They schedule tool calls. They do not own the meaning of the data flowing through those calls. An orchestrator over an ungoverned semantic substrate is a faster way to be confidently wrong.

Vertical AI copilots inside SaaS (Salesforce Agentforce, ServiceNow Now Assist). Each is excellent inside its own system. None can be the cross-system brain. Salesforce cannot govern semantics inside Workday. ServiceNow cannot reconcile a marketing definition of churn with a finance one.

The pattern is consistent. Every incumbent has shipped a partial answer that is structurally bound to the layer it already owned. None of them was architected for the actual workload: deterministic, cross-platform, reproducible, agent-first semantics.

What a true Autonomous Semantic Layer requires

This is the architectural argument. An Autonomous Semantic Layer is not a metrics dictionary. It is the deterministic execution layer between agents and enterprise data, a shift we trace in From Metric Stores to Knowledge Machines. It is what makes an AI-native operation safe to put in production. Eight properties matter.

  1. A deterministic semantic compiler. Same question in, same query out, same answer back, every time. Not LLM-written SQL, but a semantic execution layer that compiles a business intent into a verified, executable plan. Probabilistic generation is fine for content. It is not acceptable for the numbers a CFO signs.
  2. Join-path proof. Before a query runs, the layer proves the join path is semantically valid: entities share a defined relationship at a defined grain, and no fan-out silently doubles a metric. This is the most common cause of results that are wrong by 2x, and it is invisible to text-to-SQL.
  3. Grain validation. A metric defined at order-line grain cannot be aggregated at customer grain without an explicit, governed rollup. Grain validation enforces this at compile time, not at “the dashboard looks weird” time.
  4. Point-in-time reproducibility. The agent that ran the revenue query yesterday must reproduce yesterday’s answer today, even if a definition changed overnight. Without it, audit trails are theater.
  5. Semantic drift detection. Definitions, source schemas, and business meaning change. The layer detects drift continuously and surfaces it before an agent decides against a stale definition. Closer to APM for semantics than to a static catalog.
  6. Persona scope. A field rep, a finance controller, and an external partner ask the same question and need different views of the truth, bounded by what they are allowed to see. Row-level and metric-level governance fused with the semantic layer, not bolted on at the dashboard.
  7. MCP-native and OSI-aligned. MCP is now the connective tissue for agentic AI, governed by the Linux Foundation. The Open Semantic Interchange initiative, launched September 2025 by Snowflake with Salesforce, dbt Labs, BlackRock, RelationalAI, Atlan, Cube, ThoughtSpot, Mistral AI and others, gives the industry an open, vendor-neutral spec for semantic metadata. An Autonomous Semantic Layer must speak MCP outbound and OSI as its interchange format. Anything proprietary at that boundary is a future migration tax.
  8. Auto-crawl, multi-vector embeddings, and a Vector Intelligence Store. Hand-modeling worked at ten metrics. It does not scale to thousands of entities across dozens of systems. An Auto-crawl Engine continuously discovers candidate entities, joins, and definitions. Multi-vector embeddings let an LLM Orchestration Runtime reason over them. A Vector Intelligence Store keeps the semantic graph queryable in real time. The result is a Consensus Semantic Layer: not one analyst’s opinion of “revenue,” but the version every stakeholder, agent, and tool resolves against, with provenance.

Put together, this is what Colrows is building as the Autonomous Semantic Layer, exposed to agents and applications through a Semantic API. The shorthand is one we use deliberately. Fix the context. Not the model.

THE AUTONOMOUS SEMANTIC LAYER MCP-native OSI-aligned CONSUMERS Agents MCP Clients Vertical Apps BI & Humans Semantic API · single governed entry point Deterministic Semantic Compiler Determinism Join-path proof Grain validation Point-in-time Drift detection Persona scope AUTONOMOUS ENGINE Auto-crawl Engine Multi-vector Embeddings LLM Orchestration Runtime Vector Intelligence Store Consensus Semantic Layer · one version of meaning, with provenance ENTERPRISE DATA Snowflake Databricks Salesforce Workday SAP + long tail
Fig 5 - The Autonomous Semantic Layer sits between every consumer and every source: one Semantic API, a deterministic compiler with built-in guarantees, an autonomous engine that keeps meaning current, MCP-native and OSI-aligned.

The road to autonomous companies, and what leaders should do now

The honest read of the market: the autonomous company is not a 2026 reality. Gartner’s 2026 Hype Cycle for Agentic AI places the category at the Peak of Inflated Expectations. Only 17 percent of organizations have deployed AI agents to date, yet more than 60 percent expect to within two years, the most aggressive adoption curve among all emerging technologies Gartner measures. Gartner separately forecasts that 40 percent of enterprises will demote or decommission autonomous agents by 2027 because of governance gaps discovered only after production incidents.

That timing is the opportunity. The organizations that build the semantic substrate now will be the ones that can safely scale agents in 2027 and 2028, while their competitors are still in pilot purgatory.

Recommendations

FROM 90 DAYS TO ONGOING - A STAGED ROADMAP STAGE 1 90 DAYS Map context fragmentation count defs / metric STAGE 2 2 QUARTERS Stand up the semantic layer >60% via the layer STAGE 3 1 YEAR Wire agents to the Semantic API no raw agent SQL STAGE 4 ONGOING Govern by drift, not policy review drift = on-call
Fig 6 - Do not boil the ocean. Map the fragmentation, govern the top metrics, wire agents to the API, then run drift like an on-call rotation.

Stage 1, next 90 days. Map your context fragmentation honestly. List every system that holds a definition of revenue, customer, churn, pipeline, and inventory. Count how many places each is defined. If the answer is more than one, you are not ready for autonomous agents. The threshold that should change your strategy: if more than three of your top ten KPIs have inconsistent definitions across systems, stop scaling copilots and start fixing the substrate.

Stage 2, next two quarters. Stand up a deterministic semantic layer for your top 20 metrics and their join paths. Do not start with a hundred. Pick the 20 that drive board reporting, pricing, and operational decisions. Implement join-path proof and grain validation as compile-time checks. Expose them through MCP and align with OSI so you are not locked into a vendor. Track one metric: percentage of agent or BI queries that resolve through the governed layer versus ad hoc SQL. Target above 60 percent.

Stage 3, next year. Wire agents to the semantic layer, not the warehouse. Agents should never write raw SQL against production data. They should call a Semantic API. This is the pattern that separates the 5 percent of enterprises MIT identified as capturing real GenAI P&L from the 95 percent stuck in pilot. The success criterion is operational: agent-driven decisions that pass audit, reproduce across time, and survive a definition change without silently breaking.

Stage 4, ongoing. Govern by drift, not by policy review. Semantic drift detection should be a monitored production signal with the same on-call rigor as a database outage. If a definition changes upstream, every dependent agent should be notified, gated, or rerouted. This closes the loop Gartner warns about with one-size-fits-all governance.

Caveats

First, autonomous companies are not a current reality. The category sits at the peak of the hype cycle for a reason. Today’s frontier models still struggle with long-horizon reasoning, the regulatory environment is tightening fast, and even leading research reports that most organizations deploying AI communications agents had to roll them back. Build the substrate, but do not promise the board fully autonomous operations in 2026.

Second, an autonomous semantic layer is necessary but not sufficient. A decision enforcement layer, on top of context, is also required for high-stakes actions. Context tells the agent what is true. Enforcement tells it what is allowed. Both matter.

Third, the OSI and MCP standards are still young. Build to them, but expect the specs to evolve and demand portability from whatever vendor you choose. Fourth, semantic governance is real work. The single most reliable predictor of GenAI failure is the assumption that this work can be skipped. It cannot.

The bottom line

The narrative that enterprise AI is bottlenecked on better models is comfortable for vendors and useless for operators. The data, from McKinsey to MIT to RAND to Gartner, points the other way. Models are not the gating function. Context is.

Copilots were a useful first step. They taught the enterprise to trust an AI in the seat next to a human. They did not, and were never going to, run the company. The next era, AI-native operations, requires something the copilot era never built: a deterministic, governed, MCP-native, OSI-aligned semantic execution layer that agents can reason and act through, with join-path proof, grain validation, point-in-time reproducibility, semantic drift detection, and persona scope.

That is what an Autonomous Semantic Layer is. It is the missing infrastructure between raw enterprise data and the autonomous company. YC has named the category. Sequoia has named the business model. a16z has named the architectural gap. The Open Semantic Interchange has named the standard. Anthropic has named the wire. What is left is to build it, and to be honest about what it has to do.

The companies that get this right in the next twenty-four months will not just have better AI. They will have a different operating model. The ones that keep buying copilots and hoping the model gets smarter will find, in 2027, that their competitors are simply running.

Fix the context. Not the model.
· · ·

TL;DR

The short version

Copilots solved adoption, not operations. 88% of enterprises now use AI, but only 39% see EBIT impact (McKinsey, Nov 2025), 95% of GenAI pilots show no P&L lift (MIT NANDA), and over 80% of AI projects fail (RAND, 2024). The bottleneck is enterprise context, not model capability.

The autonomous company is the destination, and YC, Sequoia, and a16z have all named the missing piece. YC’s “Company Brain” RFS, Sequoia’s “Services: The New Software,” and a16z’s “Your Data Agents Need Context” converge on one diagnosis: agents need a deterministic, governed, cross-system semantic substrate to execute work safely.

An Autonomous Semantic Layer is the infrastructure. Deterministic semantic compiler, join-path proof, grain validation, point-in-time reproducibility, semantic drift detection, persona scope, MCP-native, OSI-aligned. Build it now, in stages, and beat the 40% agentic-AI cancellation rate Gartner forecasts for 2027.

Sources

  1. Y Combinator, Requests for Startups (Summer 2026), “Company Brain,” Tom Blomfield. ycombinator.com/rfs
  2. McKinsey, “The state of AI in 2025: Agents, innovation, and transformation,” Nov 2025. mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai
  3. MIT NANDA, “The GenAI Divide: State of AI in Business 2025” (via Fortune, Aug 2025). fortune.com/2025/08/18/mit-report-95-percent-generative-ai-pilots-at-companies-failing-cfo
  4. RAND Corporation, “The Root Causes of Failure for AI Projects,” RR-A2680-1, 2024. rand.org/pubs/research_reports/RRA2680-1.html
  5. Gartner, “Over 40% of Agentic AI Projects Will Be Canceled by End of 2027,” Jun 2025. gartner.com/en/newsroom/press-releases/2025-06-25-gartner-predicts-over-40-percent-of-agentic-ai-projects-will-be-canceled-by-end-of-2027
  6. Gartner, “40% of Enterprise Apps Will Feature Task-Specific AI Agents by 2026,” Aug 2025. gartner.com/en/newsroom/press-releases/2025-08-26-gartner-predicts-40-percent-of-enterprise-apps-will-feature-task-specific-ai-agents-by-2026
  7. Gartner, “Uniform Governance Across AI Agents Will Lead to Failure,” May 2026. gartner.com/en/newsroom/press-releases/2026-05-26-gartner-says-applying-uniform-governance-across-ai-agents-will-lead-to-enterprise-ai-agent-failure
  8. Sequoia Capital, Julien Bek, “Services: The New Software,” Mar 2026. sequoiacap.com/article/services-the-new-software
  9. Andreessen Horowitz, Cui & Li, “Your Data Agents Need Context,” Mar 2026. a16z.com/your-data-agents-need-context
  10. Anthropic, “Introducing the Model Context Protocol,” Nov 2024. anthropic.com/news/model-context-protocol
  11. Anthropic, “Donating MCP and establishing the Agentic AI Foundation,” Dec 2025. anthropic.com/news/donating-the-model-context-protocol-and-establishing-of-the-agentic-ai-foundation
  12. Snowflake et al., “Open Semantic Interchange Initiative,” Sep 2025. snowflake.com/en/news/press-releases/snowflake-salesforce-dbt-labs-and-more-revolutionize-data-readiness-for-ai-with-open-semantic-interchange-initiative
  13. Spider 2.0 benchmark, ICLR 2025 (Lei et al.). spider2-sql.github.io
  14. Stanford RegLab, “Assessing the Reliability of Leading AI Legal Research Tools,” JELS 2025. dho.stanford.edu/wp-content/uploads/Legal_RAG_Hallucinations.pdf
  15. Tellius, “Why Your Semantic Layer Isn’t Ready for AI Agents,” 2025. tellius.com/resources/blog/why-your-semantic-layer-isnt-ready-for-ai-agents-and-what-to-do-about-it

Ship AI you can trust enough to put in production.