The Build vs. Buy Decision for Enterprise Semantic Layers: What Teams Get Wrong

A practical framework for calculating the real cost of building your own semantic layer, and the tipping points that should change your mind.

The Temptation to Build

Every data team, at some point, convinces itself it can build a semantic layer in-house. The logic sounds bulletproof. We already run dbt. We know our schema better than any vendor. We will define metrics in YAML, version-control them, and ship a governed abstraction layer in a quarter.

The pitch lands well in engineering reviews. It flatters technical capability. And for a narrow set of conditions, it is the right call.

But for most enterprises, especially those scaling AI agents across their data estate, building a semantic layer from scratch is one of the most expensive engineering decisions a team will make. Not because the initial build is hard. Because the maintenance never ends.

This post breaks down where the build-vs-buy line actually falls, what most teams miscalculate, and a framework for computing the true cost of each path. If you have already read our hidden cost of building your own data access layer post, treat this as the companion piece that puts a three-year TCO number on the iceberg.

How Teams Build Semantic Layers In-House

The most common approach today is metrics-as-code. Teams define metrics in version-controlled YAML or SQL so that changes go through Git, code review, CI/CD, and testing. The dominant toolchains include:

dbt Semantic Layer (MetricFlow): Define semantic models, entities, measures, dimensions, and metrics in YAML. MetricFlow generates platform-specific SQL at query time. It supports four metric types (simple, ratio, cumulative, derived), dynamic join handling, and GraphQL/JDBC APIs. MetricFlow reached general availability in October 2024. It runs locally for free on dbt Core, but exposing metrics dynamically to downstream tools requires a paid dbt Cloud plan.

LookML (Looker): A 12-plus-year-old, code-based semantic modeling language with Git version control. Proven and mature, but locked to the Looker ecosystem.

Cube: Open-source, API-first semantic layer with roughly 19,000 GitHub stars. Popular for embedded analytics.

Custom SQL views and internal ontology projects: Teams build abstraction layers using database views, Power BI DAX measures, Tableau calculated fields, or bespoke YAML-to-SQL compilers.

Platform-native options: Snowflake Semantic Views (with Cortex Analyst) and Databricks Metric Views (with Unity Catalog), both maturing through 2025 and 2026. For why neither replaces a cross-estate layer, see our note on why Snowflake and Databricks can’t be your enterprise semantic layer.

Despite these options, roughly 35% of teams still define metrics ad hoc in BI tools or custom code rather than in any centralized layer. The problem is not a shortage of tooling. The problem is that centralization is an organizational challenge dressed up as a technical one.

Where In-House Semantic Layers Break Down

Homegrown semantic layers rarely fail on technology. They fail on ownership, drift, and maintenance. Here are the patterns that repeat across organizations:

Metric Drift Across Tools

“Revenue” gets defined in a Tableau calculated field, a Power BI DAX measure, a dbt model, and a SQL view. Four definitions. None agree. dbt Labs has documented that these failures stem from design decisions made in the first two weeks, not from bad technology.

Forrester analyst Boris Evelson has reported that 61% of organizations use four or more BI platforms, and 25% use ten or more. A semantic layer that lives inside one tool (LookML in Looker, DAX in Power BI) cannot govern the others. The moment your second or third BI tool enters the picture, your single-tool semantic layer becomes a partial solution.

Semantic Leakage

Even when teams build a centralized layer, business logic escapes. It leaks upstream into dbt models and derived tables. It leaks downstream into dashboard formulas, spreadsheets, and analyst memory. Over time, the canonical definitions fork, and the layer loses authority. In an AI context, this is especially dangerous. It forces fallback to raw text-to-SQL, reintroducing the hallucination risk the layer was built to eliminate.

Scalability Cliffs

A layer that works for a small team can fail when rolled out organization-wide. Dynamic joins across many tables, poor caching strategies, and missing query optimization drive up warehouse compute costs. One analysis found a 20% drop in downstream compute after deploying a properly governed semantic layer, which suggests the inverse: ungoverned or poorly governed layers inflate compute.

Maintenance Bottlenecks

LookML “balloons into a tangled web of dependencies” as the data estate grows. Small changes have ripple effects. The central modeling team becomes a bottleneck, which produces shadow BI as business teams route around the governed layer. With rapidly changing schemas, data engineers can spend 40% of their time fixing field-not-found errors, and what started as a side project becomes a full-time job. This is the failure mode we explore in detail in knowledge drift and semantic decay.

AtScale, whose team spent years building their first universal semantic layer, puts it bluntly: the margin for error is “literally zero percent,” and a universal semantic layer “is never really finished, because it will always need to support new technologies and integrations.”

Wrong Layer of the Stack

Preset (the commercial company behind Apache Superset) argues that earlier semantic layers failed because they “were owned by the wrong layer of the stack, locked inside BI tools, tightly coupled to vendors, and moving at the wrong velocity.” A semantic layer that lives inside your BI tool inherits that tool’s update cadence, its access model, and its limitations. That was tolerable when dashboards were the only consumer. It is not tolerable when AI agents, copilots, and automated workflows all need consistent definitions.

WHERE HOMEGROWN LAYERS BREAK Metric drift across tools (4 BI tools = 4 definitions of revenue) Semantic leakage upstream into dbt and downstream into dashboards Scalability cliffs: 20% compute premium on ungoverned layers Maintenance bottlenecks: 40% of engineering time on field-not-found Wrong layer of the stack: BI-tool-bound, vendor-coupled, wrong velocity
Fig 1 - Homegrown semantic layers rarely fail on technology. They fail on five organizational and architectural patterns that compound every quarter.

When Building In-House Actually Makes Sense

Build is not always wrong. For a specific, narrow set of conditions, it is the right call:

  • You run a single warehouse (Snowflake, BigQuery, or Redshift), and you intend to stay there.
  • Your metric definitions are stable. You are not adding new KPIs monthly.
  • The only consumers are internal BI users, fewer than 20 people.
  • No AI agents, copilots, or automated workflows will query your semantic layer.
  • You have spare engineering bandwidth and are willing to staff ongoing maintenance.
  • You are early-stage with nascent data processes and a small, simple data estate.

If all six conditions hold, building with dbt/MetricFlow is a defensible choice. You get version-controlled metrics alongside your transformations, a familiar developer workflow, and no vendor dependency.

But here is the part teams underestimate: these conditions erode. Companies add a second warehouse. They adopt a fourth BI tool. They deploy an AI copilot. Each of those events invalidates the build decision. And by the time it becomes obvious, the team has already sunk 12 to 18 months of engineering effort into a layer that cannot support the new requirements.

When Buying Makes Sense (And Why Most Companies Cross This Line Faster Than Expected)

The buy decision is driven by complexity thresholds. Any one of these should trigger a serious evaluation of a dedicated semantic layer:

  • Multiple warehouses or multi-cloud (the only vendor-neutral path; cross-cloud latency kills single-tool live queries).
  • Four or more BI tools (61% of organizations, per Forrester).
  • AI agent or copilot integration. Any autonomous system querying your data.
  • Cross-team consistency mandates. Regulatory or audit requirements for metric lineage.
  • Time-to-value pressure. You need core metrics live in weeks, not quarters.

CIO magazine put it directly: “If enterprise data is fragmented, unpredictable, or poorly governed, internally built agents will struggle. Buying a platform that supplies the semantic backbone may be the only viable path.”

The critical insight is that most companies cross the buy threshold faster than they expect. The typical data estate is not shrinking. It is getting more distributed, more heterogeneous, and more agent-connected every quarter.

WHEN BUILD STAYS RIGHT · WHEN BUY KICKS IN BUILD · ALL SIX MUST HOLD ● Single warehouse, staying there ● Stable metric definitions ● < 20 internal BI consumers ● No AI agents or copilots ● Spare engineering bandwidth ● Simple, small data estate Any one erodes → build is invalidated BUY · ANY ONE TRIGGERS ● Multiple warehouses or multi-cloud ● 4+ BI tools (61% of orgs · Forrester) ● AI agent or copilot integration ● Audit / regulatory lineage mandates ● Weeks-not-quarters time to value Most enterprises cross this line in 2026
Fig 2 - Build needs six conditions to hold simultaneously. Buy only needs one of five triggers to fire. The math is asymmetric, and so is how fast it changes.

The AI Agent Dimension: Why This Changes Everything

The biggest shift in the build-vs-buy calculus is not a new BI tool or a second warehouse. It is the arrival of AI agents as first-class data consumers.

When a human analyst sees a suspicious number, they pause. They check the definition. They ask a colleague. LLMs do none of that. They generate SQL from scratch, execute it with confidence, and present the result as fact. Without a governed semantic layer, they produce confident, wrong answers at scale.

The benchmarks tell the story. AtScale ran a TPC-DS benchmark and found 92.5% text-to-SQL accuracy with a semantic layer versus 20% for a system given only schemas and primary/foreign keys. dbt Labs reported similar patterns: with modern models, semantic-layer accuracy approaches or hits 100% for covered queries because SQL generation becomes deterministic. Raw text-to-SQL, even with the best models, consistently lags behind. Gartner projects that organizations prioritizing semantics will increase GenAI accuracy by up to 80% and cut associated costs by up to 60%.

TEXT-TO-SQL ACCURACY · TPC-DS BENCHMARK Raw schema + primary/foreign keys 20% Routed through a governed semantic layer 92.5% Source: AtScale TPC-DS benchmark, August 2024 · dbt Labs reports similar 84.1% → 100% pattern on covered queries
Fig 3 - The accuracy gap is not a model problem. It is a context problem. The same model, given governed semantics, becomes deterministic on covered queries.

Gartner goes further. In a June 2025 prediction, they forecast that by 2028, 60% of agentic analytics projects relying solely on MCP (Model Context Protocol) will fail for lack of a consistent semantic layer underneath. MCP gives agents a way to call tools. But without a shared semantic foundation, every agent rebuilds context from scratch, and their answers diverge.

A note on vendor benchmarks: the 92.5% vs. 20% figure comes from AtScale’s own test, not independent peer-reviewed research. Treat it as directional. The pattern, that governed semantics dramatically improve LLM accuracy over raw schema access, is consistent across multiple vendors and independent analyses. But the exact magnitude will vary by dataset, model, and query complexity.

This is the core architectural insight. A semantic layer is no longer just a BI convenience. It is AI infrastructure. And building AI infrastructure from scratch is a fundamentally different commitment than building a metrics store for dashboards.

The cost of building is not the build. It is the maintenance, the drift, the reconciliation tax, and the opportunity cost.

A Framework for Calculating the True Cost

Most teams undercount the cost of building because they model the launch, not the lifecycle. Here is a framework for computing the three-year total cost of ownership (TCO) of a homegrown semantic layer versus a bought one.

The Cost Categories

Seven cost categories · three-year TCO comparison
Cost category Build (in-house) Buy (vendor)
Initial build / setup 2-4 FTEs × 6-12 months. At $180K loaded comp per FTE, that is $360K-$720K before the first metric goes live. License fee + onboarding. Typically $18K-$180K/year depending on scale. Onboarding in weeks, not months.
Ongoing maintenance 1-3 FTEs permanent. Schema changes, new metrics, new tool integrations, access control updates. At $180K/FTE = $180K-$540K/year. Included in license. Vendor handles infrastructure, upgrades, new integrations.
Warehouse compute Ungoverned layers inflate compute via redundant queries, bad joins, and missing caching. One study found a 20% compute premium vs. governed layers. Governed query generation reduces compute. Semantic caching and query optimization are vendor features.
Reconciliation tax Time spent arguing about whose number is right. Strategy models this at $450K/year for a 10-person team (30% of time at $150K avg comp). One canonical definition per metric. Near-zero reconciliation overhead.
Technical debt Chainguard’s 2026 survey found engineers spend 84% of their time on maintenance and toil. Homegrown layers compound this. The debt “interest rate” can be computed as maintenance hours times percentage attributable to debt. Vendor absorbs tech debt. Upgrades and new integrations are their problem, not yours.
Integration cost per new tool or agent Each new BI tool, notebook, or AI agent requires custom integration. This is the cost that scales linearly and makes homegrown layers unsustainable. Vendor provides pre-built connectors and API/MCP endpoints. Marginal cost per new consumer is near zero.
Fig 4 - Seven categories, three-year horizon. Most teams model only the first row. The compounding starts in rows two through seven.

The Decision Heuristic

One question eliminates roughly 70% of the decision complexity: Where does your data live today, and where will it live in three years?

  • One warehouse, staying that way: native or build.
  • Multiple warehouses or uncertain trajectory: buy a portable/universal layer.
  • AI agents in the picture at all: buy, and make sure the layer supports governed API/MCP access.

The Salary Math

For calibration, here is what the relevant roles cost in the US market in 2025-2026, aggregated across Glassdoor, Built In, and Salary.com:

US loaded comp · 2025-2026 aggregated
Role Base salary (US) Total comp (benefits + equity)
Senior Data Engineer$140K - $220K$170K - $250K+
Data Architect$145K - $190K$170K - $230K
Analytics Engineer$118K - $180K$140K - $210K

When you model a homegrown semantic layer requiring two FTEs for initial build (six months) and 1.5 FTEs for ongoing maintenance, the three-year engineering cost alone is $810K to $1.35M. That does not include compute overhead, reconciliation tax, opportunity cost, or technical debt accumulation.

What Most Teams Miss: The Autonomous Maintenance Question

The deepest miscalculation in the build decision is not about initial build effort. It is about what happens after launch.

Enterprise schemas change. Tables get added. Column names shift. Business definitions evolve. A semantic layer that was accurate on launch day drifts within weeks unless someone is actively maintaining it.

In a homegrown layer, that “someone” is your engineering team. They watch for schema changes, update metric definitions, fix broken joins, and re-test. This is the work that consumes the 1.5 FTEs in the TCO model above. And it never stops.

The alternative is an autonomous semantic layer, one that continuously crawls enterprise sources (databases, data catalogs, documentation, usage patterns) and updates its semantic graph without manual intervention. This is a fundamentally different architecture from a static YAML-based metrics store. It treats the semantic layer as a living system rather than a configuration file.

This is the approach Colrows takes. Rather than requiring engineers to hand-maintain every metric definition, Colrows auto-crawls your data estate, builds a context-rich semantic graph (covering not just metrics but entities, relationships, events, and business rules), detects semantic drift using statistical fingerprinting, and exposes governed definitions via API so that BI tools, notebooks, copilots, and AI agents all resolve meaning from the same source. The maintenance burden shifts from your team to the platform.

The distinction matters most in the AI context. When an AI agent queries your data, it needs more than a list of metrics. It needs entity relationships, business rules, persona-level access controls, and point-in-time reproducibility. Building that from YAML is theoretically possible. Maintaining it autonomously across a growing, changing data estate is where homegrown layers break.

What the Market Is Telling Us

Gartner elevated the universal semantic layer to essential infrastructure in its 2025 Hype Cycle for BI and Analytics. In its “Top Predictions for Data and Analytics in 2026” (published March 2026), Gartner stated: “By 2030, universal semantic layers will be treated as critical infrastructure, alongside data platforms and cybersecurity.” According to their data, 44% of data and analytics leaders have already implemented semantic layers, with an additional 48% planning to do so by 2027.

The semantic layer market itself is growing at roughly 23% CAGR, with estimates ranging from $2.7B to $4.9B by 2030 depending on how narrowly you define the space (MarketsandMarkets and Mordor Intelligence, respectively).

On the standards front, the Open Semantic Interchange (OSI) specification, a vendor-neutral interoperability standard finalized in January 2026, has partners including Snowflake, Salesforce, dbt Labs, Atlan, and ThoughtSpot. This matters because it means the “vendor lock-in” objection to buying is weakening. If your semantic layer supports OSI, your definitions are portable.

Meanwhile, Gartner predicts AI agents will replace 30% of SaaS UIs by 2030. Every one of those agents will need governed data access. The semantic layer is the infrastructure that makes that access trustworthy. For the deeper architectural argument, see from copilots to autonomous companies.

A Practical Decision Framework

Step 1: Diagnose (Week 1)

Answer three questions. How many warehouses and BI tools do you run today, and expect to run in three years? Will AI agents or copilots query your data? How stable are your top metric definitions? If you are single-warehouse, two or fewer BI tools, no agents, stable metrics, and have spare engineering bandwidth: build with dbt/MetricFlow. If you cross any of multi-warehouse, four-plus BI tools, or AI-agent integration: plan to buy.

Step 2: Start Narrow (Weeks 2 to 8)

Regardless of build or buy, apply the “5-metric rule.” Standardize the five metrics your executives argue about most (revenue, active users, churn, CAC, whatever matters in your business) in one canonical, version-controlled definition. Validate parity across at least two consumer surfaces. Twenty trusted definitions beat 500 unvalidated ones.

Step 3: Model the Three-Year TCO

Use the framework above. Do not model just the launch cost. Model maintenance FTEs (at $180K loaded each), warehouse compute delta, integration cost per new tool/agent, opportunity cost of delayed product roadmap, and the reconciliation tax. If build TCO materially exceeds buy, the decision is clear.

Step 4: Cross the Build-to-Buy Threshold Consciously

Three events should trigger a re-evaluation: adopting a second warehouse, adding a fourth BI tool, or deploying your first production AI agent. Do not wait until the homegrown layer is visibly failing. By then, you have already accumulated technical debt that takes months to unwind.

Step 5: Govern for Agents Now

Before deploying any AI agent that touches your data, require it to source metric definitions from a semantic layer via API, not raw SQL. Track agent accuracy rates. If you are evaluating platforms, look specifically for autonomous maintenance (so the layer stays current without manual effort), MCP-native exposure (so agents can query governed definitions natively), persona-level access controls (so a customer-facing agent cannot expose internal data), and point-in-time reproducibility (so you can re-execute a historical query and prove it used the correct definitions at that moment). For a deeper twelve-point evaluation, work through the Semantic Layer Buyer’s Guide for 2026.

THE FIVE-STEP DECISION FRAMEWORK 1 Diagnose: warehouses, BI tools, agents, metric stability 2 Start narrow: the five-metric rule across two consumer surfaces 3 Model the three-year TCO across all seven cost categories 4 Cross the threshold consciously: 2nd warehouse, 4th BI tool, 1st agent 5 Govern for agents now: autonomy, MCP, personas, point-in-time
Fig 5 - The framework in five steps. Steps one to three are diagnostics. Steps four and five are the commitments that decide whether your AI ever ships to production.

The Bottom Line

The build-vs-buy question for semantic layers is not really about technology. dbt, MetricFlow, LookML, Cube, and custom SQL views are all capable tools. The question is about organizational sustainability: can your team maintain a governed, consistent, cross-tool, agent-ready semantic layer indefinitely, while also shipping the product and revenue work that pays the bills?

For a small team with a stable, single-warehouse data estate and no AI ambitions, building is fine. For everyone else, and that is most enterprises in 2026, the math points to buying a dedicated semantic layer and investing engineering effort where it creates differentiated value.

The cost of building is not the build. It is the maintenance, the drift, the reconciliation tax, and the opportunity cost of engineers who could be doing something else. The cost of buying is a line item in the budget. The cost of building is hidden across your entire engineering organization, and it compounds every quarter.

The Colrows thesis

Colrows builds the Autonomous Semantic Layer for enterprise AI.

Fix the Context. Not the Model.

Start the conversation: engage@colrows.com · colrows.com

· · ·

Sources and References

  1. Gartner, “Top Predictions for Data and Analytics, 2026,” March 11, 2026.
  2. Forrester, Boris Evelson, BI platform usage surveys (informal multi-year series; corroborated by Forrester client interactions).
  3. dbt Labs, “Seven Mistakes to Avoid with the dbt Semantic Layer,” dbt Developer Blog.
  4. AtScale, Jeff Curran, “Text-to-SQL Accuracy Benchmark (TPC-DS),” August 2024.
  5. dbt Labs, MetricFlow documentation and 2026 re-benchmark results.
  6. Chainguard, “2026 Engineering Reality Report” (1,200 respondents, August 2025).
  7. Strategy (formerly MicroStrategy), reconciliation tax analysis and UserEvidence ROI study.
  8. MarketsandMarkets, semantic web market sizing, 2025-2030.
  9. Mordor Intelligence, semantic layer and knowledge graph for agentic AI market sizing, 2025-2030.
  10. Holistics, “Build vs. Buy in Embedded Analytics: A 3-Year TCO Breakdown,” 2025.
  11. GigaOm, “2025 Semantic Layer Radar” report.
  12. Open Semantic Interchange (OSI) specification, finalized January 2026.
  13. Glassdoor, Built In, Salary.com, and Indeed: salary data for data engineering roles, 2025-2026.
  14. Gartner, 2025 Hype Cycle for BI and Analytics.
  15. CIO Magazine, build-vs-buy synthesis for enterprise AI infrastructure.

Stop hand-maintaining context. Start compiling it.