Core concepts

Building autonomous AI requires a new vocabulary. You cannot describe a deterministic, agent-ready data pipeline using legacy BI terminology. These are the fundamental concepts that power the Colrows autonomous semantic compiler.

Concept Area Core Primitive Why It Matters for AI
Logic Metric Resolution Eliminates manual metric drift
Security Compile-time Authorization Prevents unauthorized inference
Accuracy Deterministic SQL Removes model hallucination risk
Resilience Signal-based Monitoring Provides self-healing schema support

Compile-then-execute

Colrows treats every request - whether it arrives as natural language, SQL, or an agent tool call - as a program to be compiled, not as a string to be templated. The compiler runs four deterministic stages:

Four stages of the Colrows pipeline.
A request becomes a proof, then a plan, then dialect-perfect SQL.

Each stage is isolated. Parsing produces an AST, never SQL. The semantic control plane resolves that AST against the graph and emits a logical plan only after every binding, every constraint, and every policy has been validated. The SQL engine then performs cost-based physical planning. The dialect layer is the last thing to touch the request.

The benefit of this strict staging is straightforward: correctness is proven before execution. A request that resolves ambiguously fails at compile time, with an explainable error - not at the warehouse, after a 4-minute scan.

The semantic graph

The Consensus semantic layer is implemented as a typed, versioned, directed graph. Every meaningful business primitive - entities, metrics, events, concepts, definitions, examples, dimensions, datasets, columns, constraints, policies, personas, scopes - is a first-class node. Edges encode semantic relationships such as defined_by, derived_from, triggers, constrained_by, and governed_by.

A graph that connects customer, order, refund, net revenue, time-window constraint, region policy, and analyst persona around a central Consensus semantic graph.
A simplified slice of a real semantic graph. Concepts, metrics, events, and policies all share the same structural substrate.

Three properties of the graph are load-bearing:

  • Typed - every node has a known kind, and reasoning is structural, not string-based.
  • Versioned - changes never overwrite prior definitions. Each change creates a new semantic state, which makes point-in-time reproducibility a free property of the system.
  • Multi-scope - the same graph holds global definitions, datastore-specific ones, persona overrides, and user-personalized context, with explicit precedence rules.

Metrics as state, not queries

In most platforms a metric is a SQL fragment: a SUM(...) FROM ... WHERE ... stored under a friendly name. Colrows treats a metric as derived semantic state - a continuously interpretable representation of business reality that any agent or query can reason over.

A metric like Net Revenue doesn't merely encode how to compute a number. It encodes:

  • Business meaning - what Net Revenue is, and how it differs from Gross Revenue or Bookings.
  • Valid grain - the level at which the metric is well-defined (per order, per invoice, per customer per day).
  • Dependencies - which entities, events, and other metrics contribute to its value.
  • Constraints - rules that govern how the metric can be filtered, grouped, or compared.
  • Downstream impact - which dashboards, agents, or signals rely on it.

The practical effect: when an agent observes that Net Revenue dropped, it can reason semantically - distinguishing volume-driven decline from refund-driven erosion - because those relationships are explicit in the metric's state, not buried in a CTE.

Join path proof

When a metric references entities across multiple datasets, Colrows must prove - not guess - that a deterministic join path exists. Joins are solved as a constrained graph traversal over the semantic graph, with three kinds of pruning:

  1. Paths that violate declared grain are discarded.
  2. Paths that introduce cardinality expansion beyond allowed thresholds are pruned.
  3. Cycles are eliminated using visited-state tracking with relationship-type awareness.

If multiple valid paths exist, a deterministic ranking heuristic prioritizes minimal hop count, declared canonical relationships, and explicit anchor definitions. Ambiguity causes compilation to fail. No silent guessing - ever.

Why this matters.

The single largest source of bad numbers in enterprise BI is the silent join - the warehouse cheerfully runs a query against a relationship the analyst didn't intend. Compile-time proof is the only way to make that class of error unreachable.

Multi-vector embeddings

Colrows does not represent a concept with a single embedding. Every concept carries up to three vectors:

  • Definition vector - derived from the canonical, governed definition.
  • Usage vector - derived from how the concept is used in real queries, alerts, and dashboards over time.
  • Combined vector - a weighted blend that improves recall when natural language drifts (e.g., "lapse" vs. "churn") while still grounding to canonical meaning.

Vectors are used for candidate identification; structural reasoning makes the final call. Embeddings are never the source of truth.

Compile-time governance

Most data platforms enforce policy after a query has been generated - by masking columns, filtering rows, or denying results. Colrows enforces policy at compile time by shaping the allowed subgraph for each persona before any plan is produced.

Diagram showing post-hoc masking versus Colrows' compile-time approach where policy shapes the semantic subgraph before planning.
Policy shapes the subgraph. Unauthorized plans are never generated.

If a metric depends on a node outside the persona's allowed scope, resolution fails - not at the warehouse, but during compilation. There is no way to "smuggle" a column past the planner. Audit becomes a side effect of normal execution: every node visited, every edge traversed, every constraint applied is captured in a structured trace that survives forever.

Autonomous maintenance

The semantic graph is maintained by a coordinated set of background agents:

  • Discovery agents ingest schemas, metadata, and documentation, identifying candidate entities, events, metrics, and relationships.
  • Architecture agents validate grain, dependencies, and constraints - refusing to publish definitions that violate business logic.
  • Learning agents observe how humans and AI systems use the graph in practice and refine definitions, examples, and synonyms accordingly.
  • Monitoring agents detect semantic drift using statistical fingerprinting of column distributions, structural diffing of dataset nodes, and hybrid vector/structural equivalence analysis.

Point-in-time reproducibility

Because the graph is versioned and execution traces capture the exact semantic state used at compile time, any historical query can be re-executed with the definitions, policies, and join paths that were active at that moment. This is non-negotiable for regulators, but it's also useful for engineers debugging a number that "moved" between Monday and Wednesday.

Vocabulary cheat sheet

TermWhat it means in Colrows
ConceptA typed business primitive - entity, metric, event, definition.
AnchorA binding from a concept to a physical column or expression.
ScopeThe slice of the graph a request is allowed to traverse - global / datastore / persona / user.
PersonaA first-class graph node representing a role with its own scope and policy set.
ConstraintA formal predicate attached to a node - grain, time window, cardinality, RBAC, ABAC.
PlanThe dialect-agnostic logical tree produced after semantic resolution.
TraceThe structured audit record of a single compile-then-execute run.
Terminology defines strategy.

If you call it a "dashboarding tool," you will build a dashboard. If you call it a "semantic compiler," you will build an autonomous AI backbone. Fix the Context, Not the Model.

See these concepts in practice

The primitives above aren't abstract - they power real autonomous systems. Explore: