What does the warehouse capture, and what does it lose?
Open any modern data warehouse and you will find tables of events: orders, price_changes, signups, support_tickets, refunds, retention_offers. Each row is a snapshot of a single moment - what happened, at what time, with what value. Aggregation queries against those tables are fast and accurate. Counts are correct. Sums are correct.
What is lost is the relationship between rows. Why this order, in this region, on this day? Was it the price change two weeks earlier? The retention email three days earlier? The payment retry yesterday? In the warehouse, every event sits next to every other event, and the cause column is NULL. The data is there. The logic is not.
Why is causality the part that matters now?
Because every interesting business question is a causal one. The dashboard tells you that churn went up. The CFO does not ask did churn go up - they ask what made churn go up. The product manager does not ask did NPS drop - they ask which release made NPS drop. The CRO does not ask was the campaign opened - they ask did the campaign cause the deal.
These are relationship-driven questions. The schema is not their natural unit; the chain is. Without explicit cause-and-effect modeling, every causal answer becomes a hand-built join over guessed time windows, owned by a single analyst, never reproducible the same way twice.
What does it look like to model events causally?
An event becomes a typed entity in the semantic graph - first-class, like a Customer or an Order. It has a name (price_change, churn_risk_flag), an owner (the team that emitted it), a payload (typed fields), and crucially, typed relationships to other events:
caused_by- this churn flag was emitted because of which upstream event?triggers- what does this event fire downstream when it occurs?part_of- which higher-level workflow or campaign does this event belong to?contradicts- which prior event does this one supersede or invalidate?
Once those edges exist as graph relationships, the planner can traverse them. "Show me churn flagged accounts where the upstream event was a price change in the last 60 days" stops being a 70-line SQL query - it becomes a typed graph traversal that compiles into one.
Where does the cause column come from in the first place?
Three sources, each contributing different fidelity:
- Producer-declared. The team that emits the event also emits the upstream that triggered it - the retention service emits
caused_by: churn_risk_flag(account=...). This is the strongest signal and the one most worth investing in. - Workflow-derived. When events come out of a known workflow (campaign, automation, scheduled job), the graph already knows the parent and inherits the relationships from the workflow definition.
- Inferred. When two sources do not declare causality, the autonomous maintenance agents look for high-confidence statistical patterns and propose edges that humans approve. Inferred edges are clearly typed as inferred, never silently elevated to declared.
The graph does not pretend to know things it does not know. The point is to let the things that are known be queried as causality, not as a series of timestamps that an analyst lines up by hand.
What changes for analysts, agents, and the audit team?
For the analyst, ad-hoc causal questions stop being multi-hour exercises. They write the chain - price_change -> demand_drop -> churn_risk -> retention_offer -> save - and the planner walks it.
For the AI agent, the chain becomes a first-class object the agent can reason over. Instead of guessing which of seventeen tables holds the upstream cause, it asks the graph for the typed caused_by edge. The answer is structured, not invented.
For the audit team, every causal claim carries provenance. Every edge has a source - producer-declared, workflow-derived, or inferred-and-approved - and a graph version. "This number went up because of that campaign" stops being an opinion and becomes a query against typed edges.
Why is this the unlock for AI on enterprise data?
Because LLMs are causal-language machines. Every interesting prompt is some variant of "why" or "what would happen if". Without explicit causal structure, the model has to guess at causality from the surrounding text, which is exactly where hallucination begins. With explicit causal edges in the semantic graph, the model is no longer asked to invent causality - it is asked to traverse it, and the planner refuses traversals that are not supported by typed edges.
Warehouses are state systems. Semantics are relationship systems. The hidden logic of the enterprise has been sitting between the rows the whole time. Modeling events as graph entities, with typed causal edges, is the first thing that makes that logic queryable instead of tribal.
