Auditable SQL for Regulated Industries: Conversational Analytics in BFSI

In a bank, an analytics answer is not information - it is a regulatory artifact. The number that goes into a provisioning decision, a SARFAESI enforcement, or a board pack must survive a question no demo ever rehearses: show me exactly how this was produced, and produce it again. Conversational analytics will live or die in BFSI on that question. This pillar lays out what the regulators actually require - RBI's FREE-AI framework, the Fed's new SR 26-2, the EU AI Act, BCBS 239 - what unauditable data governance has already cost banks, why LLM-generated SQL fails the reproducibility test by physics, and the architecture that clears the bar.

The regulator has already spoken - and surveyed the gap

India's clearest statement is the RBI's FREE-AI committee report - the Framework for Responsible and Ethical Enablement of Artificial Intelligence, released 13 August 2025 by the committee chaired by Dr. Pushpak Bhattacharyya. Its seven guiding principles include two that define this article's subject: AI systems should be understandable by design to the entities deploying them, and entities remain accountable for the decisions of AI systems regardless of the level of autonomy. Its 26 recommendations include a board-approved AI policy for every regulated entity, AI inventories, sector-wide incident reporting, disclosures in annual reports - and, centrally, a risk-based AI audit framework with internal and third-party audits of AI systems.

The report's own survey explains the urgency. As summarized by analysts of the report: roughly a fifth of surveyed entities deploy AI today and two-thirds are exploring at least one GenAI use case - but only 18% of AI-using entities maintained audit logs, only about a third had board-level AI oversight, and adoption in customer-facing finance remains limited over concerns about "data sensitivity, explainability, and bias." Most striking for one segment: asset reconstruction companies reported zero AI adoption - the most reporting-burdened class of regulated entity is the least AI-enabled. And among the model risks the committee's report catalogues, summaries highlight one in precisely the vocabulary of this site: non-deterministic outputs lead to legal and reputational risks.

None of this arrived from nowhere. The RBI's Master Direction on IT Governance, Risk, Controls and Assurance Practices (November 2023, effective April 2024) already requires that systems touching critical or sensitive information keep audit trails "detailed enough to facilitate the conduct of an audit, serve as forensic evidence when required and assist in dispute resolution, including for non-repudiation purposes" - with regular monitoring of those trails. FREE-AI proposes extending exactly this doctrine to AI systems. The trajectory is unambiguous: in Indian financial services, auditability is the admission ticket for AI, not a nice-to-have.

Not just India: the converging global bar

  • United States. In April 2026 the Fed, OCC, and FDIC issued SR 26-2, superseding the fifteen-year-old SR 11-7 on model risk management. It retains the core machinery - independent validation, "effective challenge" by objective parties - and contains the most telling sentence in this section: it explicitly excludes generative and agentic AI from its scope as "novel and rapidly evolving." US banks now have a refreshed framework for everything except the systems this article is about - a governance vacuum that pushes the burden of proof onto the architecture you deploy.
  • European Union. The AI Act's Annex III classifies as high-risk "AI systems intended to be used to evaluate the creditworthiness of natural persons or establish their credit score" - bringing technical-documentation duties (Article 11), automatically generated logs (Article 19), and human oversight (Article 14). The compliance deadline for Annex III systems has been provisionally pushed (the May 2026 Digital Omnibus agreement points to December 2027, pending formal adoption) - but note what moved: the date, not the classification. Credit AI in Europe will be high-risk AI; only the countdown changed.
  • United Kingdom. The PRA's SS1/23 model-risk principles (effective May 2024, updated April 2026) expressly cover "the risks associated with the use of artificial intelligence in modelling techniques such as machine learning," with independent model validation among their five pillars.

Three jurisdictions, one assumption: that you can document, validate, and reproduce what your models do. Keep that assumption in mind for the physics section below.

What unauditable data governance already costs

This is not theoretical exposure. The Basel Committee's risk-data-aggregation principles (BCBS 239) turned thirteen this year, and the Committee's November 2023 progress report says, verbatim: "Nearly a decade after the initial publication of the Principles and seven years after the expected date of compliance, banks are at different stages in terms of alignment. Additional work is required at all banks to attain or sustain full compliance." The ECB's February 2025 supervisory newsletter is blunter - "long-standing deficiencies," targeted reviews that "revealed significant gaps in meeting supervisory expectations," banks "too reliant on weakly controlled manual workarounds" - and names the escalation tools, up to "periodic penalty payments as potential enforcement measures."

The enforcement record has names and numbers. In July 2024, Citigroup paid a combined ~$135.6 million (a $60,625,620 Federal Reserve civil money penalty plus $75,000,000 from the OCC) for failing to remediate "ongoing deficiencies in data quality management" under its 2020 consent orders. In India, the RBI in April 2024 barred Kotak Mahindra Bank from onboarding customers through digital channels and issuing new credit cards, citing "serious deficiencies and non-compliances" across IT inventory, patch and change management, user access management, and data-leak prevention - restrictions that stayed in place for months. The lesson generalizes uncomfortably well to AI analytics: regulators have demonstrated they will charge real money, and real growth, for data plumbing that cannot be evidenced.

Why GenAI analytics stalls in banking - in the banks' own words

The adoption surveys agree on where the blocker sits. McKinsey's survey of senior credit-risk executives (24 institutions, including nine of the top ten US banks) found "the most significant barriers, highlighted by 75 percent of our respondents, concern risk and governance," with data quality the top concern (79%) followed by "model risk issues (mentioned by 58 percent), such as transparency, auditability, fairness, and explainability." EY-Parthenon's 2025 survey of 100 banks found adoption surging (77% had launched or soft-launched GenAI applications) and the regret instructive: 79% said they would prioritize improving governance if they could restart their GenAI implementation - more than any other factor - while 71% of larger banks flagged regulatory compliance as a concern for agentic AI specifically.

Read those together with RBI's 18%-audit-logs finding and the diagnosis writes itself: the industry did not stall on ambition or model quality. It stalled on the gap between what generative systems do and what regulated institutions must be able to prove.

The physics: why generated SQL resists audit

It is tempting to treat auditability as a logging problem - keep the queries, done. Two facts say otherwise.

First, reproducibility fails at the hardware level. Peer-reviewed work published this January (arXiv:2601.06118) opens with the finding: "The execution of Large Language Models (LLMs) has been shown to produce nondeterministic results when run on Graphics Processing Units (GPUs), even when they are configured to produce deterministic results." Temperature zero is not a reproducibility guarantee. An audit's core operation - re-run yesterday's question, obtain yesterday's answer - is not reliably available from regeneration, full stop.

Second, a logged query without a versioned definition is evidence of what ran, not of why it was right. The NIST AI Risk Management Framework states the chain plainly: "Transparency reflects the extent to which information about an AI system and its outputs is available" and "accountability presupposes transparency." A generated query's "why" lives in a prompt and a model snapshot - neither reviewable the way a metric definition is. Even vendors benchmarking their own semantic-layer category concede the asymmetry in audit terms; dbt Labs' 2026 benchmark put it in exactly the right register: "With text-to-SQL, failure looks like a plausible but incorrect answer. With the Semantic Layer, failure looks like an error message. For anything going to a board deck, an auditor, or a company KPI dashboard, that difference is everything." The full benchmark evidence is in Deterministic vs Probabilistic Text-to-SQL.

What "auditable SQL" actually requires

Pulling the regulatory threads together, the bar for conversational analytics in BFSI has five planks - each mapped to the requirement it satisfies:

RequirementRegulatory anchorWhat the architecture must provide
Audit trail per answerRBI MD IT Governance; EU AI Act Art. 19 (logs)The exact SQL executed, automatically retained - forensic-grade, non-repudiable
Reviewable definitionsFREE-AI "understandable by design"; NIST transparencyVersioned semantic definitions behind every answer - what "NPA coverage" meant, when
ReproducibilitySR 26-2 / SS1/23 validation; audit practice itselfPoint-in-time re-execution yielding the identical answer
Enforced accessRBI MD user-access controls; FREE-AI accountabilityPer-user policy applied provably, before data is touched
Change vigilanceModel monitoring expectations across frameworksDetection when definitions and reality drift apart

Notice what the table implies: the planks are properties of a compiler, not of a model. Deterministic compilation through versioned definitions gives you the trail, the definitions, and the reproducibility for free - they are how compilation works - where generation must approximate each one with bolted-on machinery the regulator then asks you to validate too.

The architecture that clears the bar

This is the workload Colrows was built for. Every question - from a credit officer, a board pack, or an AI agent - runs the same pipeline: intent → context resolution → constrained planning → governed execution. Meaning resolves against the semantic graph - versioned, typed, multi-scope - so "provisioning coverage" is the governed definition, not a model's best guess; join paths are proven before SQL emits; RBAC, ABAC, and row/column-level predicates inject at compile time, so an unauthorized question fails compilation and never touches data; and every answer ships with dialect-perfect SQL, full lineage, and a point-in-time reproducible audit trail. Autonomous maintenance with drift detection covers the fifth plank - when schemas or definitions drift, the graph flags it instead of silently misdescribing reality. Compile-time governance. Not after-the-fact.

The ARC case makes it concrete. Asset reconstruction companies sit at the sharp end of Indian financial regulation: the SARFAESI Act, 2002 lets secured creditors enforce security interests without court intervention - possession of secured assets after a 60-day demand notice, takeover of borrower businesses - with aggrieved borrowers appealing to Debt Recovery Tribunals within 45 days. RBI's Master Direction for ARCs (April 2024) layers on the reporting discipline: supervisory returns, audited financials to RBI within a month of the AGM, every management takeover reported, possessed-asset disclosures updated monthly, quarterly wilful-defaulter reporting to credit bureaus. Every portfolio-evaluation answer an ARC produces lives inside that lattice - which is why the segment's zero AI adoption (per FREE-AI's survey) is less conservatism than a correct read of what unauditable AI would cost them.

It is also why the segment has the most to gain from the compiled approach. In our Confidential ARC deployment, distressed-asset portfolio evaluation - questions spanning loans, securities, recovery proceedings, and regulatory thresholds - moved from analyst-weeks to compiled, governed answers: evaluation cycle time down more than 95%, with 100% regulatory coverage across RBI SARFAESI and DRT requirements. The same pattern extends across BFSI: NBFCs under the IT Governance Direction, insurers with solvency reporting, AMCs with mandate compliance - anywhere the answer must carry its own evidence.

The regulators' frameworks differ in vocabulary and timeline, but they ask one question in unison: can you prove it? Generation answers with confidence. Compilation answers with a trail. Prove the query. Then run it.

Frequently asked questions

Can banks use conversational analytics under RBI's framework?

Yes - FREE-AI's posture is enablement with accountability: board AI policies, inventories, incident reporting, risk-based AI audits, disclosures. The bar is architectural: entities stay accountable regardless of AI autonomy, and systems should be understandable by design. Deterministic, logged, reproducible query pipelines can clear it; unauditable generation struggles.

Why is LLM-generated SQL hard to audit?

Architecturally, there is no versioned definition behind a generated query - only a prompt and a model snapshot. Physically, GPU execution is nondeterministic "even when configured to produce deterministic results" (arXiv:2601.06118), so re-running yesterday's question does not reliably reproduce yesterday's answer.

What is auditable SQL?

An answer carrying the executed SQL, the versioned definitions it compiled from, the proven join path, the policies applied and for whom, and point-in-time reproducibility - the difference between an answer and a defensible one.

What do global regulators require?

SR 26-2 (US, April 2026) keeps independent validation and effective challenge while excluding GenAI from scope - a vacuum, not a pass; the EU AI Act makes credit AI high-risk with documentation and automatic logging duties (timeline provisionally moved to December 2027; classification unchanged); UK SS1/23 covers AI/ML expressly. All assume documentation, validation, and reproducibility.

How do ARCs use governed conversational analytics?

Under SARFAESI enforcement powers, DRT appeal timelines, and the 2024 Master Direction's reporting duties, ARC portfolio evaluation is a cross-entity, regulator-facing workload. Our Confidential ARC deployment cut evaluation cycle time by more than 95% with 100% regulatory coverage across RBI SARFAESI and DRT requirements.

A note on the claims

Regulatory quotes were verified against the Federal Reserve, Bank of England, BIS, ECB, and EU AI Act texts linked inline as of 12 June 2026. RBI documents are paywalled to automated access; FREE-AI survey figures (the 18% audit-log statistic, ARC adoption) and the Master Directions' wording are reported via the cited professional summaries (KPMG, Khaitan & Co, Dvara Research) and labelled accordingly - verify against the RBI originals before relying on exact wording. Survey statistics are the publishers' claims about their own samples. The EU timeline reflects a provisional political agreement not yet in the Official Journal. This page is reviewed quarterly.

The regulator will ask how. Answer with a trail, not a guess.