Datasources

Colrows connects to your warehouses, lakehouses, and operational databases through a uniform connector layer. The semantic graph is warehouse-agnostic - one definition, many backends, dialect-perfect SQL at the edge.

Supported datasources

CategoryEnginesAuth modes
Cloud warehousesSnowflake, Databricks SQL, Google BigQuery, Amazon RedshiftUsername/password, key-pair, OAuth, IAM
Lakehouse / query enginesTrino, Starburst, Presto, Athena, DremioJDBC + LDAP/Kerberos/IAM
OLAPClickHouse, Druid, Pinot, ExasolJDBC, native auth
RDBMSPostgreSQL, MySQL, MariaDB, Oracle, SQL Server, IBM Db2JDBC, AD/Kerberos, certificate
Cloud-nativeAlloyDB, Aurora Postgres, Cloud SQL, Azure DatabaseIAM, password
NoSQL / SearchMongoDB, Elasticsearch (read-only)Native auth, X.509

Onboarding a datasource

  1. Open Datasources → Add datasource

    Pick the connector. Each connector exposes only the fields it actually needs - there is no generic "advanced JDBC URL" trapdoor.

  2. Provide connection details

    Host, port, database, warehouse / catalog / schema as applicable, plus credentials. Colrows runs a probe in your network and reports the latency.

    # Snowflake
    host        = acme.snowflakecomputing.com
    warehouse   = COMPUTE_WH
    database    = ANALYTICS
    role        = COLROWS_READER
    auth        = key-pair    # or password
  3. Choose how Colrows reaches the database

    Direct, SSH tunnel, or SSL with a custom CA. The network section below has the full set.

  4. Initial crawl

    Colrows scans the schema, registers candidate datasets and columns into the semantic graph, and runs distribution fingerprinting to bootstrap drift detection. You can scope the crawl to specific catalogs / schemas - new objects can be promoted later.

  5. Bind concepts

    Open Consensus and bind your business concepts to anchors in the new datasource. Once bound, every query - analyst, dashboard, or AI - runs through governance.

Driver setup

Colrows ships with built-in drivers for every supported engine. For self-hosted deployments, drivers live in /opt/colrows/drivers and are version-pinned per release. Custom dialects can be registered through the SQL Engine SDK - contact support for the SDK.

Network options

  • Direct connection - Colrows Cloud reaches your database from a fixed set of egress IPs (provided in Datasources → Network). Simplest setup; works for any internet-reachable database.
  • SSH tunnel - Colrows opens a tunnel through a bastion you control. Useful when the database has no public endpoint.
  • SSL / mTLS - upload your CA bundle and (optionally) client cert. Required by some regulated deployments.
  • Private link - AWS PrivateLink, Azure Private Endpoint, GCP Private Service Connect. Available on Enterprise plans.
  • Self-hosted runner - for fully air-gapped environments, deploy the Colrows runner inside your VPC and let it call out to the control plane over HTTPS.
Use a read-only role.

Colrows needs SELECT, SHOW, and DESCRIBE on the catalogs you want governed. It never asks for write privileges. Granting more than necessary is a control failure waiting to happen.

Cross-source semantics

Colrows can compile queries that span multiple datasources where a valid join path exists. The planner pushes down per dialect and only materializes intermediate results when no pushdown is possible. This is how the same metric definition can be served from Snowflake for analytics and from Postgres for operational queries - without duplicating logic.