Overview
As implied by its name, any system—whether temporary or permanent—that stores data has the potential to serve as a datasource in Colrows. This includes relational and non-relational databases, in-memory caches, data streaming systems, and files. Datasources mark the initial phase of a user's data interaction in Colrows. Consequently, configuring a datasource in Colrows is remarkably straightforward.
Supported Datasources
Colrows is primarily oriented towards SQL processing, yet its adapter framework expands its capabilities to encompass non-SQL datasources as well. This framework translates SQL-based relational operations into the native query language of each specific datasource. However, it's important to acknowledge that not all SQL query operations can be seamlessly translated into an appropriate native query language for every datasource.
Currently, Colrows supports following datasources:
- MySQL
- Oracle
- Microsoft SQL Server
- PostgreSQL
- IBM DB2
- Vertica
- ClickHouse
- Timescale
- MariaDB
- Orient
- Elasticsearch
- Snowflake
- Apache Ignite
- Delimited Files
We continually add new connectors to expand our supported datasource list.
Datasource Onboarding
Setting up a user datasource in Colrows through its UI is straightforward and intuitive. During the datasource configuration process, users can define access restrictions, set up SSL, manage connections, and configure SQL parser settings.
Colrows supports the use of different versions of the same database drivers across instances of the same database product. To achieve this, it's essential to configure the correct driver version and driver class during the datasource setup.
The process of onboarding a datasource involves two main steps: configuring the datasource specifics and transferring the necessary driver classes (JAR files along with their dependent JARs) to Colrows servers.
Details outlining the setup process for the datasource drivers are provided below.
JDBC Datasource
An exhaustive list of JDBC datasource configurations have been documented below.
| Attribute | Mandatory | Description |
|---|---|---|
|
Name |
Yes |
A short string to identify this datasource. |
|
Product |
Yes |
The product includes a predefined list of supported data sources. Other configurations will differ depending on the selected product. If you don't see a particular data source in the list, it means we currently don't have connectors for it. You can submit a feature request, and we will be happy to evaluate it. |
|
Restricted Access |
Yes |
The flag shows whether access to this data source is restricted. If set to true, access will be blocked for all users unless explicitly permitted through Data Access Policies. |
|
Connection String |
Yes |
The connection string includes details such as host, port and scheme. For example, mysql database will have connection string as below- jdbc:mysql://[host]:[port]/[database] It is not necessary to include database in the string in which case, the complete database layout will be pulled. |
|
Driver Version |
No |
Colrows supports multiple drivers for the same data source product, enabling connections to different versions of the product. The driver version is optional; if not specified, the system will use the latest available driver. For more details, please refer to the “JDBC Driver Section.”. |
|
Driver Class |
No |
Another optional attribute which is part of JDBC driver configurations. |
|
Principal |
Yes |
The username to be used to connect to the datasource. |
|
Secret |
Yes |
Password to be used for the given Principal. |
|
Idle Time |
No |
Attribute to specify the idle time in milliseconds for connections. If a connection remains inactive for the specified duration, it will be destroyed. |
File Datasources
A file data source is essentially a directory located on a local system or a network drive mounted on Colrows servers. It offers a range of filters that allow you to specify which files are included in the data source. Each file that meets the filter criteria becomes a table within this data source.
Below is a list of attributes for configuring a file data source.
| Attribute | Mandatory | Description |
|---|---|---|
|
Name |
Yes |
A short string to identify this datasource. |
|
Product |
Yes |
The product comes with a predefined list of supported data sources. To configure a File data source, choose 'FILE' from the list. |
|
Restricted Access |
Yes |
The flag shows whether access to this data source is restricted. If set to true, access will be blocked for all users unless explicitly permitted through Data Access Policies. |
|
Base Directory |
Yes |
The base directory (either local or network-based) will serve as the root for this data source. The directory must be accessible to Colrows nodes. |
|
File Type |
Yes |
A list of supported file content types. Colrows currently supports DELIMITED and JSON files, with support for PARQUET files coming soon. |
|
File Type Properties |
Yes |
Based on the content of the files indicated by 'File Type' attribute, a JSON configuration for various properties of the files. For example, if files are delimited, the do the files have first line as header in them, if header does not exist, then a header can be supplied. |
|
Recursive |
Yes |
A flag indicating whether the contents of directories within the base directory should be included in the scope of this data source. If set to false, only the files in the base directory itself will be considered part of the data source's scope, and files within subdirectories will be excluded. |
|
File Name Patterns |
No |
A comma separated list of regular expressions to match the file names. When specified, only the files whose name matches with the any of the regular expression will be part of datasource. |
|
Extensions |
No |
Another filter for files. When specified, only the files with given extensions will be in the scope of this datasources. |
|
Create Time Range |
No |
A time range filter based on create time of the files. This is to limit the number of files in datasource scope. |
|
Last Modified Time Range |
No |
Another filter which works based on the last modified time of the files. |
S3 Datasources
S3 is a widely-used object storage protocol with implementations from various vendors, including GCP, Azure, AWS, MinIO, Pure S3, Alibaba S3, Wasabi S3, and others. Colrows supports nearly all S3 implementations.
Below is a list of attributes to configure a S3 datasource in Colrows.
| Attribute | Mandatory | Description |
|---|---|---|
|
Name |
Yes |
A short string to identify this datasource. |
|
Product |
Yes |
The product comes with a predefined list of supported data sources. To configure a File data source, choose 'FILE' from the list. |
|
Restricted Access |
Yes |
The flag shows whether access to this data source is restricted. If set to true, access will be blocked for all users unless explicitly permitted through Data Access Policies. |
|
Base Directory |
Yes |
The base directory (either local or network-based) will serve as the root for this data source. The directory must be accessible to Colrows nodes. |
|
File Type |
Yes |
A list of supported file content types. Colrows currently supports DELIMITED and JSON files, with support for PARQUET files coming soon. |
|
File Type Properties |
Yes |
Based on the content of the files indicated by 'File Type' attribute, a JSON configuration for various properties of the files. For example, if files are delimited, the do the files have first line as header in them, if header does not exist, then a header can be supplied. |
|
Recursive |
Yes |
A flag indicating whether the contents of directories within the base directory should be included in the scope of this data source. If set to false, only the files in the base directory itself will be considered part of the data source's scope, and files within subdirectories will be excluded. |
|
File Name Patterns |
No |
A comma separated list of regular expressions to match the file names. When specified, only the files whose name matches with the any of the regular expression will be part of datasource. |
|
Extensions |
No |
Another filter for files. When specified, only the files with given extensions will be in the scope of this datasources. |
|
Create Time Range |
No |
A time range filter based on create time of the files. This is to limit the number of files in datasource scope. |
|
Last Modified Time Range |
No |
Another filter which works based on the last modified time of the files. |