Data Source Configuration

This page covers the configuration options for the various types of Data Sources.

Page Contents

Overview
Configuration Options

Overview

The available configuration options for a Data Source depend on the type of the Data Source. Supported Data Source types include JDBC, MongoDB, Elasticsearch, Cassandra, and SPARQL.

Configuration options are stored in a Java properties file.

A typical properties file for a JDBC-type Data Source (in this case, MySQL) looks like:

jdbc.url=jdbc:mysql://localhost/dept
jdbc.username=MySqlUserName
jdbc.password=MyPassword
jdbc.driver=com.mysql.jdbc.Driver

Configuration Options

Options Common to All Data Source Types

The following option applies to Data Sources of all types. The options apply to all Virtual Graphs for a given Data Source.

`unique.key.sets`

Default
Required	No
Description	For data sources that do not express unique constraints in their metadata, either because unique constraints are not supported or because the data source did not include some or all of the valid constraints for reasons such as performance concerns, this property is used to define additional constraints manually. The property value is a comma-separated list of keys that define unique rows in a table. Each key is itself a comma-separated list of schema-qualified columns, enclosed in parentheses. For example, if table `APP.CUSTOMERS` has an `ID` column that serves as a primary key and a pair of columns, `FNAME` and `LNAME`, that together are a unique key, the value to express that is: `(APP.CUSTOMERS.ID),(APP.CUSTOMERS.FNAME,APP.CUSTOMERS.LNAME)`

`batch.size`

Default	`100`
Required	Yes
Description	The `batch.size` option determines how many solutions will be evaluated at a Virtual Graph per request of the `ServiceJoin` operator. A larger batch size can reduce the number of requests to a data source but can make the evaluation of each request more expensive. By default, the batch size is set to `100`. The value can be adjusted in the range `[0, 10000]`. A value of `0` will disable batch requests to the data source.

JDBC Options

The following properties are used for all relational data sources.

`jdbc.url`

Default
Required	Yes
Description	The URL of the JDBC connection.

`jdbc.username`

Default
Required	No
Description	The username used to make the JDBC connection.

`jdbc.password`

Default
Required	No
Description	The password used to make the JDBC connection.

`jdbc.driver`

Default
Required	No
Description	The driver class name used to make the JDBC connection.

`sql.dialect`

Default	Inferred from supported JDBC drivers. `ORACLE` for unsupported drivers.
Required	No
Description	When using an unsupported JDBC driver, this option can be used to specify the format of the generated SQL. The options supported are `ATHENA`, `BIGQUERY`, `DB2`, `DERBY`, `EXASOL`, `H2`, `HANA`, `HIVE`, `IMPALA`, `JIRA`, `MSSQL`, `MYSQL`, `ORACLE`, `POSTGRESQL`, `REDSHIFT`, `REST`, `SALESFORCE`, `SPARKSQL`, `SPLUNK`, `SYBASE`, `TERADATA`

`sql.default.catalog`

Default	(taken from JDBC driver)
Required	No
Description	Overrides the default catalog for the connected user. Schemas in the default catalog may be referenced without qualification (`myschema` rather than `mycatalog.myschema`). For use with databases (like Databricks) that support 3-level (catalog/schema/table) identifiers. See also: Databases and Schemas

`sql.catalogs`

Default
Required	No
Description	A comma-separated list of catalogs. This list plus the default catalog define the set of catalogs represented by a `.` wildcard for the `sql.schemas` option. An asterisk (`*`) can be used as a wildcard for this, the `sql.catalogs` option, meaning all catalogs accessible to the JDBC connection. For use with databases (like Databricks) that support 3-level (catalog/schema/table) identifiers. See also: Databases and Schemas

`sql.default.schema`

Default	(taken from JDBC driver)
Required	No
Description	Overrides the default schema for the connected user. Tables in the default schema may be referenced without qualification (`mytable` or `"mytable"` rather than `myschema.mytable`, `"myschema"."mytable"`, or, for Databricks, `mycatalog.myschema.mytable`). See also: `sql.default.catalog` and Databases and Schemas

`sql.schemas`

Default
Required	No
Description	A comma-separated list of schemas to append to the schema search path. This option allows SQL queries from the virtual graph mappings to reference tables that are outside of the default schema for the connected user. An asterisk can be used as a wildcard. A lone asterisk (``) means all schemas in the default catalog, `mycatalog.` means all schemas in the `mycatalog` catalog, and `.` means all schemas in all the accessible catalogs. See also: `sql.default.catalog` and Databases and Schemas

`sql.skip.validation`

Default	`false`
Required	No
Description	Set this option to `true` to bypass the validation of catalog, schema, table, and column references against metadata from the backing database. These checks are made in order to catch configuration problems early. Should not be necessary with any supported database and driver but can be used with a JDBC driver or database that does not support returning schema metadata.

Tomcat Connection Pool Options

Additionally, connection pool properties for the built-in Tomcat connection pool are allowed. This set of additional allowed properties is listed in the Tomcat JDBC Connection Pool documentation. Stardog sets these connection pool defaults:

initialSize=3
testWhileIdle=true

# 4 hours
timeBetweenEvictionRunsMillis=14400000

validationQueryTimeout=10

To disable connection pooling, one can set the following connection pool properties:

testOnBorrow=true
timeBetweenEvictionRunsMillis=0
maxIdle=0
minIdle=0

This may be useful for troubleshooting some configurations.

Passing Options Directly to the JDBC Driver

Any options with the prefix ext. will be passed directly to the JDBC Driver. Tomcat connection pool properties do not need to be prefixed with ext.

Any unknown options will be ignored.

MongoDB Options

`mongodb.uri`

Default
Required	Yes
Description	The URI for the MongoDB connection. Examples: `mongodb://localhost/mydb` or `mongodb+srv://myUserName:myP4ssw0rd@cluster0-kgprod.company.com/mydb`

Elasticsearch Options

`elasticsearch.rest.urls`

Default
Required	Yes
Description	Whitespace-delimited list of connection host/port values for Elasticsearch. Example: `server1:9200 server2:9200 server3:9200`

`elasticsearch.indexes`

Default	All accessible indexes available at `elasticsearch.rest.urls`
Required	No
Description	A comma-delimited list of indexes to make available from this data source.

`elasticsearch.username`

Default
Required	No
Description	Username for Elasticsearch connections.

`elasticsearch.password`

Default
Required	No
Description	Password for Elasticsearch connections.

Cassandra Options

`cassandra.contact.point`

Default
Required	Yes
Description	The address of the Cassandra node(s) that the driver uses to discover the cluster topology. Example: `cassandra.abc.com`

`cassandra.port`

Default	9042
Required	No
Description	The port to use to connect to the Cassandra host.

`cassandra.keyspace`

Default
Required	Yes
Description	The Cassandra keyspace to use for this session.

`cassandra.username`

Default
Required	No
Description	The username for the Cassandra cluster.

`cassandra.password`

Default
Required	No
Description	The password for the Cassandra cluster.

`cassandra.allow.filtering`

Default	`false`
Required	No
Description	Whether to include the `ALLOW FILTERING` clause at the end of Cassandra CQL queries. Not recommended for production use.

SPARQL Engine/Service Options

`sparql.url`

Default
Required	Yes
Description	SPARQL query endpoint/connection string with database specified, i.e. `http://myhost:26023/testdb/query`

`sparql.username`

Default
Required	No
Description	The username to access the SPARQL endpoint.

`sparql.password`

Default
Required	No
Description	The password to access the SPARQL endpoint.

`sparql.graphname`

Default
Required	Yes
Description	The graph name on the SPARQL endpoint to be mapped as virtual graph.

`sparql.statsbasedoptimization`

Default	`true`
Required	No
Description	Whether to enable statistics-based optimization while accessing the SPARQL endpoint.

Overview
Configuration Options