Link Search Menu Expand Document
Start for Free

Data Source Configuration

This page covers the configuration options for the various types of Data Sources.

Page Contents
  1. Overview
  2. Configuration Options
    1. Options Common to All Data Source Types
    2. JDBC Options
      1. Tomcat Connection Pool Options
      2. Passing Options Directly to the JDBC Driver
    3. MongoDB Options
    4. Elasticsearch Options
    5. Cassandra Options
    6. SPARQL Engine/Service Options

Overview

The available configuration options for a Data Source depend on the type of the Data Source. Supported Data Source types include JDBC, MongoDB, Elasticsearch, Cassandra, and SPARQL.

Configuration options are stored in a Java properties file.

A typical properties file for a JDBC-type Data Source (in this case, MySQL) looks like:

jdbc.url=jdbc:mysql://localhost/dept
jdbc.username=MySqlUserName
jdbc.password=MyPassword
jdbc.driver=com.mysql.jdbc.Driver

Configuration Options

Options Common to All Data Source Types

The following option applies to Data Sources of all types. The options apply to all Virtual Graphs for a given Data Source.

unique.key.sets

Default  
Required No
Description For data sources that do not express unique constraints in their metadata, either because unique constraints are not supported or because the data source did not include some or all of the valid constraints for reasons such as performance concerns, this property is used to define additional constraints manually. The property value is a comma-separated list of keys that define unique rows in a table. Each key is itself a comma-separated list of schema-qualified columns, enclosed in parentheses. For example, if table APP.CUSTOMERS has an ID column that serves as a primary key and a pair of columns, FNAME and LNAME, that together are a unique key, the value to express that is: (APP.CUSTOMERS.ID),(APP.CUSTOMERS.FNAME,APP.CUSTOMERS.LNAME)

batch.size

Default 100
Required Yes
Description The batch.size option determines how many solutions will be evaluated at a Virtual Graph per request of the ServiceJoin operator. A larger batch size can reduce the number of requests to a data source but can make the evaluation of each request more expensive. By default, the batch size is set to 100. The value can be adjusted in the range [0, 10000]. A value of 0 will disable batch requests to the data source.

JDBC Options

The following properties are used for all relational data sources.

jdbc.url

Default  
Required Yes
Description The URL of the JDBC connection.

jdbc.username

Default  
Required No
Description The username used to make the JDBC connection.

jdbc.password

Default  
Required No
Description The password used to make the JDBC connection.

jdbc.driver

Default  
Required No
Description The driver class name used to make the JDBC connection.

sql.dialect

Default Inferred from supported JDBC drivers. ORACLE for unsupported drivers.
Required No
Description When using an unsupported JDBC driver, this option can be used to specify the format of the generated SQL. The options supported are ATHENA, BIGQUERY, DB2, DERBY, EXASOL, H2, HANA, HIVE, IMPALA, JIRA, MSSQL, MYSQL, ORACLE, POSTGRESQL, REDSHIFT, REST, SALESFORCE, SPARKSQL, SPLUNK, SYBASE, TERADATA

sql.default.catalog

Default (taken from JDBC driver)
Required No
Description Overrides the default catalog for the connected user. Schemas in the default catalog may be referenced without qualification (myschema rather than mycatalog.myschema). For use with databases (like Databricks) that support 3-level (catalog/schema/table) identifiers. See also: Databases and Schemas

sql.catalogs

Default  
Required No
Description A comma-separated list of catalogs. This list plus the default catalog define the set of catalogs represented by a *.* wildcard for the sql.schemas option. An asterisk (*) can be used as a wildcard for this, the sql.catalogs option, meaning all catalogs accessible to the JDBC connection. For use with databases (like Databricks) that support 3-level (catalog/schema/table) identifiers. See also: Databases and Schemas

sql.default.schema

Default (taken from JDBC driver)
Required No
Description Overrides the default schema for the connected user. Tables in the default schema may be referenced without qualification (mytable or "mytable" rather than myschema.mytable, "myschema"."mytable", or, for Databricks, mycatalog.myschema.mytable). See also: sql.default.catalog and Databases and Schemas

sql.schemas

Default  
Required No
Description A comma-separated list of schemas to append to the schema search path. This option allows SQL queries from the virtual graph mappings to reference tables that are outside of the default schema for the connected user. An asterisk can be used as a wildcard. A lone asterisk (*) means all schemas in the default catalog, mycatalog.* means all schemas in the mycatalog catalog, and *.* means all schemas in all the accessible catalogs. See also: sql.default.catalog and Databases and Schemas

sql.skip.validation

Default false
Required No
Description Set this option to true to bypass the validation of catalog, schema, table, and column references against metadata from the backing database. These checks are made in order to catch configuration problems early. Should not be necessary with any supported database and driver but can be used with a JDBC driver or database that does not support returning schema metadata.

Tomcat Connection Pool Options

Additionally, connection pool properties for the built-in Tomcat connection pool are allowed. This set of additional allowed properties is listed in the Tomcat JDBC Connection Pool documentation. Stardog sets these connection pool defaults:

initialSize=3
testWhileIdle=true

# 4 hours
timeBetweenEvictionRunsMillis=14400000

validationQueryTimeout=10

To disable connection pooling, one can set the following connection pool properties:

testOnBorrow=true
timeBetweenEvictionRunsMillis=0
maxIdle=0
minIdle=0

This may be useful for troubleshooting some configurations.

Passing Options Directly to the JDBC Driver

Any options with the prefix ext. will be passed directly to the JDBC Driver. Tomcat connection pool properties do not need to be prefixed with ext.

Any unknown options will be ignored.


MongoDB Options

mongodb.uri

Default  
Required Yes
Description The URI for the MongoDB connection. Examples: mongodb://localhost/mydb or mongodb+srv://myUserName:myP4ssw0rd@cluster0-kgprod.company.com/mydb

Elasticsearch Options

elasticsearch.rest.urls

Default  
Required Yes
Description Whitespace-delimited list of connection host/port values for Elasticsearch. Example: server1:9200 server2:9200 server3:9200

elasticsearch.indexes

Default All accessible indexes available at elasticsearch.rest.urls
Required No
Description A comma-delimited list of indexes to make available from this data source.

elasticsearch.username

Default  
Required No
Description Username for Elasticsearch connections.

elasticsearch.password

Default  
Required No
Description Password for Elasticsearch connections.

Cassandra Options

cassandra.contact.point

Default  
Required Yes
Description The address of the Cassandra node(s) that the driver uses to discover the cluster topology. Example: cassandra.abc.com

cassandra.port

Default 9042
Required No
Description The port to use to connect to the Cassandra host.

cassandra.keyspace

Default  
Required Yes
Description The Cassandra keyspace to use for this session.

cassandra.username

Default  
Required No
Description The username for the Cassandra cluster.

cassandra.password

Default  
Required No
Description The password for the Cassandra cluster.

cassandra.allow.filtering

Default false
Required No
Description Whether to include the ALLOW FILTERING clause at the end of Cassandra CQL queries. Not recommended for production use.

SPARQL Engine/Service Options

sparql.url

Default  
Required Yes
Description SPARQL query endpoint/connection string with database specified, i.e. http://myhost:26023/testdb/query

sparql.username

Default  
Required No
Description The username to access the SPARQL endpoint.

sparql.password

Default  
Required No
Description The password to access the SPARQL endpoint.

sparql.graphname

Default  
Required Yes
Description The graph name on the SPARQL endpoint to be mapped as virtual graph.

sparql.statsbasedoptimization

Default true
Required No
Description Whether to enable statistics-based optimization while accessing the SPARQL endpoint.