Virtual Sources

This page discusses the supported data sources for Virtual Graphs and some considerations to take into account when using them.

Page Contents

Supported Data Sources
Supported Client Drivers
Specific Data Source Considerations

Supported Data Sources

Stardog currently supports all the data sources below. Please inquire if you need support for another.

Sources with a * are not included in any Stardog trial offering and are only available upon request.

Relational Databases

Apache Hive
Apache/Cloudera Impala
AWS Athena (additional info)
AWS Aurora
AWS Redshift
Derby
Exasol
Google BigQuery (additional info)
H2
IBM DB2
MariaDB
Microsoft SQL Server
MySQL
Odata
* This source is not included in any Stardog trial offering and is only available upon request
Oracle
PostgreSQL
SAP Business One DI
* This source is not included in any Stardog trial offering and is only available upon request
SAP HANA
Sybase ASE
Teradata

NoSQL Databases

Apache Cassandra (additional info)
Cosmos DB
DataStax
Elasticsearch (additional info)
MongoDB (additional info)

SPARQL Engine/Service

Stardog additional info

Cloud Service

Active Directory
* This source is not included in any Stardog trial offering and is only available upon request
AWS Management
* This source is not included in any Stardog trial offering and is only available upon request
Azure Management
* This source is not included in any Stardog trial offering and is only available upon request
Facebook
* This source is not included in any Stardog trial offering and is only available upon request
Hubspot
* This source is not included in any Stardog trial offering and is only available upon request
Instagram
* This source is not included in any Stardog trial offering and is only available upon request
Jira
* This source is not included in any Stardog trial offering and is only available upon request
LDAP
* This source is not included in any Stardog trial offering and is only available upon request
Linkedin
* This source is not included in any Stardog trial offering and is only available upon request
Marketo
* This source is not included in any Stardog trial offering and is only available upon request
Microsoft Teams
* This source is not included in any Stardog trial offering and is only available upon request
Oracle Eloqua
* This source is not included in any Stardog trial offering and is only available upon request
Oracle SalesCloud
* This source is not included in any Stardog trial offering and is only available upon request
Salesforce Chatter
* This source is not included in any Stardog trial offering and is only available upon request
Salesforce Marketing
* This source is not included in any Stardog trial offering and is only available upon request
Salesforce Pardot
* This source is not included in any Stardog trial offering and is only available upon request
SAP
* This source is not included in any Stardog trial offering and is only available upon request
SAP SuccessFactors
* This source is not included in any Stardog trial offering and is only available upon request
ServiceNow
* This source is not included in any Stardog trial offering and is only available upon request
Slack
* This source is not included in any Stardog trial offering and is only available upon request
Splunk
* This source is not included in any Stardog trial offering and is only available upon request
Slack
* This source is not included in any Stardog trial offering and is only available upon request
Twilio
* This source is not included in any Stardog trial offering and is only available upon request
Veeva
* This source is not included in any Stardog trial offering and is only available upon request
Zendesk
* This source is not included in any Stardog trial offering and is only available upon request

Files/Unstructured Data

Box
* This source is not included in any Stardog trial offering and is only available upon request
CSV (additional info)
Dropbox
* This source is not included in any Stardog trial offering and is only available upon request
Excel
* This source is not included in any Stardog trial offering and is only available upon request
Excel Online
* This source is not included in any Stardog trial offering and is only available upon request
Excel Services
* This source is not included in any Stardog trial offering and is only available upon request
Gmail
* This source is not included in any Stardog trial offering and is only available upon request
Google Calendar
* This source is not included in any Stardog trial offering and is only available upon request
Google Contacts
* This source is not included in any Stardog trial offering and is only available upon request
Google Drive
* This source is not included in any Stardog trial offering and is only available upon request
Google Sheets
* This source is not included in any Stardog trial offering and is only available upon request
JSON (additional info)
Microsoft CDS
* This source is not included in any Stardog trial offering and is only available upon request
Microsoft Exchange
* This source is not included in any Stardog trial offering and is only available upon request
Microsoft OneDrive
* This source is not included in any Stardog trial offering and is only available upon request
Microsoft OneNote
* This source is not included in any Stardog trial offering and is only available upon request
Microsoft Planner
* This source is not included in any Stardog trial offering and is only available upon request
Microsoft Project
* This source is not included in any Stardog trial offering and is only available upon request
Office365
* This source is not included in any Stardog trial offering and is only available upon request
Parquet
* This source is not included in any Stardog trial offering and is only available upon request
REST
* This source is not included in any Stardog trial offering and is only available upon request
Sharepoint
* This source is not included in any Stardog trial offering and is only available upon request

CRM

Dynamics 365 Sales
* This source is not included in any Stardog trial offering and is only available upon request
Dynamics CRM
* This source is not included in any Stardog trial offering and is only available upon request
Netsuite
* This source is not included in any Stardog trial offering and is only available upon request
Odoo
* This source is not included in any Stardog trial offering and is only available upon request
Salesforce (via Simba Salesforce.com JDBC Connector)
Salesforce Einstein
* This source is not included in any Stardog trial offering and is only available upon request
SAP ByDesign
* This source is not included in any Stardog trial offering and is only available upon request
SAP Netweaver Gateway
* This source is not included in any Stardog trial offering and is only available upon request
Sugar CRM
* This source is not included in any Stardog trial offering and is only available upon request
Veeva CRM
* This source is not included in any Stardog trial offering and is only available upon request

Data Analytics

Adobe Analytics
* This source is not included in any Stardog trial offering and is only available upon request

Supported Client Drivers

To connect to your data sources, Stardog requires you to supply Stardog with the appropriate Client Driver. You need to manually copy the JAR file containing the driver to the <stardog_installation_directory>/server/dbms/ directory (or to the location pointed to by the STARDOG_EXT environment variable if you have set one) so that it will be available to the Stardog server.

Stardog is tested against all supported databases using the following drivers. Stardog requires a JDBC 4.2 compatible client driver. While other drivers may work, your mileage may vary. For best results, set the sql.dialect virtual graph option when using unsupported drivers.

Database	Driver
Apache Hive	jdbc/hive-jdbc-3.1.2-standalone.jar
Apache/Cloudera Impala	https://www.cloudera.com/downloads/connectors/impala/jdbc/2-5-42.html
AWS Athena	AthenaJDBC42_2.0.7.jar
AWS Aurora	https://dev.mysql.com/downloads/connector/j/5.1.html
AWS Redshift	RedshiftJDBC42-no-awssdk-1.2.20.1043.jar
Cassandra	cassandra-driver-core-3.6.0-shaded.jar
Derby	https://db.apache.org/derby/releases/release-10.14.2.0.cgi
Elasticsearch	Three jars are required: - elasticsearch-rest-client-6.7.1.jar - httpasyncclient-4.1.2.jar - httpcore-nio-4.4.5.jar
Google BigQuery	The following jars from here - GoogleBigQueryJDBC42.jar - google-api-client-1.28.0.jar - google-auth-library-oauth2-http-0.13.0.jar - gax-1.42.0.jar - google-api-services-bigquery-v2-rev426-1.25.0.jar - avro-1.9.0.jar
H2	h2-1.4.196.jar
IBM DB2	10.x
Microsoft SQL Server	mssql-jdbc-6.2.1.jre8.jar
MongoDB	mongo-java-driver-3.8.2.jar
MySQL & MariaDB	https://dev.mysql.com/downloads/connector/j/5.1.html
Oracle	https://www.oracle.com/technetwork/database/features/jdbc/default-2280470.html
PostgreSQL	postgresql-42.2.5.jar
SAP HANA	ngdbc-2.3.56.jar
Sybase ASE	https://www.sap.com/developer/trials-downloads/additional-downloads/sdk-for-adaptive-server-enterprise-16-0-sp03-pl04-15001.html
Teradata	version 16.20

Specific Data Source Considerations

AWS Athena Virtual Graph Considerations

To connect to an Athena database, first download the driver for JDBC version 4.2 here. Then follow the instructions here.

The Athena JDBC driver does not expose the list of accessible databases. To save having to qualify every table name with the database name in your mappings, provide the default database in your connection URL using the Schema parameter. For example:

jdbc.url=jdbc:awsathena://athena.us-west.amazonaws.com:443;S3OutputLocation=s3://mybucket/output;Schema=mydb

Athena does not support primary (or unique) keys. This can negatively impact Stardog’s ability to create optimal queries. Use the unique.key.sets virtual graph option to define unique columns manually.

Google BigQuery Virtual Graph Considerations

To connect to a BigQuery database, first download the driver for JDBC version 4.2 here. Then follow the instructions here. You’ll need the following jar files:

GoogleBigQueryJDBC42.jar
google-api-client-1.28.0.jar
google-auth-library-oauth2-http-0.13.0.jar
gax-1.42.0.jar
google-api-services-bigquery-v2-rev426-1.25.0.jar
avro-1.9.0.jar

The connection string requires a number of parameters. For example:

jdbc.url=jdbc:bigquery://https://bigquery.googleapis.com/bigquery/v2;ProjectId=myproject;DefaultDataset=mydataset;OAuthType=0;OAuthServiceAcctEmail=vgtestaccount@myacct.iam.gserviceaccount.com;OAuthPvtKeyPath=/path/creds/bq_pk.json;Timeout=60
#jdbc.username=
#jdbc.password=

Note the jdbc.username and jdbc.password properties are not set. An example /path/creds/bq_pk.json file looks like this:

{
  "type": "service_account",
  "project_id": "myproject",
  "private_key_id": "363287712aaedb3cd20ebdc11213d43a15760768",
  "private_key": "-----BEGIN PRIVATE KEY-----\nBRIngMEMybROWnpANTsb4THenEXTbaTTLe/or\Ill8mYlUNC5+Din2tH...3er45/2ed\53fg\g+d##\n-----END PRIVATE KEY-----\n",
  "client_email": "myaccount@myid.iam.gserviceaccount.com",
  "client_id": "12345678987654321239",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://oauth2.googleapis.com/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/myaccount%40myid.iam.gserviceaccount.com"
}

See this for details. Note that alternative means of providing credentials may require additional jar files.

BigQuery does not support primary (or unique) keys. This can negatively impact Stardog’s ability to create optimal queries. Use the unique.key.sets virtual graph option to define unique columns manually.

MongoDB Virtual Graph Considerations

To connect to a MongoDB database, first download the client jar. The client jar for MongoDB version “x.y.z” can be obtained from http://central.maven.org/maven2/org/mongodb/mongo-java-driver/x.y.z/mongo-java-driver-x.y.z.jar Then follow the instructions here.

MongoDB has one Date type. It is stored as a 64-bit integer that represents the number of milliseconds since Jan 1, 1970, Universal Time Coordinated (UTC). This Date type can be mapped to xsd:date, xsd:dateTime or xsd:dateTimeStamp data types. (The xsd:dateTimeStamp data type is the same as xsd:dateTime except instead of having an optional timezone the timezone is required.) When a Date is mapped to either xsd:date or xsd:dateTimeStamp, it will be represented in the UTC timezone. When a Date field is mapped to an xsd:dateTime, the Date will be converted to the local timezone of the Stardog server and the label will include the timezone.

Setting Unique Keys Manually

The unique.key.sets option can be used with MongoDB, but the format is MongoDB-specific. In place of schema and table names, the unique keys must be specified in terms of the collection and a list of nested arrays.

For example, take an accounts virtual graph with this SMS2 mapping:

prefix : <http://example.com/>

MAPPING <urn:accounts>
FROM JSON {
  "accounts" : {
    "_id" : "?id",
    "acct" : "?acctNum",
    "customerName" : [ "?name" ],
    "card" : [ {
      "number" : "?ccNumber",
      "expiration" : "?ccExpr" }
    ]
  }
}
TO {
  ?holder :hasAcct ?acct .
  ?holder :hasName ?name .
}
WHERE {
  BIND (template("http://example.com/acct/{acctNum}") AS ?acct)
  BIND (template("http://example.com/holder/{ccNumber}_{name}") AS ?holder)
}

And this query:

SELECT * {
  graph <virtual://accounts> {
    ?holder :hasAcct ?acct .
    ?holder :hasName ?name .
  }
}

When Stardog translates this query, it creates a flattened view of the collection (using the $unwind stage) giving a relational view. In this example both the customerName and card arrays will be flattened because both are referenced in the template for the ?holder variable.

The plan for the example query will include a join because Stardog has no way of knowing that the card.number/customerName pair is unique. If we know that this pair of fields is indeed unique in this collection, we can make the query more efficient by adding the pair as a unique key to the unique.key.sets property:

unique.key.sets=(accounts.[card;customerName].customerName,accounts.[card;customerName].card.number)

It is required that the flattened arrays are listed in alphabetical order, are separated by semicolons, and are enclosed in square brackets. For nested arrays, use periods to delimit the names (level1.level2).

Multiple key sets can be separated with commas. For example, if we also know that the acct field is unique, the property value becomes:

unique.key.sets=(accounts.[].acct),(accounts.[card;customerName].customerName,accounts.[card;customerName].card.number)

Elasticsearch Virtual Graph Considerations

To create an Elasticsearch virtual graph, you need to download the Elasticsearch client jar along with two supporting jars. The client jar for Elasticsearch version “x.y.z” can be obtained from https://repo1.maven.org/maven2/org/elasticsearch/client/elasticsearch-rest-client/x.y.z/elasticsearch-rest-client-x.y.z.jar

Two supporting jars are also required:

Supported Field Types

Stardog supports the following Elasticsearch field types:

keyword
long
integer
short
byte
double
float
half_float
scaled_float
date

Note that only the keyword data type is supported for strings. Strings indexed as text cannot be mapped for the purpose of SPARQL/GraphQL querying.

Virtual Graph Mappings for Elasticsearch

To create virtual graph mappings for Elasticsearch, use SMS2 mapping syntax with the FROM JSON clause.

There are two types of mappings being discussed here: Stardog Virtual Graph mappings that describe how Elasticsearch fields are mapped to RDF and Elasticsearch mappings that define a schema for an Elasticsearch index.

For an index named simple with a single Elasticsearch mapping type named _doc and an Elasticsearch mapping like:

{
  "simple" : {
    "mappings" : {
      "_doc" : {
        "properties" : {
          "StudentId" : {
            "type" : "integer"
          },
          "StudentName" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          }
        }
      }
    }
  }
}

an example Stardog mapping could look like this:

prefix ex: <http://example.com/>

MAPPING <urn:example>
FROM JSON {
  "simple._doc":{
    "_id": "?docId",
    "StudentId": "?id",
    "StudentName.keyword": "?name"
  }
}
TO {
  ?studentIri a ex:Student ;
    rdfs:label ?name ;
    ex:esId ?docId .
}
WHERE {
  bind(template("http://example.com/{id}") as ?studentIri)
}

The top-level key in the FROM JSON (simple._doc) is formed by joining the index name and the Elasticsearch mapping type with a period. This is similar to the schemaName.tableName convention that is used for SQL databases. As a shorthand, for indexes with only one mapping type, the mapping type can be omitted. In this example, simple._doc can be replaced with simple assuming _doc is the only mapping type.

For Elasticsearch versions 6 and later, indexes are allowed only one mapping type, where the name of the mapping type defaults to _doc. For version 5 it is possible for an index to have more than one mapping type. See Removal of mapping types in the Elasticsearch online documentation for details.

Notice in the above example that the built-in _id field is mapped. Stardog knows that the _id field is unique across all documents in the index and it uses this information to simplify the queries it generates. Stardog is not able to determine the uniqueness of any other fields but if you know certain fields (or combinations of fields) are unique you can indicate which field sets are unique in the configuration options.

For example, suppose we know that StudentId is in fact a unique field. We can tell Stardog so by setting the unique.key.sets configuration option:

unique.key.sets=(simple._doc.StudentId)

or if the simple index has only the one mapping type:

unique.key.sets=(simple.StudentId)

Automatically Generating Mappings

Elasticsearch indexes have well-defined schemas. Stardog can use that schema information to automatically generate virtual graph mappings to RDF. By default, the generated templates for the IRIs will be based on the Elasticsearch mapping type names, which are _doc for all indexes on recent versions of Elasticsearch. This makes the IRIs difficult to distinguish. To address this, Stardog defaults the schema.in.generated.mappings to true when generating virtual graph mappings for Elasticsearch.

Apache Cassandra Virtual Graph Considerations

To create a Cassandra virtual graph, you’ll need to first download the shaded Cassandra client jar. The client jar for Cassandra version “x.y.z” can be obtained from http://central.maven.org/maven2/com/datastax/cassandra/cassandra-driver-core/x.y.z/cassandra-driver-core-x.y.z-shaded.jar

Cassandra is special in the way it attempts to prevent users from distributing queries over a large number of server nodes. If you have experience with CQL queries, you have no doubt seen the ubiquitous error message:

Cannot execute this query as it might involve data filtering and thus may have 
unpredictable performance. If you want to execute this query despite the performance
unpredictability, use ALLOW FILTERING

This reflects the Cassandra modeling principle that favors writing the same data to multiple tables (perhaps through the use of Materialized Views), where each table is optimized for answering different queries.

In order to support as many queries as possible, we recommend creating mappings to each of these tables and letting Stardog choose which mappings apply for each query. It is possible that no mappings can support a particular query. In such cases, Stardog will write an entry to the log file and return no results.

This is the default behavior, which can be changed by setting the cassandra.allow.filtering option to true. When set, Stardog will include the ALLOW FILTERING clause at the end of each CQL query. Please note that the use of this option is highly discouraged in large-scale production environments.

Cassandra is also special for how SQL-like its query language is (for a NoSQL database). As this is the case, Stardog supports the use of SQL queries in the mappings files for Cassandra virtual graphs. That is, you can use the rr:sqlQuery predicate for R2RML mappings, the sm:query predicate for Stardog Mapping Syntax, or the FROM SQL clause for Stardog Mapping Syntax 2. In all cases, you can supply a SQL query to describe a view to use for a virtual graph mapping, however, the SQL query can only contain operators that are supported in CQL - no joins, subqueries, SQL functions, etc. are allowed.

Stardog/SPARQL Virtual Graph Considerations

To create Stardog Engine or SPARQL service virtual graph, the properties file should specify values for the following properties:

Property	Description
`sparql.url`	SPARQL endpoint with database specified, for example, `http://myhost:26023/testdb/query`
`sparql.username`	username to access the SPARQL endpoint
`sparql.password`	password to access the SPARQL endpoint
`sparql.graphname`	boolean value for specifying whether or not the input file has a header line at the beginning
`sparql.statsbasedoptimization`	boolean value to enable/disable statistics based optimization, only valid for Virtual Graphs to other Stardog servers

Virtual graph mappings are not supported.

Importing Text Files

The same Virtual Graph commands and mappings that are used for creating virtual graphs can be used to import data from delimited (CSV or TSV) and JSON files. It is not truly virtual, but is part of our Virtual Graph APIs and docs because it shares the same mappings syntax.

The mappings files for importing text files must be expressed in SMS2.

Unlike all other Virtual Graph data sources, the WHERE clause in SMS2 mappings for text files supports any SPARQL query function when BIND-ing transformed values to new variables. This includes UNNEST.

Importing CSV Files

To import a CSV file, provide the file as the last argument to the import command:

$ stardog-admin virtual import myDB cars.sms cars.csv

If the input file is using different kind of separators, e.g. tab character, a properties file can be provided:

$ stardog-admin virtual import myDB cars.properties cars.sms cars.tsv

The properties file for CSVs can specify values for the following virtual graph properties:

Property	Description
`csv.separator`	character for separating fields
`csv.quote`	used for strings that contain field separators
`csv.escape`	character for escaping special characters
`csv.header`	boolean value for specifying whether or not the input file has a header line at the beginning
`csv.hash.function`	the hash function to use for fields prefixed with a `#`

The csv.escape character is used as an alternative to the csv.quote character. To escape a csv.quote character within a string that is enclosed in csv.quote characters, use two consecutive csv.quote characters. Do not set csv.escape to the csv.quote character.

Note that whitespace characters in the Java properties file format need to be escaped so if you want to import tab-separated value files set csv.separator=\t in the properties file.

In addition to directly referencing columns by name, CSV mappings may include a special ?_ROW_NUMBER_ variable used to obtain the current line number.

The mappings for delimited files can be automatically generated given a couple additional properties:

Property	Description
`csv.class`	indicate the class, or `rdf:type`, to use for the subjects of each row
`unique.key.sets`	the set of columns that uniquely identify each row

To import with automatically generated mappings, omit the command line argument for the mappings file:

$ stardog-admin virtual import myDB cars.properties cars.csv

There is a complete example available in our examples repo.

Importing JSON Files

To import a JSON file, provide the file name as the final argument to the import command:

$ stardog-admin virtual import myDB bitcoin.sms bitcoin.json

Here is an example JSON file:

{
  "hash": "00000000000000000028484e3ba77273ebd245f944e574e1d4038d9247a7ff8e",
  "time": 1569266867591,
  "block_index": 1762564,
  "height": 575144,
  "txIndexes": [
    445123952,
    445058113,
    445054577,
    445061250
  ]
}

and a corresponding SMS2 mapping:

PREFIX : <http://example.com/>

mapping
from json {
  {
    "hash" : "?hash",
    "time" : "?time",
    "block_index" : "?block_index",
    "height" : "?height",
    "txIndexes" : [ "?txIndex" ]
  }
}
to {
  ?block a :Block ;
    :hash ?hash ;
    :time ?dateTime ;
    :height ?height ;
    :includesTx ?tx .
  ?tx a :Tx ;
    :index ?txIndex .
}
where {
  bind(xsd:dateTime(?time) as ?dateTime)
  bind(template("http://example.com/tx/{txIndex}") as ?tx)
  bind(template("http://example.com/block/{hash}") as ?block)
}

Note how unlike from json when used with MongoDB, with a JSON file SMS2 mapping there is no MongoDB collection name serving as the key name for the top-level object.

Supported Data Sources
Supported Client Drivers
Specific Data Source Considerations