Virtual Sources
This page discusses the supported data sources for Virtual Graphs and some considerations to take into account when using them.
Page Contents
- Supported Data Sources
- Supported Client Drivers
- Specific Data Source Considerations
Supported Data Sources
Stardog currently supports all the data sources below. Please inquire if you need support for another.
Sources with a *
are not included in any Stardog trial offering and are only available upon request.
Relational Databases
- Apache Hive
- Apache/Cloudera Impala
- AWS Athena (additional info)
- AWS Aurora
- AWS Redshift
- Derby
- Exasol
- Google BigQuery (additional info)
- H2
- IBM DB2
- MariaDB
- Microsoft SQL Server
- MySQL
- Odata * This source is not included in any Stardog trial offering and is only available upon request
- Oracle
- PostgreSQL
- SAP Business One DI * This source is not included in any Stardog trial offering and is only available upon request
- SAP HANA
- Sybase ASE
- Teradata
NoSQL Databases
- Apache Cassandra (additional info)
- Cosmos DB
- DataStax
- Elasticsearch (additional info)
- MongoDB (additional info)
SPARQL Engine/Service
- Stardog additional info
Cloud Service
- Active Directory * This source is not included in any Stardog trial offering and is only available upon request
- AWS Management * This source is not included in any Stardog trial offering and is only available upon request
- Azure Management * This source is not included in any Stardog trial offering and is only available upon request
- Facebook * This source is not included in any Stardog trial offering and is only available upon request
- Hubspot * This source is not included in any Stardog trial offering and is only available upon request
- Instagram * This source is not included in any Stardog trial offering and is only available upon request
- Jira * This source is not included in any Stardog trial offering and is only available upon request
- LDAP * This source is not included in any Stardog trial offering and is only available upon request
- Linkedin * This source is not included in any Stardog trial offering and is only available upon request
- Marketo * This source is not included in any Stardog trial offering and is only available upon request
- Microsoft Teams * This source is not included in any Stardog trial offering and is only available upon request
- Oracle Eloqua * This source is not included in any Stardog trial offering and is only available upon request
- Oracle SalesCloud * This source is not included in any Stardog trial offering and is only available upon request
- Salesforce Chatter * This source is not included in any Stardog trial offering and is only available upon request
- Salesforce Marketing * This source is not included in any Stardog trial offering and is only available upon request
- Salesforce Pardot * This source is not included in any Stardog trial offering and is only available upon request
- SAP * This source is not included in any Stardog trial offering and is only available upon request
- SAP SuccessFactors * This source is not included in any Stardog trial offering and is only available upon request
- ServiceNow * This source is not included in any Stardog trial offering and is only available upon request
- Slack * This source is not included in any Stardog trial offering and is only available upon request
- Splunk * This source is not included in any Stardog trial offering and is only available upon request
- Slack * This source is not included in any Stardog trial offering and is only available upon request
- Twilio * This source is not included in any Stardog trial offering and is only available upon request
- Veeva * This source is not included in any Stardog trial offering and is only available upon request
- Zendesk * This source is not included in any Stardog trial offering and is only available upon request
Files/Unstructured Data
- Box * This source is not included in any Stardog trial offering and is only available upon request
- CSV (additional info)
- Dropbox * This source is not included in any Stardog trial offering and is only available upon request
- Excel * This source is not included in any Stardog trial offering and is only available upon request
- Excel Online * This source is not included in any Stardog trial offering and is only available upon request
- Excel Services * This source is not included in any Stardog trial offering and is only available upon request
- Gmail * This source is not included in any Stardog trial offering and is only available upon request
- Google Calendar * This source is not included in any Stardog trial offering and is only available upon request
- Google Contacts * This source is not included in any Stardog trial offering and is only available upon request
- Google Drive * This source is not included in any Stardog trial offering and is only available upon request
- Google Sheets * This source is not included in any Stardog trial offering and is only available upon request
- JSON (additional info)
- Microsoft CDS * This source is not included in any Stardog trial offering and is only available upon request
- Microsoft Exchange * This source is not included in any Stardog trial offering and is only available upon request
- Microsoft OneDrive * This source is not included in any Stardog trial offering and is only available upon request
- Microsoft OneNote * This source is not included in any Stardog trial offering and is only available upon request
- Microsoft Planner * This source is not included in any Stardog trial offering and is only available upon request
- Microsoft Project * This source is not included in any Stardog trial offering and is only available upon request
- Office365 * This source is not included in any Stardog trial offering and is only available upon request
- Parquet * This source is not included in any Stardog trial offering and is only available upon request
- REST * This source is not included in any Stardog trial offering and is only available upon request
- Sharepoint * This source is not included in any Stardog trial offering and is only available upon request
CRM
- Dynamics 365 Sales * This source is not included in any Stardog trial offering and is only available upon request
- Dynamics CRM * This source is not included in any Stardog trial offering and is only available upon request
- Netsuite * This source is not included in any Stardog trial offering and is only available upon request
- Odoo * This source is not included in any Stardog trial offering and is only available upon request
- Salesforce (via Simba Salesforce.com JDBC Connector)
- Salesforce Einstein * This source is not included in any Stardog trial offering and is only available upon request
- SAP ByDesign * This source is not included in any Stardog trial offering and is only available upon request
- SAP Netweaver Gateway * This source is not included in any Stardog trial offering and is only available upon request
- Sugar CRM * This source is not included in any Stardog trial offering and is only available upon request
- Veeva CRM * This source is not included in any Stardog trial offering and is only available upon request
Data Analytics
- Adobe Analytics * This source is not included in any Stardog trial offering and is only available upon request
Supported Client Drivers
To connect to your data sources, Stardog requires you to supply Stardog with the appropriate Client Driver. You need to manually copy the JAR file containing the driver to the <stardog_installation_directory>/server/dbms/
directory (or to the location pointed to by the STARDOG_EXT
environment variable if you have set one) so that it will be available to the Stardog server.
Stardog is tested against all supported databases using the following drivers. Stardog requires a JDBC 4.2 compatible client driver. While other drivers may work, your mileage may vary. For best results, set the sql.dialect
virtual graph option when using unsupported drivers.
Specific Data Source Considerations
AWS Athena Virtual Graph Considerations
To connect to an Athena database, first download the driver for JDBC version 4.2 here. Then follow the instructions here.
The Athena JDBC driver does not expose the list of accessible databases. To save having to qualify every table name with the database name in your mappings, provide the default database in your connection URL using the Schema
parameter. For example:
jdbc.url=jdbc:awsathena://athena.us-west.amazonaws.com:443;S3OutputLocation=s3://mybucket/output;Schema=mydb
Athena does not support primary (or unique) keys. This can negatively impact Stardog’s ability to create optimal queries. Use the unique.key.sets
virtual graph option to define unique columns manually.
Google BigQuery Virtual Graph Considerations
To connect to a BigQuery database, first download the driver for JDBC version 4.2 here. Then follow the instructions here. You’ll need the following jar files:
GoogleBigQueryJDBC42.jar
google-api-client-1.28.0.jar
google-auth-library-oauth2-http-0.13.0.jar
gax-1.42.0.jar
google-api-services-bigquery-v2-rev426-1.25.0.jar
avro-1.9.0.jar
The connection string requires a number of parameters. For example:
jdbc.url=jdbc:bigquery://https://bigquery.googleapis.com/bigquery/v2;ProjectId=myproject;DefaultDataset=mydataset;OAuthType=0;OAuthServiceAcctEmail=vgtestaccount@myacct.iam.gserviceaccount.com;OAuthPvtKeyPath=/path/creds/bq_pk.json;Timeout=60
#jdbc.username=
#jdbc.password=
Note the jdbc.username
and jdbc.password
properties are not set. An example /path/creds/bq_pk.json
file looks like this:
{
"type": "service_account",
"project_id": "myproject",
"private_key_id": "363287712aaedb3cd20ebdc11213d43a15760768",
"private_key": "-----BEGIN PRIVATE KEY-----\nBRIngMEMybROWnpANTsb4THenEXTbaTTLe/or\Ill8mYlUNC5+Din2tH...3er45/2ed\53fg\g+d##\n-----END PRIVATE KEY-----\n",
"client_email": "myaccount@myid.iam.gserviceaccount.com",
"client_id": "12345678987654321239",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/myaccount%40myid.iam.gserviceaccount.com"
}
See this for details. Note that alternative means of providing credentials may require additional jar files.
BigQuery does not support primary (or unique) keys. This can negatively impact Stardog’s ability to create optimal queries. Use the unique.key.sets
virtual graph option to define unique columns manually.
MongoDB Virtual Graph Considerations
To connect to a MongoDB database, first download the client jar. The client jar for MongoDB version “x.y.z” can be obtained from http://central.maven.org/maven2/org/mongodb/mongo-java-driver/x.y.z/mongo-java-driver-x.y.z.jar
Then follow the instructions here.
MongoDB has one Date type. It is stored as a 64-bit integer that represents the number of milliseconds since Jan 1, 1970, Universal Time Coordinated (UTC). This Date type can be mapped to xsd:date
, xsd:dateTime
or xsd:dateTimeStamp
data types. (The xsd:dateTimeStamp
data type is the same as xsd:dateTime
except instead of having an optional timezone the timezone is required.) When a Date is mapped to either xsd:date
or xsd:dateTimeStamp
, it will be represented in the UTC timezone. When a Date field is mapped to an xsd:dateTime
, the Date will be converted to the local timezone of the Stardog server and the label will include the timezone.
Setting Unique Keys Manually
The unique.key.sets
option can be used with MongoDB, but the format is MongoDB-specific. In place of schema and table names, the unique keys must be specified in terms of the collection and a list of nested arrays.
For example, take an accounts
virtual graph with this SMS2 mapping:
prefix : <http://example.com/>
MAPPING <urn:accounts>
FROM JSON {
"accounts" : {
"_id" : "?id",
"acct" : "?acctNum",
"customerName" : [ "?name" ],
"card" : [ {
"number" : "?ccNumber",
"expiration" : "?ccExpr" }
]
}
}
TO {
?holder :hasAcct ?acct .
?holder :hasName ?name .
}
WHERE {
BIND (template("http://example.com/acct/{acctNum}") AS ?acct)
BIND (template("http://example.com/holder/{ccNumber}_{name}") AS ?holder)
}
And this query:
SELECT * {
graph <virtual://accounts> {
?holder :hasAcct ?acct .
?holder :hasName ?name .
}
}
When Stardog translates this query, it creates a flattened view of the collection (using the $unwind stage) giving a relational view. In this example both the customerName
and card
arrays will be flattened because both are referenced in the template for the ?holder
variable.
The plan for the example query will include a join because Stardog has no way of knowing that the card.number
/customerName
pair is unique. If we know that this pair of fields is indeed unique in this collection, we can make the query more efficient by adding the pair as a unique key to the unique.key.sets
property:
unique.key.sets=(accounts.[card;customerName].customerName,accounts.[card;customerName].card.number)
It is required that the flattened arrays are listed in alphabetical order, are separated by semicolons, and are enclosed in square brackets. For nested arrays, use periods to delimit the names (level1.level2
).
Multiple key sets can be separated with commas. For example, if we also know that the acct
field is unique, the property value becomes:
unique.key.sets=(accounts.[].acct),(accounts.[card;customerName].customerName,accounts.[card;customerName].card.number)
Read more about Native MongoDB Support
Elasticsearch Virtual Graph Considerations
To create an Elasticsearch virtual graph, you need to download the Elasticsearch client jar along with two supporting jars. The client jar for Elasticsearch version “x.y.z” can be obtained from https://repo1.maven.org/maven2/org/elasticsearch/client/elasticsearch-rest-client/x.y.z/elasticsearch-rest-client-x.y.z.jar
Two supporting jars are also required:
Supported Field Types
Stardog supports the following Elasticsearch field types:
keyword
long
integer
short
byte
double
float
half_float
scaled_float
date
Note that only the keyword
data type is supported for strings. Strings indexed as text
cannot be mapped for the purpose of SPARQL/GraphQL querying.
Virtual Graph Mappings for Elasticsearch
To create virtual graph mappings for Elasticsearch, use SMS2 mapping syntax with the FROM JSON
clause.
There are two types of mappings being discussed here: Stardog Virtual Graph mappings that describe how Elasticsearch fields are mapped to RDF and Elasticsearch mappings that define a schema for an Elasticsearch index.
For an index named simple
with a single Elasticsearch mapping type named _doc
and an Elasticsearch mapping like:
{
"simple" : {
"mappings" : {
"_doc" : {
"properties" : {
"StudentId" : {
"type" : "integer"
},
"StudentName" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
an example Stardog mapping could look like this:
prefix ex: <http://example.com/>
MAPPING <urn:example>
FROM JSON {
"simple._doc":{
"_id": "?docId",
"StudentId": "?id",
"StudentName.keyword": "?name"
}
}
TO {
?studentIri a ex:Student ;
rdfs:label ?name ;
ex:esId ?docId .
}
WHERE {
bind(template("http://example.com/{id}") as ?studentIri)
}
The top-level key in the FROM JSON
(simple._doc
) is formed by joining the index name and the Elasticsearch mapping type with a period. This is similar to the schemaName.tableName
convention that is used for SQL databases. As a shorthand, for indexes with only one mapping type, the mapping type can be omitted. In this example, simple._doc
can be replaced with simple
assuming _doc
is the only mapping type.
For Elasticsearch versions 6 and later, indexes are allowed only one mapping type, where the name of the mapping type defaults to _doc
. For version 5 it is possible for an index to have more than one mapping type. See Removal of mapping types in the Elasticsearch online documentation for details.
Notice in the above example that the built-in _id
field is mapped. Stardog knows that the _id
field is unique across all documents in the index and it uses this information to simplify the queries it generates. Stardog is not able to determine the uniqueness of any other fields but if you know certain fields (or combinations of fields) are unique you can indicate which field sets are unique in the configuration options.
For example, suppose we know that StudentId
is in fact a unique field. We can tell Stardog so by setting the unique.key.sets
configuration option:
unique.key.sets=(simple._doc.StudentId)
or if the simple
index has only the one mapping type:
unique.key.sets=(simple.StudentId)
Automatically Generating Mappings
Elasticsearch indexes have well-defined schemas. Stardog can use that schema information to automatically generate virtual graph mappings to RDF. By default, the generated templates for the IRIs will be based on the Elasticsearch mapping type names, which are _doc
for all indexes on recent versions of Elasticsearch. This makes the IRIs difficult to distinguish. To address this, Stardog defaults the schema.in.generated.mappings
to true
when generating virtual graph mappings for Elasticsearch.
Apache Cassandra Virtual Graph Considerations
To create a Cassandra virtual graph, you’ll need to first download the shaded Cassandra client jar. The client jar for Cassandra version “x.y.z” can be obtained from http://central.maven.org/maven2/com/datastax/cassandra/cassandra-driver-core/x.y.z/cassandra-driver-core-x.y.z-shaded.jar
Cassandra is special in the way it attempts to prevent users from distributing queries over a large number of server nodes. If you have experience with CQL queries, you have no doubt seen the ubiquitous error message:
Cannot execute this query as it might involve data filtering and thus may have
unpredictable performance. If you want to execute this query despite the performance
unpredictability, use ALLOW FILTERING
This reflects the Cassandra modeling principle that favors writing the same data to multiple tables (perhaps through the use of Materialized Views), where each table is optimized for answering different queries.
In order to support as many queries as possible, we recommend creating mappings to each of these tables and letting Stardog choose which mappings apply for each query. It is possible that no mappings can support a particular query. In such cases, Stardog will write an entry to the log file and return no results.
This is the default behavior, which can be changed by setting the cassandra.allow.filtering
option to true
. When set, Stardog will include the ALLOW FILTERING
clause at the end of each CQL query. Please note that the use of this option is highly discouraged in large-scale production environments.
Cassandra is also special for how SQL-like its query language is (for a NoSQL database). As this is the case, Stardog supports the use of SQL queries in the mappings files for Cassandra virtual graphs. That is, you can use the rr:sqlQuery
predicate for R2RML mappings, the sm:query
predicate for Stardog Mapping Syntax, or the FROM SQL
clause for Stardog Mapping Syntax 2. In all cases, you can supply a SQL query to describe a view to use for a virtual graph mapping, however, the SQL query can only contain operators that are supported in CQL - no joins, subqueries, SQL functions, etc. are allowed.
Stardog/SPARQL Virtual Graph Considerations
To create Stardog Engine or SPARQL service virtual graph, the properties file should specify values for the following properties:
Property | Description |
---|---|
sparql.url | SPARQL endpoint with database specified, for example, http://myhost:26023/testdb/query |
sparql.username | username to access the SPARQL endpoint |
sparql.password | password to access the SPARQL endpoint |
sparql.graphname | boolean value for specifying whether or not the input file has a header line at the beginning |
sparql.statsbasedoptimization | boolean value to enable/disable statistics based optimization, only valid for Virtual Graphs to other Stardog servers |
Virtual graph mappings are not supported.
Importing Text Files
The same Virtual Graph commands and mappings that are used for creating virtual graphs can be used to import data from delimited (CSV or TSV) and JSON files. It is not truly virtual, but is part of our Virtual Graph APIs and docs because it shares the same mappings syntax.
The mappings files for importing text files must be expressed in SMS2.
Unlike all other Virtual Graph data sources, the WHERE
clause in SMS2 mappings for text files supports any SPARQL query function when BIND
-ing transformed values to new variables. This includes UNNEST
.
Importing CSV Files
To import a CSV file, provide the file as the last argument to the import command:
$ stardog-admin virtual import myDB cars.sms cars.csv
If the input file is using different kind of separators, e.g. tab character, a properties file can be provided:
$ stardog-admin virtual import myDB cars.properties cars.sms cars.tsv
The properties file for CSVs can specify values for the following virtual graph properties:
Property | Description |
---|---|
csv.separator | character for separating fields |
csv.quote | used for strings that contain field separators |
csv.escape | character for escaping special characters |
csv.header | boolean value for specifying whether or not the input file has a header line at the beginning |
csv.hash.function | the hash function to use for fields prefixed with a # |
The csv.escape
character is used as an alternative to the csv.quote
character. To escape a csv.quote
character within a string that is enclosed in csv.quote
characters, use two consecutive csv.quote
characters. Do not set csv.escape
to the csv.quote
character.
Note that whitespace characters in the Java properties file format need to be escaped so if you want to import tab-separated value files set csv.separator=\t
in the properties file.
In addition to directly referencing columns by name, CSV mappings may include a special ?_ROW_NUMBER_
variable used to obtain the current line number.
The mappings for delimited files can be automatically generated given a couple additional properties:
Property | Description |
---|---|
csv.class | indicate the class, or rdf:type , to use for the subjects of each row |
unique.key.sets | the set of columns that uniquely identify each row |
To import with automatically generated mappings, omit the command line argument for the mappings file:
$ stardog-admin virtual import myDB cars.properties cars.csv
There is a complete example available in our examples repo.
Importing JSON Files
To import a JSON file, provide the file name as the final argument to the import command:
$ stardog-admin virtual import myDB bitcoin.sms bitcoin.json
Here is an example JSON file:
{
"hash": "00000000000000000028484e3ba77273ebd245f944e574e1d4038d9247a7ff8e",
"time": 1569266867591,
"block_index": 1762564,
"height": 575144,
"txIndexes": [
445123952,
445058113,
445054577,
445061250
]
}
and a corresponding SMS2 mapping:
PREFIX : <http://example.com/>
mapping
from json {
{
"hash" : "?hash",
"time" : "?time",
"block_index" : "?block_index",
"height" : "?height",
"txIndexes" : [ "?txIndex" ]
}
}
to {
?block a :Block ;
:hash ?hash ;
:time ?dateTime ;
:height ?height ;
:includesTx ?tx .
?tx a :Tx ;
:index ?txIndex .
}
where {
bind(xsd:dateTime(?time) as ?dateTime)
bind(template("http://example.com/tx/{txIndex}") as ?tx)
bind(template("http://example.com/block/{hash}") as ?block)
}
Note how unlike from json
when used with MongoDB, with a JSON file SMS2 mapping there is no MongoDB collection name serving as the key name for the top-level object.