Link Search Menu Expand Document
Start for Free

Virtual Sources

This page discusses the supported data sources for Virtual Graphs and some considerations to take into account when using them.

Page Contents
  1. Supported Data Sources
    1. Relational Databases
    2. NoSQL Databases
    3. SPARQL Engine/Service
    4. Cloud Service
    5. Files/Unstructured Data
    6. CRM
    7. Data Analytics
  2. Supported Client Drivers
  3. Specific Data Source Considerations
    1. AWS Athena Virtual Graph Considerations
    2. Google BigQuery Virtual Graph Considerations
    3. MongoDB Virtual Graph Considerations
      1. Setting Unique Keys Manually
    4. Elasticsearch Virtual Graph Considerations
      1. Supported Field Types
      2. Virtual Graph Mappings for Elasticsearch
      3. Automatically Generating Mappings
    5. Apache Cassandra Virtual Graph Considerations
    6. Stardog/SPARQL Virtual Graph Considerations
    7. Importing Text Files
      1. Importing CSV Files
      2. Importing JSON Files

Supported Data Sources

Stardog currently supports all the data sources below. Please inquire if you need support for another.

Sources with a * are not included in any Stardog trial offering and are only available upon request.

Relational Databases

  • Apache Hive
  • Apache/Cloudera Impala
  • AWS Athena (additional info)
  • AWS Aurora
  • AWS Redshift
  • Derby
  • Exasol
  • Google BigQuery (additional info)
  • H2
  • IBM DB2
  • MariaDB
  • Microsoft SQL Server
  • MySQL
  • Odata
    * This source is not included in any Stardog trial offering and is only available upon request
  • Oracle
  • PostgreSQL
  • SAP Business One DI
    * This source is not included in any Stardog trial offering and is only available upon request
  • SAP HANA
  • Sybase ASE
  • Teradata

NoSQL Databases

SPARQL Engine/Service

Cloud Service

  • Active Directory
    * This source is not included in any Stardog trial offering and is only available upon request
  • AWS Management
    * This source is not included in any Stardog trial offering and is only available upon request
  • Azure Management
    * This source is not included in any Stardog trial offering and is only available upon request
  • Facebook
    * This source is not included in any Stardog trial offering and is only available upon request
  • Hubspot
    * This source is not included in any Stardog trial offering and is only available upon request
  • Instagram
    * This source is not included in any Stardog trial offering and is only available upon request
  • Jira
    * This source is not included in any Stardog trial offering and is only available upon request
  • LDAP
    * This source is not included in any Stardog trial offering and is only available upon request
  • Linkedin
    * This source is not included in any Stardog trial offering and is only available upon request
  • Marketo
    * This source is not included in any Stardog trial offering and is only available upon request
  • Microsoft Teams
    * This source is not included in any Stardog trial offering and is only available upon request
  • Oracle Eloqua
    * This source is not included in any Stardog trial offering and is only available upon request
  • Oracle SalesCloud
    * This source is not included in any Stardog trial offering and is only available upon request
  • Salesforce Chatter
    * This source is not included in any Stardog trial offering and is only available upon request
  • Salesforce Marketing
    * This source is not included in any Stardog trial offering and is only available upon request
  • Salesforce Pardot
    * This source is not included in any Stardog trial offering and is only available upon request
  • SAP
    * This source is not included in any Stardog trial offering and is only available upon request
  • SAP SuccessFactors
    * This source is not included in any Stardog trial offering and is only available upon request
  • ServiceNow
    * This source is not included in any Stardog trial offering and is only available upon request
  • Slack
    * This source is not included in any Stardog trial offering and is only available upon request
  • Splunk
    * This source is not included in any Stardog trial offering and is only available upon request
  • Slack
    * This source is not included in any Stardog trial offering and is only available upon request
  • Twilio
    * This source is not included in any Stardog trial offering and is only available upon request
  • Veeva
    * This source is not included in any Stardog trial offering and is only available upon request
  • Zendesk
    * This source is not included in any Stardog trial offering and is only available upon request

Files/Unstructured Data

  • Box
    * This source is not included in any Stardog trial offering and is only available upon request
  • CSV (additional info)
  • Dropbox
    * This source is not included in any Stardog trial offering and is only available upon request
  • Excel
    * This source is not included in any Stardog trial offering and is only available upon request
  • Excel Online
    * This source is not included in any Stardog trial offering and is only available upon request
  • Excel Services
    * This source is not included in any Stardog trial offering and is only available upon request
  • Gmail
    * This source is not included in any Stardog trial offering and is only available upon request
  • Google Calendar
    * This source is not included in any Stardog trial offering and is only available upon request
  • Google Contacts
    * This source is not included in any Stardog trial offering and is only available upon request
  • Google Drive
    * This source is not included in any Stardog trial offering and is only available upon request
  • Google Sheets
    * This source is not included in any Stardog trial offering and is only available upon request
  • JSON (additional info)
  • Microsoft CDS
    * This source is not included in any Stardog trial offering and is only available upon request
  • Microsoft Exchange
    * This source is not included in any Stardog trial offering and is only available upon request
  • Microsoft OneDrive
    * This source is not included in any Stardog trial offering and is only available upon request
  • Microsoft OneNote
    * This source is not included in any Stardog trial offering and is only available upon request
  • Microsoft Planner
    * This source is not included in any Stardog trial offering and is only available upon request
  • Microsoft Project
    * This source is not included in any Stardog trial offering and is only available upon request
  • Office365
    * This source is not included in any Stardog trial offering and is only available upon request
  • Parquet
    * This source is not included in any Stardog trial offering and is only available upon request
  • REST
    * This source is not included in any Stardog trial offering and is only available upon request
  • Sharepoint
    * This source is not included in any Stardog trial offering and is only available upon request

CRM

  • Dynamics 365 Sales
    * This source is not included in any Stardog trial offering and is only available upon request
  • Dynamics CRM
    * This source is not included in any Stardog trial offering and is only available upon request
  • Netsuite
    * This source is not included in any Stardog trial offering and is only available upon request
  • Odoo
    * This source is not included in any Stardog trial offering and is only available upon request
  • Salesforce (via Simba Salesforce.com JDBC Connector)
  • Salesforce Einstein
    * This source is not included in any Stardog trial offering and is only available upon request
  • SAP ByDesign
    * This source is not included in any Stardog trial offering and is only available upon request
  • SAP Netweaver Gateway
    * This source is not included in any Stardog trial offering and is only available upon request
  • Sugar CRM
    * This source is not included in any Stardog trial offering and is only available upon request
  • Veeva CRM
    * This source is not included in any Stardog trial offering and is only available upon request

Data Analytics

  • Adobe Analytics
    * This source is not included in any Stardog trial offering and is only available upon request

Supported Client Drivers

To connect to your data sources, Stardog requires you to supply Stardog with the appropriate Client Driver. You need to manually copy the JAR file containing the driver to the <stardog_installation_directory>/server/dbms/ directory (or to the location pointed to by the STARDOG_EXT environment variable if you have set one) so that it will be available to the Stardog server.

Stardog is tested against all supported databases using the following drivers. Stardog requires a JDBC 4.2 compatible client driver. While other drivers may work, your mileage may vary. For best results, set the sql.dialect virtual graph option when using unsupported drivers.

Specific Data Source Considerations

AWS Athena Virtual Graph Considerations

To connect to an Athena database, first download the driver for JDBC version 4.2 here. Then follow the instructions here.

The Athena JDBC driver does not expose the list of accessible databases. To save having to qualify every table name with the database name in your mappings, provide the default database in your connection URL using the Schema parameter. For example:

jdbc.url=jdbc:awsathena://athena.us-west.amazonaws.com:443;S3OutputLocation=s3://mybucket/output;Schema=mydb

Athena does not support primary (or unique) keys. This can negatively impact Stardog’s ability to create optimal queries. Use the unique.key.sets virtual graph option to define unique columns manually.


Google BigQuery Virtual Graph Considerations

To connect to a BigQuery database, first download the driver for JDBC version 4.2 here. Then follow the instructions here. You’ll need the following jar files:

  • GoogleBigQueryJDBC42.jar
  • google-api-client-1.28.0.jar
  • google-auth-library-oauth2-http-0.13.0.jar
  • gax-1.42.0.jar
  • google-api-services-bigquery-v2-rev426-1.25.0.jar
  • avro-1.9.0.jar

The connection string requires a number of parameters. For example:

jdbc.url=jdbc:bigquery://https://bigquery.googleapis.com/bigquery/v2;ProjectId=myproject;DefaultDataset=mydataset;OAuthType=0;OAuthServiceAcctEmail=vgtestaccount@myacct.iam.gserviceaccount.com;OAuthPvtKeyPath=/path/creds/bq_pk.json;Timeout=60
#jdbc.username=
#jdbc.password=

Note the jdbc.username and jdbc.password properties are not set. An example /path/creds/bq_pk.json file looks like this:

{
  "type": "service_account",
  "project_id": "myproject",
  "private_key_id": "363287712aaedb3cd20ebdc11213d43a15760768",
  "private_key": "-----BEGIN PRIVATE KEY-----\nBRIngMEMybROWnpANTsb4THenEXTbaTTLe/or\Ill8mYlUNC5+Din2tH...3er45/2ed\53fg\g+d##\n-----END PRIVATE KEY-----\n",
  "client_email": "myaccount@myid.iam.gserviceaccount.com",
  "client_id": "12345678987654321239",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://oauth2.googleapis.com/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/myaccount%40myid.iam.gserviceaccount.com"
}

See this for details. Note that alternative means of providing credentials may require additional jar files.

BigQuery does not support primary (or unique) keys. This can negatively impact Stardog’s ability to create optimal queries. Use the unique.key.sets virtual graph option to define unique columns manually.

MongoDB Virtual Graph Considerations

To connect to a MongoDB database, first download the client jar. The client jar for MongoDB version “x.y.z” can be obtained from http://central.maven.org/maven2/org/mongodb/mongo-java-driver/x.y.z/mongo-java-driver-x.y.z.jar Then follow the instructions here.

MongoDB has one Date type. It is stored as a 64-bit integer that represents the number of milliseconds since Jan 1, 1970, Universal Time Coordinated (UTC). This Date type can be mapped to xsd:date, xsd:dateTime or xsd:dateTimeStamp data types. (The xsd:dateTimeStamp data type is the same as xsd:dateTime except instead of having an optional timezone the timezone is required.) When a Date is mapped to either xsd:date or xsd:dateTimeStamp, it will be represented in the UTC timezone. When a Date field is mapped to an xsd:dateTime, the Date will be converted to the local timezone of the Stardog server and the label will include the timezone.

Setting Unique Keys Manually

The unique.key.sets option can be used with MongoDB, but the format is MongoDB-specific. In place of schema and table names, the unique keys must be specified in terms of the collection and a list of nested arrays.

For example, take an accounts virtual graph with this SMS2 mapping:

prefix : <http://example.com/>

MAPPING <urn:accounts>
FROM JSON {
  "accounts" : {
    "_id" : "?id",
    "acct" : "?acctNum",
    "customerName" : [ "?name" ],
    "card" : [ {
      "number" : "?ccNumber",
      "expiration" : "?ccExpr" }
    ]
  }
}
TO {
  ?holder :hasAcct ?acct .
  ?holder :hasName ?name .
}
WHERE {
  BIND (template("http://example.com/acct/{acctNum}") AS ?acct)
  BIND (template("http://example.com/holder/{ccNumber}_{name}") AS ?holder)
}

And this query:

SELECT * {
  graph <virtual://accounts> {
    ?holder :hasAcct ?acct .
    ?holder :hasName ?name .
  }
}

When Stardog translates this query, it creates a flattened view of the collection (using the $unwind stage) giving a relational view. In this example both the customerName and card arrays will be flattened because both are referenced in the template for the ?holder variable.

The plan for the example query will include a join because Stardog has no way of knowing that the card.number/customerName pair is unique. If we know that this pair of fields is indeed unique in this collection, we can make the query more efficient by adding the pair as a unique key to the unique.key.sets property:

unique.key.sets=(accounts.[card;customerName].customerName,accounts.[card;customerName].card.number)

It is required that the flattened arrays are listed in alphabetical order, are separated by semicolons, and are enclosed in square brackets. For nested arrays, use periods to delimit the names (level1.level2).

Multiple key sets can be separated with commas. For example, if we also know that the acct field is unique, the property value becomes:

unique.key.sets=(accounts.[].acct),(accounts.[card;customerName].customerName,accounts.[card;customerName].card.number)

Read more about Native MongoDB Support


Elasticsearch Virtual Graph Considerations

To create an Elasticsearch virtual graph, you need to download the Elasticsearch client jar along with two supporting jars. The client jar for Elasticsearch version “x.y.z” can be obtained from https://repo1.maven.org/maven2/org/elasticsearch/client/elasticsearch-rest-client/x.y.z/elasticsearch-rest-client-x.y.z.jar

Two supporting jars are also required:

  1. http async client
  2. http core nio

Supported Field Types

Stardog supports the following Elasticsearch field types:

  • keyword
  • long
  • integer
  • short
  • byte
  • double
  • float
  • half_float
  • scaled_float
  • date

Note that only the keyword data type is supported for strings. Strings indexed as text cannot be mapped for the purpose of SPARQL/GraphQL querying.

Virtual Graph Mappings for Elasticsearch

To create virtual graph mappings for Elasticsearch, use SMS2 mapping syntax with the FROM JSON clause.

There are two types of mappings being discussed here: Stardog Virtual Graph mappings that describe how Elasticsearch fields are mapped to RDF and Elasticsearch mappings that define a schema for an Elasticsearch index.

For an index named simple with a single Elasticsearch mapping type named _doc and an Elasticsearch mapping like:

{
  "simple" : {
    "mappings" : {
      "_doc" : {
        "properties" : {
          "StudentId" : {
            "type" : "integer"
          },
          "StudentName" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          }
        }
      }
    }
  }
}

an example Stardog mapping could look like this:

prefix ex: <http://example.com/>

MAPPING <urn:example>
FROM JSON {
  "simple._doc":{
    "_id": "?docId",
    "StudentId": "?id",
    "StudentName.keyword": "?name"
  }
}
TO {
  ?studentIri a ex:Student ;
    rdfs:label ?name ;
    ex:esId ?docId .
}
WHERE {
  bind(template("http://example.com/{id}") as ?studentIri)
}

The top-level key in the FROM JSON (simple._doc) is formed by joining the index name and the Elasticsearch mapping type with a period. This is similar to the schemaName.tableName convention that is used for SQL databases. As a shorthand, for indexes with only one mapping type, the mapping type can be omitted. In this example, simple._doc can be replaced with simple assuming _doc is the only mapping type.

For Elasticsearch versions 6 and later, indexes are allowed only one mapping type, where the name of the mapping type defaults to _doc. For version 5 it is possible for an index to have more than one mapping type. See Removal of mapping types in the Elasticsearch online documentation for details.

Notice in the above example that the built-in _id field is mapped. Stardog knows that the _id field is unique across all documents in the index and it uses this information to simplify the queries it generates. Stardog is not able to determine the uniqueness of any other fields but if you know certain fields (or combinations of fields) are unique you can indicate which field sets are unique in the configuration options.

For example, suppose we know that StudentId is in fact a unique field. We can tell Stardog so by setting the unique.key.sets configuration option:

unique.key.sets=(simple._doc.StudentId)

or if the simple index has only the one mapping type:

unique.key.sets=(simple.StudentId)

Automatically Generating Mappings

Elasticsearch indexes have well-defined schemas. Stardog can use that schema information to automatically generate virtual graph mappings to RDF. By default, the generated templates for the IRIs will be based on the Elasticsearch mapping type names, which are _doc for all indexes on recent versions of Elasticsearch. This makes the IRIs difficult to distinguish. To address this, Stardog defaults the schema.in.generated.mappings to true when generating virtual graph mappings for Elasticsearch.


Apache Cassandra Virtual Graph Considerations

To create a Cassandra virtual graph, you’ll need to first download the shaded Cassandra client jar. The client jar for Cassandra version “x.y.z” can be obtained from http://central.maven.org/maven2/com/datastax/cassandra/cassandra-driver-core/x.y.z/cassandra-driver-core-x.y.z-shaded.jar

Cassandra is special in the way it attempts to prevent users from distributing queries over a large number of server nodes. If you have experience with CQL queries, you have no doubt seen the ubiquitous error message:

Cannot execute this query as it might involve data filtering and thus may have 
unpredictable performance. If you want to execute this query despite the performance
unpredictability, use ALLOW FILTERING

This reflects the Cassandra modeling principle that favors writing the same data to multiple tables (perhaps through the use of Materialized Views), where each table is optimized for answering different queries.

In order to support as many queries as possible, we recommend creating mappings to each of these tables and letting Stardog choose which mappings apply for each query. It is possible that no mappings can support a particular query. In such cases, Stardog will write an entry to the log file and return no results.

This is the default behavior, which can be changed by setting the cassandra.allow.filtering option to true. When set, Stardog will include the ALLOW FILTERING clause at the end of each CQL query. Please note that the use of this option is highly discouraged in large-scale production environments.

Cassandra is also special for how SQL-like its query language is (for a NoSQL database). As this is the case, Stardog supports the use of SQL queries in the mappings files for Cassandra virtual graphs. That is, you can use the rr:sqlQuery predicate for R2RML mappings, the sm:query predicate for Stardog Mapping Syntax, or the FROM SQL clause for Stardog Mapping Syntax 2. In all cases, you can supply a SQL query to describe a view to use for a virtual graph mapping, however, the SQL query can only contain operators that are supported in CQL - no joins, subqueries, SQL functions, etc. are allowed.


Stardog/SPARQL Virtual Graph Considerations

To create Stardog Engine or SPARQL service virtual graph, the properties file should specify values for the following properties:

Property Description
sparql.url SPARQL endpoint with database specified, for example, http://myhost:26023/testdb/query
sparql.username username to access the SPARQL endpoint
sparql.password password to access the SPARQL endpoint
sparql.graphname boolean value for specifying whether or not the input file has a header line at the beginning
sparql.statsbasedoptimization boolean value to enable/disable statistics based optimization, only valid for Virtual Graphs to other Stardog servers

Virtual graph mappings are not supported.


Importing Text Files

The same Virtual Graph commands and mappings that are used for creating virtual graphs can be used to import data from delimited (CSV or TSV) and JSON files. It is not truly virtual, but is part of our Virtual Graph APIs and docs because it shares the same mappings syntax.

The mappings files for importing text files must be expressed in SMS2.

Unlike all other Virtual Graph data sources, the WHERE clause in SMS2 mappings for text files supports any SPARQL query function when BIND-ing transformed values to new variables. This includes UNNEST.

Importing CSV Files

To import a CSV file, provide the file as the last argument to the import command:

$ stardog-admin virtual import myDB cars.sms cars.csv

If the input file is using different kind of separators, e.g. tab character, a properties file can be provided:

$ stardog-admin virtual import myDB cars.properties cars.sms cars.tsv

The properties file for CSVs can specify values for the following virtual graph properties:

Property Description
csv.separator character for separating fields
csv.quote used for strings that contain field separators
csv.escape character for escaping special characters
csv.header boolean value for specifying whether or not the input file has a header line at the beginning
csv.hash.function the hash function to use for fields prefixed with a #

The csv.escape character is used as an alternative to the csv.quote character. To escape a csv.quote character within a string that is enclosed in csv.quote characters, use two consecutive csv.quote characters. Do not set csv.escape to the csv.quote character.

Note that whitespace characters in the Java properties file format need to be escaped so if you want to import tab-separated value files set csv.separator=\t in the properties file.

In addition to directly referencing columns by name, CSV mappings may include a special ?_ROW_NUMBER_ variable used to obtain the current line number.

The mappings for delimited files can be automatically generated given a couple additional properties:

Property Description
csv.class indicate the class, or rdf:type, to use for the subjects of each row
unique.key.sets the set of columns that uniquely identify each row

To import with automatically generated mappings, omit the command line argument for the mappings file:

$ stardog-admin virtual import myDB cars.properties cars.csv

There is a complete example available in our examples repo.

Importing JSON Files

To import a JSON file, provide the file name as the final argument to the import command:

$ stardog-admin virtual import myDB bitcoin.sms bitcoin.json

Here is an example JSON file:

{
  "hash": "00000000000000000028484e3ba77273ebd245f944e574e1d4038d9247a7ff8e",
  "time": 1569266867591,
  "block_index": 1762564,
  "height": 575144,
  "txIndexes": [
    445123952,
    445058113,
    445054577,
    445061250
  ]
}

and a corresponding SMS2 mapping:

PREFIX : <http://example.com/>

mapping
from json {
  {
    "hash" : "?hash",
    "time" : "?time",
    "block_index" : "?block_index",
    "height" : "?height",
    "txIndexes" : [ "?txIndex" ]
  }
}
to {
  ?block a :Block ;
    :hash ?hash ;
    :time ?dateTime ;
    :height ?height ;
    :includesTx ?tx .
  ?tx a :Tx ;
    :index ?txIndex .
}
where {
  bind(xsd:dateTime(?time) as ?dateTime)
  bind(template("http://example.com/tx/{txIndex}") as ?tx)
  bind(template("http://example.com/block/{hash}") as ?block)
}

Note how unlike from json when used with MongoDB, with a JSON file SMS2 mapping there is no MongoDB collection name serving as the key name for the top-level object.