Link Search Menu Expand Document
Start for Free

Specific Data Source Considerations

This page covers special considerations to take into account when using certain types of data sources.

Page Contents
  1. AWS Athena Virtual Graph Considerations
  2. Databricks SQL/Spark SQL Virtual Graph Considerations
  3. Snowflake Virtual Graph Considerations
  4. Microsoft Synapse Virtual Graph Considerations
  5. Google BigQuery Virtual Graph Considerations
  6. MongoDB Virtual Graph Considerations
    1. Supported Data Types
    2. Example Mapping
    3. Specifying MongoDB Data Types
    4. Setting Unique Keys Manually
  7. MySQL Virtual Graph Considerations
  8. Elasticsearch Virtual Graph Considerations
    1. Supported Field Types
    2. Virtual Graph Mappings for Elasticsearch
    3. Automatically Generating Mappings
  9. Apache Cassandra Virtual Graph Considerations
  10. PostgreSQL Virtual Graph Considerations
  11. Stardog/SPARQL Virtual Graph Considerations

AWS Athena Virtual Graph Considerations

To connect to an Athena database, first download the driver for JDBC version 4.2 here. Then follow the instructions here.

The Athena JDBC driver does not expose the list of accessible databases. To avoid having to qualify every table name with the database name in your mappings, provide the default database in your connection URL using the Schema parameter. For example:

jdbc.url=jdbc:awsathena://athena.us-west.amazonaws.com:443;S3OutputLocation=s3://mybucket/output;Schema=mydb

Athena does not support primary (or unique) keys. This can negatively impact Stardog’s ability to create optimal queries. Use the unique.key.sets virtual graph option to define unique columns manually.

Note this from the migration guide:

Result Set Streaming Support

Starting with version 2.0.5, the connector uses the result set streaming API to improve the performance in fetching query results. To take advantage of this feature, you must do the following:

  • Include and allow the athena:GetQueryResultsStream action in your IAM policy statement. For details on managing Athena IAM policies, see Identity and access management in Athena.
  • If you are connecting to Athena through a proxy server, make sure that the proxy server does not block port 444. The result set streaming API uses port 444 on the Athena server for outbound communications.

Databricks SQL/Spark SQL Virtual Graph Considerations

To connect to a Spark SQL database, first download the driver for JDBC version 4.2 here. (No driver download is necessary for Databricks.) Then follow the instructions here.

The pooled JDBC connections can become stale if unused for a long period. Further, the default check for a healthy connection can return false positives. To mitigate these problems, the following data source options are recommended:

testOnBorrow=true
validationQuery=SELECT 1

For some environments, the JDBC driver is not able to correctly detect the default schema. This can manifest itself by successfully creating a virtual graph but not generating any mappings. If this occurs, you can manually add the schemas you wish to access using the sql.schemas data source configuration option (e.g. sql.schemas=default).

Databricks Partner Connect requires the user to select the schema they are granting access to through Databricks. It is not possible to edit schemas in Databricks after they have been set. The schema is not passed through to Stardog, so by default sql.schemas is set to *.*. Additional access restrictions to Databricks schemas can be set on the Stardog side, but must be entered manually.

Prior to Stardog version 7.8, generated mappings for Spark SQL data sources could use the wrong identifier quote character (double quote instead of backtick). A workaround was to add the sql.dialect=HIVE data source option. This is no longer necessary, so that option should be removed.

The currently supported version of the JDBC driver (2.6.22) can translate certain queries with LIMIT and OFFSET into invalid queries that reference generated fields that do not exist in the query. This issue can be fixed by including UseNativeQuery=1; in the jdbc.url JDBC connection string.

Stardog uses row-count estimates from Databricks (using a query like show table extended like '<table_name>') to drive query optmization. These estimates can be set by running an ANALYZE TABLE command such as the following:

ANALYZE TABLE table_name COMPUTE STATISTICS;

Stardog can take advantage of the Delta storage provider. To load a delimited CSV file and convert it to Delta format, you can use a set of commands like the following:

CREATE TABLE temp_csv
USING csv OPTIONS (path "/FileStore/departments.csv", header "true");
DROP TABLE IF EXISTS departments;
CREATE TABLE departments USING DELTA AS SELECT * FROM temp_csv;
ALTER TABLE departments CHANGE COLUMN dept_no SET NOT NULL;
ALTER TABLE departments CHANGE COLUMN dept_name SET NOT NULL;
ANALYZE TABLE departments COMPUTE STATISTICS;
DROP TABLE IF EXISTS temp_csv;

The ALTER TABLE commands should only be used for columns where it is known that none of the rows have a missing value for that column. This metadata will allow Stardog to elide NOT NULL filters from its generated SQL queries when querying that column.


Snowflake Virtual Graph Considerations

To connect to a Snowflake database, first download the driver for JDBC version 4.2 here. A typical properties file can look like this:

jdbc.url=jdbc:snowflake://<account_identifier>.snowflakecomputing.com/?<connection_params>
jdbc.username=<user>
jdbc.password=******
jdbc.driver=net.snowflake.client.jdbc.SnowflakeDriver

Where options for account_identifier are described here.

If you use the account locator in a region option, if you use a url like https://ABC12345.us-east-1.snowflakecomputing.com to connect to your console, and if you intend to connect to a database named EXAMPLE, then a basic connection string would be:

jdbc.url=jdbc:snowflake://abc12345.us-east-1.snowflakecomputing.com/?db=EXAMPLE

Follow the remaining instructions here.


Microsoft Synapse Virtual Graph Considerations

Synapse Serverless SQL Pools and Dedicated SQL Pools are both supported, but they are supported as two distinct SQL dialects, as their feature support has significant differences. Stardog will detect which type of pool it is connecting to and select the appropriate dialect. This selection can be overridden by setting the sql.dialect data source option to either sql.dialect=SYNAPSE_SERVERLESS or sql.dialect=SYNAPSE_DEDICATED, though this should not be necessary.


Google BigQuery Virtual Graph Considerations

To connect to a BigQuery database, first download the driver for JDBC version 4.2 here. Then follow the instructions here. You’ll need the following jar files:

  • avro-1.11.1.jar
  • GoogleBigQueryJDBC42.jar
  • google-api-services-bigquery-v2-rev20220730-2.0.0.jar
  • google-cloud-bigquerystorage-2.19.0.jar
  • proto-google-cloud-bigquerystorage-v1-2.19.0.jar
  • grpc-google-cloud-bigquerystorage-v1-2.19.0.jar

NOTE: Stardog versions prior to 9.0.0 used these jars (from here), which should be removed as part of the upgrade:

  • avro-1.9.0.jar
  • gax-1.42.0.jar
  • GoogleBigQueryJDBC42.jar
  • google-api-client-1.28.0.jar
  • google-api-services-bigquery-v2-rev426-1.25.0.jar
  • google-auth-library-oauth2-http-0.13.0.jar

The connection string requires a number of parameters. For example:

jdbc.url=jdbc:bigquery://https://bigquery.googleapis.com/bigquery/v2;ProjectId=myproject;DefaultDataset=mydataset;OAuthType=0;OAuthServiceAcctEmail=vgtestaccount@myacct.iam.gserviceaccount.com;OAuthPvtKeyPath=/path/creds/bq_pk.json;Timeout=60
#jdbc.username=
#jdbc.password=

Note the jdbc.username and jdbc.password properties are not set. An example /path/creds/bq_pk.json file looks like this:

{
  "type": "service_account",
  "project_id": "myproject",
  "private_key_id": "363287712aaedb3cd20ebdc11213d43a15760768",
  "private_key": "-----BEGIN PRIVATE KEY-----\nBRIngMEMybROWnpANTsb4THenEXTbaTTLe/or\Ill8mYlUNC5+Din2tH...3er45/2ed\53fg\g+d##\n-----END PRIVATE KEY-----\n",
  "client_email": "myaccount@myid.iam.gserviceaccount.com",
  "client_id": "12345678987654321239",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://oauth2.googleapis.com/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/myaccount%40myid.iam.gserviceaccount.com"
}

Note that the contents of the JSON file may be included in the JDBC connection string using the OAuthPvtKey connection property. The data source properties for the above example would be:

jdbc.url=jdbc:bigquery://https://bigquery.googleapis.com/bigquery/v2;ProjectId=myproject;DefaultDataset=mydataset;OAuthType=0;OAuthServiceAcctEmail=vgtestaccount@myacct.iam.gserviceaccount.com;Timeout=60;OAuthPvtKey={"type":"service_account","project_id":"myproject","private_key_id":"363287712aaedb3cd20ebdc11213d43a15760768","private_key":"-----BEGIN PRIVATE KEY-----\nBRIngMEMybROWnpANTsb4THenEXTbaTTLe/or\Ill8mYlUNC5+Din2tH...3er45/2ed\53fg\g+d##\n-----END PRIVATE KEY-----\n","client_email":"myaccount@myid.iam.gserviceaccount.com","client_id":"12345678987654321239","auth_uri":"https://accounts.google.com/o/oauth2/auth","token_uri":"https://oauth2.googleapis.com/token","auth_provider_x509_cert_url":"https://www.googleapis.com/oauth2/v1/certs","client_x509_cert_url":"https://www.googleapis.com/robot/v1/metadata/x509/myaccount%40myid.iam.gserviceaccount.com"}
#jdbc.username=
#jdbc.password=

See Configuring Authentication in the configuration guide for details. Note that alternative means of providing credentials may require additional jar files.

BigQuery does not support primary (or unique) keys. This can negatively impact Stardog’s ability to create optimal queries. Use the unique.key.sets data source option to define unique columns manually.

MongoDB Virtual Graph Considerations

To connect to a MongoDB database, first download the client jar. The client jar for MongoDB can be obtained from mongo-java-driver-3.8.2.jar. Then follow the instructions here.

Supported Data Types

Stardog supports mapping to the following MongoDB data types:

  • double
  • string
  • objectId
  • bool
  • date
  • int
  • long
  • decimal

The MongoDB Date type is stored as a 64-bit integer that represents the number of milliseconds since Jan 1, 1970, Universal Time Coordinated (UTC). This Date type can be mapped to xsd:date, xsd:dateTime, or xsd:dateTimeStamp data types. (The xsd:dateTimeStamp data type is the same as xsd:dateTime except instead of having an optional timezone the timezone is required.) When a Date is mapped to either xsd:date or xsd:dateTimeStamp, it will be represented in the UTC timezone. When a Date field is mapped to an xsd:dateTime, the Date will be converted to the local timezone of the Stardog server, and the label will include the timezone.

Example Mapping

Below you will find the example from the page on mapping data sources, but in MongoDB’s Extended JSON v2 syntax. Note the assignment of datePublished to ensure it is stored as an ISODate object within MongoDB upon import.

First, we have the Extended JSON that contains the data:

{
	"_id": "unforgiven",
	"name": "Unforgiven",
	"datePublished":
	{
		"$date": "1992-08-07T00:00:00Z"
	},
	"genre": ["Drama", "Western"],
	"boxOffice": 101157447,
	"description": "Retired gunslinger reluctantly takes on one last job.",
	"director": [{
		"director": "clintEastwood",
		"name": "Clint Eastwood"
	}],
	"actor": [
		{
			"actor": "morganFreeman",
			"name": "Morgan Freeman"
		},
		{
			"actor": "clintEastwood",
			"name": "Clint Eastwood"
		},
		{
			"actor": "geneHackman",
			"name": "Gene Hackman"
		}
	]
}
{
	"_id": "noWayOut",
	"name": "No Way Out",
	"datePublished":
	{
		"$date": "1987-08-14T00:00:00.000Z"
	},
	"genre": ["Action", "Mystery", "Drama", "Thriller"],
	"boxOffice": 35509515,
	"description": "A coverup and witchhunt occur after a politician accidentally kills his mistress.",
	"director": [{
		"director": "rogerDonaldson",
		"name": "Roger Donaldson"
	}],
	"actor": [
		{
			"actor": "geneHackman",
			"name": "Gene Hackman"
		},
		{
			"actor": "kevinCostner",
			"name": "Kevin Costner"
		}
	]
}

We will then create a mapping that will represent this data as the following RDF:

@prefix : <http://stardog.com/movies/> .

:Title_noWayOut a :Movie ;
    :name "No Way Out" ;
    :datePublished "1987-08-14"^^xsd:date ;
    :boxOffice 35509515 ;
    :description "A coverup and witchhunt occur after a politician accidentally kills his mistress." ;
    :genre "Action", "Mystery", "Drama", "Thriller" ;
    :directed :Job_noWayOut_rogerDonaldson ;
    :actedIn :Job_noWayOut_geneHackman, :Job_noWayOut_kevinCostner .

:Title_unforgiven a :Movie ;
    :name "Unforgiven" ;
    :datePublished "1992-08-07"^^xsd:date ;
    :boxOffice 101157447 ;
    :description "Retired gunslinger reluctantly takes on one last job." ;
    :genre "Drama", "Western" ;
    :directed :Job_unforgiven_clintEastwood ;
    :actedIn :Job_unforgiven_morganFreeman, :Job_unforgiven_clintEastwood, :Job_unforgiven_geneHackman .

:Job_noWayOut_rogerDonaldson a :DirectedMovie ;
    :name "Roger Donaldson" ;
    :director :Name_rogerDonaldson .

:Name_rogerDonaldson a :Person .

:Job_unforgiven_clintEastwood a :DirectedMovie ;
    :name "Clint Eastwood" ;
    :director :Name_clintEastwood .

:Job_unforgiven_clintEastwood a :ActedInMovie ;
    :name "Clint Eastwood" ;
    :actor :Name_clintEastwood .

:Name_clintEastwood a :Person .

:Job_noWayOut_geneHackman a :ActedInMovie ;
    :name "Gene Hackman" ;
    :actor :Name_geneHackman .

:Job_unforgiven_geneHackman a :ActedInMovie ;
    :name "Gene Hackman" ;
    :actor :Name_geneHackman .

:Name_geneHackman a :Person .

:Job_noWayOut_kevinCostner a :ActedInMovie ;
    :name "Kevin Costner" ;
    :actor :Name_kevinCostner .

:Name_kevinCostner a :Person .

:Job_unforgiven_morganFreeman a :ActedInMovie ;
    :name "Morgan Freeman" ;
    :actor :Name_morganFreeman .

:Name_morganFreeman a :Person .

When mapping MongoDB to a virtual graph, there is an outermost key that should be the name of the MongoDB collection that is being mapped. In this case, it is the "movie" key. This also applies to Cosmos DB. The SMS2 mapping for this data is the following:

FROM JSON {
    "movie":{ // this movie key is for the collection name, it is not in each document
      "_id":"?movieId",
      "name":"?name",
      "datePublished": {
          "$date": "?datePublished"
      },
      "genre":["?genre"],
      "boxOffice":"?boxOffice",
      "description":"?description",
      "director":[ {
          "director":"?directorId",
          "name":"?directorName"
        }
      ],
      "actor":[ {
          "actor":"?actorId",
          "name":"?actorName"
        }
      ]
    }
}
TO {
  ?movie a :Movie ;
    :name ?name ;
    :datePublished ?xsdDatePublished ;
    :genre ?genre ;
    :boxOffice ?boxOfficeNum ;
    :description ?description ;
    :directed ?directedMovie ;
    :actedIn ?actedInMovie .

  ?directedMovie a :DirectedMovie ;
    :director ?director ;
    :name ?directorName .

  ?director a :Person .

  ?actedInMovie a :ActedInMovie ;
    :actor ?actor ;
    :name ?actorName .

  ?actor a :Person .
}
WHERE {
  BIND (template("http://stardog.com/movies/Job_{movieId}_{directorId}") AS ?directedMovie)
  BIND (template("http://stardog.com/movies/Job_{movieId}_{actorId}") AS ?actedInMovie)
  BIND (template("http://stardog.com/movies/Title_{movieId}") AS ?movie)
  BIND (template("http://stardog.com/movies/Name_{directorId}") AS ?director)
  BIND (template("http://stardog.com/movies/Name_{actorId}") AS ?actor)
  BIND (xsd:date(?datePublished) AS ?xsdDatePublished)
  BIND (xsd:integer(?boxOffice) as ?boxOfficeNum)
}

Specifying MongoDB Data Types

MongoDB is a schemaless database in the sense that there is no requirement that fields in two documents in the same index have the same data type. The lack of native type information can make it difficult for Stardog to generate the correct query. For example, consider a MongoDB document for a real estate property:

{
  "_id" : {
    "$oid": "6177259280bfc133ee850abd"
  },
  "lotSize" : {
    "$numberDecimal": "0.2317"
  }
}

With this mapping:

PREFIX : <http://example.com/>
PREFIX prop: <http://example.com/property/>

MAPPING
FROM JSON {
  "properties" : {
    "_id" : "?id",
    "lotSize" : "?size",
  }
}
TO {
  ?property a :Property .
  ?property :lotSize ?doubleSize .
}
WHERE {
  BIND (template("http://example.com/property/{id}") AS ?property)
  BIND (strDt(?size, xsd:double) AS ?doubleSize)
}

This will map to the following RDF:

@prefix : <http://example.com/> .
@prefix prop: <http://example.com/property/> .

prop:6177259280bfc133ee850abd a :Property .
prop:6177259280bfc133ee850abd :lotSize "0.2317"^^xsd:double

The SPARQL query for finding this property’s lot size is:

PREFIX : <http://example.com/>
PREFIX prop: <http://example.com/property/>

SELECT * FROM <virtual://properties> {
  prop:6177259280bfc133ee850abd :lotSize ?size .
}

The MongoDB aggregation pipeline for that query will be something like:

{
  $match: {
    "_id": {
      $eq: "6177259280bfc133ee850abd"
    }
  }
}

This query is wrong because it is attempts to match an ObjectID using a string.

The correct query should cast the id to an ObjectID:

{
  $match: {
    "_id": {
      $eq : ObjectID("6177259280bfc133ee850abd")
    }
  }
}

In order for Stardog to generate the query with the proper data types, we must indicate these types in the FROM JSON section of the SMS2 mappings file. Fields are assigned types by appending the type name after the variable name separated by a colon. Supported type names are double, string, objectId, bool, date, int, long, and decimal.

The mapping for our real estate example becomes:

PREFIX : <http://example.com/>
PREFIX prop: <http://example.com/property/>

MAPPING <urn:accounts>
FROM JSON {
  "properties" : {
    "_id" : "?id:objectId",
    "lotSize" : "?size:decimal",
  }
}
TO {
  ?property a :Property .
  ?property :lotSize ?doubleSize .
}
WHERE {
  BIND (template("http://example.com/property/{id}") AS ?property)
  BIND (strDt(?size, xsd:double) AS ?doubleSize)
}

The function of the types in the FROM JSON clause is different from that of the strDt functions (or casts) in the WHERE clause. The former tell Stardog the types of the fields in the native data source, and the latter indicate what types to use when representing those fields as RDF values.

Setting Unique Keys Manually

The unique.key.sets option can be used with MongoDB, but the format is MongoDB-specific. In place of schema and table names, the unique keys must be specified in terms of the collection and a list of (zero or more) nested arrays.

For example, take an accounts virtual graph with this SMS2 mapping:

prefix : <http://example.com/>

MAPPING <urn:accounts>
FROM JSON {
  "accounts" : {
    "_id" : "?id",
    "acct" : "?acctNum",
    "customerName" : [ "?name" ],
    "card" : [ {
      "number" : "?ccNumber",
      "expiration" : "?ccExpr" }
    ]
  }
}
TO {
  ?holder :hasAcct ?acct .
  ?holder :hasName ?name .
}
WHERE {
  BIND (template("http://example.com/acct/{acctNum}") AS ?acct)
  BIND (template("http://example.com/holder/{ccNumber}_{name}") AS ?holder)
}

And this query:

SELECT * {
  graph <virtual://accounts> {
    ?holder :hasAcct ?acct .
    ?holder :hasName ?name .
  }
}

When Stardog translates this query, it creates a flattened view of the collection (using the $unwind stage), giving a relational view. In this example, both the customerName and card arrays will be flattened because both are referenced in the template for the ?holder variable.

The plan for the example query will include a join because Stardog has no way of knowing that the card.number/customerName pair is unique. If we know that this pair of fields is indeed unique in this collection, we can make the query more efficient by adding the pair as a unique key to the unique.key.sets property:

unique.key.sets=(accounts.[card;customerName].customerName,accounts.[card;customerName].card.number)

It is required that the flattened arrays (if any) are listed in alphabetical order, are separated by semicolons, and are enclosed in square brackets. For nested arrays, use periods to delimit the names (level1.level2). For top-level fields and no array references, use an empty []. Multiple key sets can be separated with commas. For example, if we also know that the acct field is unique, the property value becomes:

unique.key.sets=(accounts.[].acct),(accounts.[card;customerName].customerName,accounts.[card;customerName].card.number)

Read more about Native MongoDB Support.


MySQL Virtual Graph Considerations

Though MySQL does not require it, the JDBC connection should include the name of the database:

jdbc.url=jdbc:mysql://localhost/dbname

As of Java 1.8, update 291 TLS 1.0 and 1.1 have been disabled. Using the MySQL connector with this Java version or later requires setting the parameter enabledTLSProtocols=TLSv1.2 on your JDBC connection string. For example:

jdbc.url=jdbc:mysql://localhost/dept?enabledTLSProtocols=TLSv1.2

Alternatively, if you do not wish to use SSL to encrypt traffic between Stardog and MySQL, you can set useSSL=false:

jdbc.url=jdbc:mysql://localhost/dept?useSSL=false

See the Java 8u291 release notes for more details.


Elasticsearch Virtual Graph Considerations

To create an Elasticsearch virtual graph, you need to download the Elasticsearch client jar along with two supporting jars. The client jar for Elasticsearch can be obtained from elasticsearch-rest-client-6.7.1.jar.

Two supporting jars are also required:

  1. http async client
  2. http core nio

Supported Field Types

Stardog supports the following Elasticsearch field types:

  • keyword
  • long
  • integer
  • short
  • byte
  • double
  • float
  • half_float
  • scaled_float
  • date

Note that only the keyword data type is supported for strings. Strings indexed as text cannot be mapped for the purpose of SPARQL/GraphQL querying.

Virtual Graph Mappings for Elasticsearch

To create virtual graph mappings for Elasticsearch, use SMS2 mapping syntax with the FROM JSON clause.

There are two types of mappings being discussed here: Stardog Virtual Graph mappings that describe how Elasticsearch fields are mapped to RDF, and Elasticsearch mappings that define a schema for an Elasticsearch index.

For an index named simple with a single Elasticsearch mapping type named _doc and an Elasticsearch mapping like:

{
  "simple" : {
    "mappings" : {
      "_doc" : {
        "properties" : {
          "StudentId" : {
            "type" : "integer"
          },
          "StudentName" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          }
        }
      }
    }
  }
}

an example Stardog mapping could look like this:

prefix ex: <http://example.com/>

MAPPING <urn:example>
FROM JSON {
  "simple._doc":{
    "_id": "?docId",
    "StudentId": "?id",
    "StudentName.keyword": "?name"
  }
}
TO {
  ?studentIri a ex:Student ;
    rdfs:label ?name ;
    ex:esId ?docId .
}
WHERE {
  bind(template("http://example.com/{id}") as ?studentIri)
}

The top-level key in the FROM JSON (simple._doc) is formed by joining the index name and the Elasticsearch mapping type with a period. This is similar to the schemaName.tableName convention that is used for SQL databases. As a shorthand, for indexes with only one mapping type, the mapping type can be omitted. In this example, simple._doc can be replaced with simple, assuming _doc is the only mapping type.

For Elasticsearch versions 6 and later, indexes are allowed only one mapping type, where the name of the mapping type defaults to _doc. For version 5, it is possible for an index to have more than one mapping type. See Removal of mapping types in the Elasticsearch online documentation for details.

Notice in the above example that the built-in _id field is mapped. Stardog knows that the _id field is unique across all documents in the index, and it uses this information to simplify the queries it generates. Stardog is not able to determine the uniqueness of any other fields, but if you know certain fields (or combinations of fields) are unique, you can indicate which field sets are unique in the configuration options.

For example, suppose we know that StudentId is in fact a unique field. We can tell Stardog so by setting the unique.key.sets configuration option:

unique.key.sets=(simple._doc.StudentId)

or if the simple index has only the one mapping type:

unique.key.sets=(simple.StudentId)

Automatically Generating Mappings

Elasticsearch indexes have well-defined schemas. Stardog can use that schema information to automatically generate virtual graph mappings to RDF. By default, the generated templates for the IRIs will be based on the Elasticsearch mapping type names, which are _doc for all indexes on recent versions of Elasticsearch. This makes the IRIs difficult to distinguish. To address this, Stardog defaults the schema.in.generated.mappings to true when generating virtual graph mappings for Elasticsearch.


Apache Cassandra Virtual Graph Considerations

To create a Cassandra virtual graph, you’ll need to first download the shaded Cassandra client jar. The client jar for Cassandra can be obtained from cassandra-driver-core-3.6.0-shaded.jar.

Cassandra is special in the way it attempts to prevent users from distributing queries over a large number of server nodes. If you have experience with CQL queries, you have no doubt seen the ubiquitous error message:

Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING

This reflects the Cassandra modeling principle that favors writing the same data to multiple tables (perhaps through the use of Materialized Views), where each table is optimized for answering different queries.

In order to support as many queries as possible, we recommend creating mappings to each of these tables and letting Stardog choose which mappings apply for each query. It is possible that no mappings can support a particular query. In such cases, Stardog will write an entry to the log file and return no results.

This is the default behavior, which can be changed by setting the cassandra.allow.filtering option to true. When set, Stardog will include the ALLOW FILTERING clause at the end of each CQL query. Please note that the use of this option is highly discouraged in large-scale production environments.

Cassandra is also special for how SQL-like its query language is (for a NoSQL database). As this is the case, Stardog supports the use of SQL queries in the mappings files for Cassandra virtual graphs. That is, you can use the rr:sqlQuery predicate for R2RML mappings, the sm:query predicate for Stardog Mapping Syntax, or the FROM SQL clause for Stardog Mapping Syntax 2. In all cases, you can supply a SQL query to describe a view to use for a virtual graph mapping; however, the SQL query can only contain operators that are supported in CQL - no joins, subqueries, SQL functions, etc. are allowed.


PostgreSQL Virtual Graph Considerations

If you’re connecting to a PostgreSQL data source over SSL and your data source is hosted as a flexible server in Azure, you will need to adhere to the following guidelines:

  1. If your Postgres server uses Private access, Stardog needs to be in the same VNet as your Postgres server. If your Postgres server uses Public access, you can add the IP address of the machine on which Stardog is running to the list of firewall rules for your Postgres server.
  2. You need to add the following external parameters when creating your data source in Stardog:
    1. ext.sslmode=verify-full
    2. ext.sslrootcert=/path/to/cert-bundle.crt
      1. You can see how to get this root certificate here.

You can read more about this here.


Stardog/SPARQL Virtual Graph Considerations

To create Stardog Engine or SPARQL service virtual graph, the properties file should specify values for the following properties:

Property Description
sparql.url SPARQL endpoint with database specified, for example, http://myhost:26023/testdb/query
sparql.username username to access the SPARQL endpoint
sparql.password password to access the SPARQL endpoint
sparql.graphname default graph to run the queries against
sparql.statsbasedoptimization boolean value to enable/disable statistics based optimization, only valid for Virtual Graphs to other Stardog servers

Virtual graph mappings are not supported.