Virtual Graphs

This chapter discusses Virtual Graphs - one of Stardog’s primary features for unifying enterprise data. This page mostly discusses the basics of Virtual Graphs. See the Chapter Contents for a short description of what else is included in this chapter. Virtual Graph security is not included in this chapter but is included in the Security chapter.

Page Contents

Overview
How it Works
Importing data from a Virtual Graph
1. Permissions
List Registered Virtual Graphs
Inspect a Virtual Graph’s Mappings
Inspect a Virtual Graph’s Properties
Remove a Virtual Graph
Chapter Contents

Overview

Stardog supports a set of techniques for unifying structured enterprise data – chiefly, Virtual Graphs, which let you declaratively map data into a Stardog knowledge graph and query it via Stardog in situ.

How it Works

Stardog intelligently rewrites (parts of) SPARQL queries against Stardog into native query syntaxes like SQL, issues the native queries to remote datasources, and then translates the native results into SPARQL results. Virtual Graphs can be used to map both tabular (relational) data (from RDBMSs and CSVs) and semi-structured hierarchical data (from NoSQL sources such as MongoDB, Elasticsearch, Cassandra, and JSON) to RDF.

A Virtual Graph has four components:

a unique name
a data source
a properties file specifying configuration options
a data mapping file (which can be omitted and automatically generated for most sources)

Connecting to a Virtual Graph

To query a non-materialized Virtual Graph, it must first be registered with Stardog. Adding a new virtual graph is done via the virtual add CLI command:

$ stardog-admin virtual add dept.properties dept.ttl

When adding a Virtual Graph, Stardog will create a Data Source and establish a connection through it to verify the provided configuration and mappings.

Virtual Graph Properties File

The properties file (dept.properties in this example) contains all of the configuration for the JDBC data source and virtual graph configuration. It must be in the Java properties file format.

A minimal example (in this case, for MySQL) looks like this:

jdbc.url=jdbc:mysql://localhost/dept
jdbc.username=MySqlUserName
jdbc.password=MyPassword
jdbc.driver=com.mysql.jdbc.Driver

The name of the configuration file without the extension will be the name of the virtual graph. The name of the virtual graph in this example will be dept as a result. You can override this name using the --name option.

Stardog does not ship with client drivers. You must add drivers for each data source you want to connect to. See Supported Client Drivers for more information.

The credentials for the JDBC connection need to be provided in plain text. An alternative way to provide credentials is to use the password file mechanism. The credentials should be stored in a password file called services.sdpass located in the STARDOG_HOME directory. The password file entries are in the format hostname:port:database:username:password, so for the above example, there should be an entry like so in this file:

localhost:*:dept:MySqlUserName:MyPassword

Then the credentials in the properties file can be omitted.

The properties file can also contain an option called base to specify a base URI for resolving relative URIs generated by the mappings (if any). If no value is provided, the base URI will be virtual://myGraph, where myGraph is the name of the virtual graph.

There are many more available configuration options for Virtual Graphs. They are described in the Virtual Graph Configuration section.

Creating a Shared Data Source

In the last example, we created the Virtual Graph without supplying the name of an existing Data Source. In that case, Stardog automatically creates a “private” Data Source with the same name as the Virtual Graph (in this case <data-source://dept>) for the dedicated use of this Virtual Graph. To use a “shared” Data Source, we first create the Data Source with the data-source add CLI command:

$ stardog-admin data-source add dept.properties

Once this Data Source is created we can use it to create a Virtual Graph:

$ stardog-admin virtual add --name dept --data-source dept dept.ttl

All the options in this example applied to the Data Source, so the property file was not needed for the virtual add command. Omitting the property file, however, necessitated the inclusion of the --name option.

See Data Sources for more information.

Mapping file

The mapping file (dept.ttl in this example) contains the mapping from the virtual data source into RDF. The mapping can be in one of three formats:

SMS, which is the default for the virtual add CLI command.
Standard R2RML, which is indicated using --format r2rml in the virtual add CLI command.
SMS2 (Stardog Mapping Syntax 2), a syntax that better supports hierarchical datasources like JSON and MongoDB. This is indicated using --format sms2 in the virtual add CLI command.

A mapping file is required for data sources without a built-in schema, e.g. some NoSQL databases like MongoDB.

A mapping file is not required if your data has a built-in schema, e.g. MySQL or other relational databases. In this case, you can omit a mapping file, and the virtual graph will be automatically mapped using R2RML direct mapping. Omitting a mapping file is most commonly used with one or both of the virtual graph options default.mapping.include.tables and sql.schemas to indicate the specific tables to include.

See the detailed documentation about how to create mappings for your data source.

Querying Virtual Graphs

Querying Virtual Graphs is done by using the GRAPH clause, using a special graph URI in the form virtual://myGraph to query the Virtual Graph named myGraph.

The following example shows how to query dept:

SELECT * {
   GRAPH <virtual://dept> {
      ?person a emp:Employee ;
           emp:name "SMITH"
   }
}

Virtual graphs can be defined globally in Stardog Server, which is the default, or they can be linked to a specific database when they are created. If a virtual graph is linked to a specific database, it can only be accessed from that database. Attempts to access a linked virtual graph from some other database will result in no data being returned from that virtual graph.

Once a virtual graph is registered, it can be accessed as allowed by access rules.

We can query the local Stardog database and virtual graph’s remote data in a single query. Suppose we have the dept virtual graph, defined as above, that contains employee and department information, and the Stardog database contains data about the interests of people. We can use the following query to combine the information from both sources:

SELECT * {
   GRAPH <virtual://dept> {
      ?person a emp:Employee ;
           emp:name "SMITH" .
   }
   ?person foaf:interest ?interest
}

Or, with Virtual Transparency enabled, the following query will include remote data from the virtual graph as well as from the default graph.

SELECT * {
   ?person a emp:Employee ;
        emp:name "SMITH" .
   ?person foaf:interest ?interest
}

Query performance will be best if the GRAPH clause for Virtual Graphs is as selective as possible.

Virtual Graph queries are implemented by executing a query against the remote data source. This is a powerful feature, and care must be taken to ensure peak performance. SPARQL and SQL don’t have feature parity, especially given the varying capabilities of SQL implementations. Stardog’s query translator supports most of the salient features of SPARQL, including:

Arbitrarily nested subqueries (including solution modifiers)
Aggregation
FILTER (including most SPARQL functions)
OPTIONAL, UNION, BIND

That said, there are also limitations on translated queries. This includes:

Duplicate solutions can be returned
SPARQL MINUS is not currently translated to SQL
Comparisons between objects with different datatypes don’t always follow XML Schema semantics
Named graphs in R2RML are not supported

Importing data from a Virtual Graph

In some cases, you need to materialize the information stored in the RDBMS directly into RDF. For example, a combination of high network latency, slow-changing data, and strict query performance requirements can make materialization a good fit.

The CLI command virtual import can be used to import the contents of the RDBMS into Stardog. The command can be used as follows:

$ stardog-admin virtual import myDb dept.properties dept.ttl

This command adds all the mapped triples from the RDBMS into the default graph. Similar to virtual add, this command assumes SMS by default and can accept R2RML mappings using the --format r2rml option or SMS2 mappings using the --format sms2 option.

It is also possible to specify a target named graph by using the -g/--namedGraph option:

$ stardog-admin virtual import -g http://example.com/targetGraph myDb dept.properties dept.ttl

This virtual import command is equivalent to the following SPARQL update query:

ADD <virtual://dept> TO <http://example.com/targetGraph>

If the RDBMS contents change over time, and we need to update the materialization results in the future, we can clear the named graph contents and rematerialize again. This can be done by using the --remove-all option in virtual import or with the following SPARQL query:

COPY <virtual://dept> TO <http://example.com/targetGraph>

Query performance over materialized graphs will be better, as the data will be indexed locally by Stardog, but materialization may not be practical in cases where frequency of change is very high.

Permissions

A user requires WRITE permission on a database in order to import data into it. If Named Graph Security is enabled, they will also require WRITE permission on the named graph into which they want to import data. If they are using COPY or ADD to import data, they will also need READ permission on the source virtual graph.

List Registered Virtual Graphs

Registered virtual graphs can be listed using the virtual list CLI command:

$ stardog-admin virtual list
+----------------|----------|--------+
| Virtual Graphs | Database | Online |
+----------------|----------|--------+
| virtual://dept | *        | true   |
+----------------|----------|--------+

1 virtual graphs

Notice the * in the Database column of the output of the virtual list command. This indicates that the dept virtual graph can be used with any database. To associate a virtual graph with a specific database, use the -d <db> or --database <db> command-line option with the virtual add command.

If a virtual graph fails to load during startup, it will be listed as offline (Online = false). Use the virtual online command to retry loading an offline virtual graph.

Inspect a Virtual Graph’s Mappings

The CLI command virtual mappings can be used to retrieve the mappings associated with a virtual graph.

Here’s an example to print the mappings of a registered virtual graph in Stardog Mappings Syntax 2:

$ stardog-admin virtual mappings --format sms2 myGraph

Inspect a Virtual Graph’s Properties

The CLI command virtual options can be used to retrieve the virtual graph properties associated with a virtual graph.

$ stardog-admin virtual options myGraph

Remove a Virtual Graph

Registered virtual graphs can be removed using the virtual remove command.

$ stardog-admin virtual remove myGraph

Chapter Contents

Virtual Graph Configuration - discusses configuring virtual graphs
Data Sources - data source management
Mapping Data Sources - how to create virtual graph mappings
Virtual Transparency - discusses Virtual Transparency, a virtual graph facility to query all virtual graphs over the default graph or set of named graphs
Importing JSON and CSV Files - discusses how to import JSON and CSV files into Stardog
Optimization - tips for optimizing virtual graphs
Troubleshooting - tips for troubleshooting virtual graphs

Overview
How it Works
Importing data from a Virtual Graph
- Permissions
List Registered Virtual Graphs
Inspect a Virtual Graph’s Mappings
Inspect a Virtual Graph’s Properties
Remove a Virtual Graph
Chapter Contents