Virtual Transparency
This page discusses Virtual Transparency, a database option that adds the set of accessible virtual graphs to the query dataset.
Page Contents
Overview
Virtual graphs provide a facility for accessing external data sources by mapping them to individual named graphs. The example queries shown previously all specify the source of the data using the virtual graph name. This fine-grained declaration can be useful in some circumstances, but it’s also desirable to query over the set of all graphs without enumerating them individually. Virtual Transparency is a feature that, when enabled, will include results from virtual graphs in queries over the set of all graphs.
Configuration
How does it work? First, you need to set the virtual.transparency
database option to true
.
When this is enabled, queries are evaluated not only over local graphs, but also over accessible virtual graphs. The set of accessible virtual graphs is determined by the virtual graph access rules. It may differ by database and user.
Effect on the Query Dataset
The example queries shown previously use explicit graph blocks to name the source graph of the data, e.g.:
SELECT * {
GRAPH <virtual://dept> {
?person a emp:Employee ;
emp:name "SMITH"
}
}
In contrast, a query with a graph block with a variable for the graph name would only return data from the local named graphs:
SELECT * {
GRAPH ?g {
?person a emp:Employee ;
emp:name "SMITH"
}
}
That query is using the default dataset (no FROM
or FROM NAMED
clause is included after the SELECT
), which has the set of local named graphs for its named graphs scope. It is equivalent to:
SELECT *
FROM <tag:stardog:api:context:default>
FROM NAMED <tag:stardog:api:context:named> {
GRAPH ?g {
?person a emp:Employee ;
emp:name "SMITH"
}
}
See the SPARQL spec for an explanation of datasets, as well as the Special Named Graph section of this doc for an explanation of the tag:stardog:api:context:*
special named graphs, along with some examples.
If Virtual Transparency is enabled, the named scope of the dataset will include all accessible virtual graphs, refered to by the tag:stardog:api:context:virtual
special named graph. The effective query becomes:
SELECT *
FROM <tag:stardog:api:context:default>
FROM NAMED <tag:stardog:api:context:named>
FROM NAMED <tag:stardog:api:context:virtual> {
GRAPH ?g {
?person a emp:Employee ;
emp:name "SMITH"
}
}
Relationship to query.all.graphs
The query.all.graphs
server configuration or database option adds the named graphs portion of the dataset to the default dataset. When Virtual Transparency is enabled, the default set will consist of not only the default graph (tag:stardog:api:context:default
) and local named graphs (tag:stardog:api:context:named
) as it would with Virtual Transparency off, but it will also include the accessible virtual graphs (tag:stardog:api:context:virtual
).
With both Virtual Transparency and query.all.graphs
set to true, our sample query becomes:
SELECT *
FROM <tag:stardog:api:context:default>
FROM <tag:stardog:api:context:named>
FROM <tag:stardog:api:context:virtual>
FROM NAMED <tag:stardog:api:context:named>
FROM NAMED <tag:stardog:api:context:virtual> {
GRAPH ?g {
?person a emp:Employee ;
emp:name "SMITH"
}
}
For completeness, with query.all.graphs
on and Virtual Transparency off, the query becomes:
SELECT *
FROM <tag:stardog:api:context:default>
FROM <tag:stardog:api:context:named>
FROM NAMED <tag:stardog:api:context:named> {
GRAPH ?g {
?person a emp:Employee ;
emp:name "SMITH"
}
}
Note no virtual graphs in either the default or named scopes of the dataset.
With Virtual Transparency, the key difference between including or omitting a graph block comes from how triple patterns are joined together. Just as a graph block over a set of local named graphs limits BGP (basic graph pattern) matches to a single named graph, a graph block with Virtual Transparency limits BGP matches to a single local or virtual graph. To illustrate, consider the query with a graph block:
SELECT * {
GRAPH ?g {
?person a emp:Employee ;
emp:name "SMITH"
}
}
If the set of employees is stored in a different virtual graph than the employee names, this query will return an empty result because the entire BGP will not match any set of triples in any individual graph. However, if we remove the graph block, each individual triple pattern will match triples from different graphs, and these results will be joined together. The result is similar to what we would obtain by specifying the sources manually:
SELECT * {
GRAPH <virtual://employees> {
?person a emp:Employee
}
GRAPH <virtual://names> {
?person emp:name "SMITH"
}
}
Options and Limitations
Virtual Transparency is compatible with all SPARQL operators with the exception of “zero or more” and “one or more” property paths. These constructs are supported on some DBMS platforms when placed inside the graph block specifying the virtual graph source.
A query hint is provided to disable Virtual Transparency for all or part of a query. Placing the hint #pragma virtual.transparency off
in a SPARQL block will disable consideration of virtual graphs for that block.
Queries with edge properties are not supported when using Virtual Transparency. Specifying the virtual graph in a graph block will bypass this limitation.
When using Virtual Transparency and querying data residing in both local databases and Virtual Graphs, setting the database configuration options local.iri.template.includes
and local.iri.template.excludes
can help to improve query performance.
Legacy Support for Virtual Graphs Not in the Dataset
Ever since Stardog started supporting virtual graphs, it supported GRAPH
blocks with explicit virtual graph names, even if the virtual graph was not included in the query dataset. For example, this query works even if both Virtual Transparency and query.all.graphs
are disabled:
SELECT * {
GRAPH <virtual://dept> {
?person a emp:Employee ;
emp:name "SMITH"
}
}
This query uses the default dataset, which has tag:stardog:api:context:named
for its named scope (which does not include virtual graphs). However, the virtual graph is still accessible. Stardog supports this for backward compatibility.
Differences in versions prior to 7.7.0
As explained above, virtual.transparency
adds all accessible virtual graphs to the named scope of the default query dataset1. It also adds those virtual graphs to the default scope of the default query dataset if the query.all.graphs
option is enabled.
Before version 7.7, if virtual graphs were included in the named scope of the dataset (via FROM NAMED
or API), they would be substituted into GRAPH ?g
-type variables only if virtual.transparency
was enabled. With version 7.7, virtual graphs can be added to the named scope of the dataset (again, via FROM NAMED
or API) with or without Virtual Transparency.
Similarly, prior to version 7.7, if virtual graphs were included in the default scope of the query dataset (via FROM
or API), they would not be included in the default graph unless virtual.transparency
was enabled. With 7.7, they will be included with or without Virtual Transparency.
The only exception to the pre-7.7 behavior was if there was one virtual graph and no other graphs in the default scope of the dataset and no graphs in the named scope of the dataset, pre-7.7 versions would execute the query over the one virtual graph. With 7.7, there is no special handling for the single-virtual graph use case.
In summary, the virtual.transparency
option has an effect when the dataset is not specified (either in the query with FROM
or FROM NAMED
or through the API). When virtual.transparency
is off, the dataset will default to tag:stardog:api:context:default
for the default scope and tag:stardog:api:context:named
for the named scope. When virtual.transparency
is on, the dataset will default to tag:stardog:api:context:default
for the default scope and the union of tag:stardog:api:context:named
and tag:stardog:api:context:virtual
for the named scope.
-
The defalut dataset is the dataset used when no
FROM
orFROM NAMED
is included in the query and the dataset is not specified via the query API. ↩