Query Stardog
This chapter discusses a variety of ways to query your Stardog knowledge graph. This page describes the basics of querying Stardog. See the Chapter Contents to see what else is included in this chapter.
Page Contents
Overview
Stardog supports the SPARQL query language along with OWL & Rule Reasoning. It supports SPARQL 1.1. Below we discuss some of the basics.
Executing Queries
To execute a SPARQL query against a Stardog database with the CLI, use the query execute
subcommand with a query string, a query file, or the name of a stored query.
$ stardog query execute myDb "select * where { ?s ?p ?o }"
Any SPARQL query type (SELECT
, CONSTRUCT
, DESCRIBE
, PATHS
, ASK
or any update query type) can be executed using this command.
Reasoning can be enabled by using the --reasoning
flag (or -r
for short):
$ stardog query execute --reasoning myDb "select * where { ?sub rdfs:subClassOf ?super }"
By default, all Stardog CLI commands assume the server is running on the same machine as the client, using port 5820. You can interact with a server running on another machine using a full connection string:
$ stardog query execute http://myHost:9090/myDb "select * where { ?s ?p ?o }"
Detailed information on using the query execute command in Stardog can be found in the query execute
man page.
Path Queries
Stardog extends SPARQL for path queries to find paths between two nodes in a graph. Path queries are similar to SPARQL property paths, which recursively traverse a graph and find two nodes connected via a complex path of edges. But SPARQL property paths only return the start and end nodes of a path. Stardog path queries return all intermediate nodes on the path and allow arbitrary SPARQL patterns to be used in the query.
Here’s a simple path query to find how Alice
and Charlie
are connected to each other:
$ stardog query execute exampleDB "PATHS START ?x = :Alice END ?y = :Charlie VIA ?p"
+----------+------------+----------+
| x | p | y |
+----------+------------+----------+
| :Alice | :knows | :Bob |
| :Bob | :worksWith | :Charlie |
| | | |
| :Alice | :worksWith | :Carol |
| :Carol | :knows | :Charlie |
+----------+------------+----------+
Query returned 2 paths in 00:00:00.056
Each row of the result table shows one edge. Adjacent edges are printed on subsequent rows of the table. Multiple paths in the results are separated by an empty row.
By default, path queries return only the shortest paths. See the Path Queries section for details about finding different kinds of paths, e.g., all paths (not just the shortest ones), paths between all nodes, and cyclic paths.
DESCRIBE Queries
SPARQL provides a DESCRIBE
query type that returns a subgraph containing information about a resource:
DESCRIBE <theResource>
SPARQL’s DESCRIBE
keyword is deliberately underspecified. In Stardog, by default, a DESCRIBE
query retrieves all the triples for which <theResource>
is the subject. There are, of course, numerous other ways to implement DESCRIBE
. We provide two additional describe strategies out of the box. The desired describe strategy can be selected with a query hint. For example, the following query will return all the triples where theResource
is either the subject or the object:
#pragma describe.strategy bidirectional
DESCRIBE <theResource>
The other built-in describe strategy returns the CBD - Concise Bounded Description of the given resource:
#pragma describe.strategy cbd
DESCRIBE <theResource>
The default describe strategy can be changed by setting the query.describe.strategy
database configuration option. Finally, it is possible to implement a custom describe strategy by implementing a simple Java interface. An example can be found in the stardog examples repo.
Federated Queries
Stardog supports the SERVICE keyword, which allows users to query distributed RDF via SPARQL-compliant data sources. You can use this to federate queries between several Stardog databases or Stardog and other public endpoints.
You can also use service variables in your queries to dynamically select the endpoints for federated queries. For example:
{
?service a :MyService .
SERVICE ?service { ... }
}
Stardog ships with a default Service
implementation which uses the SPARQL Protocol to send the service fragment to the remote endpoint and retrieve the results. Any endpoint that conforms to the SPARQL protocol can be used.
The Stardog SPARQL endpoint is http://<server>:<port>/{db}/query
.
HTTP Authentication
Stardog requires authentication. If the endpoint you’re referencing with the SERVICE
keyword requires HTTP authentication, credentials are stored in a password file called services.sdpass
located in STARDOG_HOME
directory. The default Service
implementation assumes HTTP BASIC authentication; for services that use DIGEST auth, or a different authentication mechanism altogether, you’ll need to implement a custom Service
.
HTTP Credentials Passthrough
In addition to authenticating with a password file, Stardog can reuse the credentials of the current user via passthrough mechanism. This needs to be explicitly enabled with the service.sparql.credentials.passthrough.regex
database option, and the service endpoint must match the specified regex.
The following will enable passthrough for all services:
$ stardog-admin metadata set service.sparql.credentials.passthrough.regex='.*' my_db_name
Querying Local Databases
Stardog contains a specialized service implementation that lets users query other databases stored in the server without going through HTTP. The user executing the query will still be authenticated, just via Stardog authentication. In other words, the user executing the query must have proper permissions to read from the database they are attempting to query. The URI to follow the SERVICE
keyword must begin with db://
, followed by the database name.
Here’s an example querying a database named “books”:
SELECT * {
SERVICE <db://books> {
?s ?p ?o
}
}
Namespaces
Stardog allows users to store and manage custom namespace prefix bindings for each database. These stored namespaces allow users to omit prefix declarations in Turtle files and SPARQL queries. The Database Administration section describes how to manage these namespace prefixes in detail.
Stored namespaces allow one to use Stardog without declaring a single namespace prefix. Stardog will use its default namespace (http://api.stardog.com/
) behind the scenes so that everything will still be valid RDF, but users won’t need to deal with namespaces manually. Stardog will act as if there are no namespaces, which in some cases is exactly what you want!
For example, let’s assume we have some data that does not contain any namespace declarations:
:Alice a :Person ;
:knows :Bob .
We can create a database using this file directly:
$ stardog-admin db create -n myDb data.ttl
We can also add this file to the database after it is created. After the data is loaded, we can execute SPARQL queries without prefix declarations:
$ stardog query execute myDb "SELECT * { ?person a :Person }"
+--------+
| person |
+--------+
| :Alice |
+--------+
Query returned 1 results in 00:00:00.111
Once we export the data from this database, the default (i.e., built-in) prefix declarations will be printed, but otherwise we will get the same serialization as in the original data file:
$ stardog data export mydb
@prefix : <http://api.stardog.com/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix stardog: <tag:stardog:api:> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
:Alice a :Person ;
:knows :Bob .
Query Functions
Stardog supports all of the functions from the SPARQL spec, as well as some others from XPath and SWRL. See SPARQL Query Functions for a complete list of built-in functions supported.
Any of the supported functions can be used in queries or rules. Note that some functions appear in multiple namespaces, but using any of the namespaces will work. Namespaces can be omitted when calling functions, too.
If the same name is used for different functions in different namespaces, the precedence is given to the standard functions. It is best practice to use the explicit namespace for such functions to avoid ambiguity.
XPath comparison and arithmetic operators on duration, date, and time values are supported by overloading the corresponding SPARQL operators such as =
, >
, +
, -
, etc.
In addition to the built-in functions, new functions can be defined by assigning a new name to a SPARQL expression. These function definitions can either be defined in-line in a query or stored in the system and used in any query or rule. Finally, custom function implementations can be implemented in a JVM-compatible language and registered in the system. See the query functions section for more details.
RDF List Functions
Stardog supports length and membership operations over RDF lists.
Assuming we have a database with the list defined like this:
@prefix stardog: <tag:stardog:api:> .
:literalList rdf:first "one" ;
rdf:rest [
rdf:first "two" ;
rdf:rest [
rdf:first "three" ;
rdf:rest [
rdf:first "four" ;
rdf:rest [
rdf:first "two" ;
rdf:list rdf:nil
]
]
]
] .
We can retrieve the length of this list:
$ stardog query mydb "SELECT ?length { :literalList stardog:list:length ?length }"
+---------------+
| length |
+---------------+
| "5"^^xsd:long |
+---------------+
Query returned 1 results in 00:00:00.135
And verify that it contains an element:
$ stardog query mydb 'ASK { :literalList stardog:list:member "two" }'
Result: true
Or find out where in the list this element occurs:
$ stardog query mydb 'SELECT ?index { :literalList stardog:list:member ("two" ?index) }'
+---------------+
| index |
+---------------+
| "1"^^xsd:long |
| "4"^^xsd:long |
+---------------+
Query returned 2 results in 00:00:00.128
We can also fetch an element at given index (note zero-based indexing):
$ stardog query mydb "SELECT ?element { :literalList stardog:list:member (?element 0) }"
+---------+
| element |
+---------+
| "one" |
+---------+
Query returned 1 results in 00:00:00.151
Or all elements along with their indices:
$ stardog query mydb "SELECT ?element ?index { :literalList stardog:list:member (?element ?index) }"
+---------+---------------+
| element | index |
+---------+---------------+
| "one" | "0"^^xsd:long |
| "two" | "1"^^xsd:long |
| "three" | "2"^^xsd:long |
| "four" | "3"^^xsd:long |
| "two" | "4"^^xsd:long |
+---------+---------------+
Query returned 5 results in 00:00:00.129
Or just elements:
$ stardog query mydb "SELECT ?element { :literalList stardog:list:member ?element }"
+---------+
| element |
+---------+
| "one" |
| "two" |
| "three" |
| "four" |
| "two" |
+---------+
Query returned 5 results in 00:00:00.132
List functions are not supported in combination with certain other features such as reasoning.
Special Named Graphs
Stardog includes aliases for several commonly used sets of named graphs. These non-standard extensions are used when specifying the dataset for a query, that is, in the FROM
and FROM NAMED
clauses of a SPARQL query, or via the SPARQL Protocol, or CLI, etc. These graphs are read-only and cannot be updated. Following is a list of special named graph IRIs.
Named Graph IRI | Refers to |
---|---|
tag:stardog:api:context:default | the default (no) context graph |
tag:stardog:api:context:named | all local named graphs, excluding the default graph |
tag:stardog:api:context:local | all local graphs - the default graph and named graphs |
tag:stardog:api:context:virtual | all virtual graphs |
tag:stardog:api:context:all | all local and virtual graphs |
Examples
Given a database with these three triples:
:d :from :default .
GRAPH :g1 {
:n1 :from :g1 .
}
GRAPH :g2 {
:n2 :from :g2 .
}
These are some queries that use the FROM
clause to set the default graph, and their results:
SELECT * FROM <tag:stardog:api:context:default> {
?s ?p ?o
}
s | p | o |
---|---|---|
:d | :from | :default |
SELECT * FROM :g1 {
?s ?p ?o
}
s | p | o |
---|---|---|
:n1 | :from | :g1 |
SELECT * FROM <tag:stardog:api:context:named> {
?s ?p ?o
}
s | p | o |
---|---|---|
:n1 | :from | :g1 |
:n2 | :from | :g2 |
SELECT * FROM <tag:stardog:api:context:local> {
?s ?p ?o
}
s | p | o |
---|---|---|
:d | :from | :default |
:n1 | :from | :g1 |
:n2 | :from | :g2 |
Likewise, these queries use the FROM NAMED
clause to set the named graphs of the the query dataset:
SELECT * {
GRAPH ?g { ?s ?p ?o }
}
g | s | p | o |
---|---|---|---|
:d | :from | :default |
SELECT * FROM NAMED <tag:stardog:api:context:default> {
GRAPH ?g { ?s ?p ?o }
}
g | s | p | o |
---|---|---|---|
:d | :from | :default |
SELECT * FROM NAMED :g1 {
GRAPH ?g { ?s ?p ?o }
}
g | s | p | o |
---|---|---|---|
:g1 | :n1 | :from | :g1 |
SELECT * FROM NAMED <tag:stardog:api:context:named> {
GRAPH ?g { ?s ?p ?o }
}
g | s | p | o |
---|---|---|---|
:g1 | :n1 | :from | :g1 |
:g2 | :n2 | :from | :g2 |
SELECT * FROM NAMED <tag:stardog:api:context:local> {
GRAPH ?g { ?s ?p ?o }
}
g | s | p | o |
---|---|---|---|
:d | :from | :default | |
:g1 | :n1 | :from | :g1 |
:g2 | :n2 | :from | :g2 |
Multiple FROM
and FROM NAMED
clauses, as well as combinations of the two, are supported:
SELECT *
FROM <tag:stardog:api:context:default>
FROM :g1
FROM NAMED :g2 {
{ ?s ?p ?o }
UNION
{ GRAPH ?g { ?s ?p ?o } }
}
g | s | p | o |
---|---|---|---|
:d | :from | :default | |
:n1 | :from | :g1 | |
:g2 | :n2 | :from | :g2 |
Relationship to query.all.graphs
When no dataset is provided, the default dataset depends on the setting of the query.all.graphs
server configuration or database option. Note that when virtual transparency is enabled, the set of virtual graphs (tag:stardog:api:context:virtual
) is included in all graphs.
When query.all.graphs
is false
the default scope of the query dataset is the default graph:
SELECT * {
?s ?p ?o
}
s | p | o |
---|---|---|
:d | :from | :default |
and the named scope is the named graphs:
SELECT * {
GRAPH ?g { ?s ?p ?o }
}
g | s | p | o |
---|---|---|---|
:g1 | :n1 | :from | :g1 |
:g2 | :n2 | :from | :g2 |
When query.all.graphs
is true
the default scope of the query dataset is the default graph plus the named graphs (together known as the local graphs):
SELECT * {
?s ?p ?o
}
g | s | p | o |
---|---|---|---|
:d | :from | :default | |
:g1 | :n1 | :from | :g1 |
:g2 | :n2 | :from | :g2 |
and the named scope remains unchanged as the set of named graphs:
SELECT * {
GRAPH ?g { ?s ?p ?o }
}
g | s | p | o |
---|---|---|---|
:g1 | :n1 | :from | :g1 |
:g2 | :n2 | :from | :g2 |
Named Graph Aliases
Stardog enables users to create aliases for named graph IRIs appearing in the data. The aliases can be used in SPARQL queries and provide a layer of abstraction between the queries or applications and the data. In particular, queries can run against different graphs — local or virtual — when the alias definitions are changed. Importantly, neither the queries themselves nor the relevant HTTP parameters defining the query dataset need to change. That helps make data changes transparent to consumers (applications). This is best illustrated by an example.
The common data cleansing scenario involves data being imported into a staging graph (call it :staging
), preprocessed (for example, validated using SHACL, cleaned, augmented, etc.), and then moved to a graph visible to currently deployed applications (call it :production
). When the data is ready, it needs to be made available to applications. Without named graph aliases, this could have been done in two ways:
- by moving it from
:staging
to:production
via SPARQL Update - by changing all query requests from
:production
to:staging
Both approaches have rather obvious shortcomings.
Named graph aliases rectify the problem by allowing users to declare :staging
as an alias which can be pointed to :production
as soon as the data is ready. That requires neither data movement nor changes on the query or application level.
To use named graph aliases, one must first set the graph.aliases
database property to true
. It can be done at database creation time or later.
Querying Aliases
Named graph aliases are IRIs which can appear after FROM
and FROM NAMED
keywords in read queries, as well as after USING
and USING NAMED
keywords in DELETE/INSERT/WHERE
queries, for example:
select ?person ?name from :graph {
?person foaf:name ?name
}
or
insert { ?person a :Person } using :graph where {
?person foaf:name ?name
}
Assuming :graph
is an alias for :g
, Stardog will replace :graph
with :g
before processing the query. Although this is already quite powerful, named graph aliases are not restricted to this simple use case and generalize in two ways. First, an IRI can be an alias for a set of graphs in the data, not just one graph. Second, special graphs, as well as virtual graphs, can be used in the alias definition just as regular graphs are.
Adding and Updating Aliases
Named graph aliases are defined on a per-database basis, and the definitions are stored in the data as triples. The schema consists of a single predicate (<tag:stardog:api:graph:alias>
) whose domain is the aliases and the range is actual graphs in the data. Alias definitions must be asserted in the special named graph <tag:stardog:api:graph:aliases>
, as in the following TriG snippet:
<tag:stardog:api:graph:aliases> {
:graph <tag:stardog:api:graph:alias> :g1, :g2 .
}
Using Java
Stardog’s Java API provides a convenient mechanism for retrieving and updating graph aliases based on the com.complexible.stardog.query.GraphAliases
interface available from the database’s Connection
object. Under the hood, it simply fetches and updates data in <tag:stardog:api:graph:aliases>
.
Every Connection
gets its own snapshot of aliases that will not be affected by concurrent transactions that update aliases in the database.
Integration with Other Features
Named graph aliases interact with several Stardog features, particularly, the Named Graph Security and virtual graphs. The latter is pretty straightforward: one can define an alias for any combination of local and virtual graphs, and the FROM
or FROM NAMED
statements for that alias will be replaced by those with the corresponding local and virtual graph IRIs. That will happen before the query engine starts any VG-specific processing of the query like applying mappings, establishing a connection to the remote data source, etc.
As far as Named Graph Security is concerned, aliases behave like regular graphs. It is possible to define read and write permissions for an alias. If a user is allowed to read the graph :g
, and :g
happens to be an alias for :g1
union :g2
, a query using :g
in its dataset will be allowed to read :g1
and :g2
. If a query uses :g1
or :g2
in its dataset directly, the user will need explicit permissions to access those graphs.
Limitation of Aliases
Named graph aliases have the following limitations:
- They cannot be used in
GRAPH keywords
,CONSTRUCT
,INSERT
orUPDATE
templates, orADD/DROP/CLEAR/COPY/MOVE
queries. - Aliases cannot be defined for other aliases.
Some of these restrictions may be lifted in the future.
Excluding Graphs from Query’s Dataset
SPARQL 1.1 only supports adding graphs to the query dataset but not excluding them. This gets inconvenient when the query should match data in a large set of graphs but not touch a small set of graphs. In the extreme case, it could be all graphs except for one. One common example is security where sensitive information can be stored in specific graphs that should be hidden from most queries executed by regular users. In such cases, the user has to explicitly list all graphs that the query should access even if the number of such graphs greatly exceeds those that the query should not access.
Stardog 10.1 lets users exclude specific graphs from the dataset by explicitly enumerating them while executing the query. Conceptually, the feature is the direct opposite of FROM [NAMED]
: any excluded graph is removed from the query dataset, i.e. the query is executed as if the graph does not exist. The only difference is that there is only a single list of excluded graphs (rather than one list for the FROM
part and another list for the FROM NAMED
part).
Currently, the exclusion feature is only supported through the Java API (using the Dataset.excludedGraphs() method and the HTTP API (using the exclude-graph-uri HTTP parameter). There is no explicit SPARQL syntax support for it, for now.
All kinds of graphs supported in Stardog can be excluded: regular graph IRIs, special graphs, aliases, and virtual graphs.
UNNEST Operator and Arrays
Stardog includes an UNNEST
operator as a SPARQL extension. Similar to the BIND
operator, UNNEST
introduces new variable bindings as a result of evaluating an expression. The key difference is that UNNEST
may produce more than one binding for each input solution. This is useful when dealing with arrays.
Arrays can be created with the set
and split
operators. The UNNEST operator allows transforming an array into a set of solutions. For example, consider the following query:
select ?person ?name {
?person :names ?csvNameString
UNNEST(split(?csvNameString, ",") as ?name)
}
If we match a triple which binds ?person
to <urn:John>
and ?csvNameString
to "John,Johnny"
, the following solutions will be returned for the query:
?person | ?name |
---|---|
<urn:John> | "John" |
<urn:John> | "Johnny" |
If the array has no elements or evaluation of the source expressions produce an error, the target variable will be unbound.
In addition to elements, UNNEST can bind array indices with the ARRAY_INDEX
argument:
select ?idx ?person ?name {
?person :names ?csvNameString
UNNEST(split(?csvNameString, ",") as ?name, array_index as ?idx)
}
With the same data as above, this query will produce the following solutions:
?idx | ?person | ?name |
---|---|---|
1 | <urn:John> | "John" |
2 | <urn:John> | "Johnny" |
Note that indexing is one-based, which is similar to how ROW_NUMBER()
behaves in some SQL systems.
UNNEST
is governed by the same scope principles as BIND
. Variables used in the expression must precede the UNNEST
operator syntactically. References to the variables which are being assigned must occur syntactically after the UNNEST
operator.
Plan Queries
Query plans returned from the query explain
command can be executed with the query execute
command in the same manner as SPARQL queries.
The plan needs to be in a verbose format, which can be achieved with --verbose
flag:
$ stardog query explain --verbose myDB "SELECT DISTINCT ?s WHERE { ?s ?p ?o } LIMIT 10"
This will produce an output similar to this:
QueryPlan
Slice(offset=0, limit=10)
`─ Distinct
`─ Projection(?s)
`─ Scan[S](?s, ?p, ?o)
Assuming the output is saved in a file named query.plan
, the following is equivalent to running the original query:
$ stardog query execute myDB query.plan
Editing this plan directly can help with performance debugging and fine-tuning - see Query Plan Syntax for details on query plan format.
Plan queries are currently supported in CLI, Java API, and HTTP API.
Chapter Contents
- Path Queries - query Stardog to find paths between nodes in an RDF graph
- Full-text Search - query Stardog for RDF literals present in Stardog
- Geospatial Queries - discusses Stardog's support for geospatial queries
- GraphQL Queries - discusses Stardog's support for querying data using GraphQL
- Edge Properties - discusses Stardog's support for edge properties, bridging the gap between the RDF data model and the Property Graph data model.
- Testing Queries - Test the correctness and performance of your queries automatically
- BI Tools and SQL Queries - query Stardog using business intelligence tools like Tableau
- Query Functions - learn about query functions supported by Stardog
- Stored Query Service - use stored queries as building blocks for larger queries
- Obfuscating Data - discusses obfuscating datasets and queries in Stardog.
- Query Hints - contains all query hints available in Stardog
- Sampling Service - use sampling service to sample over a triple pattern
- Sequence Service - use the sequence service to generate numerical sequences
- Statistics Service - use the statistics service to gain insights into your data
- Label Service - use the label service to look up entities by labels and vice versa.
- BARQ Execution Engine - query Stardog using BARQ