Query Stardog
This chapter discusses a variety of ways to query your Stardog knowledge graph. This page describes the basics of querying Stardog. See the Chapter Contents to see what else is included in this chapter.
Page Contents
Overview
Stardog supports the SPARQL query language along with OWL & Rule Reasoning. It supports SPARQL 1.1. Below we discuss some of the basics.
Executing Queries
To execute a SPARQL query against a Stardog database with the CLI, use the query execute
subcommand with a query string, a query file, or the name of a stored query.
$ stardog query execute myDb "select * where { ?s ?p ?o }"
Any SPARQL query type (SELECT
, CONSTRUCT
, DESCRIBE
, PATHS
, ASK
or any update query type) can be executed using this command.
Reasoning can be enabled by using the --reasoning
flag (or -r
for short):
$ stardog query execute --reasoning myDb "select * where { ?sub rdfs:subClassOf ?super }"
By default, all Stardog CLI commands assume the server is running on the same machine as the client using port 5820. But you can interact with a server running on another machine using a full connection string:
$ stardog query execute http://myHost:9090/myDb "select * where { ?s ?p ?o }"
Detailed information on using the query execute command in Stardog can be found in the query execute
man page.
Path Queries
Stardog extends SPARQL for path queries which can be used to find paths between two nodes in a graph. Path queries are similar to SPARQL property paths that recursively traverse a graph and find two nodes connected via a complex path of edges. But SPARQL property paths only return the start and end nodes of a path. Stardog path queries return all the intermediate nodes on the path and allow arbitrary SPARQL patterns to be used in the query.
Here’s a simple path query to find how Alice
and Charlie
are connected to each other:
$ stardog query execute exampleDB "PATHS START ?x = :Alice END ?y = :Charlie VIA ?p"
+----------+------------+----------+
| x | p | y |
+----------+------------+----------+
| :Alice | :knows | :Bob |
| :Bob | :worksWith | :Charlie |
| | | |
| :Alice | :worksWith | :Carol |
| :Carol | :knows | :Charlie |
+----------+------------+----------+
Query returned 2 paths in 00:00:00.056
Each row of the result table shows one edge. Adjacent edges are printed on subsequent rows of the table. Multiple paths in the results are separated by an empty row.
Path queries by default return only the shortest paths. See the Path Queries section for details about finding different kinds of paths, e.g. all paths (not just shortest ones), paths between all nodes, and cyclic paths.
DESCRIBE Queries
SPARQL provides a DESCRIBE
query type that returns a subgraph containing information about a resource:
DESCRIBE <theResource>
SPARQL’s DESCRIBE
keyword is deliberately underspecified. In Stardog, by default, a DESCRIBE
query retrieves all the triples for which <theResource>
is the subject. There are, of course, about seventeen thousand other ways to implement DESCRIBE
. Starting with Stardog 5.3, we are providing two additional describe strategies out of the box. The desired describe strategy can be selected by using a special query hint. For example, the following query will return all the triples where theResource
is either the subject or the object:
#pragma describe.strategy bidirectional
DESCRIBE <theResource>
The other built-in describe strategy returns the CBD - Concise Bounded Description of the given resource:
#pragma describe.strategy cbd
DESCRIBE <theResource>
The default describe strategy can be changed by setting the query.describe.strategy
database configuration option. Finally, it is also possible to implement a custom describe strategy by implementing a simple Java interface. An example can be found in the
stardog examples repo.
Federated Queries
Stardog supports the SERVICE keyword which allows users to query distributed RDF via SPARQL-compliant data sources. You can use this to federate queries between several Stardog databases or Stardog and other public endpoints.
You can also use service variables in your queries to dynamically select the endpoints for federated queries, for example:
{
?service a :MyService .
SERVICE ?service { ... }
}
Stardog ships with a default Service
implementation which uses SPARQL Protocol to send the service fragment to the remote endpoint and retrieve the results. Any endpoint that conforms to the SPARQL protocol can be used.
The Stardog SPARQL endpoint is http://<server>:<port>/{db}/query
.
HTTP Authentication
Stardog requires authentication. If the endpoint you’re referencing with the SERVICE
keyword requires HTTP authentication, credentials are stored in a password file called services.sdpass
located in STARDOG_HOME
directory. The default Service
implementation assumes HTTP BASIC authentication; for services that use DIGEST auth, or a different authentication mechanism altogether, you’ll need to implement a custom Service
.
HTTP Credentials Passthrough
In addition to authenticating with a password file, Stardog can reuse credentials of current user via passthrough mechanism. This needs to be explicitly enabled with service.sparql.credentials.passthrough.regex database option, and the service endpoint must match the specified regex.
The folowing will enable passthrough for all services:
$ stardog-admin metadata set service.sparql.credentials.passthrough.regex='.*' my_db_name
Querying Local Databases
Stardog contains a specialized service implementation that lets users query other databases stored in the server without going through HTTP. The user executing the query will still be authenticated, just via Stardog authentication. In other words, the user executing the query must have proper permissions to read from the database they are attempting to query. The URI to follow the SERVICE
keyword must begin with db://
followed by the database name.
Here’s an example querying a database named “books”.
SELECT * {
SERVICE <db://books> {
?s ?p ?o
}
}
Namespaces
Stardog allows users to store and manage custom namespace prefix bindings for each database. These stored namespaces allow users to omit prefix declarations in Turtle files and SPARQL queries. The Database Administration section describes how to manage these namespace prefixes in detail.
Stored namespaces allow one to use Stardog without declaring a single namespace prefix. Stardog will use its default namespace (http://api.stardog.com/
) behind the scenes so that everything will still be valid RDF, but users won’t need to deal with namespaces manually. Stardog will act as if there are no namespaces, which in some cases is exactly what you want!
For example, let’s assume we have some data that does not contain any namespace declarations:
:Alice a :Person ;
:knows :Bob .
We can create a database using this file directly:
$ stardog-admin db create -n myDb data.ttl
We can also add this file to the database after it is created. After the data is loaded, we can then execute SPARQL queries without prefix declarations:
$ stardog query execute myDb "SELECT * { ?person a :Person }"
+--------+
| person |
+--------+
| :Alice |
+--------+
Query returned 1 results in 00:00:00.111
Once we export the data from this database, the default (i.e., in-built) prefix declarations will be printed, but otherwise we will get the same serialization as in the original data file:
$ stardog data export mydb
@prefix : <http://api.stardog.com/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix stardog: <tag:stardog:api:> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
:Alice a :Person ;
:knows :Bob .
Query Functions
Stardog supports all of the functions from the SPARQL spec, as well as some others from XPath and SWRL. See SPARQL Query Functions for a complete list of built-in functions supported.
Any of the supported functions can be used in queries or rules. Note that, some functions appear in multiple namespaces, but using any of the namespaces will work. Namespaces can be omitted when calling functions too.
If the same name is used for different functions in different namespaces then the precedence is given to the standard functions. It is best practice to use the explicit namespace for such functions to avoid ambiguity.
XPath comparison and arithmetic operators on duration, date and time values are supported by overloading the corresponding SPARQL operators such as =
, >
, +
, -
, etc.
In addition to the built-in functions, new functions can be defined by assigning a new name to a SPARQL expression. These function definitions can either be defined inline in a query or stored in the system and used in any query or rule. Finally, custom function implementations can be implemented in a JVM-compatible language and registered in the system. See the query functions section for more details.
RDF List Functions
Stardog supports length and membership operations over RDF lists.
Assuming we have a database with the list defined like this:
@prefix stardog: <tag:stardog:api:> .
:literalList rdf:first "one" ;
rdf:rest [
rdf:first "two" ;
rdf:rest [
rdf:first "three" ;
rdf:rest [
rdf:first "four" ;
rdf:rest [
rdf:first "two" ;
rdf:list rdf:nil
]
]
]
] .
We can retrieve the length of this list:
$ stardog query mydb "SELECT ?length { :literalList stardog:list:length ?length }"
+---------------+
| length |
+---------------+
| "5"^^xsd:long |
+---------------+
Query returned 1 results in 00:00:00.135
And verify that it contains an element:
$ stardog query mydb 'ASK { :literalList stardog:list:member "two" }'
Result: true
Or find out where in the list this element occurs:
$ stardog query mydb 'SELECT ?index { :literalList stardog:list:member ("two" ?index) }'
+---------------+
| index |
+---------------+
| "1"^^xsd:long |
| "4"^^xsd:long |
+---------------+
Query returned 2 results in 00:00:00.128
We can also fetch an element at given index (note zero-based indexing):
$ stardog query mydb "SELECT ?element { :literalList stardog:list:member (?element 0) }"
+---------+
| element |
+---------+
| "one" |
+---------+
Query returned 1 results in 00:00:00.151
Or all elements along with their indices:
$ stardog query mydb "SELECT ?element ?index { :literalList stardog:list:member (?element ?index) }"
+---------+---------------+
| element | index |
+---------+---------------+
| "one" | "0"^^xsd:long |
| "two" | "1"^^xsd:long |
| "three" | "2"^^xsd:long |
| "four" | "3"^^xsd:long |
| "two" | "4"^^xsd:long |
+---------+---------------+
Query returned 5 results in 00:00:00.129
Or just elements:
$ stardog query mydb "SELECT ?element { :literalList stardog:list:member ?element }"
+---------+
| element |
+---------+
| "one" |
| "two" |
| "three" |
| "four" |
| "two" |
+---------+
Query returned 5 results in 00:00:00.132
List functions are not supported in combination with certain other features such as reasoning.
Special Named Graphs
Stardog includes aliases for several commonly used sets of named graphs. These non-standard extensions are used when specifying the dataset for a query, that is, in the FROM
and FROM NAMED
clauses of a SPARQL query, or via the SPARQL Protocol, or CLI, etc. These graphs are read-only and cannot be updated. Following is a list of special named graph IRIs.
Named Graph IRI | Refers to |
---|---|
tag:stardog:api:context:default | the default (no) context graph |
tag:stardog:api:context:named | all local named graphs, excluding the default graph |
tag:stardog:api:context:local | all local graphs - the default graph and named graphs |
tag:stardog:api:context:virtual | all virtual graphs |
tag:stardog:api:context:all | all local and virtual graphs |
Examples
Given a database with these three triples:
:d :from :default .
GRAPH :g1 {
:n1 :from :g1 .
}
GRAPH :g2 {
:n2 :from :g2 .
}
These are some queries that use the FROM
clause to set the default graph, and their results:
SELECT * FROM <tag:stardog:api:context:default> {
?s ?p ?o
}
s | p | o |
---|---|---|
:d | :from | :default |
SELECT * FROM :g1 {
?s ?p ?o
}
s | p | o |
---|---|---|
:n1 | :from | :g1 |
SELECT * FROM <tag:stardog:api:context:named> {
?s ?p ?o
}
s | p | o |
---|---|---|
:n1 | :from | :g1 |
:n2 | :from | :g2 |
SELECT * FROM <tag:stardog:api:context:local> {
?s ?p ?o
}
s | p | o |
---|---|---|
:d | :from | :default |
:n1 | :from | :g1 |
:n2 | :from | :g2 |
Likewise, these queries use the FROM NAMED
clause to set the named graphs of the the query dataset:
SELECT * {
GRAPH ?g { ?s ?p ?o }
}
g | s | p | o |
---|---|---|---|
:d | :from | :default |
SELECT * FROM NAMED <tag:stardog:api:context:default> {
GRAPH ?g { ?s ?p ?o }
}
g | s | p | o |
---|---|---|---|
:d | :from | :default |
SELECT * FROM NAMED :g1 {
GRAPH ?g { ?s ?p ?o }
}
g | s | p | o |
---|---|---|---|
:g1 | :n1 | :from | :g1 |
SELECT * FROM NAMED <tag:stardog:api:context:named> {
GRAPH ?g { ?s ?p ?o }
}
g | s | p | o |
---|---|---|---|
:g1 | :n1 | :from | :g1 |
:g2 | :n2 | :from | :g2 |
SELECT * FROM NAMED <tag:stardog:api:context:local> {
GRAPH ?g { ?s ?p ?o }
}
g | s | p | o |
---|---|---|---|
:d | :from | :default | |
:g1 | :n1 | :from | :g1 |
:g2 | :n2 | :from | :g2 |
Multiple FROM
and FROM NAMED
clauses, as well as combinations of the two, are supported:
SELECT *
FROM <tag:stardog:api:context:default>
FROM :g1
FROM NAMED :g2 {
{ ?s ?p ?o }
UNION
{ GRAPH ?g { ?s ?p ?o } }
}
g | s | p | o |
---|---|---|---|
:d | :from | :default | |
:n1 | :from | :g1 | |
:g2 | :n2 | :from | :g2 |
Relationship to query.all.graphs
When no dataset is provided, the default dataset depends on the setting of the query.all.graphs
server configuration or database option. Note that when virtual transparency is enabled, the set of virtual graphs (tag:stardog:api:context:virtual
) is included in all graphs.
When query.all.graphs
is false
the default scope of the query dataset is the default graph:
SELECT * {
?s ?p ?o
}
s | p | o |
---|---|---|
:d | :from | :default |
and the named scope is the named graphs:
SELECT * {
GRAPH ?g { ?s ?p ?o }
}
g | s | p | o |
---|---|---|---|
:g1 | :n1 | :from | :g1 |
:g2 | :n2 | :from | :g2 |
When query.all.graphs
is true
the default scope of the query dataset is the default graph plus the named graphs (together known as the local graphs):
SELECT * {
?s ?p ?o
}
g | s | p | o |
---|---|---|---|
:d | :from | :default | |
:g1 | :n1 | :from | :g1 |
:g2 | :n2 | :from | :g2 |
and the named scope remains unchanged as the set of named graphs:
SELECT * {
GRAPH ?g { ?s ?p ?o }
}
g | s | p | o |
---|---|---|---|
:g1 | :n1 | :from | :g1 |
:g2 | :n2 | :from | :g2 |
Named Graph Aliases
As of version 7.4.5, Stardog enables users to create aliases for named graph IRIs appearing in the data. The aliases can be used in SPARQL queries and provide a layer of abstraction between the queries or applications and the data. In particular queries can run against different graphs — local or virtual — when the alias definitions are changed. Importantly, neither the queries themselves nor the relevant HTTP parameters defining the query dataset need to change. That helps make data changes transparent to consumers (applications). This is best illustrated by an example.
The common data cleansing scenario involves data being imported into a staging graph (call it :staging
), preprocessed (for example, validated using SHACL, cleaned, augmented, etc.), and then moved to a graph visible to currently deployed applications (call it :production
). When the data is ready, it needs to be made available to applications. Prior to 7.4.5 this could have been made in two ways:
- by moving it from
:staging
to:production
via SPARQL Update - by changing all query requests from
:production
to:staging
.
Both approaches have rather obvious shortcomings.
Named graph aliases rectify the problem by allowing users to declare :production
as an alias which can be pointed to :staging
as soon as the data is ready. That requires neither data movement nor changes on the query or application level.
To use named graph aliases one must first set the graph.aliases
database property to true
. It can be done at database creation time or later.
Querying Aliases
Named graph aliases are IRIs which currently can appear after FROM
and FROM NAMED
keywords in read queries as well as after USING
and USING NAMED
keywords in DELETE/INSERT/WHERE
queries, for example:
select ?person ?name from :graph {
?person foaf:name ?name
}
or
insert { ?person a :Person } using :graph where {
?person foaf:name ?name
}
Assuming :graph
is an alias for :g
, Stardog will replace :graph
by :g
before processing the query. Although this is already quite powerful, named graph aliases are not restricted to this simple use case and generalize it in two ways. First, an IRI can be an alias for a set of graphs in the data, not just one graph. Second, special graphs as well as virtual graphs can be used in the alias definition just as regular graphs.
Adding and Updating Aliases
Named graph aliases are defined on per-database basis and the definitions are stored in the data as triples. The schema consists of a single predicate <tag:stardog:api:graph:alias>
whose domain is the aliases and the range is actual graphs in the data. Alias definitions must be asserted in the special named graph <tag:stardog:api:graph:aliases>
, as in the following TriG snippet:
<tag:stardog:api:graph:aliases> {
:graph <tag:stardog:api:graph:alias> :g1, :g2 .
}
Using Java
Stardog Java API provides a convenient mechanism for retrieving and updating graph aliases based on the com.complexible.stardog.query.GraphAliases
interface available from the database’s Connection
object. Under the hood it simply fetches and updates data in <tag:stardog:api:graph:aliases>
.
Every Connection
gets its own snapshot of aliases which will not be affected by concurrent transactions updating aliases in the database.
Integration with Other Features
Named graph aliases interact with several Stardog features, particularly, the Named Graph Security and virtual graphs. The latter is pretty straightforward: one can define an alias for any combination of local and virtual graphs and the FROM
or FROM NAMED
statements for that alias will be replaced by those with the corresponding local and virtual graph IRIs. That will happen before the query engine starts any VG-specific processing of the query, like applying mappings, establishing a connection to the remote data source, etc.
As far as Named Graph Security is concerned, aliases behave like regular graphs. It is possible to define read and write permissions for an alias. If a user is allowed to read the graph :g
, and :g
happens to be an alias for :g1
union :g2
, a query using :g
in its dataset will be allowed to read :g1
and :g2
. If a query uses :g1
or :g2
in its dataset directly, the user will need explicit permissions to access those graphs.
Limitation of Aliases
Named graph aliases have the following limitations:
- They cannot be used in
GRAPH keywords
,CONSTRUCT
,INSERT
orUPDATE
templates orADD/DROP/CLEAR/COPY/MOVE
queries. - Aliases cannot be defined for other aliases.
Some of these restrictions may be lifted in the future.
UNNEST Operator and Arrays
Stardog includes an UNNEST
operator as a SPARQL extension. Similar to the BIND
operator, UNNEST
introduces new variable bindings as a result of evaluating an expression. The key difference is that UNNEST
may produce more than one binding for each input solution. This is useful when dealing with arrays.
Arrays can be created with the set
and split
. The UNNEST operator allows transforming an array into a set of solutions. For example, consider the following query:
select ?person ?name {
?person :names ?csvNameString
UNNEST(split(?csvNameString, ",") as ?name)
}
If we match a triple which binds ?person
to <urn:John>
and ?csvNameString
to "John,Johnny"
, the following solutions will be returned for the query:
?person | ?name |
---|---|
<urn:John> | "John" |
<urn:John> | "Johnny" |
If the array has no elements or evaluation of the source expressions produce an error, the target variable will be unbound.
In addition to elements, UNNEST can bind array indices with ARRAY_INDEX
argument:
select ?idx ?person ?name {
?person :names ?csvNameString
UNNEST(split(?csvNameString, ",") as ?name, array_index as ?idx)
}
With the same data as above this query will produce the following solutions:
?idx | ?person | ?name |
---|---|---|
1 | <urn:John> | "John" |
2 | <urn:John> | "Johnny" |
Note that indexing is one-based, which is similar to how ROW_NUMBER()
behaves in some SQL systems.
UNNEST
is governed by the same scope principles as BIND
. Variables used in the expression must precede the UNNEST
operator syntactically. References to the variables which are being assigned must occur syntactically after the UNNEST
operator.
Plan Queries
Query plan returned from query explain
command can be executed with query execute
command in the same manner as SPARQL queries.
The plan needs to be in a verbose format, which can be achieved with --verbose
flag:
$ stardog query explain --verbose myDB "SELECT DISTINCT ?s WHERE { ?s ?p ?o } LIMIT 10"
This will produce an output similar to this:
QueryPlan
Slice(offset=0, limit=10)
`─ Distinct
`─ Projection(?s)
`─ Scan[S](?s, ?p, ?o)
Assuming the output is saved in a file named query.plan
, the following is equivalent to running the original query:
$ stardog query execute myDB query.plan
Editing this plan directly can help with performance debugging and fine-tuning - see Query Plan Syntax for details on query plan format.
Plan queries are currently supported in CLI, Java API and HTTP API.
Chapter Contents
- Path Queries - query Stardog to find paths between nodes in an RDF graph
- Full-text Search - query Stardog for RDF literals present in Stardog
- Geospatial Queries - discusses Stardog's support for geospatial queries
- GraphQL Queries - discusses Stardog's support for querying data using GraphQL
- Edge Properties - discusses Stardog's support for edge properties, bridging the gap between the RDF data model and the Property Graph data model.
- Testing Queries - Test the correctness and performance of your queries automatically
- BI Tools and SQL Queries - query Stardog using business intelligence tools like Tableau
- Query Functions - learn about query functions supported by Stardog
- Stored Query Service - use stored queries as building blocks for larger queries
- Obfuscating Data - discusses obfuscating datasets and queries in Stardog.
- Query Hints - contains all query hints available in Stardog
- Sampling Service - use sampling service to sample over a triple pattern
- Sequence Service - use the sequence service to generate numerical sequences