Query Stardog

This chapter discusses a variety of ways to query your Stardog knowledge graph. This page describes the basics of querying Stardog. See the Chapter Contents to see what else is included in this chapter.

Page Contents

Overview
Executing Queries
Path Queries
DESCRIBE Queries
Federated Queries
Namespaces
Query Functions
RDF List Functions
Special Named Graphs
1. Examples
  1. Relationship to query.all.graphs
Named Graph Aliases
Excluding Graphs from Query’s Dataset
UNNEST Operator and Arrays
Plan Queries
Chapter Contents

Overview

Stardog supports the SPARQL query language along with OWL & Rule Reasoning. It supports SPARQL 1.1. Below we discuss some of the basics.

Executing Queries

To execute a SPARQL query against a Stardog database with the CLI, use the query execute subcommand with a query string, a query file, or the name of a stored query.

$ stardog query execute myDb "select * where { ?s ?p ?o }"

Any SPARQL query type (SELECT, CONSTRUCT, DESCRIBE, PATHS, ASK or any update query type) can be executed using this command.

Reasoning can be enabled by using the --reasoning flag (or -r for short):

$ stardog query execute --reasoning myDb "select * where { ?sub rdfs:subClassOf ?super }"

By default, all Stardog CLI commands assume the server is running on the same machine as the client, using port 5820. You can interact with a server running on another machine using a full connection string:

$ stardog query execute http://myHost:9090/myDb "select * where { ?s ?p ?o }"

Detailed information on using the query execute command in Stardog can be found in the query execute man page.

Path Queries

Stardog extends SPARQL for path queries to find paths between two nodes in a graph. Path queries are similar to SPARQL property paths, which recursively traverse a graph and find two nodes connected via a complex path of edges. But SPARQL property paths only return the start and end nodes of a path. Stardog path queries return all intermediate nodes on the path and allow arbitrary SPARQL patterns to be used in the query.

Here’s a simple path query to find how Alice and Charlie are connected to each other:

$ stardog query execute exampleDB "PATHS START ?x = :Alice END ?y = :Charlie VIA ?p"
+----------+------------+----------+
|    x     |     p      |    y     |
+----------+------------+----------+
| :Alice   | :knows     | :Bob     |
| :Bob     | :worksWith | :Charlie |
|          |            |          |
| :Alice   | :worksWith | :Carol   |
| :Carol   | :knows     | :Charlie |
+----------+------------+----------+

Query returned 2 paths in 00:00:00.056

Each row of the result table shows one edge. Adjacent edges are printed on subsequent rows of the table. Multiple paths in the results are separated by an empty row.

By default, path queries return only the shortest paths. See the Path Queries section for details about finding different kinds of paths, e.g., all paths (not just the shortest ones), paths between all nodes, and cyclic paths.

DESCRIBE Queries

SPARQL provides a DESCRIBE query type that returns a subgraph containing information about a resource:

DESCRIBE <theResource>

SPARQL’s DESCRIBE keyword is deliberately underspecified. In Stardog, by default, a DESCRIBE query retrieves all the triples for which <theResource> is the subject. There are, of course, numerous other ways to implement DESCRIBE. We provide two additional describe strategies out of the box. The desired describe strategy can be selected with a query hint. For example, the following query will return all the triples where theResource is either the subject or the object:

#pragma describe.strategy bidirectional

DESCRIBE <theResource>

The other built-in describe strategy returns the CBD - Concise Bounded Description of the given resource:

#pragma describe.strategy cbd

DESCRIBE <theResource>

The default describe strategy can be changed by setting the query.describe.strategy database configuration option. Finally, it is possible to implement a custom describe strategy by implementing a simple Java interface. An example can be found in the stardog examples repo.

Federated Queries

Stardog supports the SERVICE keyword, which allows users to query distributed RDF via SPARQL-compliant data sources. You can use this to federate queries between several Stardog databases or Stardog and other public endpoints.

You can also use service variables in your queries to dynamically select the endpoints for federated queries. For example:

{
  ?service a :MyService .

  SERVICE ?service { ... }
}

Stardog ships with a default Service implementation which uses the SPARQL Protocol to send the service fragment to the remote endpoint and retrieve the results. Any endpoint that conforms to the SPARQL protocol can be used.

The Stardog SPARQL endpoint is http://<server>:<port>/{db}/query.

HTTP Authentication

Stardog requires authentication. If the endpoint you’re referencing with the SERVICE keyword requires HTTP authentication, credentials are stored in a password file called services.sdpass located in STARDOG_HOME directory. The default Service implementation assumes HTTP BASIC authentication; for services that use DIGEST auth, or a different authentication mechanism altogether, you’ll need to implement a custom Service.

HTTP Credentials Passthrough

In addition to authenticating with a password file, Stardog can reuse the credentials of the current user via passthrough mechanism. This needs to be explicitly enabled with the service.sparql.credentials.passthrough.regex database option, and the service endpoint must match the specified regex.

The following will enable passthrough for all services:

$ stardog-admin metadata set service.sparql.credentials.passthrough.regex='.*' my_db_name

Querying Local Databases

Stardog contains a specialized service implementation that lets users query other databases stored in the server without going through HTTP. The user executing the query will still be authenticated, just via Stardog authentication. In other words, the user executing the query must have proper permissions to read from the database they are attempting to query. The URI to follow the SERVICE keyword must begin with db://, followed by the database name.

Here’s an example querying a database named “books”:

SELECT * {
    SERVICE <db://books> {
        ?s ?p ?o
    }
}

Namespaces

Stardog allows users to store and manage custom namespace prefix bindings for each database. These stored namespaces allow users to omit prefix declarations in Turtle files and SPARQL queries. The Database Administration section describes how to manage these namespace prefixes in detail.

Stored namespaces allow one to use Stardog without declaring a single namespace prefix. Stardog will use its default namespace (http://api.stardog.com/) behind the scenes so that everything will still be valid RDF, but users won’t need to deal with namespaces manually. Stardog will act as if there are no namespaces, which in some cases is exactly what you want!

For example, let’s assume we have some data that does not contain any namespace declarations:

:Alice a :Person ;
       :knows :Bob .

We can create a database using this file directly:

$ stardog-admin db create -n myDb data.ttl

We can also add this file to the database after it is created. After the data is loaded, we can execute SPARQL queries without prefix declarations:

$ stardog query execute myDb "SELECT * { ?person a :Person }"
+--------+
| person |
+--------+
| :Alice |
+--------+

Query returned 1 results in 00:00:00.111

Once we export the data from this database, the default (i.e., built-in) prefix declarations will be printed, but otherwise we will get the same serialization as in the original data file:

$ stardog data export mydb

@prefix : <http://api.stardog.com/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix stardog: <tag:stardog:api:> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

:Alice a :Person ;
       :knows :Bob .

Query Functions

Stardog supports all of the functions from the SPARQL spec, as well as some others from XPath and SWRL. See SPARQL Query Functions for a complete list of built-in functions supported.

Any of the supported functions can be used in queries or rules. Note that some functions appear in multiple namespaces, but using any of the namespaces will work. Namespaces can be omitted when calling functions, too.

If the same name is used for different functions in different namespaces, the precedence is given to the standard functions. It is best practice to use the explicit namespace for such functions to avoid ambiguity.

XPath comparison and arithmetic operators on duration, date, and time values are supported by overloading the corresponding SPARQL operators such as =, >, +, -, etc.

In addition to the built-in functions, new functions can be defined by assigning a new name to a SPARQL expression. These function definitions can either be defined in-line in a query or stored in the system and used in any query or rule. Finally, custom function implementations can be implemented in a JVM-compatible language and registered in the system. See the query functions section for more details.

RDF List Functions

Stardog supports length and membership operations over RDF lists.

Assuming we have a database with the list defined like this:

@prefix stardog: <tag:stardog:api:> .

:literalList rdf:first "one" ;
             rdf:rest [
                rdf:first "two" ;
                rdf:rest [
                    rdf:first "three" ;
                    rdf:rest [
                        rdf:first "four" ;
                        rdf:rest [
                            rdf:first "two" ;
                            rdf:list rdf:nil
                        ]
                    ]
                ]
            ] .

We can retrieve the length of this list:

$ stardog query mydb "SELECT ?length { :literalList stardog:list:length ?length }"
+---------------+
|    length     |
+---------------+
| "5"^^xsd:long |
+---------------+

Query returned 1 results in 00:00:00.135

And verify that it contains an element:

$ stardog query mydb 'ASK { :literalList stardog:list:member "two" }'
Result: true

Or find out where in the list this element occurs:

$ stardog query mydb 'SELECT ?index { :literalList stardog:list:member ("two" ?index) }'
+---------------+
|     index     |
+---------------+
| "1"^^xsd:long |
| "4"^^xsd:long |
+---------------+

Query returned 2 results in 00:00:00.128

We can also fetch an element at given index (note zero-based indexing):

$ stardog query mydb "SELECT ?element { :literalList stardog:list:member (?element 0) }"
+---------+
| element |
+---------+
| "one"   |
+---------+

Query returned 1 results in 00:00:00.151

Or all elements along with their indices:

$ stardog query mydb "SELECT ?element ?index { :literalList stardog:list:member (?element ?index) }"
+---------+---------------+
| element |     index     |
+---------+---------------+
| "one"   | "0"^^xsd:long |
| "two"   | "1"^^xsd:long |
| "three" | "2"^^xsd:long |
| "four"  | "3"^^xsd:long |
| "two"   | "4"^^xsd:long |
+---------+---------------+

Query returned 5 results in 00:00:00.129

Or just elements:

$ stardog query mydb "SELECT ?element { :literalList stardog:list:member ?element }"
+---------+
| element |
+---------+
| "one"   |
| "two"   |
| "three" |
| "four"  |
| "two"   |
+---------+

Query returned 5 results in 00:00:00.132

List functions are not supported in combination with certain other features such as reasoning.

Special Named Graphs

Stardog includes aliases for several commonly used sets of named graphs. These non-standard extensions are used when specifying the dataset for a query, that is, in the FROM and FROM NAMED clauses of a SPARQL query, or via the SPARQL Protocol, or CLI, etc. These graphs are read-only and cannot be updated. Following is a list of special named graph IRIs.

Named Graph IRI	Refers to
`tag:stardog:api:context:default`	the default (no) context graph
`tag:stardog:api:context:named`	all local named graphs, excluding the default graph
`tag:stardog:api:context:local`	all local graphs - the default graph and named graphs
`tag:stardog:api:context:virtual`	all virtual graphs
`tag:stardog:api:context:all`	all local and virtual graphs

Examples

Given a database with these three triples:

:d :from :default .
GRAPH :g1 {
  :n1 :from :g1 .
}
GRAPH :g2 {
  :n2 :from :g2 .
}

These are some queries that use the FROM clause to set the default graph, and their results:

SELECT * FROM <tag:stardog:api:context:default> {
  ?s ?p ?o
}

s	p	o
:d	:from	:default

SELECT * FROM :g1 {
  ?s ?p ?o
}

s	p	o
:n1	:from	:g1

SELECT * FROM <tag:stardog:api:context:named> {
  ?s ?p ?o
}

s	p	o
:n1	:from	:g1
:n2	:from	:g2

SELECT * FROM <tag:stardog:api:context:local> {
  ?s ?p ?o
}

s	p	o
:d	:from	:default
:n1	:from	:g1
:n2	:from	:g2

Likewise, these queries use the FROM NAMED clause to set the named graphs of the the query dataset:

SELECT * {
  GRAPH ?g { ?s ?p ?o }
}

g	s	p	o
	:d	:from	:default

SELECT * FROM NAMED <tag:stardog:api:context:default> {
  GRAPH ?g { ?s ?p ?o }
}

g	s	p	o
	:d	:from	:default

SELECT * FROM NAMED :g1 {
  GRAPH ?g { ?s ?p ?o }
}

g	s	p	o
:g1	:n1	:from	:g1

SELECT * FROM NAMED <tag:stardog:api:context:named> {
  GRAPH ?g { ?s ?p ?o }
}

g	s	p	o
:g1	:n1	:from	:g1
:g2	:n2	:from	:g2

SELECT * FROM NAMED <tag:stardog:api:context:local> {
  GRAPH ?g { ?s ?p ?o }
}

g	s	p	o
	:d	:from	:default
:g1	:n1	:from	:g1
:g2	:n2	:from	:g2

Multiple FROM and FROM NAMED clauses, as well as combinations of the two, are supported:

SELECT *
FROM <tag:stardog:api:context:default> 
FROM :g1
FROM NAMED :g2 {
  { ?s ?p ?o }
  UNION
  { GRAPH ?g { ?s ?p ?o } }
}

g	s	p	o
	:d	:from	:default
	:n1	:from	:g1
:g2	:n2	:from	:g2

Relationship to `query.all.graphs`

When no dataset is provided, the default dataset depends on the setting of the query.all.graphs server configuration or database option. Note that when virtual transparency is enabled, the set of virtual graphs (tag:stardog:api:context:virtual) is included in all graphs.

When query.all.graphs is false the default scope of the query dataset is the default graph:

SELECT * {
  ?s ?p ?o
}

s	p	o
:d	:from	:default

and the named scope is the named graphs:

SELECT * {
  GRAPH ?g { ?s ?p ?o }
}

g	s	p	o
:g1	:n1	:from	:g1
:g2	:n2	:from	:g2

When query.all.graphs is true the default scope of the query dataset is the default graph plus the named graphs (together known as the local graphs):

SELECT * {
  ?s ?p ?o
}

g	s	p	o
	:d	:from	:default
:g1	:n1	:from	:g1
:g2	:n2	:from	:g2

and the named scope remains unchanged as the set of named graphs:

SELECT * {
  GRAPH ?g { ?s ?p ?o }
}

g	s	p	o
:g1	:n1	:from	:g1
:g2	:n2	:from	:g2

Named Graph Aliases

Stardog enables users to create aliases for named graph IRIs appearing in the data. The aliases can be used in SPARQL queries and provide a layer of abstraction between the queries or applications and the data. In particular, queries can run against different graphs — local or virtual — when the alias definitions are changed. Importantly, neither the queries themselves nor the relevant HTTP parameters defining the query dataset need to change. That helps make data changes transparent to consumers (applications). This is best illustrated by an example.

The common data cleansing scenario involves data being imported into a staging graph (call it :staging), preprocessed (for example, validated using SHACL, cleaned, augmented, etc.), and then moved to a graph visible to currently deployed applications (call it :production). When the data is ready, it needs to be made available to applications. Without named graph aliases, this could have been done in two ways:

by moving it from :staging to :production via SPARQL Update
by changing all query requests from :production to :staging

Both approaches have rather obvious shortcomings.

Named graph aliases rectify the problem by allowing users to declare :staging as an alias which can be pointed to :production as soon as the data is ready. That requires neither data movement nor changes on the query or application level.

To use named graph aliases, one must first set the graph.aliases database property to true. It can be done at database creation time or later.

Querying Aliases

Named graph aliases are IRIs which can appear after FROM and FROM NAMED keywords in read queries, as well as after USING and USING NAMED keywords in DELETE/INSERT/WHERE queries, for example:

select ?person ?name from :graph {
  ?person foaf:name ?name
}

insert { ?person a :Person } using :graph where {
  ?person foaf:name ?name
}

Assuming :graph is an alias for :g, Stardog will replace :graph with :g before processing the query. Although this is already quite powerful, named graph aliases are not restricted to this simple use case and generalize in two ways. First, an IRI can be an alias for a set of graphs in the data, not just one graph. Second, special graphs, as well as virtual graphs, can be used in the alias definition just as regular graphs are.

Adding and Updating Aliases

Named graph aliases are defined on a per-database basis, and the definitions are stored in the data as triples. The schema consists of a single predicate (<tag:stardog:api:graph:alias>) whose domain is the aliases and the range is actual graphs in the data. Alias definitions must be asserted in the special named graph <tag:stardog:api:graph:aliases>, as in the following TriG snippet:

<tag:stardog:api:graph:aliases> {
  :graph <tag:stardog:api:graph:alias> :g1, :g2 .
}

Using Java

Stardog’s Java API provides a convenient mechanism for retrieving and updating graph aliases based on the com.complexible.stardog.query.GraphAliases interface available from the database’s Connection object. Under the hood, it simply fetches and updates data in <tag:stardog:api:graph:aliases>.

Every Connection gets its own snapshot of aliases that will not be affected by concurrent transactions that update aliases in the database.

Integration with Other Features

Named graph aliases interact with several Stardog features, particularly, the Named Graph Security and virtual graphs. The latter is pretty straightforward: one can define an alias for any combination of local and virtual graphs, and the FROM or FROM NAMED statements for that alias will be replaced by those with the corresponding local and virtual graph IRIs. That will happen before the query engine starts any VG-specific processing of the query like applying mappings, establishing a connection to the remote data source, etc.

As far as Named Graph Security is concerned, aliases behave like regular graphs. It is possible to define read and write permissions for an alias. If a user is allowed to read the graph :g, and :g happens to be an alias for :g1 union :g2, a query using :g in its dataset will be allowed to read :g1 and :g2. If a query uses :g1 or :g2 in its dataset directly, the user will need explicit permissions to access those graphs.

Limitation of Aliases

Named graph aliases have the following limitations:

They cannot be used in GRAPH keywords, CONSTRUCT, INSERT or UPDATE templates, or ADD/DROP/CLEAR/COPY/MOVE queries.
Aliases cannot be defined for other aliases.

Some of these restrictions may be lifted in the future.

Excluding Graphs from Query’s Dataset

SPARQL 1.1 only supports adding graphs to the query dataset but not excluding them. This gets inconvenient when the query should match data in a large set of graphs but not touch a small set of graphs. In the extreme case, it could be all graphs except for one. One common example is security where sensitive information can be stored in specific graphs that should be hidden from most queries executed by regular users. In such cases, the user has to explicitly list all graphs that the query should access even if the number of such graphs greatly exceeds those that the query should not access.

Stardog 10.1 lets users exclude specific graphs from the dataset by explicitly enumerating them while executing the query. Conceptually, the feature is the direct opposite of FROM [NAMED]: any excluded graph is removed from the query dataset, i.e. the query is executed as if the graph does not exist. The only difference is that there is only a single list of excluded graphs (rather than one list for the FROM part and another list for the FROM NAMED part).

Currently, the exclusion feature is only supported through the Java API (using the Dataset.excludedGraphs() method and the HTTP API (using the exclude-graph-uri HTTP parameter). There is no explicit SPARQL syntax support for it, for now.

All kinds of graphs supported in Stardog can be excluded: regular graph IRIs, special graphs, aliases, and virtual graphs.

UNNEST Operator and Arrays

Stardog includes an UNNEST operator as a SPARQL extension. Similar to the BIND operator, UNNEST introduces new variable bindings as a result of evaluating an expression. The key difference is that UNNEST may produce more than one binding for each input solution. This is useful when dealing with arrays.

Arrays can be created with the set and split operators. The UNNEST operator allows transforming an array into a set of solutions. For example, consider the following query:

select ?person ?name {
  ?person :names ?csvNameString
  UNNEST(split(?csvNameString, ",") as ?name)
}

If we match a triple which binds ?person to <urn:John> and ?csvNameString to "John,Johnny", the following solutions will be returned for the query:

?person	?name
`<urn:John>`	`"John"`
`<urn:John>`	`"Johnny"`

If the array has no elements or evaluation of the source expressions produce an error, the target variable will be unbound.

In addition to elements, UNNEST can bind array indices with the ARRAY_INDEX argument:

select ?idx ?person ?name {
  ?person :names ?csvNameString
  UNNEST(split(?csvNameString, ",") as ?name, array_index as ?idx)
}

With the same data as above, this query will produce the following solutions:

?idx	?person	?name
1	`<urn:John>`	`"John"`
2	`<urn:John>`	`"Johnny"`

Note that indexing is one-based, which is similar to how ROW_NUMBER() behaves in some SQL systems.

UNNEST is governed by the same scope principles as BIND. Variables used in the expression must precede the UNNEST operator syntactically. References to the variables which are being assigned must occur syntactically after the UNNEST operator.

Plan Queries

Query plans returned from the query explain command can be executed with the query execute command in the same manner as SPARQL queries.

The plan needs to be in a verbose format, which can be achieved with --verbose flag:

$ stardog query explain --verbose myDB "SELECT DISTINCT ?s WHERE { ?s ?p ?o } LIMIT 10"

This will produce an output similar to this:

QueryPlan
Slice(offset=0, limit=10)
`─ Distinct
   `─ Projection(?s)
      `─ Scan[S](?s, ?p, ?o)

Assuming the output is saved in a file named query.plan, the following is equivalent to running the original query:

$ stardog query execute myDB query.plan

Editing this plan directly can help with performance debugging and fine-tuning - see Query Plan Syntax for details on query plan format.

Plan queries are currently supported in CLI, Java API, and HTTP API.

Chapter Contents

Path Queries - query Stardog to find paths between nodes in an RDF graph
Full-text Search - query Stardog for RDF literals present in Stardog
Geospatial Queries - discusses Stardog's support for geospatial queries
GraphQL Queries - discusses Stardog's support for querying data using GraphQL
BARQ Execution Engine - query Stardog using BARQ
Edge Properties - discusses Stardog's support for edge properties, bridging the gap between the RDF data model and the Property Graph data model.
Testing Queries - Test the correctness and performance of your queries automatically
BI Tools and SQL Queries - query Stardog using business intelligence tools like Tableau
Query Functions - learn about query functions supported by Stardog
Stored Query Service - use stored queries as building blocks for larger queries
Obfuscating Data - discusses obfuscating datasets and queries in Stardog.
Query Hints - contains all query hints available in Stardog
Sampling Service - use sampling service to sample over a triple pattern
Sequence Service - use the sequence service to generate numerical sequences
Statistics Service - use the statistics service to gain insights into your data
Label Service - use the label service to look up entities by labels and vice versa.