Query Stardog

This chapter discusses a variety of ways to query your Stardog knowledge graph. This page describes the basics of querying Stardog. See the Chapter Contents to see what else is included in this chapter.

Page Contents

Overview
Executing Queries
Path Queries
DESCRIBE Queries
Federated Queries
1. HTTP Authentication
2. Querying Local Databases
Namespaces
Query Functions
Special Named Graphs
1. Examples
Named Graph Aliases
Obfuscating
UNNEST Operator and Arrays
Plan Queries
Chapter Contents

Overview

Stardog supports the SPARQL query language along with OWL & Rule Reasoning. It supports SPARQL 1.1. Below we discuss some of the basics.

Executing Queries

To execute a SPARQL query against a Stardog database with the CLI, use the query execute subcommand with a query string, a query file, or the name of a stored query.

$ stardog query execute myDb "select * where { ?s ?p ?o }"

Any SPARQL query type (SELECT, CONSTRUCT, DESCRIBE, PATHS, ASK or any update query type) can be executed using this command.

Reasoning can be enabled by using the --reasoning flag (or -r for short):

$ stardog query execute --reasoning myDb "select * where { ?sub rdfs:subClassOf ?super }"

By default, all Stardog CLI commands assume the server is running on the same machine as the client using port 5820. But you can interact with a server running on another machine using a full connection string:

$ stardog query execute http://myHost:9090/myDb "select * where { ?s ?p ?o }"

Detailed information on using the query execute command in Stardog can be found in the query execute man page.

Path Queries

Stardog extends SPARQL for path queries which can be used to find paths between two nodes in a graph. Path queries are similar to SPARQL property paths that recursively traverse a graph and find two nodes connected via a complex path of edges. But SPARQL property paths only return the start and end nodes of a path. Stardog path queries return all the intermediate nodes on the path and allow arbitrary SPARQL patterns to be used in the query.

Here’s a simple path query to find how Alice and Charlie are connected to each other:

$ stardog query execute exampleDB "PATHS START ?x = :Alice END ?y = :Charlie VIA ?p"
+----------+------------+----------+
|    x     |     p      |    y     |
+----------+------------+----------+
| :Alice   | :knows     | :Bob     |
| :Bob     | :worksWith | :Charlie |
|          |            |          |
| :Alice   | :worksWith | :Carol   |
| :Carol   | :knows     | :Charlie |
+----------+------------+----------+

Query returned 2 paths in 00:00:00.056

Each row of the result table shows one edge. Adjacent edges are printed on subsequent rows of the table. Multiple paths in the results are separated by an empty row.

Path queries by default return only the shortest paths. See the Path Queries section for details about finding different kinds of paths, e.g. all paths (not just shortest ones), paths between all nodes, and cyclic paths.

DESCRIBE Queries

SPARQL provides a DESCRIBE query type that returns a subgraph containing information about a resource:

DESCRIBE <theResource>

SPARQL’s DESCRIBE keyword is deliberately underspecified. In Stardog, by default, a DESCRIBE query retrieves all the triples for which <theResource> is the subject. There are, of course, about seventeen thousand other ways to implement DESCRIBE. Starting with Stardog 5.3, we are providing two additional describe strategies out of the box. The desired describe strategy can be selected by using a special query hint. For example, the following query will return all the triples where theResource is either the subject or the object:

#pragma describe.strategy bidirectional

DESCRIBE <theResource>

The other built-in describe strategy returns the CBD - Concise Bounded Description of the given resource:

#pragma describe.strategy cbd

DESCRIBE <theResource>

The default describe strategy can be changed by setting the query.describe.strategy database configuration option. Finally, it is also possible to implement a custom describe strategy by implementing a simple Java interface. An example can be found in the
stardog examples repo.

Federated Queries

Stardog supports the SERVICE keyword which allows users to query distributed RDF via SPARQL-compliant data sources. You can use this to federate queries between several Stardog databases or Stardog and other public endpoints.

You can also use service variables in your queries to dynamically select the endpoints for federated queries, for example:

{
  ?service a :MyService .

  SERVICE ?service { ... }
}

Stardog ships with a default Service implementation which uses SPARQL Protocol to send the service fragment to the remote endpoint and retrieve the results. Any endpoint that conforms to the SPARQL protocol can be used.

The Stardog SPARQL endpoint is http://<server>:<port>/{db}/query.

HTTP Authentication

Stardog requires authentication. If the endpoint you’re referencing with the SERVICE keyword requires HTTP authentication, credentials are stored in a password file called services.sdpass located in STARDOG_HOME directory. The default Service implementation assumes HTTP BASIC authentication; for services that use DIGEST auth, or a different authentication mechanism altogether, you’ll need to implement a custom Service

Querying Local Databases

Stardog contains a specialized service implementation that lets users query other databases stored in the server without going through HTTP. The user executing the query will be still be authenticated, just via Stardog authentication. In other words, the user executing the query must have proper permissions to read from the database they are attempting to query. The URI to follow the SERVICE keyword must begin with db:// followed by the database name.

Here’s an example querying a database named “books”.

SELECT * {
    SERVICE <db://books> {
        ?s ?p ?o
    }
}

Namespaces

Stardog allows users to store and manage custom namespace prefix bindings for each database. These stored namespaces allow users to omit prefix declarations in Turtle files and SPARQL queries. The Database Administration section describes how to manage these namespace prefixes in detail.

Stored namespaces allow one to use Stardog without declaring a single namespace prefix. Stardog will use its default namespace (http://api.stardog.com/) behind the scenes so that everything will still be valid RDF, but users won’t need to deal with namespaces manually. Stardog will act as if there are no namespaces, which in some cases is exactly what you want!

For example, let’s assume we have some data that does not contain any namespace declarations:

:Alice a :Person ;
       :knows :Bob .

We can create a database using this file directly:

$ stardog-admin db create -n myDb data.ttl

We can also add this file to the database after it is created. After the data is loaded, we can then execute SPARQL queries without prefix declarations:

$ stardog query execute myDb "SELECT * { ?person a :Person }"
+--------+
| person |
+--------+
| :Alice |
+--------+

Query returned 1 results in 00:00:00.111

Once we export the data from this database, the default (i.e., in-built) prefix declarations will be printed, but otherwise we will get the same serialization as in the original data file:

$ stardog data export mydb

@prefix : <http://api.stardog.com/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix stardog: <tag:stardog:api:> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

:Alice a :Person ;
       :knows :Bob .

Query Functions

Stardog supports all of the functions from the SPARQL spec, as well as some others from XPath and SWRL. See SPARQL Query Functions for a complete list of built-in functions supported.

Any of the supported functions can be used in queries or rules. Note that, some functions appear in multiple namespaces, but using any of the namespaces will work. Namespaces can be omitted when calling functions too.

If the same name is used for different functions in different namespaces then the precedence is given to the standard functions. It is best practice to use the explicit namespace for such functions to avoid ambiguity.

XPath comparison and arithmetic operators on duration, date and time values are supported by overloading the corresponding SPARQL operators such as =, >, +, -, etc.

In addition to the built-in functions, new functions can be defined by assigning a new name to a SPARQL expression. These function definitions can either be defined inline in a query or stored in the system and used in any query or rule. Finally, custom function implementations can be implemented in a JVM-compatible language and registered in the system. See the query functions section for more details.

Special Named Graphs

Stardog includes aliases for several commonly used sets of named graphs. These non-standard extensions are used when specifying the dataset for a query, that is, in the FROM and FROM NAMED clauses of a SPARQL query, or via the SPARQL Protocol, or CLI, etc. These graphs are read-only and cannot be updated. Following is a list of special named graph IRIs.

Named Graph IRI	Refers to
`tag:stardog:api:context:default`	the default (no) context graph
`tag:stardog:api:context:named`	all local named graphs, excluding the default graph
`tag:stardog:api:context:local`	all local graphs - the default graph and named graphs
`tag:stardog:api:context:virtual`	all virtual graphs (applicable when used with Virtual Transparency)
`tag:stardog:api:context:all`	all local graphs. If Virtual Transparency is enabled, all virtual graphs as well.

Examples

Given a database with these three triples:

:d :from :default .
GRAPH :g1 {
  :n1 from :g1 .
}
GRAPH :g2 {
  :n2 from :g2 .
}

These are some queries that use the FROM clause to set the default graph, and their results:

SELECT * {
  ?s ?p ?o
}

s	p	o
:d	:from	:default

SELECT * FROM <tag:stardog:api:context:default> {
  ?s ?p ?o
}

s	p	o
:d	:from	:default

SELECT * FROM <tag:stardog:api:context:named> {
  ?s ?p ?o
}

s	p	o
:n1	:from	:g1
:n2	:from	:g2

SELECT * FROM <tag:stardog:api:context:local> {
  ?s ?p ?o
}

s	p	o
:d	:from	:default
:n1	:from	:g1
:n2	:from	:g2

Likewise, these queries use the FROM NAMED clause to set the named graphs of the the query dataset:

SELECT * {
  GRAPH ?g { ?s ?p ?o }
}

g	s	p	o
	:d	:from	:default

SELECT * FROM NAMED <tag:stardog:api:context:default> {
  GRAPH ?g { ?s ?p ?o }
}

g	s	p	o
	:d	:from	:default

SELECT * FROM NAMED <tag:stardog:api:context:named> {
  GRAPH ?g { ?s ?p ?o }
}

g	s	p	o
:g1	:n1	:from	:g1
:g2	:n2	:from	:g2

SELECT * FROM NAMED <tag:stardog:api:context:local> {
  GRAPH ?g { ?s ?p ?o }
}

g	s	p	o
	:d	:from	:default
:g1	:n1	:from	:g1
:g2	:n2	:from	:g2

Multiple FROM and FROM NAMED clauses, as well as combinations of the two, are supported:

SELECT *
FROM <tag:stardog:api:context:default> 
FROM :g1
FROM NAMED :g2 {
  { ?s ?p ?o }
  UNION
  { GRAPH ?g { ?s ?p ?o } }
}

g	s	p	o
	:d	:from	:default
	:n1	:from	:g1
:g2	:n2	:from	:g2

Named Graph Aliases

As of version 7.4.5, Stardog enables users to create aliases for named graph IRIs appearing in the data. The aliases can be used in SPARQL queries and provide a layer of abstraction between the queries or applications and the data. In particular queries can run against different graphs — local or virtual — when the alias definitions are changed. Importantly, neither the queries themselves nor the relevant HTTP parameters defining the query dataset need to change. That helps make data changes transparently to consumers (applications). This is best illustrated by an example.

The common data cleansing scenario involves data being imported into a staging graph (call it :staging), preprocessed (for example, validated using SHACL, cleaned, augmented, etc.), and then moved to a graph visible to currently deployed applications (call it :production). When the data is ready, it needs to be made available to applications. Prior to 7.4.5 this could have been made in two ways:

by moving it from :staging to :production via SPARQL Update
by changing all query requests from :production to :staging.

Both approaches have rather obvious shortcomings.

Named graph aliases rectify the problem by allowing users to declare :production as an alias which can be pointed to :staging as soon as the data is ready. That requires neither data movement nor changes on the query or application level.

To use named graph aliases one must first set the graph.aliases database property to true. It can be done at database creation time or later.

Querying Aliases

Named graphs aliases are IRIs which currently can appear after FROM and FROM NAMED keywords in read queries as well as after USING and USING NAMED keywords in DELETE/INSERT/WHERE queries, for example:

select ?person ?name from :graph {
  ?person foaf:name ?name
}

insert { ?person a :Person } using :graph where {
  ?person foaf:name ?name
}

Assuming :graph is an alias for :g, Stardog will replace :graph by :g before processing the query. Although this is already quite powerful, named graph aliases are not restricted to this simple use case and generalize it in two ways. First, an IRI can be an alias for a set of graphs in the data, not just one graph. Second, special graphs as well as virtual graphs can be used in the alias definition just as regular graphs.

Adding and Updating Aliases

Named graph aliases are defined on per-database basis and the definitions are stored in the data as triples. The schema consists of a single predicate <tag:stardog:api:graph:alias> whose domain is the aliases and the range is actual graphs in the data. Alias definitions must be asserted in the special named graph <tag:stardog:api:graph:aliases>, as in the following TriG snippet:

<tag:stardog:api:graph:aliases> {
  :graph <tag:stardog:api:graph:alias> :g1, :g2 .
}

Using Java

Stardog Java API provides a convenient mechanism for retrieving and updating graph aliases based on the com.complexible.stardog.query.GraphAliases interface available from the database’s Connection object. Under the hood it simply fetches and updates data in <tag:stardog:api:graph:aliases>.

Every Connection gets its own snapshot of aliases which will not be affected by concurrent transactions updating aliases in the database.

Integration with Other Features

Named graph aliases interact with several Stardog features, particularly, the Named Graph Security and virtual graphs. The latter is pretty straightforward: one can define an alias for any combination of local and virtual graphs and the FROM or FROM NAMED statements for that alias will be replaced by those with the corresponding local and virtual graph IRIs. That will happen before the query engine starts any VG-specific processing of the query, like applying mappings, establishing a connection to the remote data source, etc.

As far as Named Graph Security is concerned, aliases behave like regular graphs. It is possible to define read and write permissions for an alias. If a user is allowed to read the graph :g, and :g happens to be an alias for :g1 union :g2, a query using :g in its dataset will be allowed to read :g1 and :g2. If a query uses :g1 or :g2 in its dataset directly, the user will need explicit permissions to access those graphs.

Limitation of Aliases

Named graph aliases have the following limitations:

They cannot be used in GRAPH keywords, CONSTRUCT, INSERT or UPDATE templates or ADD/DROP/CLEAR/COPY/MOVE queries.
Aliases cannot be defined for other aliases.

Some of these restrictions may be lifted in the future.

Obfuscating

When sharing sensitive RDF data with others, you might want to (selectively) obfuscate it so that sensitive bits are not present, but non-sensitive bits remain. For example, this feature can be used to submit Stardog bug reports using sensitive data.

Data obfuscation works much the same way as the data export command and supports the same set of arguments:

$ stardog data obfuscate myDatabase obfDatabase.ttl

By default, all URIs, bnodes, and string literals in the database will be obfuscated using the SHA256 message digest algorithm. Non-string typed literals (numbers, dates, etc.) are left unchanged as well as URIs from built-in namespaces (RDF, RDFS, and OWL). It’s possible to customize obfuscation by providing a configuration file.

$ stardog data obfuscate --config obfConfig.ttl myDatabase obfDatabase.ttl

The configuration specifies which URIs and strings will be obfuscated by defining inclusion and exclusion filters. See the example configuration file in the stardog-examples Github repo.

Once the data is obfuscated, queries written against the original data will no longer work. Stardog provides query obfuscation capability, too, so that queries can be executed against the obfuscated data. If a custom configuration file is used to obfuscate the data, then the same configuration should be used for obfuscating the queries as well:

$ stardog query obfuscate --config obfConfig.ttl myDatabase myQuery.sparql > obfQuery.ttl

UNNEST Operator and Arrays

Stardog includes an UNNEST operator as a SPARQL extension. Similar to the BIND operator, UNNEST introduces new variable bindings as a result of evaluating an expression. The key difference is that UNNEST may produce more than one binding for each input solution. This is useful when dealing with arrays.

Arrays can be created with the set and split. The UNNEST operator allows transforming an array into a set of solutions. For example, consider the following query:

select ?person ?name {
  ?person :names ?csvNameString
  UNNEST(split(?csvNameString, ",") as ?name)
}

If we match a triple which binds ?person to <urn:John> and ?csvNameString to "John,Johnny", the following solutions will be returned for the query:

?person	?name
`<urn:John>`	`"John"`
`<urn:John>`	`"Johnny"`

If the array has no elements or evaluation of the source expressions produce an error, the target variable will be unbound.

UNNEST is governed by the same scope principles as BIND. Variables used in the expression must precede the UNNEST operator syntactically. References to the variable which is being assigned must occur syntactically after the UNNEST operator.

Plan Queries

Query plan returned from query explain command can be executed with query execute command in the same manner as SPARQL queries.

The plan needs to be in a verbose format, which can be achieved with --verbose flag:

$ stardog query explain --verbose myDB "SELECT DISTINCT ?s WHERE { ?s ?p ?o } LIMIT 10"

This will produce an output similar to this:

QueryPlan
Slice(offset=0, limit=10)
`─ Distinct
   `─ Projection(?s)
      `─ Scan[S](?s, ?p, ?o)

Assuming the output is saved in a file named query.plan, the following is equivalent to running the original query:

$ stardog query explain myDB query.plan

Editing this plan directly can help with performance debugging and fine-tuning - see Query Plan Syntax for details on query plan format.

Plan queries are currently supported in CLI, Java API and HTTP API.

Chapter Contents

Path Queries - query Stardog to find paths between nodes in an RDF graph
Full-text Search - query Stardog for RDF literals present in Stardog
Geospatial Queries - discusses Stardog's support for geospatial queries
GraphQL Queries - discusses Stardog's support for querying data using GraphQL
Edge Properties - discusses Stardog's support for edge properties, bridging the gap between the RDF data model and the Property Graph data model.
Testing Queries - Test the correctness and performance of your queries automatically
BI Tools and SQL Queries - query Stardog using business intelligence tools like Tableau
Query Functions - learn about query functions supported by Stardog
Stored Query Service - use stored queries as building blocks for larger queries