Link Search Menu Expand Document
Start for Free

Getting Started Part 4: Learn SPARQL in Studio

Put Knowledge Graph concepts in action using SPARQL

Page Contents
  1. Introduction
  2. Getting Started
    1. Understanding the Schema
    2. A relational aside …
  3. Using Studio
  4. Introduction to SPARQL
    1. SELECT Queries
    2. Triple Patterns
    3. Basic Graph Patterns
    4. Ordering Results
    5. Slicing Results
    6. Filtering Results
    7. Binding Values
    8. Removing Duplicates
    9. Aggregation
    10. Joins
    11. Grouping
    12. Subqueries
    13. Alternatives
    14. Optional Matches
    15. Negation
    16. Property Paths
      1. Inverse Path
      2. Sequence Path
      3. Recursive Paths
      4. Optional Paths
      5. Alternative Paths
  5. Shortest Path Queries
  6. SPARQL Query Forms
    1. ASK Queries
    2. DESCRIBE Queries
    3. CONSTRUCT Queries
  7. Update Queries
  8. What’s next?

Introduction

This is part of the Stardog Getting Started series, which puts Knowledge Graph concepts in action. This tutorial introduces the SPARQL query language. We’ll start briefly in Explorer, but work primarily in Studio.

Before you dive in, the introduction to the Getting Started series, Getting Started: Part 1 is a good pre-read.

This tutorial corresponds to Stardog’s SPARQL Tutorial knowledge kit, and we recommend you install this knowledge kit so you can follow along.

Getting Started

To view the data model for this Kit in Explorer, select a connection and install it in the section “Install this Knowledge Kit?”, on the top of this page. Then, click the Visualize button. You should see something like this:

Music Schema

Understanding the Schema

In this guide we’ll use the term “schema”, but you may hear terms like “data model” and “ontology” that all mean approximately the same thing - what kind of information is represented in the data and how is it related.

The simplest elements of the schema are Classes and Relationships. Classes are the distinct concepts that are represented. Relationships are how those classes are related. There are also Datatype Properties, or Attributes, the basic information or descriptors about an specific instance of a class (e.g. age or serial number).

In this example, you’ll see our basic concepts, such as Band and Album, and their relations to other concepts, e.g. Bands are the artist for Albums and those Albums have via track a Track list. The attributes, such as an Album’s release year are not pictured in Explorer, you will see those in Designer, which is our tool for building Knowledge Graphs.

A relational aside …

For those familiar with relational databases, we can already see a benefit of using the graph-based model. In a relational system, it is straightforward to model people, bands, songs, and that people are in bands. The part of the schema would look something like this:

relational example

But how do we augment this to allow both a person and a band to sing songs?

  • We could make separate bandSong and personSong tables and then have a view across them.
  • We could create a concept of a performers table which has both a performerID value and a column for performerType (band or person).
  • We could have a songPerformers table that has songID, performerID, and then typeID

All of these are reasonable and depend on what we want to do now and expect down the road, but we have to choose one. With our graph database, we don’t have to make this choice, and our schema much more closely matches our intuition and what we would draw on the whiteboard.

Using Studio

Now that we’ve explored this data a bit in Explorer to learn what it looks like, it’s time to learn SPARQL. For this you will need to jump over to Studio, head into the workspace, and select the database you installed this kit into.

Then, select the graph stardog-tutorial:music:music_data to ensure all of the queries you run are against the correct graph. It should look like this:

You can now copy any query from this tutorial, paste it into Studio, and click on “Run” to visualize your results.

Please note that you will need to ensure the SPARQL type (bottom right) is selected for your Editor Tab.

Introduction to SPARQL

Stardog Knowledge Graph supports the SPARQL query language, a W3C standard for querying RDF graphs. In the RDF Graph Data Model tutorial we looked at the details of RDF graphs and in this tutorial we will learn how to query them.

SELECT Queries

The main query form in SPARQL is a SELECT query which, by design, looks a bit like a SQL query. A SELECT query has two main components: a list of selected variables and a WHERE clause for specifying the graph patterns to match:

SELECT <variables>
WHERE {
   <graph-pattern>
}

The result of a SELECT query is a table where there will be one column for each selected variable and one row for each pattern match.

Triple Patterns

The basic building block for SPARQL queries is triple patterns. A triple pattern is just like an RDF graph triple, but you can use a variable in any one of the three positions. We use triple patterns to find the matching triples in a graph and variables act like wildcards that match any node.

For example,

# 01a-albums.sparql
# dataset: beatles
prefix : <http://stardog.com/tutorial/>

SELECT ?album
WHERE {
   ?album rdf:type :Album .
}

Here we see a simple SELECT query with a single triple pattern: ?album rdf:type :Album.

Stardog stores namespaces in database metadata so we do not need to include the prefix declarations for every query.

This triple pattern will match all the triples in the graph that have rdf:type as the predicate and :Album as the object. There are three matching triples in our graph so the query result will look like this:

album
:Please_Please_Me
:McCartney
:Imagine

There are several syntactic simplifications we can do to this query:

  1. the WHERE keyword is optional for SELECT queries and can be omitted
  2. if all the variables are being selected, then * can be used instead of enumerating them explicitly
  3. just like in RDF, the keyword a can be used instead of rdf:type
  4. we can omit the trailing . for the last triple pattern

If we apply all these simplifications we get the following equivalent query, which we show on one line here to emphasize its concision:

# 01b-albums.sparql
# dataset: beatles
prefix : <http://stardog.com/tutorial/>

SELECT * 
{ 
    ?album a :Album 
}

SPARQL keywords are case-insensitive so one can use lowercase keywords like select instead of SELECT. We recommend consistency.

Basic Graph Patterns

When one or more triple patterns are used together they form what is known as a Basic Graph Pattern (BGP). Let’s add one more triple pattern to our previous query to retrieve the artist for each album:

# 02-albums-artists.sparql
# dataset: beatles
prefix : <http://stardog.com/tutorial/>

SELECT *
{
   ?album a :Album .
   ?album :artist ?artist .
}

The second triple pattern in this query will match the triples with :artist predicate and we will get a result table with two columns:

album artist
:Please_Please_Me :The_Beatles
:McCartney :Paul_McCartney
:Imagine :John_Lennon

Now let’s add a third triple pattern to require that the returned artists should be of the SoloArtist type:

# 03-albums-solo-artists.sparql
# dataset: beatles
prefix : <http://stardog.com/tutorial/>

SELECT *
{
   ?album a :Album .
   ?album :artist ?artist .
   ?artist a :SoloArtist .
}

The third pattern matches four triples in our graph (one match for each member of The Beatles), but only two of those matches are compatible with the matches of the previous two patterns so only two results will be returned:

album artist
:McCartney :Paul_McCartney
:Imagine :John_Lennon

Ordering Results

At this point we look at the larger music dataset to explore other interesting features of SPARQL.

If we run the following query against the larger music dataset we will get more than a thousand results:

# 04b-albums-dates.sparql
# dataset: music
prefix : <http://stardog.com/tutorial/>

SELECT *
{
   ?album a :Album ;
          :artist ?artist ;
          :date ?date .
}

In this query, we used ; to separate triple patterns that share the same subject.

Your results might look different than the following table because there is no built-in ordering for query results in SPARQL:

album artist date
:A_Date_with_Elvis :Elvis_Presley “1959-07-24”^^xsd:date
:A_Momentary_Lapse_of_Reason :Pink_Floyd “1987-09-07”^^xsd:date
:Achtung_Baby :U2 “1991-11-18”^^xsd:date

If we want the results to be ordered based on a sorting condition we can add an ORDER BY:

# 05-albums-dates-sorted.sparql
# dataset: music
prefix : <http://stardog.com/tutorial/>

SELECT *
{
   ?album a :Album ;
          :artist ?artist ;
          :date ?date .
}
ORDER BY ?date

Now albums will be returned ordered by their release dates:

album artist date
:Elvis_Presley_(album) :Elvis_Presley “1956-03-23”^^xsd:date
:Elvis_(1956_album) :Elvis_Presley “1956-10-19”^^xsd:date
:Loving_You_(album) :Elvis_Presley “1957-07-01”^^xsd:date

It is possible to have multiple sorting conditions by specifying multiple variables (or even function calls) in ORDER BY; and we can also sort the results in descending order by encapsulating the sort condition with the DESC keyword, like this: DESC(?date).

Slicing Results

When a query returns too many results we can limit the results by using the LIMIT keyword:

# 06-albums-dates-limited.sparql
# dataset: music
prefix : <http://stardog.com/tutorial/>

SELECT *
{
   ?album a :Album ;
          :artist ?artist ;
          :date ?date
}
ORDER BY desc(?date)
LIMIT 2

In this query we changed the dates to be sorted in reverse chronological order and limited the query to return only two results:

album artist date
:Hardwired…_to_Self-Destruct :Metallica “2016-11-18”^^xsd:date
:24K_Magic_(album) :Bruno_Mars “2016-11-18”^^xsd:date

We can skip the first N results by adding an OFFSET N clause at the end of the query where N is a positive integer. A paging capability can be supported by using LIMIT and OFFSET.

Filtering Results

We can filter the results returned by a query using a FILTER expression. SPARQL supports many built-in functions for writing such expressions:

  1. comparison operators: (=, !=, <, <=, >, >=)
  2. logical operators (&&, ||, !)
  3. mathematical operators (+, -, /, *)

Plus many others. Stardog also provides an extensive number of additional functions.

If we want to find the albums released in 1970 or later we can do this with the following filter expression:

# 07a-albums-dates-filtered.sparql
# dataset: music
prefix : <http://stardog.com/tutorial/>

SELECT *
{
   ?album a :Album ;
          :artist ?artist ;
          :date ?date
   FILTER (?date >= "1970-01-01"^^xsd:date)
}
ORDER BY ?date

All otherwise matching results not satisfying the filter condition will be excluded from the results:

album artist date
:This_Girl\’s_in_Love_with_You :Aretha_Franklin “1970-01-15”^^xsd:date
:Chicago_(album) :Chicago_(band) “1970-01-26”^^xsd:date
:Morrison_Hotel :The_Doors “1970-02-09”^^xsd:date

We can use any SPARQL function in the FILTER expressions. For example, the year function applied to a date value will return the year component as an integer value. So the following query will return the exact same results as the previous query but the filter is written in a slightly different way:

# 07b-albums-dates-filtered.sparql
# dataset: music
prefix : <http://stardog.com/tutorial/>

SELECT *
{
   ?album a :Album ;
          :artist ?artist ;
          :date ?date
      FILTER (year(?date) >= 1970)
}
ORDER BY ?date

Binding Values

We can assign the output of a function to a variable using the BIND keyword. This might be useful if we want to reuse the function result in different parts of the query or if we want to increase readability when we have a lot of nested function calls.

We can rewrite the previous query by binding the output of the year(?date) expression to a new variable ?year first and using the variable in the filter expresion:

# 07c-albums-dates-filtered.sparql
# dataset: music
prefix : <http://stardog.com/tutorial/>

SELECT *
{
   ?album a :Album ;
          :artist ?artist ;
          :date ?date
   BIND (year(?date) AS ?year)
   FILTER (?year >= 1970)
}
ORDER BY ?date

Removing Duplicates

Our music dataset is not complete by any means and we have about a thousand albums. Suppose we want to find out the years in which these albums were released. One attempt would be to take the previous query, remove the filter, and only select the ?year variable.

# 08a-albums-years-duplicates.sparql
# dataset: music
prefix : <http://stardog.com/tutorial/>

SELECT ?year
{
   ?album a :Album ;
          :artist ?artist ;
          :date ?date
   BIND (year(?date) AS ?year)
}
ORDER BY ?date

But we will quickly discover that this query will still return many and the year values will be repeated:

year
1956
1956
1957
1957
1957

We can see that changing just the selected variables has no effect on the number of results returned by a query. We will still get one result for each matching pattern so the number of rows in the result table won’t change; only the number of columns will change.

In order to get rid of duplicates we need to use the DISTINCT keyword right after SELECT:

# 08b-albums-years-distinct.sparql
# dataset: music
prefix : <http://stardog.com/tutorial/>

SELECT DISTINCT ?year
{
   ?album a :Album ;
          :artist ?artist ;
          :date ?date
   BIND (year(?date) AS ?year)
}
ORDER BY ?year

The results won’t have duplicates anymore:

year
1956
1957
1959
1960
1961

Aggregation

Aggregation is applying a function to a list of values rather than to a single value. Unlike regular functions, aggregate functions can only be used in SELECT expressions. Built-in aggregates provided in SPARQL are COUNT, SUM, MIN, MAX, AVG, GROUP_CONCAT, and SAMPLE.

We can find the earliest and the latest release dates of albums in our dataset by using the MIN and MAX aggregates:

# 09-albums-dates.minmax.sparql
# dataset: music
prefix : <http://stardog.com/tutorial/>

SELECT (min(?date) as ?minDate) (max(?date) as ?maxDate)
{
    ?album a :Album ;
           :date ?date
}

We will get a single result with two columns:

minDate maxDate
“1956-03-23”^^xsd:date “2016-11-18”^^xsd:date

The WHERE clause in this query would return a table with two columns and many rows if we didn’t use the aggregate functions. The MIN (respectively, MAX) function looks at the values in the specified column of the results table and returns the single smallest (respectively, largest) value found.

We can use the COUNT function to return the number of rows in the result table. The query to find the number of albums in our dataset is this:

# 10a-albums-count.sparql
# dataset: music
prefix : <http://stardog.com/tutorial/>

SELECT (count(?album) as ?count)
{
    ?album a :Album
}
count
1037

You can also count the relationships and how many times each type of relationship appears. Like with our classes, there will be some things that are likely less familiar like rdf:type - gloss over them for now and look at things like :sings and :hasName.

# 10b-predicate-count.sparql
# dataset: music
prefix : <http://stardog.com/tutorial/>

SELECT ?predicate (COUNT(?predicate) as ?predicateCount)
{
    ?subject ?predicate ?object .
}
GROUP BY ?predicate
ORDER BY DESC(?predicateCount)

Another aggregation is sample data:

# 10c-sample-data.sparql
# dataset: music
prefix : <http://stardog.com/tutorial/>

SELECT *
{
    ?s ?p ?o .
}
LIMIT 100

This shows us 100 sample triples from the data, each that is of the form [Concept] → [Relationship] → [Concept]. So the first line we have shows that [David Bowie] → [rdf:type (aka is of type)] → [Person]. Your 100 rows may be different, and you may have to look down a few rows to see some actual people, bands, or songs.

This is common type of query to say “just give me some sample data”, which looks like this (and is often said as “Select S P O”). Make sure to include a LIMIT when you use it.

Joins

Let’s find every album that has both an artist and a producer.

# 10d-artist-producer.sparql
# dataset: music
prefix : <http://stardog.com/tutorial/>

SELECT *
{
    ?album :artist ?artist .
    ?album :producer ?producer .
}

It is important that the ?album variable in the first triple pattern is the same as the ?album in the second triple pattern. So what this is saying is “Find me an album with an artist. Also, make sure that same album has a producer”. Note that you could reverse the order of the lines to get the same results:

# 10e-producer-artist.sparql
# dataset: music
prefix : <http://stardog.com/tutorial/>

SELECT *
{
    ?album :producer ?producer .
    ?album :artist ?artist .
}

Grouping

The previous aggregation examples worked over a single result table and returned a single row as the final result. We can also group the results based on the values of one or more variables and apply the aggregation functions to each group separately.

Suppose we want to find the number of albums released each year. We can group the albums based on their release year and use the COUNT aggregate for each group:

# 11a-albums-dates-grouped.sparql
# dataset: music
prefix : <http://stardog.com/tutorial/>

SELECT ?year (count(distinct ?album) AS ?count)
{
    ?album a :Album ;
            :date ?date ;
    BIND (year(?date) AS ?year)
}
GROUP BY ?year
ORDER BY desc(?count)

We will get one result for each distinct year value:

year count
1997 28
1975 28
1976 27

You might notice that we used the DISTINCT keyword inside the count aggregate. This is because some of the albums in our date have duplicate release dates. For example, the album “A Hard Day’s Night” has both 1964-06-26 and 1964-07-10 as release dates. This is due to the imperfection of our dataset and using the DISTINCT keyword ensures we count the album only once for that year.

This is not a perfect solution since it means we’ll double count albums if their multiple release dates are in different years. It’s better to clean up the data. And we can use the aggregates to find which albums have multiple release dates:

# 11b-albums-duplicate-dates.sparql
# dataset: music
prefix : <http://stardog.com/tutorial/>

SELECT ?album (group_concat(?date) AS ?dates)
{
    ?album a :Album ;
            :date ?date
}
GROUP BY ?album
HAVING (count(?date) > 1)

The HAVING keyword we used at the end acts like an overall filter on the query results. Since the aggregates can only be used in SELECT expressions we cannot use a regular FILTER (without introducing a subquery), so the HAVING keyword provides an easy way to define such filters.

Subqueries

If we want to find the average number of albums released in a year we need to use an aggregation function over the results of the previous query. This can be achieved by subqueries where we simply put a SELECT query inside another one:

# 12-albums-dates-subselect.sparql
# dataset: music
prefix : <http://stardog.com/tutorial/>

SELECT (avg(?count) AS ?avgCount)
{
      SELECT ?year (count(?album) AS ?count)
      {
            ?album a :Album ;
                  :date ?date ;
            BIND (year(?date) AS ?year)
      }
      GROUP BY ?year
}

The result of this query will be a single value:

avgCount
18.55

Of course subqueries don’t have to use aggregation; it would be fine to use any kind of SELECT query as a subquery. If the outer WHERE clause contains additional patterns, then the subquery should be surrounded with {}.

Alternatives

In our data we have artists separated into two types: bands and solo artists. If we want to retrieve all artists along with their names, then we can use the UNION operator to combine the matches from two different patterns:

# 13-artists-union.sparql
# dataset: music
prefix : <http://stardog.com/tutorial/>

SELECT ?name
{
    { ?artist a :SoloArtist }
    UNION
    { ?artist a :Band }
    ?artist :name ?name
}

The results will contain artists matching either pattern:

name
“David Bowie”
“Brian May”
“Metallica”

If the same artists matched both patterns then we would get a duplicate result and would need DISTINCT to get unique results.

If we load the music schema to our database and enable reasoning in our queries (e.g. using the reasoning toggle button in Stardog Studio), then we can write a much simpler query–for ?artist a :Artist – and we would automatically get the instances of its subclasses :Band and :SoloArtist:

To learn more about reasoning, see our Inference Engine tutorial.

Optional Matches

The following query returns the songs and their lengths:

# 14a-songs-length.sparql
# dataset: music
prefix : <http://stardog.com/tutorial/>

SELECT * 
{
    ?song a :Song .
    ?song :length ?length .
}

However, when we look at the results we see that this query returns 3,640 results whereas the query without the second pattern returns 3,749 songs. This means there are 109 songs in our dataset that do not have any length information.

We can use OPTIONAL blocks to match patterns that may exist for some nodes but not for others:

# 14b-songs-optional-length.sparql
# dataset: music
prefix : <http://stardog.com/tutorial/>

SELECT ?song ?length 
{
    ?song a :Song .
    OPTIONAL {
        ?song :length ?length .
    }
}

This query will return 3,749 results where 109 rows will not have a value for the length. If we only want to see those rows where length is missing then we can add a filter to our query:

# 14c-songs-unbound-length.sparql
# dataset: music
prefix : <http://stardog.com/tutorial/>

SELECT ?song ?length 
{
    ?song a :Song .
    OPTIONAL {
        ?song :length ?length .
    }
    FILTER(!bound(?length))
}

Negation

The last example shows a somewhat indirect way to find patterns that does not exist in the dataset by using a combination of OPTIONAL and FILTER expressions. But SPARQL provides a special kind of filter for this purpose: NOT EXISTS. The following query will return the same 109 results as the previous query:

# 14d-songs-no-length.sparql
# dataset: music
prefix : <http://stardog.com/tutorial/>

SELECT ?song 
{
    ?song a :Song .
    FILTER NOT EXISTS {
        ?song :length ?length .
    }
}

Any SPARQL construct can be used inside a NOT EXISTS block.

Property Paths

The triple patterns match triples in the dataset so they can only be used to find nodes that are directly connected. We can use property paths to match nodes that are connected via arbitrary-length paths. More generally, a property path is a regular expression describing the possible route between two nodes in a graph. Property paths can also be used to express some graph patterns more concisely.

To explore what we can do with property paths we will start with this query that uses two ordinary triple patterns to find pairs of people who wrote songs together:

# 15a-cowriters.sparql
# dataset: music
prefix : <http://stardog.com/tutorial/>

select distinct ?artist ?cowriter
{
    ?song :writer ?artist .
    ?song :writer ?cowriter
    FILTER (?artist != ?cowriter)
}

We need DISTINCT in this query because the same pair might have cowritten multiple songs together. We need the FILTER because otherwise the query would match the songs with a single writer and bind the two variables ?artist and ?cowriter to the same person. Using a different variable does not ensure that the triple patterns match different triples in the data. This query returns each pair twice; we leave it as an exercise to the reader to come up with a different filter expression to return every pair only once.

Inverse Path

Adding the symbol ^ in front of a predicate (or a property path expression) makes it an inverse path expression. An inverse path expression simply flips the direction of the match: the subject of the triple pattern will match the object of the triple in the data and the object of the triple pattern will match the subject. So an equivalent way to write the previous query is as follows:

# 15b-cowriters-inverse-path.sparql
# dataset: music
prefix : <http://stardog.com/tutorial/>

select distinct ?artist ?cowriter
{
    ?artist ^:writer ?song .
    ?song :writer ?cowriter
    FILTER (?artist != ?cowriter)
}

By itself an inverse property path expression is not that useful, but in combination with other property path expressions, it can be (as we will see next).

Sequence Path

When the object of one triple pattern is the same as the subject of another triple pattern, and we are not interested in the binding of the variable, we can combine the two patterns using a sequence path. A sequence path means the subject is connected to the object via the path of property expressions specified in the sequence. The next query returns the same results as the previous query:

# 15c-cowriters-property-path.sparql
# dataset: music
prefix : <http://stardog.com/tutorial/>

select distinct ?artist ?cowriter
{
    ?artist ^:writer/:writer ?cowriter
    FILTER (?artist != ?cowriter)
}

There can be more than two expressions in a path if necessary. We can also use constants for the subject or the object or both. The next query returns cowriters of Paul McCartney:

# 15d-cowriters-mccartney.sparql
# dataset: music
prefix : <http://stardog.com/tutorial/>

select distinct ?cowriter
{
    :Paul_McCartney ^:writer/:writer ?cowriter
    FILTER (?cowriter != :Paul_McCartney)
}
order by ?cowriter

Recursive Paths

Suppose we want not only to find cowriters of Paul McCartney, but also find the cowriters of his cowriters, and continue finding cowriters recursively. We can use the recursive path operator + to follow a property path one or more times.

# 15e-cowriters-recursive.sparql
# dataset: music
prefix : <http://stardog.com/tutorial/>

select distinct ?cowriter
{
    :Paul_McCartney (^:writer/:writer)+ ?cowriter
    FILTER (?cowriter != :Paul_McCartney)
}
order by ?cowriter

The other recursive operator * is used to follow a path zero or more times. Following a path zero times means we don’t traverse any edges and simply return the same node as the starting node. This makes most sense when used in a sequence path as in rdf:type/rdfs:subClassOf*. This property path returns the type(s) of a node and all its superclasses.

Optional Paths

In our dataset we have both the solo albums released by Paul McCartney and the albums released by The Beatles. The next query would return both kinds of album:

# 15f-songs-optional-path.sparql
# dataset: music
prefix : <http://stardog.com/tutorial/>

select ?album 
{
  ?album :artist/:member? :Paul_McCartney
}

The ? suffix means we should follow a path zero or one times. The property path expression :artist/:member? would start with an album and first find all the nodes connected via the :artist predicate and return those nodes (since we would end up on those nodes when we follow the :member edge zero times). Then if any of those nodes have a :member edge, it will follow those edges and return the new nodes we reach as well.

Alternative Paths

Suppose we want to find all the songs related to Paul McCartney: songs released in either his solo albums or The Beatles’ albums, along with the songs he wrote that were recorded by other artists. We need to find three alternate paths from songs to Paul McCartney. The previous property path expression already (partially) encodes two of these paths, and the third alternate path can be introduced using the | path operator:

# 15g-songs-alternative-path.sparql
# dataset: music
prefix : <http://stardog.com/tutorial/>

select ?song 
{
  ?song (^:track/:artist/:member?)|:writer :Paul_McCartney
}

Shortest Path Queries

As we have just seen, SPARQL property paths can be used to find pairs of nodes connected via a complex path of edges. But these property paths only return the start and end nodes of a path and not the intermediate nodes within the path.

In the Recursive Paths example we have seen how we can recursively find all the nodes connected to Paul McCartney via cowriter relationships. When we run that query we can see in the results that Kanye West is connected to Paul McCartney, but there isn’t a song they cowrote together, and we can’t tell by looking at the results how they are connected. We can use path queries to get the answers.

Stardog path queries are an extension to SPARQL, and we can use them to find not only (shortest or all) paths between two nodes defined, but also to find all the intermediate nodes on that path. The connection between nodes can be a single property or a more complex SPARQL pattern.

The path query equivalent to the property path expression in 15e-cowriters-recursive.sparql is as follows:

# 16a-cowriters-paths.sparql
# dataset: music
prefix : <http://stardog.com/tutorial/>

paths     start ?artist = :Paul_McCartney
    end ?cowriter
via {
    ?artist ^:writer/:writer ?cowriter
}
order by ?cowriter

The result of the query will be a list of paths:

(:Paul_McCartney)->(:Michael_Jackson)->(:Will.i.am)->(:Flo_Rida)->(:Avicii)

(:Paul_McCartney)->(:Michael_Jackson)->(:Tupac_Shakur)->(:Elton_John)->(:Tim_Rice)->(:Benny_Andersson)

...

The VIA clause in a path query is similar to a WHERE clause in a SELECT query but, instead of returning the matches for this pattern, a path query traverses this pattern in the graph recursively starting with the start node until reaching the end node.

The start and end nodes in a path query can be completely unrestricted to find the shortest paths between any node in the graph but such queries typically are intractable due to the very high number of such paths.

Now, if we want to see the paths between Paul McCartney and Kanye West, we can run the following query:

# 16b-cowriters-paths.sparql
# dataset: music
prefix : <http://stardog.com/tutorial/>

paths     start ?artist = :Paul_McCartney
    end ?cowriter = :Kanye_West

via {
    ?song :writer ?artist .
    ?song :writer ?cowriter
}

order by ?cowriter

Note that in this version of the query, we are using an explicit variable for the song, so we can also see at each step of the path what song the cowriters collaborated on. One example path returned by this query looks as follows:

(:Paul_McCartney)-[song=:Say_Say_Say]->(:Michael_Jackson)-[song=:Is_It_Scary]->(:James_Harris_III)-[song=\(Always_Be_My\)_Sunshine]->(:Jay-Z)-[song=:Run_This_Town]->(:Kanye_West)

Path queries are a powerful addition to SPARQL and you can find more details about how to use them in the Stardog documentation.

SPARQL Query Forms

There are SPARQL query forms other than the SELECT queries. The WHERE clause in these queries can use any of the constructs we described above, but the query result will not be a table.

ASK Queries

Ask queries return a boolean result indicating if the pattern specified in the WHERE clause matched any result or not. The next query asks if there is any band credited as the writer of a song:

# 17-bands-writers-ask.sparql
# dataset: music
prefix : <http://stardog.com/tutorial/>

ASK 
{
    ?band a :Band .
    ?song :writer ?band .
}

The result is simply true without any detail about which song or band matches this pattern:

Result: true.

Since the ASK queries don’t return any bindings they can be more efficient than executing a SELECT query.

DESCRIBE Queries

The DESCRIBE query returns an RDF graph. What triples are returned for a node is not prescribed in the SPARQL specification and is in fact system-dependent. The default DESCRIBE implementation in Stardog returns all the outgoing edges of the node:

# 18a-beatles-describe.sparql
# dataset: music

PREFIX : <http://stardog.com/tutorial/>

DESCRIBE :The_Beatles

The resulting triples has :The_Beatles as the subject:

:The_Beatles a :Band ;
   :member :George_Harrison , :John_Lennon , :Ringo_Starr , :Paul_McCartney ;
   :name "The Beatles" ;
   :description "..." .

The DESCRIBE query is most useful when we don’t know anything about the RDF graph we are querying and want to quickly see the terms used in the triples. The query SELECT * { :The_Beatles ?p ?o } would serve the same purpose as the above query.

Stardog provides a special query hint to change the DESCRIBE strategy being used. The following query returns the incoming edges too:

#pragma describe.strategy bidirectional
DESCRIBE :The_Beatles

It is possible to specify a WHERE clause in a DESCRIBE query to describe the values bound to one or more variables. The following query will describe every band with a name starting with the word “The”:

# 18b-bands-describe.sparql
# dataset: music

PREFIX : <http://stardog.com/tutorial/>

DESCRIBE ?band
{
    ?band a :Band ;
         :name ?name
    FILTER(contains(?name, "The"))
}

CONSTRUCT Queries

A CONSTRUCT query matches patterns in a WHERE clause and then returns an RDF graph as a result. If the WHERE clause only contains triple patterns we can use the short-hand syntax; for example, to retrieve the bands and their members:

# 19a-bands-construct.sparql
# dataset: music

PREFIX : <http://stardog.com/tutorial/>

CONSTRUCT 
{
    ?band a :Band ;
          :member ?member
}

The result will look something like this:

:ABBA a :Band ;
   :member :Frida_Lyngstad , :Benny_Andersson , :Björn_Ulvaeus , :Agnetha_Fältskog .
:Led_Zeppelin a :Band ;
   :member :John_Paul_Jones_\(musician\) , :Robert_Plant , :Jimmy_Page , :John_Bonham .
:Linkin_Park a :Band ;
   :member :Chester_Bennington , :Mike_Shinoda , :Joe_Hahn , :Brad_Delson .
...

We can specify a CONSTRUCT template that will return a different set of triples than the triples we matched in the WHERE clause:

# 19b-bands-members-construct.sparql
# dataset: music

PREFIX : <http://stardog.com/tutorial/>

CONSTRUCT {
    ?member a :BandMember
}
{
    ?band a :Band ;
          :member ?member
}

The triples returned for this query do not exist in our graph:

:Frida_Lyngstad a :BandMember .
:Benny_Andersson a :BandMember .
:Björn_Ulvaeus a :BandMember .
:Agnetha_Fältskog a :BandMember .
...

CONSTRUCT queries are read-only just like the other query forms we have seen so far and do not modify the input RDF graph. If we want to modify the RDF graph we need to use SPARQL update queries that we discuss next.

Update Queries

If we want to insert triples into the database based on the results of a query, we can use an INSERT query. The following query is similar to the above CONSTRUCT query but simply inserts the resulting triples into the database:

# 20-bands-members-insert.sparql
# dataset: music
prefix : <http://stardog.com/tutorial/>

INSERT {
    ?member a :BandMember
}
{
    ?band a :Band ;
          :member ?member
}

We can also use the WHERE clause results to delete triples from the database.

In our graph, we used an integer range for the :length property to specify the length of a song in seconds. This is simple and good enough in most cases, but using the XML Scheme datatype xsd:dayTimeDuration is better, as it avoids the ambiguity of the time unit associated with the length. So if we want to replace all :length values in our data to use duration values, we can execute the following update query that will delete the existing triples with integer values and insert new triples with duration values:

# 21-songs-length-delete.sparql
# dataset: music
prefix : <http://stardog.com/tutorial/>

DELETE {
    ?song :length ?seconds
}
INSERT {
    ?song :length ?duration
}
{
    ?song a :Song ;
         :length ?seconds
    BIND(?seconds * "PT1S"^^xsd:dayTimeDuration AS ?duration)
}

What’s next?

That’s it for Getting Started: Part 4. To continue on with the Getting Started series, head on over to Getting Started: Part 5.