Reasoning & Inference

This chapter discusses what Stardog’s reasoning capabilities are and how to use them. This page provides an overview of the reasoning capabilities.

Page Contents

Overview
What is reasoning?
How does reasoning work?
1. Why Query Rewriting
Stardog Reasoners
1. Blackout Reasoner
2. Stride Reasoner (Alpha)
Query Answering with Reasoning
Reasoning Schemas
Chapter Contents

Overview

In this chapter we describe how to use Stardog’s reasoning capabilities; we address some common problems and known issues. We also describe Stardog’s approach to query answering with reasoning in some detail, as well as a set of guidelines that contribute to efficient query answering with reasoning. Throughout this chapter the terms reasoning and inference are used interchangeably to mean the same capability; that is, the ability to infer implicit knowledge from explicit data. Similarly reasoner and inference engine are used interchangeably to refer to the Stardog component that implements this capability.

Stardog performs reasoning in a lazy and late-binding fashion: it does not materialize inferences; but, rather, reasoning is performed at query time. This means inferences are visible in query results but they are not explicitly stored within the Stardog database which is how Stardog can do reasoning over virtual graphs. We start with a high-level introduction to reasoning with some examples and explain the details of query time reasoning and how to use reasoning for queries.

What is reasoning?

At the very basic level, reasoning is the process of inferring new types and relationships from existing data given a schema. A schema, sometimes called a “data model”, “ontology”, or “TBox”, is set of RDFS or OWL axioms plus user-defined rules. Schema contains the information for the reasoner to compute inferences. As a result, queries with reasoning will return additional results compared to queries that do no use reasoning.

Let’s start with the following simple example where we define a class Person and its two subclasses Employee and Customer in our schema. We also have our instance data where we see one instance for each subclass.

:Person a owl:Class .
:Customer a owl:Class ;
   rdfs:subClassOf :Person .
:Employee a owl:Class ;
   rdfs:subClassOf :Person .
   
:Alice a :Employee .
:Bob a :Customer .

The following query retrieves all Person instances:

SELECT ?person {
   ?person a :Person
}

This query would return no results by default since there are no explicit type triples in the data. But enabling reasoning for this query will return the results:

person
:Alice
:Bob

The following example shows a user-defined rule to infer :coworker relationship between two people if they work for the same organization:

IF {
   ?person1 :worksFor ?organization .
   ?person2 :worksFor ?organization .
   FILTER (?person1 != ?person2)
}
THEN {
   ?person1 :coworker ?person2 .
}
   
:Alice :worksFor :ACME .
:Charlie :worksFor :ACME .

If we enable reasoning we can execute any of the following queries

SELECT ?person { ?person :coworker :Alice }
SELECT ?person { :Alice :coworker ?person }
SELECT ?person ?coworker { ?person :coworker ?coworker }

and get Alice and Charlie as coworkers in the results.

How does reasoning work?

Stardog computes inferences as needed, on-the-fly using a query rewriting approach: Stardog rewrites the user’s query with respect to a schema, and then executes the resulting expanded query against the data in the normal way. This process is completely automated and requires no intervention from the user.

If we consider the example schema above, the input query

SELECT ?person {
   ?person a :Person
}

would be rewritten to a query equivalent to

SELECT DISTINCT ?person {
    { ?person a :Person } UNION
    { ?person a :Customer } UNION
    { ?person a :Employee }
}

There are various optimizations involved in the query rewriting process that would simplify the rewritten query. For example, it is very common that some classes will have no explicit instances in the data (think about abstract super classes). In our very simple dataset above there are no instance with explicit Person type. The reasoner will detect these cases and in this example not include the pattern ?person a :Person in the expanded query.

The expanded queries created by Stardog are not directly expressed in SPARQL syntax but you can see the effect of reasoning in the query plans. The query plan for the aboce example would look like this:

Distinct [#2]
`─ Projection(?person) [#2]
   `─ Union [#2]
      +─ Scan[POSC](?person, rdf:type, :Customer) [#1]
      `─ Scan[POSC](?person, rdf:type, :Employee) [#1]

Reasoning in Stardog is primarily founded in the Datalog formalism. RDFS and OWL axioms along with user-defined rules are (basically) Datalog rules over a graph. Stardog will translate the schemas into an internal Datalog representation and perform the query rewriting process.

Why Query Rewriting

Query rewriting has several advantages over materialization. The query rewiting approach allows for maximum flexibility while maintaining excellent performance; you only pay for the reasoning that you use; no more and no less. In materialization, on the other hand, the data gets expanded with respect to the schema, not with respect to any actual query. And it’s the data–all of the data–that gets expanded, whether any actual query subsequently requires reasoning or not. The schema is used to generate new triples, typically when data is added or removed from the system. However, materialization introduces several thorny issues:

data freshness. Materialization has to be performed every time the data or the schema change. This is particularly unsuitable for applications where the data changes frequently or data is stored externally and accessed via a virtualization layer.
data size. Depending on the schema, materialization can significantly increase the size of the data, sometimes dramatically so. The cost of this data size blowup may be applied to every query in terms of increased I/O.
fixed schema. Materialization is computed based on a fixed schema. If there are different applications that require difference kind of inference rules there will not be a flexibility to swtich between different schemas.
resources. Depending on the size of the original data and the complexity of the schema, materialization may be computationally expensive. And truth maintenance, which materialization requires, is always computationally expensive especially after deletions.

Stardog Reasoners

As of version 9.0 Stardog comes with two different reasoner implementations both providing query time reasoning capability with some differences:

Blackout is the more mature reasoner implementation that supports more of the RDFS and OWL specifications but has limitations with respect to user-defined rules.
Stride (alpha) is the next generation reasoner implementation that supports more expressive user-defined rules including negation and aggregation but a smaller subset of RDFS and OWL.

Users can switch between the two reasoner implementations by setting the database configuration option reasoning.stride to true or false. The default value for this options is false which means the Blackout reasoner will be used. No other changes are required after setting this option and the corresponding reasoner will be used automatically behind the scenes.

Blackout Reasoner

The Blackout reasoner supports the expressivity of OWL 2 profiles which means schemas can contain complex OWL axioms. Futhermore, OWL axioms can be filtered automatically by setting the reasoning.type database option. The default value of reasoning.type is SL and for the most part users don’t need to worry too much about which reasoning type is necessary since SL covers all of the OWL 2 profiles as well as user-defined rules. This value may be set to a different value:

RDFS for RDF Schema, mainly subclass, subproperty, domain, and range axioms
QL for the OWL2 QL axioms
RL for the OWL2 RL axioms
EL for the OWL2 EL axioms
SL for a combination of RDFS, QL, RL, and EL axioms, plus SWRL rules.

Any axiom outside the selected type will be ignored by the reasoner.

The following table lists patterns (and the corresponding restrictions) which can be used in the body of a user-defined rule supported by Blackout:

Rule Features	Limitations
Triple patterns	No variables in predicate position or object position if the predicate is `rdf:type`. No property path operators `*`, `+` or `?`.
FILTER	`EXISTS`, `NOT EXISTS` or non-deterministic functions, e.g. `RAND`, cannot be used in filters
BIND	`EXISTS`, `NOT EXISTS` or non-deterministic functions, e.g. `RAND`, cannot be used in bind expressions
UNION	No limitations

In addition to the above rules Blackout only supports limited forms of recursive rules. Only recursive rules that can be translated to SPARQL property paths are supported.

Stride Reasoner (Alpha)

Stride reasoner has been introduced in Stardog 9.0 and is currently in alpha state. It is designed to support more expressive rules and exhibit more robust performance but is currently not recommended for production usage.

Stride only supports the following RDFS and OWL constructs and ignores any other axiom regardless of the reasoning.type option value:

Terms	Description
`rdfs:subClassOf`, `owl:equivalentClass`	Class hierarchies and inheritance between named classes
`rdfs:subPropertyOf`, `owl:equivalentProperty`	Property hierarchies and inheritance between properties
`owl:inverseOf`, `owl:SymmetricProperty`	Inverse properties
`owl:TransitiveProperty`	Transitive properties

The following table lists patterns (and the corresponding restrictions) which can be used in the body of a user-defined rule supported by Stride:

Rule Features	Limitations
Triple patterns	No variables in predicate position or object position if the predicate is `rdf:type`. No property path operators `*`, `+` or `?`.
FILTER	Non-deterministic functions, e.g. `RAND`, cannot be used in filters
BIND	Non-deterministic functions, e.g. `RAND`, cannot be used in bind expressions
UNION	No limitations
VALUES	No `UNDEF` values
GROUP BY	No cyclic dependencies between rules involving GROUP BY

Stride behaves differently compared to Blackout if there is an invalid rule in the schema. Blackout logs such problematic rules or axioms and performs reasoning with the valid rules and axioms. This might cause subtle issues as errors in Stardog log can easily go unnoticed. Stride, on the other hand, will refuse to do any reasoning if there is an invalid rule or axiom requiring the user to fix the issue first. Note that, if multiple schemas are being used, errors in one schema will not affect reasoning with other schemas. Rules causing problems can be moved to named graphs outside the schema graphs so they can be fixed without preventing reasoning with other rules.

Stride reasoner in its alpha status does not support reasoning for triple patterns that have variables in the predicate position or that have rdf:type in the predicate position and a variable in the object position. Such triple patterns will be answered without reasoning as if #pragma reasoning off hint has been used for that triple pattern. This limitation will be lifted in the future release.

Query Answering with Reasoning

As explained above Stardog uses a query time reasoning approach so you do not need to do anything up front when you create your database or add data into your database if you want to use reasoning. You need ot enable reasoning for your queries. All of Stardog’s interfaces (API, network, and CLI) support reasoning during query evaluation. All types of queries (that is, SELECT, ASK, CONSTRUCT, PATHS, DESCRIBE, VALIDATE) can be evaluated with reasoning. When reasoning is enabled, it applies to all query patterns in WHERE and VIA blocks.

When reasoning is enabled, the query execution will take into account the axioms and rules in the schema. There is one default schema associated with a database but in addition there can be multiple named schemas as explained in the next section. Reasoning queries will use the default schema by default but a different reasoning schema can be selected for queries.

When reasoning is enabled for a query, it is possible to selectively disable reasoning for certain parts of the query using the #pragma reasoning hint. See Reasoning Query Hints

CLI

In order to evaluate queries in Stardog using reasoning via the command line, we use the --reasoning flag in the query execute command:

$ stardog query execute --reasoning myDB "SELECT ?s { ?s a :Employee }"

This will use the default reasoning schema for the database. A named schema can be specified using th --schema option:

$ stardog query execute --schema schema-1.0 myDB "SELECT ?s { ?s a :Employee }"

HTTP

For HTTP, the reasoning flag is specified either with the other HTTP request parameters:

$ curl -u admin:admin -X GET "http://localhost:5820/myDB/query?reasoning=true&query=..."

or, as a segment in the URL:

$ curl -u admin:admin -X GET "http://localhost:5820/myDB/query/reasoning?query=..."

See the HTTP API for a detailed look at the API to perform a SPARQL query with reasoning enabled.

Programmatically

See the chapter on Programming for the details of how to use reasoning from the various programming languages Stardog supports.

Reasoning Schemas

A reasoning schema is simply one or more named graphs that contain RDFS/OWL axioms and user-defined rules. The schema elements stored in the corresponding named graphs are automatically identified and extracted by Stardog. There is a default schema associated with each database which is configured by the reasoning.schema.graphs database configuration option. The default value for this option is the special named graph tag:stardog:api:context:local which is the wildcard for all local graphs including the default graph.

It is best practice to store your reasoning schema in specific named graphs and specify the named graphs explicitly in database configuration. This makes management of schemas easier and allows Stardog to extract schema elements more efficiently.

No additional operations are needed when schemas named graphs are updated. Stardog will automatically detect when schemas are updated and use the new versions of schemas going forward. Since schemas are represented as RDF triples loading and unloading schemas into Stardog is done by following the regular instructions for adding data.

There are certain use cases where one might need to use different schemas to answer different queries. Some example use cases are as follows:

There are two different versions of a schema that evolved over time and older legacy applications need to use the previous version of the schema whereas the newer applications need to use the newer version.
Different applications require different rules and business logic, e.g. threshold for a concept like Low or High might change based on the context.
There could be a very large number of axioms and rules in the domain that can be partitioned into smaller schema subsets for performance reasons.

Starting with version 7.0, Stardog supports schema multi-tenancy: reasoning with multiple schemas and specifying a schema to be used for answering a query. Each schema has a name and a set of named graphs associated with it. When the schema is selected for answering a query the axioms and the rules stored in the associated graphs will be taken into account. A named schema can be selected for a query using the --schema parameter in the query execute command:

$ stardog query execute --schema employeeSchema myDB "SELECT ?s { ?s a :Employee }"

When the --schema parameter is used, the --reasoning parameter does not need to be specified and will have no effect. But using --reasoning flag without a --schema parameter is equivalent to specifying --schema default.

The named schemas are defined via the reasoning.schemas configuration option that is a set of schema name and graph IRI pairs. There is convenience functionality provided in the CLI and Java API to manage schemas. The named graphs for a new or an existing schema can be set as follows using stored namespaces or full IRIs:

$ stardog reasoning schema --add employeeSchema --graphs :employeeGraph :personGraph -- myDB

The schemas can be removed using the reasoning schema command with the --remove flag. The --list option will list all the defined schemas and their named graphs:

$ stardog reasoning schema --list myDB
+----------------+---------------------------------+
|     Schema     |             Graphs              |
+----------------+---------------------------------+
| default        | <tag:stardog:api:context:local> |
| employeeSchema | :personGraph, :employeeGraph    |
| customerSchema | :personGraph, :customerGraph    |
| personSchema   | :personGraph                    |
+----------------+---------------------------------+

Stardog does not follow ontology owl:imports statements automatically. Any schema information that is relevant for reasoning should be loaded into Stardog explicitly.

Chapter Contents

Overview
What is reasoning?
How does reasoning work?
- Why Query Rewriting
Stardog Reasoners
- Blackout Reasoner
- Stride Reasoner (Alpha)
Query Answering with Reasoning
Reasoning Schemas
Chapter Contents