Reasoning & Inference
This chapter discusses what Stardog’s reasoning capabilities are and how to use them. This page provides an overview of the reasoning capabilities.
Page Contents
Overview
In this chapter, we describe how to use Stardog’s reasoning capabilities; we address some common problems and known issues. We also describe Stardog’s approach to query answering with reasoning in some detail, as well as a set of guidelines that contribute to efficient query answering with reasoning. Throughout this chapter, the terms “reasoning” and “inference” are used interchangeably to mean the same capability; that is, the ability to infer implicit knowledge from explicit data. Similarly, “reasoner” and “inference engine” are used interchangeably to refer to the Stardog component that implements this capability.
Stardog performs reasoning in a lazy and late-binding fashion: it does not materialize inferences; rather, reasoning is performed at query time. This means inferences are visible in query results, but they are not explicitly stored within the Stardog database. This is how Stardog can do reasoning over virtual graphs. We start with a high-level introduction to reasoning with some examples and explain the details of query-time reasoning and how to use reasoning for queries.
What is reasoning?
At the very basic level, reasoning is the process of inferring new types and relationships from existing data, given a schema. A schema, sometimes called a “data model”, “ontology”, or “TBox”, is set of RDFS or OWL axioms plus user-defined rules. Schemas contain the information for the reasoner to compute inferences. As a result, queries with reasoning will return additional results compared to queries that do not use reasoning.
Let’s start with a simple example where we define a class Person
and its two subclasses Employee
and Customer
in our schema. We also have our instance data where we see one instance for each subclass. To follow along with this example, insert the schema and data like so:
INSERT DATA {
GRAPH <tag:stardog:api:context:schema> {
:Person a owl:Class .
:Customer a owl:Class ;
rdfs:subClassOf :Person .
:Employee a owl:Class ;
rdfs:subClassOf :Person .
}
}
INSERT DATA {
:Alice a :Employee .
:Bob a :Customer .
}
The following query retrieves all Person
instances:
SELECT ?person {
?person a :Person
}
This query would return no results by default since there are no explicit :Person
-type triples in the data. But enabling reasoning for this query will return the results:
person |
---|
:Alice |
:Bob |
The following example shows a user-defined rule to infer :coworker
relationship between two people if they work for the same organization:
IF {
?person1 :worksFor ?organization .
?person2 :worksFor ?organization .
FILTER (?person1 != ?person2)
}
THEN {
?person1 :coworker ?person2 .
}
:Alice :worksFor :ACME .
:Charlie :worksFor :ACME .
If we enable reasoning, we can execute any of the following queries:
SELECT ?person { ?person :coworker :Alice }
SELECT ?person { :Alice :coworker ?person }
SELECT ?person ?coworker { ?person :coworker ?coworker }
and get Alice
and Charlie
as coworkers in the results.
How does reasoning work?
Stardog computes inferences as needed, on-the-fly, using a query rewriting approach: Stardog rewrites the user’s query with respect to a schema, and then executes the resulting expanded query against the data in the normal way. This process is completely automated and requires no intervention from the user.
If we consider the example schema above, the input query:
SELECT ?person {
?person a :Person
}
would be rewritten to a query equivalent to:
SELECT DISTINCT ?person {
{ ?person a :Person } UNION
{ ?person a :Customer } UNION
{ ?person a :Employee }
}
There are various optimizations involved in the query rewriting process that would simplify the rewritten query. For example, it is very common that some classes will have no explicit instances in the data (think about abstract super classes). In our very simple dataset above, there are no instances with an explicit Person
type. The reasoner will detect these cases, and in this example, not include the pattern ?person a :Person
in the expanded query.
The expanded queries created by Stardog are not directly expressed in SPARQL syntax, but you can see the effect of reasoning in the query plans. The query plan for the above example would look like this:
Distinct [#2]
`─ Projection(?person) [#2]
`─ Union [#2]
+─ Scan[POSC](?person, rdf:type, :Customer) [#1]
`─ Scan[POSC](?person, rdf:type, :Employee) [#1]
Reasoning in Stardog is primarily founded in the Datalog formalism. RDFS and OWL axioms along with user-defined rules are (basically) Datalog rules over a graph. Stardog will translate the schemas into an internal Datalog representation and perform the query rewriting process.
Why Query Rewriting
Query rewriting has several advantages over materialization. The query rewriting approach allows for maximum flexibility while maintaining excellent performance; you only pay for the reasoning you use - no more and no less. In materialization, on the other hand, the data gets expanded with respect to the schema, not with respect to any actual query. And it’s the data – all of the data – that gets expanded, whether any subsequent query actually requires reasoning or not. The schema is used to generate new triples, typically when data is added or removed from the system. However, materialization introduces several thorny issues:
- data freshness. Materialization has to be performed every time the data or the schema change. This is particularly unsuitable for applications where the data changes frequently or data is stored externally and accessed via a virtualization layer.
- data size. Depending on the schema, materialization can significantly increase the size of the data. The cost of this data size blowup may be applied to every query (in terms of increased I/O).
- fixed schema. Materialization is computed based on a fixed schema. If there are different applications that require different kinds of inference rules, there will not be the flexibility to switch between different schemas.
- resources. Depending on the size of the original data and the complexity of the schema, materialization may be computationally expensive. And truth maintenance, which materialization requires, is always computationally expensive, especially after deletions.
Stardog Reasoners
As of version 9.0, Stardog comes with two different reasoner implementations, both providing query-time reasoning capability with some differences:
- Blackout is the more mature reasoner implementation that supports more of the RDFS and OWL specifications but has limitations with respect to user-defined rules.
- Stride (alpha) is the next-generation reasoner implementation that supports more expressive user-defined rules, including negation and aggregation but a smaller subset of RDFS and OWL.
Users can switch between the two reasoner implementations by setting the database configuration option reasoning.stride
to true
or false
. The default value for this option is false
, which means the Blackout
reasoner will be used. No other changes are required after setting this option, and the corresponding reasoner will be used automatically behind the scenes.
Blackout Reasoner
The Blackout reasoner supports the expressivity of OWL 2 profiles, which means schemas can contain complex OWL axioms. Furthermore, OWL axioms can be filtered automatically by setting the reasoning.type
database option. The default value of reasoning.type
is SL
, and for the most part, users don’t need to worry too much about which reasoning type is necessary since SL
covers all of the OWL 2 profiles, as well as user-defined rules. This value may be set to a different value:
RDFS
for RDF Schema, mainly subclass, subproperty, domain, and range axiomsQL
for the OWL 2 QL axiomsRL
for the OWL 2 RL axiomsEL
for the OWL 2 EL axiomsSL
for a combination ofRDFS
,QL
,RL
, andEL
axioms, plus SWRL rules.
Any axiom outside the selected type will be ignored by the reasoner.
The following table lists patterns (and the corresponding restrictions) which can be used in the body of a user-defined rule supported by Blackout:
Rule Features | Limitations |
---|---|
Triple patterns | No variables in predicate position or object position if the predicate is rdf:type . No property path operators * , + or ? . |
FILTER | EXISTS , NOT EXISTS or non-deterministic functions, e.g. RAND , cannot be used in filters |
BIND | EXISTS , NOT EXISTS or non-deterministic functions, e.g. RAND , cannot be used in bind expressions |
UNION | No limitations |
In addition to the above rules, Blackout only supports limited forms of recursive rules. Only recursive rules that can be translated to SPARQL property paths are supported.
Stride Reasoner (Alpha)
The Stride reasoner has been introduced in Stardog 9.0 and is currently in alpha state. It is designed to support more expressive rules and exhibit more robust performance, but it is currently not recommended for production usage.
Stride only supports the following RDFS and OWL constructs and ignores any other axiom, regardless of the reasoning.type
option value:
Terms | Description |
---|---|
rdfs:subClassOf , owl:equivalentClass | Class hierarchies and inheritance between named classes |
rdfs:subPropertyOf , owl:equivalentProperty | Property hierarchies and inheritance between properties |
owl:inverseOf , owl:SymmetricProperty | Inverse properties |
owl:TransitiveProperty | Transitive properties |
The following table lists patterns (and the corresponding restrictions) which can be used in the body of a user-defined rule supported by Stride:
Rule Features | Limitations |
---|---|
Triple patterns | No variables in predicate position or object position if the predicate is rdf:type . No property path operators * , + or ? . |
FILTER | Non-deterministic functions, e.g. RAND , cannot be used in filters |
BIND | Non-deterministic functions, e.g. RAND , cannot be used in bind expressions |
UNION | No limitations |
VALUES | No UNDEF values |
GROUP BY | No cyclic dependencies between rules involving GROUP BY |
Stride behaves differently than Blackout if there is an invalid rule in the schema. Blackout logs such problematic rules or axioms and performs reasoning with the valid rules and axioms. This might cause subtle issues, as errors in the Stardog log can easily go unnoticed. Stride, on the other hand, will refuse to do any reasoning if there is an invalid rule or axiom, requiring the user to fix the issue first. Note that if multiple schemas are being used, errors in one schema will not affect reasoning with other schemas. Rules causing problems can be moved to named graphs outside the schema graphs so they can be fixed without preventing reasoning with other rules.
Stride reasoner in its alpha status does not support reasoning for triple patterns that have variables in the predicate position or that have rdf:type
in the predicate position and a variable in the object position. Such triple patterns will be answered without reasoning, as if the #pragma reasoning off
hint has been used for that triple pattern. This limitation will be lifted in a future release.
Query Answering with Reasoning
As explained above, Stardog uses a query-time reasoning approach. This means you do not need to do anything up front when you create your database or add data to it if you want to use reasoning. You merely need to enable reasoning for your queries. All of Stardog’s interfaces (API, network, and CLI) support reasoning during query evaluation. All types of queries (that is, SELECT
, ASK
, CONSTRUCT
, PATHS
, DESCRIBE
, VALIDATE
) can be evaluated with reasoning. When reasoning is enabled, it applies to all query patterns in WHERE
and VIA
blocks.
When reasoning is enabled, the query execution will take into account the axioms and rules in the schema. There is one default schema associated with a database, but there can also be multiple named schemas (as explained in the next section). Reasoning queries will use the default schema by default, but a different reasoning schema can be selected for queries.
When reasoning is enabled for a query, it is possible to selectively disable reasoning for certain parts of the query using the #pragma reasoning
hint. See Reasoning Query Hints.
CLI
In order to evaluate queries using reasoning via the command line, use the --reasoning
flag in the query execute
command:
$ stardog query execute --reasoning myDB "SELECT ?s { ?s a :Employee }"
This will use the default reasoning schema for the database. A named schema can be specified using the --schema
option:
$ stardog query execute --schema schema-1.0 myDB "SELECT ?s { ?s a :Employee }"
HTTP
For HTTP, the reasoning flag is specified either with the other HTTP request parameters:
$ curl -u admin:admin -X GET "http://localhost:5820/myDB/query?reasoning=true&query=..."
or, as a segment in the URL:
$ curl -u admin:admin -X GET "http://localhost:5820/myDB/query/reasoning?query=..."
See the HTTP API for a detailed look at how to perform a SPARQL query with reasoning enabled.
Programmatically
See the chapter on Programming for the details of how to use reasoning in the various programming languages Stardog supports.
Reasoning Schemas
A reasoning schema is simply one or more named graphs that contain RDFS/OWL axioms and user-defined rules. The schema elements stored in the corresponding named graphs are automatically identified and extracted by Stardog. There is a default schema associated with each database, which is configured by the reasoning.schema.graphs
database configuration option. The default value for this option is the special named graph tag:stardog:api:context:schema
, which is initally an empty named graph so reasoner will see an empty schema. You can load your schema into this named graph or change this option to point to a named graph that you create.
It is best practice to store your reasoning schema in specific named graphs and specify the named graphs explicitly in database configuration. This makes management of schemas easier and allows Stardog to extract schema elements more efficiently.
Prior to Stardog version 10, the default reasoning schema was set to be tag:stardog:api:context:local
which is a built-in wildcard for all local graphs, including the default graph. Using wildcards for reasoning schemas is deprecated in Stardog 10. Wildcards will continue working for Stardog 10 but support for reasoning schema wildcards is scheduled to be removed in version 11.
No additional operations are needed when schema named graphs are updated. Stardog will automatically detect when schemas are updated and use the new versions of schemas going forward. Since schemas are represented as RDF triples, loading and unloading schemas into Stardog is done by following the regular instructions for adding data.
There are certain use cases where one might need to use different schemas to answer different queries. Some examples:
- There are two different versions of a schema that evolved over time. Older legacy applications need to use the previous version of the schema, whereas the newer applications need to use the newer version.
- Different applications require different rules and business logic. e.g., the threshold for a concept like
Low
orHigh
might change based on the context. - There could be a very large number of axioms and rules in the domain that can be partitioned into smaller schema subsets for performance reasons.
Starting with version 7.0, Stardog supports schema multi-tenancy: reasoning with multiple schemas and specifying a schema to be used for answering a query. Each schema has a name and a set of named graphs associated with it. When the schema is selected for answering a query, the axioms and the rules stored in the associated graphs will be taken into account. A named schema can be selected for a query using the --schema
parameter in the query execute
command:
$ stardog query execute --schema employeeSchema myDB "SELECT ?s { ?s a :Employee }"
When the --schema
parameter is used, the --reasoning
parameter does not need to be specified and will have no effect. Using the --reasoning
flag without a --schema
parameter is equivalent to specifying --schema default
.
The named schemas are defined via the reasoning.schemas
configuration option that is a set of schema names and graph IRI pairs. There is convenience functionality provided in the CLI and Java API to manage schemas. The named graphs for a new or an existing schema can be set as follows (using stored namespaces or full IRIs):
$ stardog reasoning schema --add employeeSchema --graphs :employeeGraph :personGraph -- myDB
The schemas can be removed using the reasoning schema
command with the --remove
flag. The --list
option will list all the defined schemas and their named graphs:
$ stardog reasoning schema --list myDB
+----------------+----------------------------------+
| Schema | Graphs |
+----------------+----------------------------------+
| default | <tag:stardog:api:context:schema> |
| employeeSchema | :personGraph, :employeeGraph |
| customerSchema | :personGraph, :customerGraph |
| personSchema | :personGraph |
+----------------+----------------------------------+
Stardog does not follow ontology owl:imports
statements automatically. Any schema information that is relevant for reasoning should be loaded into Stardog explicitly.
Schema Versioning
Stardog 10 introduces a new capability to track versions of schema graphs automatically. Tracking changes to schema graphs is an optimization used by the reasoner to avoid reloading and reprocessing the schema unless the schema graphs have been updated. The schema versioning is exposed to the end users so external applications can check if a schema has been updated or not easily without inspecting the contents of schema graphs.
The database option reasoning.schema.versioning.enabled
needs to be set to true
for schema versioning to be active. When this option is enabled Stardog will automatically compute a 64-bit hash from the contents of the reasoning schema graphs. This will be updated every time any of the reasoning schemas are updated. The hash value for a schema can be checked to determine if any of the associated named graphs have been modified.
The configuration option reasoning.precompute.non_empty.predicates
should be set to false
before reasoning.schema.versioning.enabled
can be set to true.
Care should be taken to enable schema versioning with very large schemas, e.g. if there are millions of triples in schema graphs, especially if the schema contents are being updated frequently. Computing the version hash requires iteration over the schema graphs so performing this operation frequently on large graphs could have a noticeable performance overhead. With smaller schemas there should not be any noticeable performance overhead.
Schema version hashes can be retrieved using the following SPARQL query:
SELECT ?schema ?version {
SERVICE stardog:schema:service {
[] stardog:schema:schema ?schema ;
stardog:schema:version ?version
}
}
This query will return all the schemas associated with the database and their version hashes. Different filters can be used within the query to retrieve the version hash for specific schemas. The name “default” can be used for the default schema.
The following example shows how updating any named graph associated with a schema will cause the version to be updated as a result. But updating a non-schema graph will not change the schema version.
$ stardog reasoning schema --list myDB
+---------+--------------------------------+
| Schema | Graphs |
+---------+--------------------------------+
| default | :customerGraph, :employeeGraph |
+---------+--------------------------------+
$ stardog query myDB 'SELECT ?schema ?version {
SERVICE stardog:schema:service {
[] stardog:schema:schema ?schema ;
stardog:schema:version ?version
}
}'
+-----------+--------------------+
| schema | version |
+-----------+--------------------+
| "default" | "defce74405837cbb" |
+-----------+--------------------+
Query returned 1 results in 00:00:00.210
$ stardog query myDB 'INSERT DATA { GRAPH :customerGraph { :Customer a owl:Class } }'
Transaction committed successfully in 00:00:00.156
$ sq myDB 'SELECT ?schema ?version {
SERVICE stardog:schema:service {
[] stardog:schema:schema ?schema ;
stardog:schema:version ?version
}
}'
+-----------+------------------+
| schema | version |
+-----------+------------------+
| "default" | "aa14af549c5661" |
+-----------+------------------+
Query returned 1 results in 00:00:00.226
$ stardog query myDB 'INSERT DATA { GRAPH :employeeGraph { :Employee a owl:Class } }'
Transaction committed successfully in 00:00:00.137
$ stardog query myDB 'SELECT ?schema ?version {
SERVICE stardog:schema:service {
[] stardog:schema:schema ?schema ;
stardog:schema:version ?version
}
}'
+-----------+--------------------+
| schema | version |
+-----------+--------------------+
| "default" | "b0d1a9c49cc9e825" |
+-----------+--------------------+
Query returned 1 results in 00:00:00.214
$ stardog query myDB 'INSERT DATA { GRAPH :dataGraph { :JohnDoe a :Customer } }'
Transaction committed successfully in 00:00:00.148
$ stardog query myDB 'SELECT ?schema ?version {
SERVICE stardog:schema:service {
[] stardog:schema:schema ?schema ;
stardog:schema:version ?version
}
}'
+-----------+--------------------+
| schema | version |
+-----------+--------------------+
| "default" | "b0d1a9c49cc9e825" |
+-----------+--------------------+
Query returned 1 results in 00:00:00.204
Updating the reasoning schema configuration to add or remove a non-empty named graph will also cause the schema version to change since the contents of a schema is the union of all its named graphs.