Advanced Reasoning Features
This page discusses advanced reasoning features in Stardog.
Page Contents
There are several advanced reasoning features in Stardog that provide additional expressivity and inference capabilities, as explained in the following sections.
The advanced reasoning features explained in this chapter are not supported by the new Stride reasoner. You need to disable the Stride reasoner by setting the reasoning.stride
option to false
in database configuration.
Same As Reasoning
Stardog has full support for OWL 2 sameAs
reasoning. However, sameAs
reasoning works in a different way than the rest of the reasoning mechanism. The sameAs
inferences are computed and indexed eagerly so that these materialized inferences can be used directly at query rewriting time. The sameAs
index is updated automatically as the database is modified, so the difference is not of much direct concern to users.
In order to use sameAs
reasoning, the database configuration option reasoning.sameas
should be set either at database creation time or at a later time when the database is offline. This can be done using the command line as follows:
$ stardog-admin db create -o reasoning.sameas=FULL -n myDB
There are three legal values for this option:
OFF
disables allsameAs
inferences; that is, only assertedsameAs
triples will be included in query results.ON
computessameAs
inferences using only assertedsameAs
triples, considering the reflexivity, symmetry, and transitivity of thesameAs
relation.FULL
is the same asON
but also considers OWL functional properties, inverse functional properties, andhasKey
axioms while computingsameAs
inferences.
The way sameAs
reasoning works differs from the OWL semantics slightly, in the sense that Stardog designates one canonical individual for each sameAs
equivalence set and only returns the canonical individual. This avoids the combinatorial explosion in query results while providing the data integration benefits.
Let’s see an example showing how sameAs
reasoning works. Consider the following database where sameAs
reasoning is set to ON
:
dbpedia:Elvis_Presley
dbpedia-owl:birthPlace dbpedia:Mississippi ;
owl:sameAs freebase:en.elvis_presley .
nyt:presley_elvis_per
nyt:associated_article_count 35 ;
rdfs:label "Elvis Presley" ;
owl:sameAs dbpedia:Elvis_Presley .
freebase:en.elvis_presley
freebase:common.topic.official_website <http://www.elvis.com/> .
Now consider the following query and its results:
$ stardog query execute --reasoning elvis 'SELECT * { ?s dbpedia-owl:birthPlace ?o; rdfs:label "Elvis Presley" }'
+-----------------------+---------------------+
| s | o |
+-----------------------+---------------------+
| nyt:presley_elvis_per | dbpedia:Mississippi |
+-----------------------+---------------------+
Let’s unpack this carefully. There are three things to note.
First, the query returns only one result even though there are three different URIs that denote Elvis Presley. Second, the URI returned is fixed but chosen randomly. Stardog picks one of the URIs as the canonical URI and always returns that and only that canonical URI in the results. If more sameAs
triples are added, the chosen canonical individual may change. Third, it is important to point out that even though only one URI is returned, the effect of sameAs
reasoning is visible in the results since the rdfs:label
and dbpedia-owl:birthPlace
properties were asserted about different instances (i.e., different URIs).
Now, you might be inclined to write queries such as this to get all the properties for a specific URI:
SELECT * {
nyt:presley_elvis_per owl:sameAs ?elvis .
?elvis ?p ?o
}
However, this is completely unnecessary; rather, you can write the following query:
SELECT * {
nyt:presley_elvis_per ?p ?o
}
and get the same results since sameAs
reasoning would automatically merge the results for you:
+----------------------------------------+-----------------------+
| p | o |
+----------------------------------------+-----------------------+
| rdfs:label | "Elvis Presley" |
| dbpedia-owl:birthPlace | dbpedia:Mississippi |
| nyt:associated_article_count | 35 |
| freebase:common.topic.official_website | http://www.elvis.com/ |
| rdf:type | owl:Thing |
+----------------------------------------+-----------------------+
The URI used in the query does not need to be the same one returned in the results. Thus, the following query would return the exact same results, too:
SELECT * {
dbpedia:Elvis_Presley ?p ?o
}
The only time Stardog will return a non-canonical URI in the query results is when you explicitly query for the sameAs
inferences, as in this next example:
$ stardog query execute -r elvis 'SELECT * { freebase:en.elvis_presley owl:sameAs ?elvis }'
+---------------------------+
| elvis |
+---------------------------+
| dbpedia:Elvis_Presley |
| freebase:en.elvis_presley |
| nyt:presley_elvis_per |
+---------------------------+
In the FULL
sameAs
reasoning mode, Stardog will also take other OWL axioms into account when computing sameAs
inferences. Consider the following example:
#Everyone has a unique SSN number
:hasSSN a owl:InverseFunctionalProperty , owl:DatatypeProperty .
:JohnDoe :hasSSN "123-45-6789" .
:JDoe :hasSSN "123-45-6789" .
#Nobody can work for more than one company (for the sake of the example)
:worksFor a owl:FunctionalProperty , owl:ObjectProperty ;
rdfs:domain :Employee ;
rdfs:range :Company .
:JohnDoe :worksFor :Acme .
:JDoe :worksFor :AcmeInc .
#For each company, there can only be one employee with the same employee ID
:Employee owl:hasKey (:employeeID :worksFor ).
:JohnDoe :employeeID "1234-ABC" .
:JohnD :employeeID "1234-ABC" ;
:worksFor :AcmeInc .
:JD :employeeID "5678-XYZ" ;
:worksFor :AcmeInc .
:John :employeeID "1234-ABC" ;
:worksFor :Emca .
For this database, with sameAs
reasoning set to FULL
, we would get the following answers:
$ stardog query execute -r acme "SELECT * {?x owl:sameAs ?y}"
+----------+----------+
| x | y |
+----------+----------+
| :JohnDoe | :JohnD |
| :JDoe | :JohnD |
| :Acme | :AcmeInc |
+----------+----------+
We can follow the chain of inferences to understand how these results were computed:
:JohnDoe owl:sameAs :JohnD
can be computed due to the fact that both have the same SSN numbers and thehasSSN
property is inverse functional.- We can infer
:Acme owl:sameAs :AcmeInc
since:JohnDoe
can work for at most one company. :JohnDoe owl:sameAs :JohnD
can be inferred using theowl:hasKey
definition since both individuals are known to work for the same company and have the same employee ID.- No more
sameAs
inferences can be computed due to the key definition, since other employees either have different IDs or work for other companies.
Special Reasoning Query Predicates
Stardog supports some built-in predicates with special meaning in order to make queries easier to read and write. These special predicates are primarily syntactic sugar for more complex structures.
Direct/Strict Subclasses, Subproperties, & Direct Types
Besides the standard RDF(S) predicates rdf:type
, rdfs:subClassOf
and rdfs:subPropertyOf
, Stardog supports the following special built-in predicates:
sp:directType
sp:directSubClassOf
sp:strictSubClassOf
sp:directSubPropertyOf
sp:strictSubPropertyOf
Where the sp
prefix binds to tag:stardog:api:property:
. Stardog also recognizes the sesame:directType
, sesame:directSubClassOf
, and sesame:strictSubClassOf
predicates, where the prefix sesame
binds to http://www.openrdf.org/schema/sesame#
.
We show what each of these predicates means by relating them to an equivalent triple pattern; that is, you can just write the predicate rather than the (more unwieldy) triple pattern.
#c1 is a subclass of c2 but not equivalent to c2
:c1 sp:strictSubClassOf :c2 => :c1 rdfs:subClassOf :c2 .
FILTER NOT EXISTS {
:c1 owl:equivalentClass :c2 .
}
#c1 is a strict subclass of c2 and there is no c3 between c1 and c2 in
#the strict subclass hierarchy
:c1 sp:directSubClassOf :c2 => :c1 sp:strictSubClassOf :c2 .
FILTER NOT EXISTS {
:c1 sp:strictSubClassOf :c3 .
:c3 sp:strictSubClassOf :c2 .
}
#ind is an instance of c1 but not an instance of any strict subclass of c1
:ind sp:directType :c1 => :ind rdf:type :c1 .
FILTER NOT EXISTS {
:ind rdf:type :c2 .
:c2 sp:strictSubClassOf :c1 .
}
The predicates sp:directSubPropertyOf
and sp:strictSubPropertyOf
are defined analogously.
New individuals with user-defined rules
Stardog supports a special predicate that extends the expressivity of user-defined rules. According to SWRL, you can’t create new individuals (i.e., new instances of classes) in a SWRL rule.
Don’t get hung up by the tech jargon here. “Can’t create new individuals” just means you can’t have a rule that creates a new instance of some RDF or OWL class as a result of the rule firing.
This restriction is well-motivated; without it, you can easily create rules that do not terminate (i.e., never reach a fixed point). Stardog’s user-defined rules weaken this restriction in some crucial aspects, subject to the following restrictions, conditions, and warnings.
This special predicate is very powerful, and you can mess things up badly if you aren’t careful.
Despite the general restriction in SWRL, in Stardog we can create new individuals with a rule by using the function UUID()
, as follows:
IF {
?p a :Parent .
BIND (UUID() AS ?parent) .
}
THEN {
?parent a :Person .
}
This rule will create a random URI for each instance of the class :Parent
and also assert that each new instance is an instance of :Person
– parents are people, too!
There are a couple important things to note about this feature:
-
The URIs for the generated individuals are meaningless, in the sense that they should not be used in further queries; that is, these URIs are not guaranteed by Stardog to be stable.
-
Due to normalization, rules with more than one atom in the head are broken up into several rules.
IF { ?person a :Person . BIND (UUID() AS ?parent) . } THEN { ?parent a :Parent ; a :Male . }
will be normalized into two rules:
IF { ?person a :Person . BIND (UUID() AS ?parent) . } THEN { ?parent a :Parent . } IF { ?person a :Person . BIND (UUID() AS ?parent) . } THEN { ?parent a :Male . }
As a consequence, instead of stating that the new individual is both an instance of
:Male
and:Parent
, we would create two different new individuals and assert that one is male and the other is a parent. If you need to assert various things about the new individual, we recommend the use of extra rules or axioms. In the previous example, we can introduce a new class (:Father
) and add the following rule to our schema:IF { ?person a :Father . } THEN { ?person a :Parent ; a :Male . }
And then modify the original rule accordingly:
IF { ?person a :Person . BIND (UUID() AS ?parent) . } THEN { ?parent a :Father . }
Explaining Reasoning Results
Stardog can be used to check if the current database logically entails a set of triples; moreover, Stardog can explain why this is so. An explanation of an inference is the minimum set of statements explicitly stored in the database that, together with the schema and any valid inferences, logically justify the inference. Explanations are useful for understanding data, schema, and their interactions, especially when large number of statements interact with each other to infer new statements.
Explanations can be retrieved using the CLI reasoning explain
command by providing an input file that contains the inferences to be explained:
$ stardog reasoning explain myDB inference_to_explain.ttl
The output is displayed in a concise syntax designed to be legible; but it can be rendered in any one of the supported RDF syntaxes, if desired. Explanations are also accessible through the HTTP API and programmatically. See the example in the stardog-examples Github repo for more details about retrieving explanations programmatically.
Proof Trees
Proof trees are a hierarchical presentation of multiple explanations (of inferences) to make data, schemas, and rules more intelligible. Proof trees provide an explanation for an inference or an inconsistency as a hierarchical structure. Nodes in the proof tree may represent an assertion in a Stardog database. Multiple assertion nodes are grouped under an inferred node.
Examples
For example, if we are explaining the inferred triple :Alice rdf:type :Employee
, the root of the proof tree will show that inference:
INFERRED :Alice rdf:type :Employee
The children of an inferred node will provide more explanation for that inference:
INFERRED :Alice rdf:type :Employee
ASSERTED :Manager rdfs:subClassOf :Employee
INFERRED :Alice rdf:type :Manager
The fully expanded proof tree will show the asserted triples and axioms for every inference:
INFERRED :Alice rdf:type :Employee
ASSERTED :Manager rdfs:subClassOf :Employee
INFERRED :Alice rdf:type :Manager
ASSERTED :Alice :supervises :Bob
ASSERTED :supervises rdfs:domain :Manager
The CLI reasoning explain
command prints the proof tree using indented text; but, using the Java API, it is easy to create a tree widget in a GUI to show the explanation tree, such that users can expand and collapse details in the explanation.
Another feature of proof trees is the ability to merge multiple explanations into a single proof tree with multiple branches when explanations have common statements. Consider the following example database:
#schema
:Manager rdfs:subClassOf :Employee
:ProjectManager rdfs:subClassOf :Manager
:ProjectManager owl:equivalentClass (:manages some :Project)
:supervises rdfs:domain :Manager
:ResearchProject rdfs:subClassOf :Project
:projectID rdfs:domain :Project
#instance data
:Alice :supervises :Bob
:Alice :manages :ProjectX
:ProjectX a :ResearchProject
:ProjectX :projectID "123-45-6789"
In this database, there are three different unique explanations for the inference :Alice rdf:type :Employee
:
Explanation 1
:Manager rdfs:subClassOf :Employee
:ProjectManager rdfs:subClassOf :Manager
:supervises rdfs:domain :Manager
:Alice :supervises :Bob
Explanation 2
:Manager rdfs:subClassOf :Employee
:ProjectManager rdfs:subClassOf :Manager
:ProjectManager owl:equivalentClass (:manages some :Project)
:ResearchProject rdfs:subClassOf :Project
:Alice :manages :ProjectX
:ProjectX a :ResearchProject
Explanation 3
:Manager rdfs:subClassOf :Employee
:ProjectManager rdfs:subClassOf :Manager
:ProjectManager owl:equivalentClass (:manages some :Project)
:projectID rdfs:domain :Project
:Alice :manages :ProjectX
:ProjectX :projectID "123-45-6789"
All three explanations have some triples in common; but when explanations are retrieved separately, it is hard to see how these explanations are related. When explanations are merged, we get a single proof tree where alternatives for subtrees of the proof are shown inline. In indented text rendering, the merged tree for the above explanations would look as follows:
INFERRED :Alice a :Employee
ASSERTED :Manager rdfs:subClassOf :Employee
1.1) INFERRED :Alice a :Manager
ASSERTED :supervises rdfs:domain :Manager
ASSERTED :Alice :supervises :Bob
1.2) INFERRED :Alice a :Manager
ASSERTED :ProjectManager rdfs:subClassOf :Manager
INFERRED :Alice a :ProjectManager
ASSERTED :ProjectManager owl:equivalentClass (:manages some :Project)
ASSERTED :Alice :manages :ProjectX
2.1) INFERRED :ProjectX a :Project
ASSERTED :projectID rdfs:domain :Project
ASSERTED :ProjectX :projectID "123-45-6789"
2.2) INFERRED :ProjectX a :Project
ASSERTED :ResearchProject rdfs:subClassOf :Project
ASSERTED :ProjectX a :ResearchProject
In the merged proof tree, alternatives for an explanation are shown with a number id. In the above tree, :Alice a :Manager
is the first inference for which we have multiple explanations, so it gets the id 1
. Then each alternative explanation gets an id appended to this (so explanations 1.1
and 1.2
are both alternative explanations for inference 1
). We also have multiple explanations for inference :ProjectX a :Project
, so its alternatives get ids 2.1
and 2.2
.