User-defined Rule Reasoning
This page discusses user-defined rule reasoning in Stardog.
Page Contents
Overview
Stardog supports user-defined rule reasoning in addition to RDFS and OWL reasoning. User-defined rules provide more expressivity compared to RDFS/OWL and allow users to encode more complex domain logic. Rules in Stardog correspond to Datalog rules expressed over RDF graphs using SPARQL syntax.
A rule is an IF-THEN
statement. The IF
clause defines the conditions to match in the data; if they match, then the contents of the THEN
clause “fire”, that is, they are inferred and, thus, available for other queries, rules, or axioms, etc. Inferences implied by the rules will not be materialized. Instead, rules are used to expand queries and the results of the expanded query will include the relevant inferences as explained in this section.
If you are familiar with Datalog jargon, the IF
clause of a rule is called the “rule body” (or antecedent) and the THEN
clause is called the “rule head” (or consequent).
In this chapter, we explain the details of the rules syntax, the features that can be used in rules and the limitations.
Stardog Rules Syntax (SRS)
Stardog Rule Syntax (SRS) is based on the SPARQL Grammar and is defined as follows. If you are not familiar with the details of the SPARQL grammar see the examples below to get started.
IF {
GroupOrUnionGraphPattern | Filter | Bind | InlineData
}
[
GROUP BY Var+
Bind+
[ HavingClause ]
]
THEN {
TriplesBlock
}
An example of a user-defined rule looks as follows:
# infers a person's country of residence from their address
IF {
?person :hasAddress ?address .
?address :country ?country .
}
THEN {
?person :countryOfResidence ?country .
}
In Datalog, the above rule would be represented as
countryOfResidence(P, C) :- hasAddress(P, A), country(A, C)
where classes in RDF correspond to unary Datalog predicates and properties in RDF correspond to binary Datalog predicates. The Datalog convention is to use uppercase letters for variables and lowercase strings for predicates.
Conceptually, user-defined rules are similar to SPARQL CONSTRUCT queries that generate new RDF triples. For example, the CONSTRUCT equivalent of the above rule would be:
CONSTRUCT {
?person :countryOfResidence ?country .
}
WHERE {
?person :hasAddress ?address .
?address :country ?country .
}
Many features of SPARQL can be used in rules but there are some limitations as explained below.
Stardog supports inclusion of rules directly in Turtle files by extending the Turtle syntax. Rules can be mixed with other triples in the file. Optionally an IRI can be used to uniquely identify the rule and used in further triples to refer to the rule. The following example shows such an example with data nd a rules mixed:
PREFIX : <http://example.org/>
:JohnDoe a :Person ;
:hasAddress [
:street "123 Oak St." ;
:state :VA ;
:country :USA
] .
RULE :CountryOfResidenceRule
IF {
?person :hasAddress ?address .
?address :country ?country .
}
THEN {
?person :countryOfResidence ?country .
}
:CountryOfResidenceRule rdfs:comment "This rule infers a person's country of residence from their address" .
When rules are loaded into Stardog they are stored as standard RDF triples using terms from the tag:stardog:api:rule:
namespace. The rule’s content is stored as a string literal:
PREFIX : <http://example.org/>
PREFIX rule: <tag:stardog:api:rule:>
:CountryOfResidenceRule a rule:SPARQLRule;
rule:content """
PREFIX : <http://example.org/>
RULE :CountryOfResidenceRule
IF {
?person :hasAddress ?address .
?address :country ?country .
}
THEN {
?person :countryOfResidence ?country .
}
""".
This representation of rules can be loaded into Stardog as an alternative to the inline syntax.
Rules can use namespaces stored in the corresponding Stardog database and omit the prefix declarations. If rules are using other namespaces then prefixes should be explicitly declared in the input file. If inline rule syntax is being used then the prefix declarations of the Turtle file will apply. If rules are represented inside literals then the prefix declaration should be repeated within each literal as shown above.
SWRL Support
The Semantic Web Rule Language (SWRL) is a proposed language for expressing rules on top of OWL ontologies and defines an RDF serialization for these rules. Stardog supports the SWRL serialization and will automatically detect and process rules defined in the SWRL syntax. Stardog can also translate rules between SRS and SWRL where possible. The negation and the aggregation features in SRS cannot be expressed in SWRL and therefore SRS rules using those features cannot be translated to SWRL.
Rule Examples
In this section we go over rule examples showcasing how different SPARQL constructs can be used in rules. In addition to triples patterns the IF
clause of a rule can optionally contain FILTER
, BIND
or UNION
constructs. If the new Stride reasoner is enabled then VALUES
and GROUP BY
constructs can also be used, as well as the NOT EXISTS
expression in FILTER
s. See the examples for more details.
Filtering Matches
Filters can be used in rules to restrict the matches for the IF
clause. The following rule classifies every person who is 18 years or older as Adult
:
IF {
?person a :Person ;
:age ?age.
FILTER (?age >= 18)
}
THEN {
?person a :Adult
}
Almost all SPARQL functions Stardog supports including user-defined functions can be used in rules. There are special conditions that apply to NOT EXISTS
function which is explained in the negation section below. In addition, non-deterministic functions such as RAND
should be avoided in rules. In general, if the function is not a pure function then the reasoning results would be unpredictable because every invocation of the function might return a different value. Note that the function NOW()
can be used in rules without problems. In the context of a single query NOW()
will deterministically return the same time. All the inferences computed for a query involving rules containing NOW()
will use the beginning time of the query.
Binding Values
Rules can bind new values from existing facts and those values can be used in further filters or in the THEN
clause. The first rule infers/computes the full name of a person by concatenating the first and the last names.
IF {
?person :firstName ?firstName ;
:lastName ?lastName ;
BIND (concat(?firstName, " ", ?lastName) as ?fullName)
}
THEN {
?person :fullName ?fullName
}
As mentioned in the previous section, it is possible to use the NOW()
function in rules. In that way we can compute the age
of a person from their birthdate with the following rule:
IF {
?person :birthDate ?birthDate
bind(year(now()) - year(?birthDate) AS ?age)
}
THEN {
?person :age ?age
}
The power of rules comes from the fact that they apply to both asserted facts and inferences obtained through other rules. For example, the age
attribute inferred by this rule would be used by the rule above that infers the Adult
type. We can encode the domain logic as simple rules that build on each other in a modular way.
Matching Alternatives
We can use the UNION construct in rules to match alternatives. Suppose we have a data model where MoneyTransfer
s are connected to Account
s via sentBy
and receivedBy
relationships and Account
s are linked to Person
via the ownedBy
relationship. The following rule relates a MoneyTransfer
to a Person
by either path:
IF {
?transfer a :MoneyTransfer .
{ ?transfer :sentBy ?acct }
UNION
{ ?transfer :receivedBy ?acct }
?acct :ownedBy ?owner
}
THEN {
?owner :participatesIn ?transfer
}
We can use SPARQL syntax shortcuts for property path alternatives and sequence to express this rule in a more succint way:
IF {
?transfer a :MoneyTransfer .
?transfer (:sentBy|:receivedBy)/:ownedBy ?owner
}
THEN {
?owner :participatesIn ?transfer
}
The other possibility to match alternatives is to write multiple rules. We can split the above example to multiple rules as follows by introducing an intermediate involvesAccount
relationship inference:
IF { ?transfer a :MoneyTransfer ; :sentBy ?acct }
THEN { ?transfer :involvesAccount ?acct };
IF { ?transfer a :MoneyTransfer ; :receivedBy ?acct }
THEN { ?transfer :involvesAccount ?acct };
IF { ?transfer :involvesAccount/:ownedBy ?owner }
THEN { ?owner :participatesIn ?transfer };
Negation
The negation support requires the new Stride reasoner to be enabled.
It is possible to write rules that infer new triples due to the absence of other triples or inferences. Continuing with the money transfer example from the previous section, suppose we want to write a rule that identifies accounts that have never been used in any transfer in one year:
IF {
?acct a :Account .
FILTER NOT EXISTS {
?transfer a :MoneyTransfer .
?transfer :sentBy|:receivedBy ?acct .
?transfer :date ?date .
FILTER(NOW() - ?date < "P1Y"^^xsd:yearMonthDuration)
}
}
THEN {
?acct a :InactiveAccount
}
Making inferences due to absence of information is called negation as failure in Datalog. This is a non-monotonic inference rule which means if we add new data about money transfers, some of the inferences we have derived might not be derived anymore; that is, addition of data results in removal of inferences. When rules are monotonic, addition of new data can only result in additional inferences.
Stardog supports stratified negation which refers to the notion that there cannot be recursion through negated (or aggregated) rules. See below for more details.
Aggregation
The aggregation support requires the new Stride reasoner to be enabled.
We can write a rule that will compute a value over the results matched by the IF
condition. The following rule computes the total number of accounts owned by the same person:
IF {
?acct a :Account ;
:ownedBy ?owner
}
GROUP BY ?owner
BIND(count(?acct) AS ?acctCount)
THEN {
?owner :numerOfAccounts ?acctCount
}
Aggregation is non-monotonic similar to negation and the same stratification restrictions apply to rules with aggregation.
Rule Limitations
There are several limitations about rules that we explain in this section. Most of these limitations come from the Datalog formalism to ensure non-ambiguous semantics.
Rule Safety
The so-called “rule safety” is a common Datalog requirement that says every variable used in the THEN
clause should be bound in the IF
clause. The reason for this requirement should be obvious; the inferences should be based on what we know in the data. Most commonly unsafe rules would occur due to typos. For example, in the following rule the variable ?person
is mistyped in the THEN
clause making the rule unsafe:
# invalid rule example
IF { ?person :worksFor ?org }
THEN { ?org :hasEmployee ?person }
Variable Positions
Triple patterns in rules cannot have variables in the predicate position or in the object position if the predicate is rdf:type
. This restriction is a direct result of how rules are represented in Datalog. As mentioned above, a triple pattern ?person :hasAddress ?address
turns into a Datalog atom hasAddress(P, A)
so variables in predicate position cannot be translated to Datalog.
# invalid rule example - ?productType variable
IF { ?person :purchased ?product ;
?product a ?productType }
THEN { ?person :purchasedProductType ?productType }
SPARQL Constructs
In addition to triples patterns the IF
clause of a rule can optionally contain UNION
, BIND
or FILTER
constructs. If the new Stride reasoner is enabled then VALUES
and GROUP BY
constructs are allowed. Furthermore, Stride reasoner also supports EXISTS
and NOT EXISTS
functions in FILTER
expressions which are otherwise not allowed.
Recursion Restrictions
Combining recursive Datalog rules with negation and aggregation complicates its semantics and can cause ambiguity. Stardog incorporates well-known restrictions from the Datalog literature to provide clear semantics for rules that can be implemented efficiently. These restrictions are explained in the following sections, but first we explain what a recursive rule is.
A recursive rule infers something by referring to itself directly or indirectly. A very simple recursive rule example is as follows:
IF { ?person :worksFor ?org .
?org :partOf ?otherOrg }
THEN { ?person :worksFor ?otherOrg }
This rule will infer that a person working for an organization is also working for the other organization that the first organization is part of. This is a recursive rule because the same worksFor
relation is used both in the rule body and the rule head. If there is a path partOf
relations in the graph this rule would infer the worksFor
relation for all the organizations on the path.
Recursion can also occur via multiple rules:
IF { ?person :worksFor ?org }
THEN { ?org :hasEmployee ?person }
IF { ?org :hasEmployee ?person }
THEN { ?person :worksFor ?org }
In this example, these rules create a recursion because hasEmployee
relation can be inferred using worksFor
and vice versa.
Stratified Rules
Using negation or aggregation within recursive rules without any limits cause problems with semantics. Stardog adopts the well-known stratification restriction to avoid this. Stratification simply means there can be no recursion involving negation or aggregation. See the following example showing an example of an invalid recursive rule:
# invalid rule example - recursion via negation not allowed
IF {
?person a :Person .
FILTER NOT EXISTS {
?person a :Employed
}
}
THEN {
?person a :Unemployed
}
IF {
?person a :Person .
FILTER NOT EXISTS {
?person a :Unemployed
}
}
THEN {
?person a :Employed
}
It is easy to see why these rules are problematic. If a graph only has the triple :John a :Person
then it is not clear if the person should be inferred :Employed
or :Unemployed
. The following example on the other hand is valid:
# negation is stratified, second rule depends on first rule but not the other way around
IF {
?person :worksFor ?org
}
THEN {
?person a :Employed
}
IF {
?person a :Person .
FILTER NOT EXISTS {
?person a :Employed
}
}
THEN {
?person a :Unemployed
}
Linear Recursion
Linear recursion is another well-known restriction in Datalog to ensure inference algorithms can be implemented efficiently. Linear recursion requires that every rule can have at most one recursive term in its body.
# invalid rule - recursion is not linear
IF {
?x a :Famous .
?x :knows ?y .
?y :knows ?z .
?z a :Famous .
}
THEN {
?y a :Famous
}
This rule is not linear because Famous
is recursive and appears twice in the rule body.
There are various types of rules that syntactically satisfy the definition of a non-linear rule but can be transformed into a linear rule. The following transitivity rule is one example of a syntactically non-linear rule that can be linearized and thus allowed:
# valid recursion - transitivity can be expressed with linear rules
IF { ?x :partOf ?y . ?y :partOf ?z . }
THEN { ?x :partOf ?z }
Stardog will first transform all non-linear rules into linear rules. If there is a non-linear rule that cannot be transformed an error message will be produced.