User-defined Rule Reasoning

This page discusses user-defined rule reasoning in Stardog.

Page Contents

Overview
Stardog Rules Syntax (SRS)
1. SWRL Support
Rule Examples
Rule Limitations

Overview

Stardog supports user-defined rule reasoning in addition to RDFS and OWL reasoning. User-defined rules provide more expressivity compared to RDFS/OWL and allow users to encode more complex domain logic. Rules in Stardog correspond to Datalog rules expressed over RDF graphs using SPARQL syntax.

A rule is an IF-THEN statement. The IF clause defines the conditions to match in the data; if they match, then the contents of the THEN clause “fire”, that is, they are inferred and, thus, available for other queries, rules, or axioms, etc. Inferences implied by the rules will not be materialized. Instead, rules are used to expand queries and the results of the expanded query will include the relevant inferences as explained in this section.

If you are familiar with Datalog jargon, the IF clause of a rule is called the “rule body” (or antecedent) and the THEN clause is called the “rule head” (or consequent).

In this chapter, we explain the details of the rules syntax, the features that can be used in rules and the limitations.

Stardog Rules Syntax (SRS)

Stardog Rule Syntax (SRS) is based on the SPARQL Grammar and is defined as follows. If you are not familiar with the details of the SPARQL grammar see the examples below to get started.

IF {
    GroupOrUnionGraphPattern | Filter | Bind | InlineData
}
[
 GROUP BY Var+
 Bind+  
 [ HavingClause ]
]
THEN {
   TriplesBlock
}



Rule := IF { IfClause } [RuleAggregation] THEN { ThenClause }
IfClause := GroupOrUnionGraphPattern | Filter | Bind | InlineData
RuleAggregation := GROUP BY Var+ Bind+ [ HavingClause ] 
ThenClause := TriplesBlock

An example of a user-defined rule looks as follows:

# infers a person's country of residence from their address
IF {
   ?person :hasAddress ?address .
   ?address :country ?country .
}
THEN {
   ?person :countryOfResidence ?country .
}

In Datalog, the above rule would be represented as

countryOfResidence(P, C) :- hasAddress(P, A), country(A, C)

where classes in RDF correspond to unary Datalog predicates and properties in RDF correspond to binary Datalog predicates. The Datalog convention is to use uppercase letters for variables and lowercase strings for predicates.

Conceptually, user-defined rules are similar to SPARQL CONSTRUCT queries that generate new RDF triples. For example, the CONSTRUCT equivalent of the above rule would be:

CONSTRUCT {
   ?person :countryOfResidence ?country .
}
WHERE {
   ?person :hasAddress ?address .
   ?address :country ?country .
}

Many features of SPARQL can be used in rules but there are some limitations as explained below.

Stardog supports inclusion of rules directly in Turtle files by extending the Turtle syntax. Rules can be mixed with other triples in the file. Optionally an IRI can be used to uniquely identify the rule and used in further triples to refer to the rule. The following example shows such an example with data nd a rules mixed:

PREFIX : <http://example.org/>

:JohnDoe a :Person ;
   :hasAddress [
      :street "123 Oak St." ;
      :state :VA ;
      :country :USA  
   ] .

RULE :CountryOfResidenceRule
IF {
   ?person :hasAddress ?address .
   ?address :country ?country .
}
THEN {
   ?person :countryOfResidence ?country .
}

:CountryOfResidenceRule rdfs:comment "This rule infers a person's country of residence from their address" .

When rules are loaded into Stardog they are stored as standard RDF triples using terms from the tag:stardog:api:rule: namespace. The rule’s content is stored as a string literal:

PREFIX : <http://example.org/>
PREFIX rule: <tag:stardog:api:rule:>

:CountryOfResidenceRule a rule:SPARQLRule;
   rule:content """
      PREFIX : <http://example.org/>
      RULE :CountryOfResidenceRule
      IF {
         ?person :hasAddress ?address .
         ?address :country ?country .
      }
      THEN {
         ?person :countryOfResidence ?country .
      }
   """.

This representation of rules can be loaded into Stardog as an alternative to the inline syntax.

Rules can use namespaces stored in the corresponding Stardog database and omit the prefix declarations. If rules are using other namespaces then prefixes should be explicitly declared in the input file. If inline rule syntax is being used then the prefix declarations of the Turtle file will apply. If rules are represented inside literals then the prefix declaration should be repeated within each literal as shown above.

SWRL Support

The Semantic Web Rule Language (SWRL) is a proposed language for expressing rules on top of OWL ontologies and defines an RDF serialization for these rules. Stardog supports the SWRL serialization and will automatically detect and process rules defined in the SWRL syntax. Stardog can also translate rules between SRS and SWRL where possible. The negation and the aggregation features in SRS cannot be expressed in SWRL and therefore SRS rules using those features cannot be translated to SWRL.

Rule Examples

In this section we go over rule examples showcasing how different SPARQL constructs can be used in rules. In addition to triples patterns the IF clause of a rule can optionally contain FILTER, BIND or UNION constructs. If the new Stride reasoner is enabled then VALUES and GROUP BY constructs can also be used, as well as the NOT EXISTS expression in FILTERs. See the examples for more details.

Filtering Matches

Filters can be used in rules to restrict the matches for the IF clause. The following rule classifies every person who is 18 years or older as Adult:

IF {
      ?person a :Person ; 
             :age ?age.
      FILTER (?age >= 18)
}
THEN {
      ?person a :Adult
}

Almost all SPARQL functions Stardog supports including user-defined functions can be used in rules. There are special conditions that apply to NOT EXISTS function which is explained in the negation section below. In addition, non-deterministic functions such as RAND should be avoided in rules. In general, if the function is not a pure function then the reasoning results would be unpredictable because every invocation of the function might return a different value. Note that the function NOW() can be used in rules without problems. In the context of a single query NOW() will deterministically return the same time. All the inferences computed for a query involving rules containing NOW() will use the beginning time of the query.

Binding Values

Rules can bind new values from existing facts and those values can be used in further filters or in the THEN clause. The first rule infers/computes the full name of a person by concatenating the first and the last names.

IF {
  ?person :firstName ?firstName ;
          :lastName ?lastName ;
  BIND (concat(?firstName, " ", ?lastName) as ?fullName)
}
THEN {
  ?person :fullName ?fullName
}

As mentioned in the previous section, it is possible to use the NOW() function in rules. In that way we can compute the age of a person from their birthdate with the following rule:

IF {
  ?person :birthDate ?birthDate
  bind(year(now()) - year(?birthDate) AS ?age)
}
THEN {
  ?person :age ?age
}

The power of rules comes from the fact that they apply to both asserted facts and inferences obtained through other rules. For example, the age attribute inferred by this rule would be used by the rule above that infers the Adult type. We can encode the domain logic as simple rules that build on each other in a modular way.

Matching Alternatives

We can use the UNION construct in rules to match alternatives. Suppose we have a data model where MoneyTransfers are connected to Accounts via sentBy and receivedBy relationships and Accounts are linked to Person via the ownedBy relationship. The following rule relates a MoneyTransfer to a Person by either path:

IF {
  ?transfer a :MoneyTransfer .
  { ?transfer :sentBy ?acct }
  UNION 
  { ?transfer :receivedBy ?acct }
  ?acct :ownedBy ?owner
}
THEN {
  ?owner :participatesIn ?transfer
}

We can use SPARQL syntax shortcuts for property path alternatives and sequence to express this rule in a more succint way:

IF {
  ?transfer a :MoneyTransfer .
  ?transfer (:sentBy|:receivedBy)/:ownedBy ?owner
}
THEN {
  ?owner :participatesIn ?transfer
}

The other possibility to match alternatives is to write multiple rules. We can split the above example to multiple rules as follows by introducing an intermediate involvesAccount relationship inference:

IF { ?transfer a :MoneyTransfer ; :sentBy ?acct }
THEN { ?transfer :involvesAccount ?acct };

IF { ?transfer a :MoneyTransfer ; :receivedBy ?acct }
THEN { ?transfer :involvesAccount ?acct };

IF { ?transfer :involvesAccount/:ownedBy ?owner }
THEN { ?owner :participatesIn ?transfer };

Negation

The negation support requires the new Stride reasoner to be enabled.

It is possible to write rules that infer new triples due to the absence of other triples or inferences. Continuing with the money transfer example from the previous section, suppose we want to write a rule that identifies accounts that have never been used in any transfer in one year:

IF {
  ?acct a :Account .
  FILTER NOT EXISTS {
     ?transfer a :MoneyTransfer .
     ?transfer :sentBy|:receivedBy ?acct .
     ?transfer :date ?date .
     FILTER(NOW() - ?date < "P1Y"^^xsd:yearMonthDuration)
  }
}
THEN {
  ?acct a :InactiveAccount
}

Making inferences due to absence of information is called negation as failure in Datalog. This is a non-monotonic inference rule which means if we add new data about money transfers, some of the inferences we have derived might not be derived anymore; that is, addition of data results in removal of inferences. When rules are monotonic, addition of new data can only result in additional inferences.

Stardog supports stratified negation which refers to the notion that there cannot be recursion through negated (or aggregated) rules. See below for more details.

Aggregation

The aggregation support requires the new Stride reasoner to be enabled.

We can write a rule that will compute a value over the results matched by the IF condition. The following rule computes the total number of accounts owned by the same person:

IF {
  ?acct a :Account ;
        :ownedBy ?owner
}
GROUP BY ?owner
BIND(count(?acct) AS ?acctCount)
THEN {
  ?owner :numerOfAccounts ?acctCount
}

Aggregation is non-monotonic similar to negation and the same stratification restrictions apply to rules with aggregation.

Rule Limitations

There are several limitations about rules that we explain in this section. Most of these limitations come from the Datalog formalism to ensure non-ambiguous semantics.

Rule Safety

The so-called “rule safety” is a common Datalog requirement that says every variable used in the THEN clause should be bound in the IF clause. The reason for this requirement should be obvious; the inferences should be based on what we know in the data. Most commonly unsafe rules would occur due to typos. For example, in the following rule the variable ?person is mistyped in the THEN clause making the rule unsafe:

# invalid rule example
IF   { ?person :worksFor ?org }
THEN { ?org :hasEmployee ?person }

Variable Positions

Triple patterns in rules cannot have variables in the predicate position or in the object position if the predicate is rdf:type. This restriction is a direct result of how rules are represented in Datalog. As mentioned above, a triple pattern ?person :hasAddress ?address turns into a Datalog atom hasAddress(P, A) so variables in predicate position cannot be translated to Datalog.

# invalid rule example - ?productType variable 
IF   { ?person :purchased ?product ;
       ?product a ?productType }
THEN { ?person :purchasedProductType ?productType }

SPARQL Constructs

In addition to triples patterns the IF clause of a rule can optionally contain UNION, BIND or FILTER constructs. If the new Stride reasoner is enabled then VALUES and GROUP BY constructs are allowed. Furthermore, Stride reasoner also supports EXISTS and NOT EXISTS functions in FILTER expressions which are otherwise not allowed.

Recursion Restrictions

Combining recursive Datalog rules with negation and aggregation complicates its semantics and can cause ambiguity. Stardog incorporates well-known restrictions from the Datalog literature to provide clear semantics for rules that can be implemented efficiently. These restrictions are explained in the following sections, but first we explain what a recursive rule is.

A recursive rule infers something by referring to itself directly or indirectly. A very simple recursive rule example is as follows:

IF   { ?person :worksFor ?org .
       ?org :partOf ?otherOrg }
THEN { ?person :worksFor ?otherOrg }

This rule will infer that a person working for an organization is also working for the other organization that the first organization is part of. This is a recursive rule because the same worksFor relation is used both in the rule body and the rule head. If there is a path partOf relations in the graph this rule would infer the worksFor relation for all the organizations on the path.

Recursion can also occur via multiple rules:

IF   { ?person :worksFor ?org }
THEN { ?org :hasEmployee ?person }

IF   { ?org :hasEmployee ?person }
THEN { ?person :worksFor ?org }

In this example, these rules create a recursion because hasEmployee relation can be inferred using worksFor and vice versa.

Stratified Rules

Using negation or aggregation within recursive rules without any limits cause problems with semantics. Stardog adopts the well-known stratification restriction to avoid this. Stratification simply means there can be no recursion involving negation or aggregation. See the following example showing an example of an invalid recursive rule:

# invalid rule example - recursion via negation not allowed
IF {
  ?person a :Person .
  FILTER NOT EXISTS {
     ?person a :Employed
  }
}
THEN {
  ?person a :Unemployed
}

IF {
  ?person a :Person .
  FILTER NOT EXISTS {
     ?person a :Unemployed
  }
}
THEN {
  ?person a :Employed
}

It is easy to see why these rules are problematic. If a graph only has the triple :John a :Person then it is not clear if the person should be inferred :Employed or :Unemployed. The following example on the other hand is valid:

# negation is stratified, second rule depends on first rule but not the other way around
IF {
  ?person :worksFor ?org
}
THEN {
  ?person a :Employed
}

IF {
  ?person a :Person .
  FILTER NOT EXISTS {
     ?person a :Employed
  }
}
THEN {
  ?person a :Unemployed
}

Linear Recursion

Linear recursion is another well-known restriction in Datalog to ensure inference algorithms can be implemented efficiently. Linear recursion requires that every rule can have at most one recursive term in its body.

# invalid rule - recursion is not linear
IF { 
   ?x a :Famous .
   ?x :knows ?y .
   ?y :knows ?z .
   ?z a :Famous .
}
THEN { 
   ?y a :Famous
}

This rule is not linear because Famous is recursive and appears twice in the rule body.

There are various types of rules that syntactically satisfy the definition of a non-linear rule but can be transformed into a linear rule. The following transitivity rule is one example of a syntactically non-linear rule that can be linearized and thus allowed:

# valid recursion - transitivity can be expressed with linear rules
IF   { ?x :partOf ?y . ?y :partOf ?z . }
THEN { ?x :partOf ?z }

Stardog will first transform all non-linear rules into linear rules. If there is a non-linear rule that cannot be transformed an error message will be produced.

Overview
Stardog Rules Syntax (SRS)
- SWRL Support
Rule Examples
Rule Limitations