This document proposes a method for validating Semantic Web and Linked Data by providing an alternative, integrity constraint (IC) semantics for OWL. A model-theoretic semantics based on the Closed World Assumption and a weak variant of the Unique Name Assumption is given for OWL axioms that are thereby interpreted as ICs. The document includes a structural specification in order to augment an ontology with a set of OWL ICs and a brief description of possible implementation approaches.
This document proposes a method of constraining and validating Semantic Web and Linked Data (i.e., RDF) instance data using Integrity Constraints (ICs) modeled as OWL axioms. The proposal enables an OWL ontology to be interpreted as a set of ICs; i.e., checks that must be satisfied by the information explicitly present or information that may be inferred. We define a model-theoretic semantics for OWL ICs based on the Closed World Assumption and a weak variant of the Unique Name Assumption and briefly describe feasible implementation strategies.
In some use cases and for some requirements, OWL users intend OWL axioms to be interpreted as ICs. However, the direct semantics of OWL [OWL 2 Direct Semantics] does not interpret OWL axioms in this way; thus, the consequences that one can draw from such ontologies differ from the ones that some users intuitively expect and require. In other words, some users want to use OWL as a validation or constraint language for RDF instance data, but that is not possible using OWL software that correctly implements the existing semantics of OWL. This document addresses that situation by providing a different semantics—compatible with the existing semantics—for OWL axioms which may be used, together with appropriate software, to validate RDF instance data.
To see the nature of the problem, consider an OWL ontology that describes terms and concepts regarding the product inventory of a supermarket. The ontology includes the classes Product and Provider, the object property hasProvider, and the data property hasID. Suppose we want to impose the following ICs on the data:
These constraints could be interpreted in the following way:
These constraints can be concisely and unambiguously represented as OWL axioms:
Class: Product
hasID some literal
DataProperty: hasID
Domain: Product
ObjectProperty: hasProvider
Characteristics: Functional
However, these axioms will not be interpreted as checks by software which implements the standard OWL semantics. In fact, according to the standard OWL semantics, we have that:
In some cases, users want these inferences; but in others, users want integrity constraint violations to be detected, reported, repaired, etc.
OWL adopts the Open World Assumption (OWA) and does not adopt the Unique Name Assumption (UNA). These design choices make it very difficult to treat these axioms as ICs. On the one hand, due to OWA, a statement must not be inferred to be false on the basis of failures to prove it; therefore, the fact that a piece of information has not been specified (e.g., a product's ID) does not mean that such information does not exist. On the other hand, the absence of UNA allows two different constants to refer to the same individual (e.g., provider_{i} and provider_{j}).
The standard interpretation of OWL axioms that are intended to be interpreted as ICs is inappropriate for some use cases and applications; therefore, it is useful to define an alternate semantics for OWL based on IC. An IC semantics together with associated software will increase the number of satisfied users of OWL because OWL and the software will then behave as those users intuitively expect and require.
As formally defined in Section 2, our approach allows for a standard OWL ontology O to import a set of IC ontologies—OWL ontologies that are to be interpreted as ICs. Note that the IC semantics for OWL defined in this document is a strict extension of the standard OWL semantics: in case O imports no IC ontology, then O should be interpreted as a standard OWL ontology.
An OWL ontology that is to be interpreted as a set of ICs is called an IC ontology. We slightly extend the structural specification of OWL in order to allow ontologies to import a set of IC ontologies. We do so by introducing a new annotation property which is defined analogously to owl:imports—the annotation property that is used to import standard ontologies defined by OWL 2 [OWL 2 Specification].
We use an annotation property that resembles owl:imports with a different namespace: http://www.w3.org/Submission/owlic/. In the following, we denote this annotation property as ic:imports for brevity.
An example usage of this annotation property is given in the following:
Namespace(ic = <http://www.w3.org/Submission/owlic/>) Ontology(<http://www.example.com/instanceOntology> Import(<http://www.example.com/schemaOntology>) Annotation(ic:imports <http://www.example.com/constraintsOntology>) ... )
where instanceOntology imports the standard axioms of schemaOntology and imports constraintsOntology as an IC ontology. This import approach to relating ICs to other OWL ontologies gives enough flexibility to users without too much maintenance cost and negligible impact on existing tools. See Implementation Remarks for a more detailed discussion of this design choice.
An OWL ontology can import a set of IC ontologies via ic:imports. An IC ontology can import a set of IC ontologies via ic:imports as well. And, of course, an IC ontology can import a set of standard ontologies via owl:imports as usual.
The import closure of an IC ontology is defined in the same vein as the import closure for standard OWL ontologies. The IC import closure of a standard or IC ontology O is a set containing all the IC ontologies that O imports via the ic:imports annotation property. The IC closure of a standard or IC ontology O is the smallest set that contains all the axioms from each ontology O' in the IC import closure of O.
We refer to the definitions of datatype map, vocabulary, and OWL interpretation and model in OWL 2 [OWL 2 Direct Semantics].
Let D = (N_{DT}, N_{LS}, N_{FS}, ·^{DT}, ·^{LS}, ·^{FS}) be a datatype map and let V = (V_{C}, V_{OP}, V_{DP}, V_{I}, V_{DT}, V_{LT}, V_{FA}) be a vocabulary over D. An IC-interpretation Γ = (Δ_{I}, Δ_{D}, I, U, ·^{C}, ·^{OP}, ·^{DP}, ·^{I}, ·^{DT}, ·^{LT}, ·^{FA}) for D and V is an 11-tuple with the following structure:
The extensions of · ^{C}, · ^{OP}, and · ^{DT} to class expressions, object property expressions, and data ranges respectively, are defined analogously to OWL 2 [OWL 2 Direct Semantics]. For example, we extend · ^{C} to the class expression ObjectIntersectionOf(CE_{1} ... CE_{n}) as (CE_{1})^{C} ∩ ... ∩ (CE_{n})^{C}. However, we extend · ^{C} to ObjectComplementOf(CE) as {x^{I} | x ∈ V_{I}} \ (CE)^{C}—that is, the complement of a class expression is defined with respect to the set of named individuals as opposed to the object domain. The complete extensions for · ^{C}, · ^{OP}, and · ^{DT} can be found in the Appendix.
Satisfaction of an IC-interpretation Γ with respect to a given axiom is defined analogously to satisfaction of standard intepretations defined in OWL 2 [OWL 2 Direct Semantics]. For example, Γ satisfies the axiom SubClassOf(CE_{1} CE_{2}) if (CE_{1})^{C} ⊆ (CE_{2})^{C}. The complete definitions for axiom satisfaction can be found in the Appendix.
Let D = (N_{DT}, N_{LS}, N_{FS}, ·^{DT}, ·^{LS}, ·^{FS}) be a datatype map and let V = (V_{C}, V_{OP}, V_{DP}, V_{I}, V_{DT}, V_{LT}, V_{FA}) be a vocabulary over D.
We are mainly interested in the following inference problem:
Ontology Validation. Let O be an OWL ontology. We say that O is Valid iff for all axioms α in the IC closure of O, it holds that O IC-satisfies α.
As discussed in Section 2, we use standard OWL syntax for ICs; store ICs in a separate document; and define a new annotation property analogous to owl:imports, that will associate a standard OWL ontology with a set of ICs defined for that ontology.
The motivation for this design choice is to minimize the effects of ICs on existing tools. From the perspective of creation and maintenance, users can continue using existing ontology authoring toolsets. For example, one can use an OWL editor to create ICs and store them in a document. The ontology for which the constraints are written can be augmented with the IC import annotation easily, since this is a standard OWL annotation. With an OWL editor that allows users to open and edit multiple ontologies at the same time, the regular ontology and the IC ontology can be edited together. Several OWL editors provide the feature to move axioms between ontologies; hence, one can easily change the interpretation of an axiom just by moving it from the regular ontology to the IC ontology.
The only issue in using an existing OWL editor is as follows: when a user opens a regular ontology that links to an IC ontology, the editor will not open the IC ontology automatically. The user needs to look at the ontology annotation and open the IC ontology manually. However, this is not a serious issue since it is a relatively simple extension for editors to recognize this annotation. It is safe to assume that such extensions will be available, especially if ICs start to be widely used.
Our approach has no impact on OWL existing reasoners that do not support ICs: since the annotation property has no semantic effect, they would not process that annotation. Therefore, there is no additional work that needs to be done to hide the ICs in order to avoid unintended inferences that would occur if they are inadvertently interpreted as regular OWL axioms.
Note that our approach does not require an IC ontology to be identified as such. However, in case an ontology is to be exclusively interpreted as a set of ICs, one might use an ontology annotation to make this fact explicit. As usual, such annotations are for informational purposes only and have no effect on the semantics.
The IC semantics described in this document is strongly related to the semantics presented in a paper [TSBM10] giving a formal integrity constraint semantics for the description logic SROIQ. Based on the correspondence between SROIQ and OWL 2 semantics [OWL 2 Direct Semantics], the semantics we present here has been adapted to OWL 2 and extended to support datatypes.
As discussed in the paper, there is a close relationship between IC semantics and queries that have negation as failure (NAF) operator. This is interesting from a practical point of view because a validator for OWL IC can be implemented in a straightforward way: each axiom in an IC ontology defined with respect to an OWL ontology O can be effectively transformed into a SPARQL query that can be later answered over O using the SPARQL entailment regime that corresponds to O.
As a simple example of the translation, consider the IC presented in Example 6 above:
Class: Supervisor SubClassOf: supervises some Employee
The translation of this IC to SPARQL would yield the following SPARQL query:
ASK WHERE { ?x rdf:type :Supervisor . OPTIONAL { ?x :supervises ?y . ?y rdf:type :Employee . } FILTER ( !bound( ?y ) ) }
If the execution of the query over an ontology O returns
true
, we can conclude that the IC has been violated
by O; and, therefore, that O is not IC-valid with
respect to this constraint. Note that the query uses the
OPTIONAL/FILTER/!BOUND pattern to encode NAF. However, it is likely
that SPARQL 1.1 [SPARQL 1.1] will make
NAF more clearly visible syntatically, perhaps via NOT
EXISTS
as in current drafts.
It has been shown that SPARQL
[SPARQL] has the same expressive
power as nonrecursive datalog programs with NAF
[AG08]. Therefore, it is possible to
translate OWL ICs to a set of rules that will be evaluated over an
ontology O. Such rules can be written using RIF Framework for
Logic Dialects [RIF-FLD] with
the Naf
operator:
Forall ?x ?y ( invalid() :- And ( ?x[rdf:type -> :Supervisor] Naf And ( ?x[:supervises -> ?y] ?y[rdf:type -> :Employee] )))
This rule uses the Naf
operator for encoding NAF and
defines an arbitrary RIF predicate invalid
to detect the
condition that an ontology O is invalid with respect to ICs.
Implementations would be free to choose a different name for the
predicate.
Details of the translation are out of the scope here; interested readers are referred to the formal semantics paper mentioned previously [TSBM10]. Translation-based IC validation is one of many possibilities to implement IC validation and has been mentioned here as an example. An IC validator conforming to the IC semantics described here can also be implemented with different approaches.
We wish to thank the following people for their assistance: Pavel Klinov, Michael Smith, Michael Grove, Jiao Tao, and Peter Patel-Schneider. We thank members of the OWLED community, including the anonymous reviewers, who gave us very early feedback on using OWL as integrity constraints, including, most helpfully, use cases and requirements. We also acknowledge the support of NIST SBIR funding under the auspices of which this document was prepared.
The class interpretation function ·^{C} is extended to class expressions as shown in Table 1. For S a set, #S denotes the number of elements in S.
Class Expression | Interpretation ·^{C} |
---|---|
ObjectIntersectionOf(CE_{1} ... CE_{n} ) | (CE_{1})^{C} ∩ ... ∩ (CE_{n})^{C} |
ObjectUnionOf(CE_{1} ... CE_{n}) | (CE_{1})^{C} ∪ ... ∪ (CE_{n})^{C} |
ObjectComplementOf(CE) | { x^{I}| x ∈ V_{I} } \ (CE)^{C} |
ObjectOneOf(a_{1} ... a_{n}) | { (a_{1})^{I}, ..., (a_{n})^{I} } |
ObjectSomeValuesFrom(OPE CE) | { x | ∃ y: (x, y) ∈ (OPE)^{OP} and y ∈ (CE)^{C} } |
ObjectAllValuesFrom(OPE CE) | { x | ∀ y: (x, y) ∈ (OPE)^{OP} implies y ∈ (CE)^{C} } |
ObjectHasValue(OPE a) | { x | (x, (a)^{I}) ∈ (OPE)^{OP} } |
ObjectHasSelf(OPE) | { x | (x, x) ∈ (OPE)^{OP} } |
ObjectMinCardinality(n OPE) | { x | #{ y | (x, y) ∈ (OPE)^{OP} } ≥ n } |
ObjectMaxCardinality(n OPE) | { x | #{ y | (x, y) ∈ (OPE)^{OP} } ≤ n } |
ObjectExactCardinality(n OPE) | { x | #{ y | (x, y) ∈ (OPE)^{OP} } = n } |
ObjectMinCardinality(n OPE CE) | { x | #{ y | (x, y) ∈ (OPE)^{OP} and y ∈ (CE)^{C} } ≥ n } |
ObjectMaxCardinality(n OPE CE) | { x | #{ y | (x, y) ∈ (OPE)^{OP} and y ∈ (CE)^{C} } ≤ n } |
ObjectExactCardinality(n OPE CE) | { x | #{ y | (x, y) ∈ (OPE)^{OP} and y ∈ (CE)^{C} } = n } |
DataSomeValuesFrom(DPE_{1} ... DPE_{n} DR) | { x | ∃ y_{1}, ..., y_{n}: (x, y_{k}) ∈ (DPE_{k})^{DP} for each 1 ≤ k ≤ n and (y_{1}, ..., y_{n}) ∈ (DR)^{DT} } |
DataAllValuesFrom(DPE_{1} ... DPE_{n} DR) | { x | ∀ y_{1}, ..., y_{n}: (x, y_{k}) ∈ (DPE_{k})^{DP} for each 1 ≤ k ≤ n imply (y_{1}, ..., y_{n}) ∈ (DR)^{DT} } |
DataHasValue(DPE lt) | { x | (x, (lt)^{LT}) ∈ (DPE)^{DP} } |
DataMinCardinality(n DPE) | { x | #{ y | (x, y) ∈ (DPE)^{DP}} ≥ n } |
DataMaxCardinality(n DPE) | { x | #{ y | (x, y) ∈ (DPE)^{DP} } ≤ n } |
DataExactCardinality(n DPE) | { x | #{ y | (x, y) ∈ (DPE)^{DP} } = n } |
DataMinCardinality(n DPE DR) | { x | #{ y | (x, y) ∈ (DPE)^{DP} and y ∈ (DR)^{DT} } ≥ n } |
DataMaxCardinality(n DPE DR) | { x | #{ y | (x, y) ∈ (DPE)^{DP} and y ∈ (DR)^{DT} } ≤ n } |
DataExactCardinality(n DPE DR) | { x | #{ y | (x, y) ∈ (DPE)^{DP} and y ∈ (DR)^{DT} } = n } |
The object property interpretation function ·^{OP} is extended to object property expressions as shown in Table 2.
Object Property Expression | Interpretation ·^{OP} |
---|---|
ObjectInverseOf(OP) | { (x, y) | (y, x) ∈ (OP)^{OP} } |
The datatype interpretation function ·^{DT} is extended to data ranges as shown in Table 3. All datatypes in OWL 2 are unary, so each datatype DT is interpreted as a unary relation over Δ_{D} — that is, as a set (DT)^{DT} ⊆ Δ_{D}. OWL 2 currently does not define data ranges of arity more than one; however, by allowing for n-ary data ranges, the syntax of OWL 2 provides a "hook" allowing implementations to introduce extensions such as comparisons and arithmetic. An n-ary data range DR is interpreted as an n-ary relation (DR)^{DT} over Δ_{D} — that is, as a set (DT)^{DT} ⊆ (Δ_{D})^{n}.
Data Range | Interpretation ·^{DT} |
---|---|
DataIntersectionOf(DR_{1} ... DR_{n}) | (DR_{1})^{DT} ∩ ... ∩ (DR_{n})^{DT} |
DataUnionOf(DR_{1} ... DR_{n}) | (DR_{1})^{DT} ∪ ... ∪ (DR_{n})^{DT} |
DataComplementOf(DR) | (Δ_{D})^{n} \ (DR)^{DT} where n is the arity of DR |
DataOneOf(lt_{1} ... lt_{n}) | { (lt_{1})^{LT}, ..., (lt_{n})^{LT} } |
DatatypeRestriction(DT F_{1} lt_{1} ... F_{n} lt_{n}) | (DT)^{DT} ∩ (F_{1}, lt_{1})^{FA} ∩ ... ∩ ( F_{n}, lt_{n})^{FA} |
Satisfaction of OWL 2 class expression axioms in Γ with respect to an ontology O is defined as shown in Table 4.
Axiom | Condition |
---|---|
SubClassOf(CE_{1} CE_{2}) | (CE_{1})^{C} ⊆ (CE_{2})^{C} |
EquivalentClasses(CE_{1} ... CE_{n}) | (CE_{j})^{C} = (CE_{k})^{C} for each 1 ≤ j ≤ n and each 1 ≤ k ≤ n |
DisjointClasses(CE_{1} ... CE_{n}) | (CE_{j})^{C} ∩ (CE_{k})^{C} = ∅ for each 1 ≤ j ≤ n and each 1 ≤ k ≤ n such that j ≠ k |
DisjointUnion(C CE_{1} ... CE_{n}) | (C)^{C} = (CE_{1})^{C} ∪
... ∪ (CE_{n})^{C} and (CE_{j})^{C} ∩ (CE_{k})^{C} = ∅ for each 1 ≤ j ≤ n and each 1 ≤ k ≤ n such that j ≠ k |
Satisfaction of OWL 2 object property expression axioms in Γ with respect to an ontology O is defined as shown in Table 5.
Axiom | Condition |
---|---|
SubObjectPropertyOf(OPE_{1} OPE_{2}) | (OPE_{1})^{OP} ⊆ (OPE_{2})^{OP} |
SubObjectPropertyOf(ObjectPropertyChain( OPE_{1} ... OPE_{n}) OPE) | ∀ y_{0}, ..., y_{n}: ( y_{0}, y_{1}) ∈ (OPE_{1})^{OP} and ... and ( y_{n-1}, y_{n}) ∈ (OPE_{n})^{OP} imply (y_{0}, y_{n}) ∈ (OPE)^{OP} |
EquivalentObjectProperties(OPE_{1} ... OPE_{n}) | (OPE_{j})^{OP} = (OPE_{k})^{OP} for each 1 ≤ j ≤ n and each 1 ≤ k ≤ n |
DisjointObjectProperties(OPE_{1} ... OPE_{n}) | (OPE_{j})^{OP} ∩ (OPE_{k})^{OP} = ∅ for each 1 ≤ j ≤ n and each 1 ≤ k ≤ n such that j ≠ k |
ObjectPropertyDomain(OPE CE) | ∀ x, y: (x, y) ∈ (OPE)^{OP} implies x ∈ (CE)^{C} |
ObjectPropertyRange(OPE CE) | ∀ x, y: (x, y) ∈ (OPE)^{OP} implies y ∈ (CE)^{C} |
InverseObjectProperties(OPE_{1} OPE_{2} ) | (OPE_{1})^{OP} = { (x, y) | ( y, x) ∈ (OPE_{2})^{OP} } |
FunctionalObjectProperty(OPE) | ∀ x, y_{1}, y_{2}: ( x, y_{1}) ∈ (OPE)^{OP} and (x, y_{2}) ∈ (OPE)^{OP} imply y_{1} = y_{2} |
InverseFunctionalObjectProperty(OPE) | ∀ x_{1}, x_{2}, y: ( x_{1}, y) ∈ (OPE)^{OP} and (x_{2}, y) ∈ (OPE)^{OP} imply x_{1} = x_{2} |
ReflexiveObjectProperty(OPE) | ∀ x: x ∈ Δ_{I} implies (x, x) ∈ (OPE)^{OP} |
IrreflexiveObjectProperty(OPE) | ∀ x: x ∈ Δ_{I} implies (x, x) ∉ (OPE)^{OP} |
SymmetricObjectProperty(OPE) | ∀ x, y: (x, y) ∈ (OPE)^{OP} implies (y, x) ∈ (OPE)^{OP} |
AsymmetricObjectProperty(OPE) | ∀ x, y: (x, y) ∈ (OPE)^{OP} implies (y, x) ∉ (OPE)^{OP} |
TransitiveObjectProperty(OPE) | ∀ x, y, z: (x, y) ∈ (OPE)^{OP} and (y, z) ∈ (OPE)^{OP} imply (x, z) ∈ (OPE)^{OP} |
Satisfaction of OWL 2 data property expression axioms in I with respect to an ontology O is defined as shown in Table 6.
Axiom | Condition |
---|---|
SubDataPropertyOf(DPE_{1} DPE_{2}) | (DPE_{1})^{DP} ⊆ (DPE_{2})^{DP} |
EquivalentDataProperties(DPE_{1} ... DPE_{n}) | (DPE_{j})^{DP} = (DPE_{k})^{DP} for each 1 ≤ j ≤ n and each 1 ≤ k ≤ n |
DisjointDataProperties(DPE_{1} ... DPE_{n}) | (DPE_{j})^{DP} ∩ (DPE_{k})^{DP} = ∅ for each 1 ≤ j ≤ n and each 1 ≤ k ≤ n such that j ≠ k |
DataPropertyDomain(DPE CE) | ∀ x, y: (x, y) ∈ (DPE)^{DP} implies x ∈ (CE)^{C} |
DataPropertyRange(DPE DR) | ∀ x, y: (x, y) ∈ (DPE)^{DP} implies y ∈ (DR)^{DT} |
FunctionalDataProperty(DPE) | ∀ x, y_{1}, y_{2}: ( x, y_{1}) ∈ (DPE)^{DP} and (x, y_{2}) ∈ (DPE)^{DP} imply y_{1} = y_{2} |
Satisfaction of datatype definitions in Γ with respect to an ontology O is defined as shown in Table 7.
Axiom | Condition |
---|---|
DatatypeDefinition(DT DR) | (DT)^{DT} = (DR)^{DT} |
Satisfaction of keys in Γ with respect to an ontology O is defined as shown in Table 8.
Axiom | Condition |
---|---|
HasKey(CE (OPE_{1} ... OPE_{m}) ( DPE_{1} ... DPE_{n})) | ∀ x, y,
z_{1}, ..., z_{m},
w_{1}, ..., w_{n}: if x ∈ (CE)^{C} and ISNAMED_{O}(x) and y ∈ (CE)^{C} and ISNAMED_{O}(y) and (x, z_{i}) ∈ (OPE_{i})^{OP} and (y, z_{i}) ∈ (OPE_{i})^{OP} and ISNAMED_{O}(z_{i}) for each 1 ≤ i ≤ m and (x, w_{j}) ∈ (DPE_{j})^{DP} and (y, w_{j}) ∈ (DPE_{j})^{DP} for each 1 ≤ j ≤ n then x = y |
Satisfaction of OWL 2 assertions in Γ with respect to an ontology O is defined as shown in Table 9.
Axiom | Condition |
---|---|
SameIndividual(a_{1} ... a_{n}) | (a_{j})^{I} = (a_{k})^{I} for each 1 ≤ j ≤ n and each 1 ≤ k ≤ n |
DifferentIndividuals(a_{1} ... a_{n}) | (a_{j})^{I} ≠ (a_{k})^{I} for each 1 ≤ j ≤ n and each 1 ≤ k ≤ n such that j ≠ k |
ClassAssertion(CE a) | (a)^{I} ∈ (CE)^{C} |
ObjectPropertyAssertion(OPE a_{1} a_{2} ) | ((a_{1})^{I}, (a_{2})^{I}) ∈ (OPE)^{OP} |
NegativeObjectPropertyAssertion(OPE a_{1} a_{2}) | ((a_{1})^{I}, (a_{2})^{I}) ∉ (OPE)^{OP} |
DataPropertyAssertion(DPE a lt) | ((a)^{I}, (lt)^{LT}) ∈ (DPE)^{DP} |
NegativeDataPropertyAssertion(DPE a lt) | ((a)^{I}, (lt)^{LT}) ∉ (DPE)^{DP} |
Copyright © 2010–2012 Clark & Parsia, LLC. Some rights reserved.