Edge Properties
This page discusses Stardog’s support for edge properties - bridging the gap between the RDF data model and the Property Graph data model.
This feature is in beta.
Page Contents
Overview
Stardog 7.1+ supports extensions to the RDF data model and SPARQL query engine to store and query properties of RDF statements (edges in the RDF graph, thus the name “edge properties”). Edge properties allow the user to attach specific information to RDF statements by using them as subjects in other RDF statements. They are somewhat similar to named graphs in the sense that they allow adding metadata to existing data but on the statement level, not on the graph level.
Edge properties bridge the gap between the RDF data model and the Property Graph data model.
Edge properties require the use of the abort on conflict write strategy, which is problematic in the cluster if your workload contains write conflicts. See known cluster issues for more information.
Configuration
To enable edge properties, create the database with the edge.properties
database configuration option.
edge.properties
is an immutable property and thus can only be set at database creation time.
Example
Common examples of statement metadata include provenance, uncertainty, and time. The two statements below can be annotated to specify where they come from, and in what time period they hold.
:Pete a :Engineer ;
:worksAt :Stardog
as follows:
:Pete a { :since 2010 } :Engineer ;
:worksAt { :source :HR } :Stardog
See Syntaxes below on the details of the Stardog syntax for edge properties.
Motivation
While it is technically possible to maintain statement level metadata in plain RDF, the approaches are unintuitive and tend to complicate queries for accessing the data. For example, one way is to treat relations such as :worksAt
as n-ary predicates, model them as nodes, and link them to metadata using ordinary RDF statements:
:PeteEmployment :employee :Pete ;
:employer :Stardog ;
:source :HR
The similar domain-independent approach is to use the RDF reification vocabulary and turn each edge into a set of rdf:Statement/rdf:subject/rdf:predicate/rdf:object
triples. Both ways increase the number of triples in the database and the number of triple patterns in SPARQL queries. That typically leads to performance penalties for both data updates and queries.
Instead the edge property support in Stardog 7.1+ is based on the recent work on RDF*/SPARQL* extensions and includes changes to both the storage layer and the query engine for performance reasons.
Syntax(es)
Stardog supports two syntactic flavors to represent edge properties in RDF and query them in SPARQL. The first notation was originally suggested as a part of the RDF*/SPARQL* proposal. It is based on Turtle and looks as follows:
<< :Pete a :Engineer >> :since 2010 .
<< :Pete :worksAt :Stardog >> :source :HR .
The corresponding triple pattern syntax in SPARQL also uses the << >>
notation:
SELECT * {
<< ?emp a :Engineer >> :since ?year .
<< ?emp :worksAt :Stardog >> :source ?who .
}
It is also possible to use extended BIND operator in SPARQL to bind variables to RDF edges:
SELECT ?emp ?year {
BIND(<< ?emp a :Engineer >> as ?edge)
?edge :since ?year .
}
Both the Turtle and SPARQL extensions support the standard Turtle shortcuts like implicit bnodes ([]
), predicate lists (;
), and object lists (,
) outside of << >>
patterns but not inside them. It makes it verbose when the same subject has multiple outgoing edges some of which have properties and some do not. That tends to be annoying in SPARQL where multiple triple patterns with the same subject are common.
To address this issue Stardog also supports an alternative syntax which puts edge properties next to predicates, not full triples or triple patterns. That allows for all Turtle-style shortcuts:
:Pete a { :since 2010 ; :until 2018} :Engineer ;
:worksAt { :source :HR } :Stardog
The same works in SPARQL:
SELECT ?emp ?start ?end ?who {
?emp a { :since ?start ; :until ?end } :Engineer ;
:worksAt { :source ?who } :Stardog .
}
Both notations are equally expressive so that data and queries can be translated back and forth.
Scope of the Support
Stardog supports edge properties across all APIs (Java and HTTP) and types of SPARQL queries. Specifically, not only can edge property patterns be used in the WHERE
clauses of CONSTRUCT
and SPARQL Update queries, but also in their graph templates.
CONSTRUCT
query example:
CONSTRUCT { << ?emp :worksAt ?org >> :source :HR } WHERE {
?emp a :Engineer ;
:worksAt ?org
}
SPARQL Update query examples:
INSERT DATA { << :Pete :worksAt :Stardog >> :since :2010 }
INSERT { << ?emp :worksAt ?org >> :source :HR } WHERE {
?emp a :Engineer ;
:worksAt ?org
}
Stardog supports RDF* extensions for Turtle, TriG, the Binary RDF format, and JSON-LD. SPARQL SELECT
query results with edge properties can be sent over in the XML, binary, and JSON format (content types application/sparql-results+xml
, application/x-binary-rdf-results-table
, and application/sparql-results+json
, respectively).
Details
We have made several decisions regarding edge properties support and how it relates to the RDF*/SPARQL* proposal. They are motivated by performance and ease of use considerations.
- Only subjects of RDF triples can be triples (not also predicates or objects as in the original RDF* proposal). Nested edge properties are not allowed.
- Named Graphs: Edges with properties can be stored in a named graph just as other triples. However, edge properties must be stored in the same named (or default) graph as the corresponding edges.
- Asserting Edges: The question of whether edges with properties should be asserted in the graph or not has been the subject of a lively debate. In Stardog, similarly to the Property Graph Model, they are implicitly added to the graph as soon as their properties are added. Note that the Stardog syntax makes it obvious in contrast to the RDF* syntax, which allows both interpretations.
- Cascading Deletes: The consequence of asserting edges is that edge properties are deleted in a cascading fashion when edges themselves are deleted.
- Transactions: Stardog automatically selects the Abort On Conflict strategy for conflict resolution during commits. This is required to guarantee that there are no orphaned edge properties under concurrent transactions.