Link Search Menu Expand Document
Start for Free

Property-based Data Protection

In addition to named graph security, Stardog has the ability to restrict access to sensitive information. This page discusses this feature in detail.

This feature is in Beta.

Page Contents
  1. Overview
  2. Semantics and Implementation
  3. Configuration
  4. Current Limitations

Overview

In addition to the named graph security Stardog 7.3.2 introduces another way of restricting access to sensitive information: by indicating that only particular users can read values of particular properties (in what follows, we call those properties “sensitive”). The canonical example of such a property would be :ssn linking a person to their Social Security Number (or other private information) so that only specific users, like the HR department, can have access to it. All that is required to do is to add :ssn to the list of IRIs set in the value of the security.properties.sensitive database option and grant the READ permission on the sensitive-properties resource to the right users. After that the query engine will ensure that SSN values are masked for all other users when they run queries or make API calls which try to access those.

Semantics and Implementation

So what does “protecting access to property values” mean specifically? There is a single list of sensitive properties, P, and all users are split into two categories: those having the READ permission on the sensitive-properties resource (for the right database) and those who do not. For the former all queries run as usual. For the latter queries return results as if the database was pre-processed with the following SPARQL Update query:

INSERT { ?subject ?property ?masked }
DELETE { ?subject ?property ?object }
WHERE {
  ?subject ?property ?object .
  FILTER (?property in { P })  # that is, if the predicate of the triple is sensitive
  BIND(mask(?object) AS ?masked)
}

i.e. the graph data looks like if every triple with a sensitive property was replaced by another triple where the object node is masked (obfuscated) by applying a masking function (see the section below).

One important aspect of this definition is that it prevents graph traversals over nodes having both sensitive incoming edges but regular outgoing edges. Consider the following little graph:

:john :account :Acc1 .
:Acc1 :opened "2020-05-06"^^xsd:date .

and assume that :account is a sensitive property. The update query above would transform it into:

:john :account "...e6ac047..." .
:Acc1 :opened "2020-05-06"^^xsd:date .

i.e. it would break the connection between :john and the attributes of his account by masking the account node. Thus the following query (if executed by a user not authorized to see :account values) will not return the expected results:

SELECT ?name ?openDate {
  ?name :account/:opened ?openDate
}

That may seem counterintuitive at first but, in fact, it’s the only way to prevent attempts to “guess” values of sensitive nodes by a malicious user via queries like:

SELECT ?name ?ssn ?guessed {
  ?name :ssn ?ssn
  VALUES (?ssn ?guessed) { ( "123-12-1111" "123-12-1111") ( "123-12-1112" "123-12-1112") ... }
}

if the system allowed the join over ?ssn, it would be too late to mask ?ssn values since they would be revealed by the ?guessed variable if the attacker has guessed right. To keep things consistent, Stardog also does not allow traversals over sensitive nodes in property path and path queries.

Implementation-wise Stardog does not make physical changes to the data to mask values of the sensitive properties. Instead it rewrites queries on-the-fly to apply the configured masking function (but only when the current user lacks the permission). That is, a simple query like

select ?s ?ssn { ?s :ssn ?ssn }

would be processed according to this query plan:

Projection(?s, ?ssn) [#1]
`─ Bind(SHA256(Str(id(?ssn))) AS ?ssn)
   `─ Scan[POS](?s, <urn:ssn>, ?ssn)

The Bind operator masks values of the ?ssn variable by applying the default masking function (SHA256).

This query rewriting mechanism supports queries executed with reasoning. In that case, masking is done after reasoning to ensure that inferred values of sensitive properties are protected too.

Configuration

This feature is disabled by default, i.e. the default value of security.properties.sensitive is empty. If it is non-empty, it should be a comma-separated list of IRIs specifying the protected properties:

$ stardog-admin metadata set -o security.properties.sensitive=urn:ssn -- myDB

Once the properties are set, the next step is to grant the READ permission on the sensitive-properties resource to the users which are supposed to see the data, for example, using the following CLI command:

$ stardog-admin user grant -a read -o sensitive-properties:myDB myUser

Now myUser can see values of urn:ssn while other regular users would only see masked strings (SHA256 hashes by default).

Finally, it is possible to use a different masking function to apply to sensitive values. It can configured using the security.masking.function database property, e.g. security.masking.function=replace(str(?object),".+","XXXX"). The function can be either a constant or any SPARQL function with zero or one argument.

Current Limitations

As said above, this feature is in beta and should not be considered production ready at this time. The following limitations are expected to be addressed before the final release:

  • Values of sensitive nodes can be revealed through zero-length paths (e.g. queries like ?s :p? ?o) and full-text search. Technically these are violations of the definition based on the update query. However, it should not be possible to see connections of these values to other nodes via sensitive properties.
  • Only a single list of sensitive properties is supported. It is not possible to restrict access to different properties for different users.