Property-based Data Protection
In addition to named graph security, Stardog has the ability to restrict access to sensitive information. This page discusses this feature in detail.
This feature is in Beta.
Page Contents
Overview
In addition to the named graph security Stardog 7.3.2 introduces another way of restricting access to sensitive information: by indicating that only particular users can read values of particular properties (in what follows, we call those properties “sensitive”). The canonical example of such a property would be :ssn
linking a person to their Social Security Number (or other private information) so that only specific users, like the HR department, can have access to it. All that is required to do is to add :ssn
to the list of IRIs set in the value of the security.properties.sensitive
database option and grant the READ
permission on the sensitive-properties
resource to the right users. After that the query engine will ensure that SSN values are masked for all other users when they run queries or make API calls which try to access those.
Semantics and Implementation
So what does “protecting access to property values” mean specifically? There is a single list of sensitive properties, P
, and all users are split into two categories: those having the READ
permission on the sensitive-properties
resource (for the right database) and those who do not. For the former all queries run as usual. For the latter queries return results as if the database was pre-processed with the following SPARQL Update query:
INSERT { ?subject ?property ?masked }
DELETE { ?subject ?property ?object }
WHERE {
?subject ?property ?object .
FILTER (?property in { P }) # that is, if the predicate of the triple is sensitive
BIND(mask(?object) AS ?masked)
}
i.e. the graph data looks like if every triple with a sensitive property was replaced by another triple where the object node is masked (obfuscated) by applying a masking function (see the section below).
One important aspect of this definition is that it prevents graph traversals over nodes having both sensitive incoming edges but regular outgoing edges. Consider the following little graph:
:john :account :Acc1 .
:Acc1 :opened "2020-05-06"^^xsd:date .
and assume that :account
is a sensitive property. The update query above would transform it into:
:john :account "...e6ac047..." .
:Acc1 :opened "2020-05-06"^^xsd:date .
i.e. it would break the connection between :john
and the attributes of his account by masking the account node. Thus the following query (if executed by a user not authorized to see :account
values) will not return the expected results:
SELECT ?name ?openDate {
?name :account/:opened ?openDate
}
That may seem counterintuitive at first but, in fact, it’s the only way to prevent attempts to “guess” values of sensitive nodes by a malicious user via queries like:
SELECT ?name ?ssn ?guessed {
?name :ssn ?ssn
VALUES (?ssn ?guessed) { ( "123-12-1111" "123-12-1111") ( "123-12-1112" "123-12-1112") ... }
}
if the system allowed the join over ?ssn
, it would be too late to mask ?ssn
values since they would be revealed by the ?guessed
variable if the attacker has guessed right. To keep things consistent, Stardog also does not allow traversals over sensitive nodes in property path and path queries.
Implementation-wise Stardog does not make physical changes to the data to mask values of the sensitive properties. Instead it rewrites queries on-the-fly to apply the configured masking function (but only when the current user lacks the permission). That is, a simple query like
select ?s ?ssn { ?s :ssn ?ssn }
would be processed according to this query plan:
Projection(?s, ?ssn) [#1]
`─ Bind(SHA256(Str(id(?ssn))) AS ?ssn)
`─ Scan[POS](?s, <urn:ssn>, ?ssn)
The Bind
operator masks values of the ?ssn
variable by applying the default masking function (SHA256
).
This query rewriting mechanism supports queries executed with reasoning. In that case, masking is done after reasoning to ensure that inferred values of sensitive properties are protected too.
Configuration
This feature is disabled by default, i.e. the default value of security.properties.sensitive
is empty. If it is non-empty, it should be a comma-separated list of IRIs specifying the protected properties:
$ stardog-admin metadata set -o security.properties.sensitive=urn:ssn -- myDB
Once the properties are set, the next step is to grant the READ
permission on the sensitive-properties
resource to the users which are supposed to see the data, for example, using the following CLI command:
$ stardog-admin user grant -a read -o sensitive-properties:myDB myUser
Now myUser
can see values of urn:ssn
while other regular users would only see masked strings (SHA256
hashes by default).
Finally, it is possible to use a different masking function to apply to sensitive values. It can configured using the security.masking.function
database property, e.g. security.masking.function=replace(str(?object),".+","XXXX")
. The function can be either a constant or any SPARQL function with zero or one argument.
Current Limitations
As said above, this feature is in beta and should not be considered production ready at this time. The following limitations are expected to be addressed before the final release:
- Values of sensitive nodes can be revealed through zero-length paths (e.g. queries like
?s :p? ?o
) and full-text search. Technically these are violations of the definition based on the update query. However, it should not be possible to see connections of these values to other nodes via sensitive properties. - The query plan cache should be disabled when using this feature (by setting
query.plan.reuse=never
for the database). Otherwise it is possible that a query plan cached for a user with the permission to see sensitive values is reused for a user without the permission, thus bypassing the masking. - Only a single list of sensitive properties is supported. It is not possible to restrict access to different properties for different users.