Property-based Data Protection
In addition to named graph security, Stardog has the ability to restrict access to sensitive information. This page discusses this feature in detail.
Page Contents
Overview
In addition to the named graph security Stardog 7.3.2 introduces another way of restricting access to sensitive information: by indicating that only particular users can read values of particular properties (in what follows, we call those properties “sensitive”). The canonical example of such a property would be :ssn
linking a person to their Social Security Number (or other private information) so that only specific users, like the HR department, can have access to it. All that is required to do is to add :ssn
to the list of sensitive properties and grant the permission to access sensitive properties to the right users. After that the query engine will ensure that SSN values are masked for all other users when they run queries or make API calls which try to access those.
Semantics and Implementation
So what does “protecting access to property values” mean specifically? There is a single list of sensitive properties, P
, and all users are split into two categories: those having the READ
permission on the sensitive-properties
resource (for the right database) and those who do not. For the former all queries run as usual. For the latter queries return results as if the database was pre-processed with the following SPARQL Update query:
INSERT { ?subject ?property ?masked }
DELETE { ?subject ?property ?object }
WHERE {
?subject ?property ?object .
FILTER (?property in { P }) # that is, if the predicate of the triple is sensitive
BIND(mask(?object) AS ?masked)
}
i.e. the graph data looks like if every triple with a sensitive property was replaced by another triple where the object node is masked (obfuscated) by applying a masking function (see the section below).
One important aspect of this definition is that it prevents graph traversals over nodes having both sensitive incoming edges but regular outgoing edges. Consider the following little graph:
:john :account :Acc1 .
:Acc1 :opened "2020-05-06"^^xsd:date .
and assume that :account
is a sensitive property. The update query above would transform it into:
:john :account "...e6ac047..." .
:Acc1 :opened "2020-05-06"^^xsd:date .
i.e. it would break the connection between :john
and the attributes of his account by masking the account node. Thus the following query (if executed by a user not authorized to see :account
values) will not return the expected results:
SELECT ?name ?openDate {
?name :account/:opened ?openDate
}
That may seem counterintuitive at first but, in fact, it’s the only way to prevent attempts to “guess” values of sensitive nodes by a malicious user via queries like:
SELECT ?name ?ssn ?guessed {
?name :ssn ?ssn
VALUES (?ssn ?guessed) { ( "123-12-1111" "123-12-1111") ( "123-12-1112" "123-12-1112") ... }
}
if the system allowed the join over ?ssn
, it would be too late to mask ?ssn
values since they would be revealed by the ?guessed
variable if the attacker has guessed right. To keep things consistent, Stardog also does not allow traversals over sensitive nodes in property path and path queries.
Implementation-wise Stardog does not make physical changes to the data to mask values of the sensitive properties. Instead it rewrites queries on-the-fly to apply the configured masking function (but only when the current user lacks the permission). That is, a simple query like
select ?s ?ssn { ?s :ssn ?ssn }
would be processed according to this query plan:
Projection(?s, ?ssn) [#1]
`─ Bind(SHA256(Str(id(?ssn))) AS ?ssn)
`─ Scan[POS](?s, <urn:ssn>, ?ssn)
The Bind
operator masks values of the ?ssn
variable by applying the default masking function (SHA256
).
This query rewriting mechanism supports queries executed with reasoning. In that case, masking is done after reasoning to ensure that inferred values of sensitive properties are protected too.
Configuration
This feature is disabled by default, i.e. there are no sensitive properties, and no value will be masked. We describe how sensitive properties can be configured for simple and advanced use cases. The configuration in both cases involve two steps:
- Define which properties are sensitive
- Grant permissions to users/roles to access sensitive properties
For both cases we describe below, the user needs to have write permissions for database metadata to be able to define sensitive properties. Granting permissions for sensitive properties requires the user to have provilege to grant permissions as usual.
Simple Configuration
In the simple configuration you can define a single group of properties to be sensitive. Any user that is granted permission to read sensitive properties will have access to every sensitive property.
Defining Sensitive Properties
Defining a sensitive property can be done via the following CLI command:
$ stardog-admin sensitive-property add myDb :ssn
1 sensitive properties have been added to the default group
Multiple properties can be added to the sensitive property group at once:
$ stardog-admin sensitive-property add myDb foaf:mbox foaf:phone
2 sensitive properties have been added to the default group
This example assumes the foaf
prefix has been added as a namespace for myDb
and it would be equivalent to the following command that uses full IRIs:
$ stardog-admin sensitive-property add myDb http://xmlns.com/foaf/0.1/mbox http://xmlns.com/foaf/0.1/phone
2 sensitive properties have been added to the default group
Properties can be removed from the sensitive property group with the following command:
$ stardog-admin sensitive-property remove myDb :ssn
1 sensitive properties have been removed from the default group
All the sensitive properties can be listed with the following command:
$ stardog-admin sensitive-property list myDb
+-------+-----------------------------+
| Group | Properties |
+-------+-----------------------------+
| | :ssn, foaf:mbox, foaf:phone |
+-------+-----------------------------+
As we explain below, for more advanced use cases, it is possible to define multiple groups of sensitive properties and assign a name to each group. In the simple use case there is only one unnamed default group which is why the group
column in the above table is empty.
The CLI commands described above are provided for convenience. The default group of senstive properties is stored as the database configuration option security.properties.sensitive
. One can update this option directly as well.
Granting Sensitive Property Permissions
Once the properties are set, the next step is to grant the READ
permission on the sensitive-properties
resource to the users who are supposed to see the data, for example, using the following CLI command:
$ stardog-admin user grant -a read -o sensitive-properties:myDB myUser
Now myUser
can see values of :ssn
while other regular users would only see masked strings (SHA256
hashes by default).
Advanced Configuration
In use cases where different users need different permissions to access different sets of sensitive properties it is possible to define named groups of sensitive properties.
Defining Sensitive Properties
Defining named groups of sensitive properties is done similar to the Simple Configuration case with an additional --group
parameter used to specify the group name. The following command adds two properties to the group named PII
:
$ stardog-admin sensitive-property add --group PII myDb :ssn :email
2 sensitive properties have been added to the group PII
Name of the sensitive property group is case-sensitive.
If we define another named group:
$ stardog-admin sensitive-property add --group Finance myDb :hasSalary :bankAccountBalance
2 sensitive properties have been added to the group Finance
The list of sensitive properties will show both groups:
$ stardog-admin sensitive-property list myDb
+---------+---------------------------------+
| Group | Properties |
+---------+---------------------------------+
| Finance | :bankAccountBalance, :hasSalary |
| PII | :email, :ssn |
+---------+---------------------------------+
Removing properties from a group can be done as follows:
$ stardog-admin sensitive-property remove --group Finance myDb :hasSalary
1 sensitive properties have been removed from the group Finance
Granting Sensitive Property Permissions
In order to access the sensitive properties defined within a named group, the user needs explicit READ
permission for that group. The permission can be granted with the following command where the name of the graph is appended to the database name separated by the \
character:
$ stardog-admin user grant -a read -o "sensitive-properties:myDB\PII" myUser
We are using double quotes around the sensitive-properties:myDB\PII
because in most terminals the \
character is used for escaping special characters and the granted permission would be for the incorrect resource.
Usage Examples
Here is a quick summary of how named sensitive property groups work in practice:
- Each database can define multiple groups. Sensitive groups defined in different databases are completely independent.
- Each sensitive property group may have one or more properties in them. There is no notion of an empty group.
- A property appearing in any of the groups will be considered sensitive and require the additional permission for read access.
- A property may appear in multiple groups. The user that has permission to at least one of these groups will have access to that property.
Let’s use the following simple example to see how this works in practice:
$ stardog-admin sensitive-property list myDb
+------------+-------------------------------+
| Group | Properties |
+------------+-------------------------------+
| Contact | :hasEmail |
| Membership | :isMemberOf |
| Personal | :hasBirthdate, :hasEmail |
| Project | :currentProject, :pastProject |
+------------+-------------------------------+
There are four different groups od properties defined in this database. Note that the property :hasEmail
appears in two different groups. The permissions are granted to users as follows:
$ stardog-admin user grant -a read -o "sensitive-properties:myDB\Contact" Alice
$ stardog-admin user grant -a read -o "sensitive-properties:myDB\Personal" Bob
$ stardog-admin user grant -a read -o "sensitive-properties:myDB\Personal" Charlie
$ stardog-admin user grant -a read -o "sensitive-properties:myDB\Membership" Charlie
$ stardog-admin user grant -a read -o "sensitive-properties:myDB\Contact" Daisy
$ stardog-admin user grant -a read -o "sensitive-properties:myDB\Project" Daisy
The following example shows the properties accessible by each user:
Alice can access :hasEmail
property because she has permissions to the Contact
group. It does not matter that she does not have permissions to the Personal
group which also includes the :hasEmail
property. Bob can access to :hasEmail
property via the Personal
group which also allows access to the :hasBirthdate
property. Charlie has access to three sensitive properties via the two groups he has permissions for.
Masking Function
It is possible to use a different masking function to apply to sensitive values. It can configured using the security.masking.function
database property, e.g. security.masking.function=replace(str(?object),".+","XXXX")
. The function can be either a constant or any SPARQL function with zero or one argument. For example, you can even use a constant string to be displayed for any sensitive value, e.g. security.masking.function="You do not have permission to see this value"
. If you are setting the configuration option using the CLI you will need additional quotes around the function to make sure the quotes in the expression are not processed by the shell:
$ stardog-admin metadata set -o security.masking.function='replace(str(?object),".+","XXXX")' -- myDb
If there is a syntax error in your masking function, the default masking function will be used and an error message will be logged in stardog.log
.
Current Limitations
There are a few limitations with this feature at this time and it should not be considered production ready. These limiations are:
- Values of sensitive nodes can be revealed through zero-length paths (e.g. queries like
?s :p? ?o
) and full-text search. Technically these are violations of the definition based on the update query. However, it should not be possible to see connections of these values to other nodes via sensitive properties. - Values of sensitive nodes can be revealed through edge properties. Sensitive properties feature should not be used when edge properties feature is enabled.
- Sensitive property permissions only limit read access. Users without read permission to sensitive properties can still write triples with those properties even though they would not be able to read the values they have written in the clear and only see the masked values!