Property-based Data Protection

In addition to named graph security, Stardog has the ability to restrict access to sensitive information. This page discusses this feature in detail.

Page Contents

Overview
Semantics and Implementation
Configuration
Current Limitations

Overview

In addition to named graph security, Stardog possesses another way of restricting access to sensitive information: by indicating that only particular users can read values of particular properties (in what follows, we call those properties “sensitive”). The canonical example would be restricting access to a property like :ssn (linking a person to their Social Security Number) so that only specific users, like the HR department, can access it. All that is required is to add :ssn to the list of sensitive properties and grant the permission to access sensitive properties to the right users. After that, the query engine will ensure that SSN values are masked for all other users when they run queries or make API calls which try to access them.

Semantics and Implementation

So what does “restricting access to sensitive information” mean specifically? There is a single list of sensitive properties, P, and all users are split into two categories: those who have the READ permission on the sensitive-properties resource (for the right database) and those who do not. For the former, all queries run as usual. For the latter, queries return results as if the database was pre-processed with the following SPARQL Update query:

INSERT { ?subject ?property ?masked }
DELETE { ?subject ?property ?object }
WHERE {
  ?subject ?property ?object .
  FILTER (?property in { P })  # that is, if the predicate of the triple is sensitive
  BIND(mask(?object) AS ?masked)
}

i.e., the graph data looks as if every triple with a sensitive property was replaced by another triple where the object node is masked (obfuscated) by applying a masking function (see the section below).

One important aspect of this definition is that it prevents graph traversals over nodes having sensitive incoming edges but regular outgoing edges. Consider the following graph:

:john :account :Acc1 .
:Acc1 :opened "2020-05-06"^^xsd:date .

and assume that :account is a sensitive property. The update query above would transform it into:

:john :account "...e6ac047..." .
:Acc1 :opened "2020-05-06"^^xsd:date .

i.e., it would break the connection between :john and the attributes of his account by masking the account node. Thus, the following query (if executed by a user not authorized to see :account values) will not return the expected results:

SELECT ?name ?openDate {
  ?name :account/:opened ?openDate
}

That may seem counterintuitive, but it’s the only way to prevent attempts to “guess” values of sensitive nodes by a malicious user via queries like:

SELECT ?name ?ssn ?guessed {
  ?name :ssn ?ssn
  VALUES (?ssn ?guessed) { ( "123-12-1111" "123-12-1111") ( "123-12-1112" "123-12-1112") ... }
}

If the system allowed the join over ?ssn, it would be too late to mask ?ssn values; they would be revealed by the ?guessed variable if the attacker has guessed right. To keep things consistent, Stardog also does not allow traversals over sensitive nodes in property path and path queries.

The actual implementation does not make physical changes to the data to mask values of the sensitive properties. Instead, it rewrites queries on-the-fly to apply the configured masking function (but only when the current user lacks the permission). That is, a simple query like:

select ?s ?ssn { ?s :ssn ?ssn }

would be processed according to this query plan:

Projection(?s, ?ssn) [#1]
`─ Bind(SHA256(Str(id(?ssn))) AS ?ssn)
   `─ Scan[POS](?s, <urn:ssn>, ?ssn)

The Bind operator masks values of the ?ssn variable by applying the default masking function (SHA256).

This query rewriting mechanism supports queries executed with reasoning. In that case, masking is done after reasoning to ensure that inferred values of sensitive properties are protected too.

Configuration

This feature is disabled by default (i.e., there are no sensitive properties), and no values will be masked. We describe how sensitive properties can be configured for simple and advanced use cases. In both cases, the configuration involves two steps:

Define which properties are sensitive.
Grant permissions to users/roles to access sensitive properties.

For both cases we describe below, the user needs to have write permissions for database metadata to be able to define sensitive properties. Granting permissions for sensitive properties requires the user to have the privilege to grant permissions, as usual.

Simple Configuration

In the simple configuration, you can define a single group of properties to be sensitive. Any user that is granted permission to read sensitive properties will have access to every sensitive property.

Defining Sensitive Properties

Defining a sensitive property can be done via the sensitive-property add command:

$ stardog-admin sensitive-property add myDb :ssn
1 sensitive properties have been added to the default group

Multiple properties can be added to the sensitive property group at once:

$ stardog-admin sensitive-property add myDb foaf:mbox foaf:phone
2 sensitive properties have been added to the default group

This example assumes the foaf prefix has been added as a namespace for myDb, and it would be equivalent to the following command that uses full IRIs:

$ stardog-admin sensitive-property add myDb http://xmlns.com/foaf/0.1/mbox http://xmlns.com/foaf/0.1/phone
2 sensitive properties have been added to the default group

Properties can be removed from the sensitive property group with the sensitive-property remove command:

$ stardog-admin sensitive-property remove myDb :ssn
1 sensitive properties have been removed from the default group

All the sensitive properties can be listed with the sensitive-property list command:

$ stardog-admin sensitive-property list myDb
+-------+-----------------------------+
| Group |         Properties          |
+-------+-----------------------------+
|       | :ssn, foaf:mbox, foaf:phone |
+-------+-----------------------------+

As we explain below, for more advanced use cases, it is possible to define multiple groups of sensitive properties and assign a name to each group. In the simple use case, there is only one unnamed default group, which is why the group column in the above table is empty.

The CLI commands described above are provided for convenience. The default group of sensitive properties is stored as the database configuration option security.properties.sensitive. One can update this option directly as well.

Granting Sensitive Property Permissions

Once the properties are set, the next step is to grant the READ permission on the sensitive-properties resource to the users who are supposed to see the data, for example, using the following CLI command:

$ stardog-admin user grant -a read -o sensitive-properties:myDB myUser

Now myUser can see values of :ssn while other regular users would only see masked strings (SHA256 hashes by default).

Advanced Configuration

In use cases where different users need different permissions to access different sets of sensitive properties, it is possible to define named groups of sensitive properties.

Defining Sensitive Properties

Defining named groups of sensitive properties is done similarly to the Simple Configuration case, with an additional --group parameter used to specify the group name. The following command adds two properties to the group named PII:

$ stardog-admin sensitive-property add --group PII myDb :ssn :email
2 sensitive properties have been added to the group PII

Name of the sensitive property group is case-sensitive.

If we define another named group:

$ stardog-admin sensitive-property add --group Finance myDb :hasSalary :bankAccountBalance
2 sensitive properties have been added to the group Finance

The list of sensitive properties will show both groups:

$ stardog-admin sensitive-property list myDb
+---------+---------------------------------+
|  Group  |           Properties            |
+---------+---------------------------------+
| Finance | :bankAccountBalance, :hasSalary |
| PII     | :email, :ssn                    |
+---------+---------------------------------+

Removing properties from a group can be done as follows:

$ stardog-admin sensitive-property remove --group Finance myDb :hasSalary
1 sensitive properties have been removed from the group Finance

Granting Sensitive Property Permissions

In order to access the sensitive properties defined within a named group, the user needs explicit READ permission for that group. The permission can be granted with the following command, where the name of the graph is appended to the database name, separated by the \ character:

$ stardog-admin user grant -a read -o "sensitive-properties:myDB\PII" myUser

We are using double quotes around the sensitive-properties:myDB\PII because in most terminals the \ character is used for escaping special characters, and the granted permission would be for the incorrect resource.

Usage Examples

Here is a quick summary of how named sensitive property groups work in practice:

Each database can define multiple groups. Sensitive groups defined in different databases are completely independent.
Each sensitive property group may have one or more properties in them. There is no notion of an empty group.
A property appearing in any of the groups will be considered sensitive and require the additional permission for read access.
A property may appear in multiple groups. The user that has permission to at least one of these groups will have access to that property.

Let’s use the following simple example to see how this works in practice:

$ stardog-admin sensitive-property list myDb
+------------+-------------------------------+
|   Group    |          Properties           |
+------------+-------------------------------+
| Contact    | :hasEmail                     |
| Membership | :isMemberOf                   |
| Personal   | :hasBirthdate, :hasEmail      |
| Project    | :currentProject, :pastProject |
+------------+-------------------------------+

There are four different groups of properties defined in this database. Note that the property :hasEmail appears in two different groups. The permissions are granted to users as follows:

$ stardog-admin user grant -a read -o "sensitive-properties:myDB\Contact" Alice
$ stardog-admin user grant -a read -o "sensitive-properties:myDB\Personal" Bob
$ stardog-admin user grant -a read -o "sensitive-properties:myDB\Personal" Charlie
$ stardog-admin user grant -a read -o "sensitive-properties:myDB\Membership" Charlie
$ stardog-admin user grant -a read -o "sensitive-properties:myDB\Contact" Daisy
$ stardog-admin user grant -a read -o "sensitive-properties:myDB\Project" Daisy

The following example shows the properties accessible by each user:

Alice can access the :hasEmail property because she has permissions to the Contact group. It does not matter that she does not have permissions to the Personal group, which also includes the :hasEmail property. Bob can access the :hasEmail property via the Personal group, which also allows access to the :hasBirthdate property. Charlie has access to three sensitive properties via the two groups he has permissions for.

Masking Function

It is possible to apply a different masking function to sensitive values. It can be configured using the security.masking.function database property, e.g., security.masking.function=replace(str(?object),".+","XXXX"). The function can be either a constant or any SPARQL function with zero or one arguments. For example, you can use a constant string to be displayed for any sensitive value, e.g. security.masking.function="You do not have permission to see this value". If you are setting the configuration option using the CLI, you will need additional quotes around the function to make sure the quotes in the expression are not processed by the shell:

$ stardog-admin metadata set -o security.masking.function='replace(str(?object),".+","XXXX")' -- myDb

If there is a syntax error in your masking function, the default masking function will be used, and an error message will be logged in stardog.log.

Current Limitations

There are a few limitations with this feature, and it should not be considered production-ready. These limitations are:

Values of sensitive nodes can be revealed through zero-length paths (e.g., queries like ?s :p? ?o) and full-text search. Technically, these are violations of the definition based on the Update query. However, it should not be possible to see connections of these values to other nodes via sensitive properties.
Values of sensitive nodes can be revealed through edge properties. The sensitive properties feature should not be used when the edge properties feature is enabled.
Sensitive property permissions only limit read access. Users without read permission to sensitive properties can still write triples with those properties, even though they would not be able to read the values they have written.

Overview
Semantics and Implementation
Configuration
Current Limitations