Backup and Restore
This page discusses backing up and restoring individual Stardog databases. For more information on backing up (and restoring) the entire Stardog server, please see the Server Backup section.
Page Contents
Overview
Stardog provides two different kinds of backup operations: database backups and server backups. In this section, we’ll just be discussing database backups. These commands perform physical backups, including database metadata, rather than logical backups via some RDF serialization. They are native Stardog backups and can only be restored with Stardog tools as explained below.
Backups may be accomplished while a database is online. The backup is performed in a read transaction; reads and writes may continue, but writes performed during the backup are not reflected in the backup.
Backing up the Database
A database backup saves the contents of a single database along with database metadata, including user and role permissions associated with the database. Database backups can be written to the file system, AWS S3, Google Cloud Platform, or Azure Blob Storage.
The stardog-admin db backup
command assumes a default location for its output, namely, $STARDOG_HOME/.backup
. That default may be overridden by setting the backup.dir
server property in your stardog.properties
file. Backups are stored in directories by database name and then in date-versioned subdirectories for each backup volume.
Your typical backup directory would have a layout similar to this:
.backup/myDb/2020-10-02
.backup/myDb/2020-10-11
.backup/myOtherDb/2020-06-21
If you need to specify a location outside of $STARDOG_HOME
(e.g. a network mount) you can set the backup.location
server property in your stardog.properties
or pass it to the --to
argument in the stardog-admin db backup
command.
EXAMPLE
To backup a Stardog database called foobar
:
$ stardog-admin db backup foobar
EXAMPLE
To perform a remote backup, for example, pass in a specific directory that may be mounted in the current OS namespace via some network protocol, thus:
$ stardog-admin db backup --to /my/network/share/stardog-backups foobar
In the progress monitor, backups to remote systems (e.g., S3, GCP, and Azure) may show progress at 100% for an extended period of time due to file transfers.
Backup to S3
For S3 backups use a URL in the following format:
s3://[<endpoint hostname>:<endpoint port>]/<bucket name>/<path prefix>?region=<AWS Region>&AWS_ACCESS_KEY_ID=<access key>&AWS_SECRET_ACCESS_KEY=<verySecretKey1>
The endpoint hostname
and endpoint port
values are only used for on-premises S3 clones. To use Amazon S3 those values can be left blank and the URL will have three /
before the bucket as in:
s3:///mybucket/backup/prefix?region=us-east-1&AWS_ACCESS_KEY_ID=accessKey&AWS_SECRET_ACCESS_KEY=secret
A default S3 location can also be specified in the stardog.properties
file with the key backup.location
.
IAM roles
In Stardog 9.1, database backup and restore to s3 support IAM roles attached to the instance as a means to provide credentials. If an access key and secret key are not specified in the s3 URL, Stardog will attempt to use an IAM role attached to the instance. You can read more about how to use IAM roles in the AWS documentation.
Requests using IAM roles attached to the instance are essentially the same except you do not provide the AWS access key or secret:
s3:///mybucket/backup/prefix?region=us-east-1
If an IAM role is attached to the instance and AWS keys are provided in the s3 URL, Stardog will only attempt the request with the access and secret keys provided in the s3 URL. If the credentials are incorrect and the request fails, Stardog will not attempt to use the IAM role attached to the instance. You will either need to specify the correct credentials as part of the s3 URL or remove the credentials from the request so the IAM role can be used.
Backup to Google Cloud Platform (GCP)
For GCP backups use a URL in the following format:
gs://<bucket name>/<path prefix>?GOOGLE_APPLICATION_CREDENTIALS=<path to Google Credentials JSON file>
See GCP documentation for creating Google credentials JSON file.
A default GCP backup location can also be specified in the stardog.properties
file with the key backup.location
.
Backup to Azure Blob Storage
For Azure backups use a URL in the following format:
https://<storage account>.blob.core.windows.net/<container>/<prefix>?<token>
The database will be stored in your Azure storage account under the specified container and directory identified by prefix.
If another scheme or host is required for your Azure account, they can be configured in stardog.properties
with backup.azure.scheme
, which defaults to https
, and backup.azure.host
, which defaults to blob.core.windows.net
.
See Azure Blob Storage documentation for configuring and securing a storage container in your storage account.
Similar to S3 and GCP backups, a default Azure Blob Storage backup location can also be specified in the stardog.properties
file with the key backup.location
.
Restoring a Database
To restore a Stardog database from a Stardog backup volume, simply pass a fully-qualified path to the volume in question. The location of the backup should be the full path to the backup, not the location of the backup directory as specified in your Stardog configuration. There is no need to specify the name of the database to restore.
To restore a database from its backup:
$ stardog-admin db restore $STARDOG_HOME/.backups/myDb/2012-06-21
Restore from S3
Backups can also be restored directly from S3 by using an S3 URL in the following format:
s3://[<endpoint hostname>:<endpoint port>]/<bucket name>/<path prefix>/<database name>?region=<AWS Region>&AWS_ACCESS_KEY_ID=<access key>&AWS_SECRET_ACCESS_KEY=<verySecretKey1>
Unlike the backup URL the database name must be specified as the last entry of the path
field in the URL.
Restore from Google Cloud Platform (GCP)
Backups can also be restored directly from GCP by using a GCP URL in the following format:
gs://<bucket name>/<path prefix>?GOOGLE_APPLICATION_CREDENTIALS=<path to Google Credentials JSON file>
Restore from Azure Blob Storage
Backups can also be restored directly from Azure Blob Storage by using a URL in the same format as backups:
https://<storage account>.blob.core.windows.net/<container>/<prefix>/<database name>?<token>
Unlike the backup URL the database name must be specified as the last entry of the path
field in the URL.
Automatic Restore
Stardog can be configured to automatically restore databases from a backup location on startup. For example, when a Stardog cluster node first starts it could pull all of the database data down from an S3 backup before joining the cluster.
Automatic restore is not supported for GCP or Azure Blob Storage backups.
There are two server properties that control this behavior.
Properties | Description |
---|---|
backup.autorestore.dbnames | A regular expression that matches the names of the databases to automatically restore on startup, eg: .* for every database. |
backup.autorestore.onfailure | A boolean value that determines if all databases which failed to load should be automatically restored from a backup location. |
As with any server property, they should be set in your stardog.properties
file
Restoring Permissions
Backups created by version 7.7.1 or newer include permissions related to the database in the backup and grant these permissions when the database is restored. The permission included in the backup cover permissions for the database, database metadata, database admin, named graphs, data quality constraints and sensitive properties. See the security model
section for details on these security resources.
There are some caveats with restoring permissions and some permissions might not be restored. For example, at the time the backup was created a certain user might have had permissions over the database. But if that user was deleted and does not exist at the time the database is being restored, the corresponding permissions will not be restored.
Furthermore, for the permissions in the backup to be restored, the user performing the restore operation should have privileges to grant the permissions specified in the backup. If the user performing the restore operation does not have such privileges, then permissions will not be restored. As a best practice, it is recommended that a superuser or a user with grant:*:*
privileges perform the restore operation.
Database will be restored even if errors are encountered while restoring the permissions. Such errors will be included in the restore operation output.
Backups created by version 7.7.0 or earlier do not contain any permission information. When such backups are restored the database owner is granted the default permission but no additional permissions will be granted. Any required permissions need to be manually granted after the restore is complete.
Logical Backup
In addition to physical backups, one can perform a logical backup using the stardog data export
command that will save the contents of a database into a standard RDF file.
EXAMPLE
Export the database myDb
as NTRIPLES:
$ stardog data export --format NTRIPLES myDb
EXAMPLE
Export the database myDb
to a gzipped file in TURTLE:
$ stardog data export myDb export.ttl.gz
Logical backups do not contain database metadata or configuration options.