Link Search Menu Expand Document
Start for Free

Backup and Restore

This page discusses backing up and restoring individual Stardog databases. For more information on backing up (and restoring) the entire Stardog server, please see the Server Backup section.

Page Contents
  1. Overview
  2. Backing up the Database
    1. Backup Methods
      1. Impact on cluster synchronization
    2. Backup to S3
      1. IAM roles
    3. Backup to Google Cloud Platform (GCP)
    4. Backup to Azure Blob Storage
  3. Restoring a Database
    1. Restoring with Metadata Overrides
    2. Restore from S3
    3. Restore from Google Cloud Platform (GCP)
    4. Restore from Azure Blob Storage
    5. Automatic Restore
    6. Restoring Permissions
  4. Logical Backup

Overview

Stardog provides two different kinds of backup operations: database backups and server backups. In this section, we’ll just be discussing database backups. These commands perform physical backups, including database metadata, rather than logical backups via some RDF serialization. They are native Stardog backups and can only be restored with Stardog tools as explained below.

Backups may be accomplished while a database is online. The backup is performed in a read transaction; reads and writes may continue, but writes performed during the backup are not reflected in the backup.

For point-in-time recovery beyond the backup timestamp, combine database backups with transaction log replay. This allows recovery to any point between backups by first restoring from backup and then replaying transactions from the transaction log.

Backing up the Database

A database backup saves the contents of a single database along with database metadata, including user and role permissions associated with the database. Database backups can be written to the file system, AWS S3, Google Cloud Platform, or Azure Blob Storage.

The stardog-admin db backup command assumes a default location for its output, namely, $STARDOG_HOME/.backup. That default may be overridden by setting the backup.dir server property in your stardog.properties file. Backups are stored in directories by database name and then in date-versioned subdirectories for each backup volume.

Your typical backup directory would have a layout similar to this:

.backup/myDb/2020-10-02
.backup/myDb/2020-10-11
.backup/myOtherDb/2020-06-21

If you need to specify a location outside of $STARDOG_HOME (e.g. a network mount) you can set the backup.location server property in your stardog.properties or pass it to the --to argument in the stardog-admin db backup command.

EXAMPLE

To create a backup of a Stardog database called foobar:

$ stardog-admin db backup foobar

EXAMPLE

To perform a remote backup, for example, pass in a specific directory that may be mounted in the current OS namespace via some network protocol, thus:

$ stardog-admin db backup --to /my/network/share/stardog-backups foobar

In the progress monitor, backups to remote systems (e.g., S3, GCP, and Azure) may show progress at 100% for an extended period of time due to file transfers.

Backup Methods

Stardog 12.1 introduced a new backup implementation. Two backup methods are now available, selected by the backup.method option (CLI: --method):

  • EXPORT — this is the method used in all Stardog versions prior to 12.1. It is also the method used by Stardog 12.1 when the transaction.store.mode database option is set to system (the default, see below). It exports all triples and the dictionary to files in a custom format. Its performance is similar to that of stardog data export. It is the only method that supports the repair option.
  • CHECKPOINT — a physical snapshot of the database’s data files using hard links. It is available only for databases created in Stardog 12.1 or later with the transaction.store.mode=database option (in which case it is the default method). Backups created using this method can only be restored on Stardog 12.1+. This method is faster for both backup and restore, but typically requires more disk space, especially if the database was not recently optimized to compact data files.

transaction.store.mode is a new storage option in Stardog 12.1.0 that controls where the list of committed and aborted transactions for a database is kept. With the default system value the list is maintained globally for the Stardog server; with database the list is maintained as a part of the database itself. This means the database can be restored from a checkpoint on a different Stardog instance, such as a different cluster node.

To enable checkpoint backups, set transaction.store.mode=database when creating the database. The option is fixed at creation time and cannot be changed later, except when restoring from a backup (see Restoring a Database).

$ stardog-admin --server=... db create -o transaction.store.mode=database -n myDb

Creating a database with transaction.store.mode=database makes the STARDOG_HOME directory unreadable by Stardog 12.0 or earlier. Once any such database exists, those older versions will fail to start against that home directory. Downgrading to Stardog 12.0.* is only possible after deleting every database that uses the per-database transaction store. Do not enable this option until you are committed to running Stardog 12.1 or later.

Backups for this database will now use the CHECKPOINT method by default. For backwards compatibility, the EXPORT method can still be selected using the --method argument for the db backup command:

$ stardog-admin --server=... db backup --method EXPORT myDb

Choosing checkpoint-based backups

Advantages of CHECKPOINT:

  • Faster to create a backup. A checkpoint does not scan the data: it snapshots the data files in place, so the runtime is dominated by file I/O rather than CPU. On a local SSD this is typically orders of magnitude faster than EXPORT.
  • Faster to restore Restoring from the backup does not require data re-indexing or statistics recomputation. The data files can be directly ingested into Stardog storage.

Disadvantages of CHECKPOINT:

  • Backup size tracks on-disk size, not logical size. A checkpoint captures the data files as they are, so it includes multiple versions of the same triples and deleted data that have not yet been compacted away. A database with many un-compacted updates or deferred vacuum work will produce a checkpoint backup that is significantly larger than an EXPORT backup of the same logical content. Running db optimize before a checkpoint backup reclaims that space.
  • Cannot repair corruption. Because the data files are copied verbatim, any index-level corruption is copied along with them. Backups taken with --repair or --partial always use EXPORT, even when --method CHECKPOINT is requested.
  • Not backwards-compatible. A checkpoint backup cannot be restored on Stardog 12.0 or earlier. Use --method EXPORT to produce a portable archive.

The on-disk backup is self-describing: the restore command detects the format from the backup metadata, so callers do not need to specify a method when restoring.

A checkpoint-based backup does not always report the exact number of triples in the backup because, as opposed to EXPORT, it does not scan the data. It is based on the index.size metadata value, which might deviate from the true size if the same triples have been added or deleted multiple times.

Concurrency and locking

Unlike an EXPORT backup (which runs in a read transaction and lets reads and writes continue), a CHECKPOINT backup needs a point-in-time snapshot that is consistent across all of the database’s internal data stores. To capture one, it briefly takes an exclusive lock on the database and cancels all processes that are ongoing at the moment when the backup starts. Cancelled transactions fail and must be retried by the client; new writes are simply held off for the (typically sub-second) duration of the snapshot.

We suggest to avoid running a checkpoint backup concurrently with a write activity or db optimize on the same database. Because in-flight transactions are cancelled, combining a checkpoint backup with a sustained write load — and especially with a concurrent db optimize — on the same database in a cluster can abort those transactions and, in rare cases, cause a cluster node to be expelled and then automatically rejoined. The cluster recovers on its own and the backup itself remains consistent, but writes in that window may fail. Schedule checkpoint backups when write load is low, and do not run db optimize and db backup against the same database at the same time.

Impact on cluster synchronization

Stardog cluster uses backups to synchronize nodes of the cluster when it detects that a partial synchronization using transaction logs is not possible. For databases created with transaction.store.mode=database, the cluster use checkpoint-based backups. This typically makes synchronization faster, especially on the joining node, but at the expense of sending more data across the network.

Backup to S3

For S3 backups use a URL in the following format:

s3://[<endpoint hostname>:<endpoint port>]/<bucket name>/<path prefix>?region=<AWS Region>&AWS_ACCESS_KEY_ID=<access key>&AWS_SECRET_ACCESS_KEY=<verySecretKey1>

The endpoint hostname and endpoint port values are only used for on-premises S3 clones. To use Amazon S3 those values can be left blank and the URL will have three / before the bucket as in:

s3:///mybucket/backup/prefix?region=us-east-1&AWS_ACCESS_KEY_ID=accessKey&AWS_SECRET_ACCESS_KEY=secret

A default S3 location can also be specified in the stardog.properties file with the key backup.location.

IAM roles

In Stardog 9.1, database backup and restore to s3 support IAM roles attached to the instance as a means to provide credentials. If an access key and secret key are not specified in the s3 URL, Stardog will attempt to use an IAM role attached to the instance. You can read more about how to use IAM roles in the AWS documentation.

Requests using IAM roles attached to the instance are essentially the same except you do not provide the AWS access key or secret:

s3:///mybucket/backup/prefix?region=us-east-1

If an IAM role is attached to the instance and AWS keys are provided in the s3 URL, Stardog will only attempt the request with the access and secret keys provided in the s3 URL. If the credentials are incorrect and the request fails, Stardog will not attempt to use the IAM role attached to the instance. You will either need to specify the correct credentials as part of the s3 URL or remove the credentials from the request so the IAM role can be used.

Backup to Google Cloud Platform (GCP)

For GCP backups use a URL in the following format:

gs://<bucket name>/<path prefix>?GOOGLE_APPLICATION_CREDENTIALS=<path to Google Credentials JSON file>

See GCP documentation for creating Google credentials JSON file.

A default GCP backup location can also be specified in the stardog.properties file with the key backup.location.

Backup to Azure Blob Storage

For Azure backups use a URL in the following format:

https://<storage account>.blob.core.windows.net/<container>/<prefix>?<token>

The database will be stored in your Azure storage account under the specified container and directory identified by prefix.

If another scheme or host is required for your Azure account, they can be configured in stardog.properties with backup.azure.scheme, which defaults to https, and backup.azure.host, which defaults to blob.core.windows.net.

See Azure Blob Storage documentation for configuring and securing a storage container in your storage account.

Similar to S3 and GCP backups, a default Azure Blob Storage backup location can also be specified in the stardog.properties file with the key backup.location.

Restoring a Database

To restore a Stardog database from a Stardog backup volume, simply pass a fully-qualified path to the volume in question. The location of the backup should be the full path to the backup, not the location of the backup directory as specified in your Stardog configuration. There is no need to specify the name of the database to restore.

To restore a database from its backup:

$ stardog-admin db restore $STARDOG_HOME/.backups/myDb/2012-06-21

It is also possible to restore a database and overwrite the existing database.

$ stardog-admin db restore --overwrite $STARDOG_HOME/.backups/myDb/2012-06-21

The restore process (with --overwrite) drops the existing database, after the data is loaded.

Note that data written to the existing database, while restoring takes place, will be lost. Transactions that were started before or during the restore operation, will not be present in the restored database. For a long running restore process, it is possible to read or query the old database. At the end of the restore process, the old database is dropped and replaced with the restored database. During this short period of time, the database will not be available.

Restoring with Metadata Overrides

Database options can be overridden on the restored database with -m/--metadata. Each override is a key=value pair, and overrides are applied on top of the metadata stored in the backup. For example, an EXPORT backup of a database on the system transaction store can be restored as a checkpoint-enabled database:

$ stardog-admin db restore -m transaction.store.mode=database $STARDOG_HOME/.backups/myDb/2012-06-21

Restoring with -m transaction.store.mode=database has the same downgrade implications as creating such a database directly: the STARDOG_HOME directory becomes unreadable by Stardog 12.0 or earlier, and downgrading to 12.0.0 then requires deleting the database. See the warning under Backup Methods.

Overriding options at restore time can put the restored database into a state that is inconsistent with the data in the backup. If the overridden options are not compatible with the actual contents of the backup, the results are undefined and the resulting database may be unusable.

This is particularly relevant for RDF* enabled databases (changing edge.properties to false must only be done if not edge properties are present in the backup).

Restore from S3

Backups can also be restored directly from S3 by using an S3 URL in the following format:

s3://[<endpoint hostname>:<endpoint port>]/<bucket name>/<path prefix>/<database name>?region=<AWS Region>&AWS_ACCESS_KEY_ID=<access key>&AWS_SECRET_ACCESS_KEY=<verySecretKey1>

Unlike the backup URL the database name must be specified as the last entry of the path field in the URL.

Restore from Google Cloud Platform (GCP)

Backups can also be restored directly from GCP by using a GCP URL in the following format:

gs://<bucket name>/<path prefix>?GOOGLE_APPLICATION_CREDENTIALS=<path to Google Credentials JSON file>

Restore from Azure Blob Storage

Backups can also be restored directly from Azure Blob Storage by using a URL in the same format as backups:

https://<storage account>.blob.core.windows.net/<container>/<prefix>/<database name>?<token>

Unlike the backup URL the database name must be specified as the last entry of the path field in the URL.

Automatic Restore

Stardog can be configured to automatically restore databases from a backup location on startup. For example, when a Stardog cluster node first starts it could pull all of the database data down from an S3 backup before joining the cluster.

Automatic restore is not supported for GCP or Azure Blob Storage backups.

There are two server properties that control this behavior.

Properties Description
backup.autorestore.dbnames A regular expression that matches the names of the databases to automatically restore on startup, eg: .* for every database.
backup.autorestore.onfailure A boolean value that determines if all databases which failed to load should be automatically restored from a backup location.

As with any server property, they should be set in your stardog.properties file

Restoring Permissions

Backups created by version 7.7.1 or newer include permissions related to the database in the backup and grant these permissions when the database is restored. The permission included in the backup cover permissions for the database, database metadata, database admin, named graphs, data quality constraints and sensitive properties. See the security model section for details on these security resources.

There are some caveats with restoring permissions and some permissions might not be restored. For example, at the time the backup was created a certain user might have had permissions over the database. But if that user was deleted and does not exist at the time the database is being restored, the corresponding permissions will not be restored.

Furthermore, for the permissions in the backup to be restored, the user performing the restore operation should have privileges to grant the permissions specified in the backup. If the user performing the restore operation does not have such privileges, then permissions will not be restored. As a best practice, it is recommended that a superuser or a user with grant:*:* privileges perform the restore operation.

Database will be restored even if errors are encountered while restoring the permissions. Such errors will be included in the restore operation output.

Backups created by version 7.7.0 or earlier do not contain any permission information. When such backups are restored the database owner is granted the default permission but no additional permissions will be granted. Any required permissions need to be manually granted after the restore is complete.

Logical Backup

In addition to physical backups, one can perform a logical backup using the stardog data export command that will save the contents of a database into a standard RDF file.

EXAMPLE

Export the database myDb as NTRIPLES:

$ stardog data export --format NTRIPLES myDb

EXAMPLE

Export the database myDb to a gzipped file in TURTLE:

$ stardog data export myDb export.ttl.gz

Logical backups do not contain database metadata or configuration options.