This chapter discusses operating a Stardog Cluster. This page discusses basic operations like adding nodes to a cluster and taking backups. More advanced topics like managing a distributed cache and a standby node are discussed in separate pages.
Stardog cluster stores the UUID of the last committed transaction for each database in ZooKeeper. When a new node is joining the cluster it will compare the local transaction ID of each database with the corresponding transaction ID stored in ZooKeeper. If there is a mismatch the node will synchronize the database contents from another node in the cluster. If there are no nodes in the cluster the new node cannot join the cluster and will shut itself down. For this reason, if you are starting a new cluster then you should make sure that the ZooKeeper state is cleared. If you are retaining an existing cluster then new nodes should be started when there is at least one node in the cluster.
If there are active transactions in the cluster joining node will wait for those transactions to finish and then synchronize its databases. More transactions may take place during synchronization and in that case the joining node will continue synchronization and retrieve the data from new transactions. Thus, it will take longer for a node to join the cluster if there are continuous transactions. Note that, the new node will not be available for requests until all the databases are synchronized. The proxy/load-balancer should perform a health check before forwarding the requests to a new node (as shown in the above configuration) so user requests will always be forwarded to available nodes.
The process to upgrade Stardog Cluster is straightfoward; however, there are a few extra steps you should take to ensure the upgrade goes as quickly and smoothly as possible. Before you begin the upgrade, make sure to place the new Stardog binaries on all of the cluster nodes.
Also make sure to note which node is the coordinator since this is the first node that will be started as part of the upgrade.
stardog-admin cluster info will show the nodes in the cluster and which one is the coordinator.
Next you should ensure that there are no transactions running, e.g.,
stardog-admin db status <db name> will show if there are any open transactions for a database. This step is not strictly required, however, it can minimize downtime and streamline the process, allowing the cluster to stop quickly and helping avoid non-coordinator nodes from having to re-sync when they attempt to join the upgraded cluster.
When you are ready to begin the upgrade, you can shutdown the cluster with
stardog-admin cluster stop. Once all nodes have stopped, backup the
STARDOG_HOME directories on all of the nodes.
With the new version of Stardog, bring the cluster up one node at a time, starting with the previous coordinator. As each node starts make sure that it is able to join the cluster cleanly before moving on to the next node.
Backing up the cluster is similar to single-node backups. However, there are a few points to be aware of. All nodes in the cluster will perform a backup unless S3 is the backup location, in which case only a single node will perform the backup to the S3 bucket.
If you are backing up to S3 then the
backup.location server property should be the same on all nodes in the cluster since any node may perform the backup.
You can also disable backup replication in the cluster by setting the following option:
backup.location is used to specify a backup directory mounted on the Stardog nodes then
backup.location can specify different directories on each node in the cluster, if required. In this case you should disable replicated backups and issue the backup command to each node individually.
--to is passed to the backup CLI commands, it will take precedence over either
backup.location specified in
stardog.properties and all nodes will perform a backup to the specified location.
Similar to single-node Stardog, you can restore individual databases with
db restore and the cluster will replicate the database to all nodes in the cluster.
Because Stardog Cluster uses ZooKeeper to ensure strong consistency between all of the nodes in the cluster, we recommend
server restore be done on only one node with a fresh deploy of ZooKeeper (i.e., clear ZooKeeper’s state once it is no longer in use).
The operational process is to:
- Shutdown Stardog on all nodes in the cluster
- Shutdown the ZooKeeper ensemble, if possible. If that’s not possible we recommend backing up ZooKeeper’s state and wiping the contents stored by Stardog.
- Create an empty
$STARDOG_HOMEdirectory on all of the Stardog Cluster nodes.
$STARDOG_HOMEto the empty home and run
server restore(the same as you would for a single node) on a single node.
- Start a fresh ZooKeeper ensemble with an empty data directory.
- Start ONLY the Stardog node where you performed
server restore. Verify the node starts and is in the cluster with the
cluster infocommand before continuing to step 7.
- Start a second node in the cluster with its empty home directory, wait for it to sync and join the cluster, as reported by
cluster info. Wait until the node joins before moving to step 8.
- Repeat step 7, one node at a time, for the remaining cluster nodes.