Standby Nodes
This page describes Standby Nodes in Stardog - a useful feature for maintaining HA clusters.
Page Contents
Overview
The notion of a standby node was introduced in Stardog 6.2.3. A standby node runs next to the Stardog cluster and periodically requests updates. The standby does not service any user requests, neither reads nor writes. Its purpose is to stay very closely synchronized with the cluster but without disturbing the cluster with the more difficult join
event. By only drifting from full synchronization by limited time windows, it allows for two important features:
-
The standby node can safely run database and server backups while taking minimal CPU cycles from cluster nodes servicing user requests.
-
The standby node can be upgraded to a full node and thereby quickly join the cluster because it is already closely in sync.
This latter point is important for maintaining HA clusters. If one node goes down, a standby node can be promoted to a real, functional node, restoring the cluster to full strength.
Managing a Standby Node
To start a cluster node as a standby node, simply add the following line to stardog.properties
:
pack.standby=true
pack.standby.node.sync.interval=5m
This will configure the node to be in standby mode and to wait 5 minutes between synchronization attempts. The interval begins when the synchronization completes. In other words, if a synchronization takes 3 minutes, it will be 8 minutes before the next synchronization attempt.
Starting with Stardog 9.0, in order to convert a standby node to a full node, it must be shut down, the standby configuration properties removed, and then restarted. Once upgraded, it may take a bit of time for the node to fully join the cluster. Its progress can be monitored with stardog-admin cluster status
.
To check the status of a standby node, run the cluster standby-status
command:
$ stardog-admin --server http://<standby node IP>:5820 cluster standby-status
Another feature of a standby node is the ability to pause
synchronization. To request a pause of synchronization, run the cluster standby-pause
command:
$ stardog-admin --server http://<standby node IP>:5820 cluster standby-pause
This tells the standby node that you want to pause it; however, it does not mean it is paused. Pausing can take some time if the node is in the middle of a large synchronization event. The status of pausing can be monitored with the cluster standby-status
command:
$ stardog-admin --server http://<standby node IP>:5820 cluster standby-status
A node is not safely paused until the state PAUSED
is returned. To resume synchronization, run the cluster standby-resume
:
$ stardog-admin --server http://<standby node IP>:5820 cluster standby-resume
Finally, you can attempt a sync outside of the configured standby synchronization schedule with the cluster standby-attempt-sync
command:
$ stardog-admin --server http://<standby node IP>:5820 cluster standby-attempt-sync
You cannot use the IP address of a full cluster node, nor that of a load balancer directing requests to full cluster nodes. You must point directly to the standby node address.
Because standby nodes are not full cluster members, many cluster commands do not work with standby nodes, such as:
cluster info
cluster status
cluster readonly-start
cluster readonly-stop
cluster shutdown
cluster diagnostics-report
To shut down a standby node, you must issue the server stop
command directly to the standby node address or send the process a SIGTERM.
Standby Node Limits
The number of standby nodes that can be run at a time is controlled by the pack.standby.node
value in your license. Each standby node has an auto-generated unique ID associated with it and will register itself with the cluster using this ID when it first starts. When the number of standby nodes registered for a cluster reaches the limit allowed by the license, additional standby nodes will refuse to start.
Read replicas and geo replicas are categorized as standby nodes too. For this reason, the limit in the license applies to the total number of standby nodes, read replicas, and geo replicas registered for the cluster.
When a standby node shuts down, it will not deregister itself from the cluster. This is to avoid situations where a standby node is not allowed to start after a shutdown because another standby node registered itself in the meantime. But this also means if you delete a standby node permanently, fresh new standby nodes may fail to start because the license limit is reached. For this reason, you can revoke standby access for deleted standby nodes manually by using the standbyRevokeAccess API call.