Geo Replica Nodes (Feature Preview)
This page describes Geo Replica Nodes in Stardog - a useful feature for maintaining HA clusters.
The notion of a geo replica node was introduced in Stardog 9.0. A geo replica node runs outside the data center or region where Stardog Cluster is running and periodically requests updates, as shown in the figure above. A geo replica can operate in read replica mode, or simply as a standby node. It cannot service any user write requests. Similar to a standby node, a geo replica stays closely synchronized with the cluster but without disturbing the cluster with the more difficult
join event. By only drifting from full synchronization by limited time windows it allows for a few important features:
A geo replica is designed to run outside the data center where Stardog Cluster is running. This is an important feature for disaster recovery and maintaining high-availability when an entire region is unavailable.
When configured as a read replica, the geo replica node can respond to SPARQL queries with minimal impact on the cluster. Note that the query results may not be exactly the same as those returned by the cluster, since the replica’s databases may be slightly out of date.
The replica node can safely run database and server backups while taking minimal CPU cycles from servicing user requests on cluster nodes.
A geo replica node cannot be upgraded to be a full node in the cluster. Instead a geo replica would need to be configured as a separate, independent, cluster with its own ZooKeeper deployment (or started as a standalone Stardog node) if you want to enable write requests in another region. However, at this point it will be completely separated from the original cluster and if it is configured as a replica again data loss may occur.
Configuring a Geo Replica
A geo replica is a single node Stardog configured with the following settings in
pack.enabled=true pack.geoReplica=true pack.geoReplica.address=<Stardog source cluster IP>:5820 pack.geoReplica.connection.user=<Stardog source cluster superuser> pack.geoReplica.connection.password=<Stardog source cluster superuser password>
The geo replica address is the IP or domain name of the source Stardog Cluster endpoint and port (typically this is the load balancer). By default geo replicas will sync over https to the address provided for the cluster. The geo replica requires a Stardog superuser and password to connect to the Stardog cluster endpoint and sync. The standby sync interval is the frequency that the geo replica attempts to sync from the primary cluster. Because the credentials are provided in
stardog.properties we recommend that the file is only readable to Stardog and that the file system is encrypted.
If you need to test a geo replica without https for non-production use cases you can pass the flag
--insecure-geo-replica to the
stardog-admin server start command when Stardog starts and syncs will happen over http.
Do not use the
--insecure-geo-replica flag in production environments or across wide-area networks with sensitive data. All Stardog data and metadata will be sent unencrypted over http.
If you wish to configure the geo replica as a read replica you can also specify:
To change the interval that the geo replica syncs you can change the standby sync interval (default is 5 minutes):
Using a Geo Replica
A geo replica node functions similar to a standby node (or read replica if configured as one) except that it can run in another region. Therefore, working with a geo replica uses all the standby commands. Two key commands for working with geo replicas are
cluster standby-status, which shows how far out of date different databases are, and
cluster standby-attempt-sync, which will trigger a sync attempt immediately.
Similar to standby nodes and read replicas,
stardog-admin server stop should be sent directly to the geo replica or SIGTERM sent to the process to shut it down.