Read Replica Nodes

This page describes Read Replica Nodes in Stardog - a useful feature for maintaining HA clusters.

Page Contents

Overview
Configuring a Read Replica Node
Example HAProxy Configuration

Overview

The notion of a read replica node was introduced in Stardog 8.0.0. A read replica node runs next to the Stardog cluster and periodically requests updates. A read replica node can also respond to read-only requests, such as SPARQL queries, but it cannot service any user write requests. Similar to a standby node, a read replica node stays closely synchronized with the cluster but without disturbing the cluster with the more difficult join event. By only drifting from full synchronization by limited time windows, it allows for two important features:

The read replica node can respond to SPARQL queries with minimal impact on the cluster. Note that the query results may not be exactly the same as those returned by the cluster, since the read replica node’s databases may be slightly out of date. If this is not acceptable for a given query, the query should be sent to the cluster.
The read replica node can safely run database and server backups while taking minimal CPU cycles from servicing user requests on cluster nodes.

Unlike a standby node, a read replica node cannot be directly upgraded to be a full node in the cluster. However, a read replica node can be upgraded to a full node in the cluster by removing its read replica configuration settings, and then restarting the node.

A read replica is a specialization of a standby node and thus license restrictions described for standby nodes apply to read replicas as well.

Configuring a Read Replica Node

To start a cluster node as a read replica node, simply add the following lines to stardog.properties:

pack.standby=true
pack.standby.node.sync.interval=5m
pack.readReplica=true

This configures the node to be in read replica mode and to wait 5 minutes between synchronization attempts. The interval begins when the synchronization completes. In other words, if a synchronization takes 3 minutes, it will be 8 minutes before the next synchronization attempt.

The synchronization interval can be increased or decreased depending on your requirements and use case. For example, if you require that the query results returned from the read replica node be no more than 2 minutes behind the data stored on the cluster, set the pack.standby.node.sync.interval configuration value to 2m. Be sure to monitor the performance of all nodes in the cluster (including the read replica node) to ensure the increased synchronization frequency does not degrade performance to an unacceptable level.

As mentioned above, a read replica node can join the cluster as a full node by stopping the node, removing the above lines from the node’s stardog.properties file, and restarting the node.

Example HAProxy Configuration

A scenario where adding a read replica node to your cluster would be useful is an ecommerce recommendation system. The cluster is constantly being updated as users make purchases in the ecommerce system, and the data in the cluster must be queried to provide browsing shoppers with recommendations of related items to purchase. The recommendations provided to a browsing shopper do not need to be perfectly synchronized with the purchase data stored in the cluster, so querying a read replica node for recommendations makes sense (and removes the query load from the cluster).

Suppose we have configured a six node cluster as described in the Installation and Setup page. Now that we have decided to add a read replica node to this cluster, we need to configure our proxy server to route SPARQL queries to the read replica node instead of to the cluster.

Continuing with the HAProxy configuration example, we add the following backend definition for our read replica node at the end of the haproxy.cfg file.

# the Stardog read replica
backend stardog_read_replica
    mode http
    option tcpka # keep-alive
    # the following line returns 200 for a read replica node and
    # 400 for a node that is not a read replica (e.g., a standby
    # node or a full cluster node)
    option httpchk GET /admin/cluster/readreplica
    # the check interval can be increased or decreased depending
    # on your requirements and use case
    default-server inter 5s
    # replace the IP address with the corresponding node address;
    # maxconn value can be upgraded if you expect more concurrent
    # connections
    server stardog4 196.69.68.4:5821 maxconn 64 check

Next we update the frontend stardog-in definition in the haproxy.cfg file with an Access Control List (ACL) that routes SPARQL queries to the read replica node.

frontend stardog-in
    mode http
    option tcpka # keep-alive
    bind *:5820
    # the following lines identify any routes that end with
    # "/query" and send them directly to the read replica node;
    # if haproxy is unable to find a healthy read replica node,
    # the request falls through and will be routed via the
    # default_backend
    acl read_replica_down nbsrv(stardog_read_replica) lt 1
    use_backend all_stardogs if read_replica_down
    acl query_route path -i -m end /query
    use_backend stardog_read_replica if query_route
    # the following lines identify any routes with "transaction"
    # in the path and send them directly to the coordinator, if
    # haproxy is unable to determine the coordinator all requests
    # will fall through and be routed via the default_backend
    acl transaction_route path_sub -i transaction
    use_backend stardog_coordinator if transaction_route
    default_backend all_stardogs

Finally, we update the backend all_stardogs definition in the haproxy.cfg file, inserting mode http before the option tcpka line.

Overview
Configuring a Read Replica Node
Example HAProxy Configuration