Link Search Menu Expand Document
Start for Free

Point-in-Time Recovery

In this tutorial you will build a continuous point-in-time recovery (PITR) pipeline for a Stardog database. The pipeline takes one full backup and then captures incremental transaction-log slices on demand, so that any moment from the backup forward can be restored — including a specific transaction.

Page Contents
  1. Overview
  2. Prerequisites
  3. How it works
    1. Anchoring on the backup
    2. Detecting new transactions
    3. Slicing the log
  4. The backup script
  5. The replay script
  6. Walking through a full cycle
  7. Recovering to a specific moment
    1. Finding the right UUID to stop at
  8. Tuning and operational notes
  9. See also

Overview

This tutorial assumes you have already read Transaction Logs for the underlying concepts. Briefly:

  • A full backup captures the database at a point in time. It also records, in its metadata, the UUID of the last committed transaction at that moment. This is the anchor for replay.
  • A transaction-log slice is a contiguous portion of the database’s transaction log, identified by its starting and ending transaction UUIDs. Replayed back-to-back, slices reconstruct everything that happened between two anchors.
  • Point-in-time recovery means: restore the backup → replay every slice in order until the desired point.

The pipeline you’ll build has two parts:

  1. pitr-backup.sh — takes one full backup, extracts the anchor UUID from the backup’s metadata, then polls the live database for its current index.last.tx. Each time the UUID advances, it exports a new tx-log slice covering exactly the new range.
  2. pitr-restore.sh — restores the backup, then replays every captured slice in order. Optionally stops at a specific UUID.

The “poll, then slice only on change” approach avoids the obvious trap of writing empty slices when nothing has changed, and it makes every slice exactly cover one chunk of history with no overlap and no gap.

Prerequisites

  • Stardog 12.0 or later.
  • jq installed on the backup host — used to parse the JSON metadata output.
  • A user with DBMS execute permission [EXECUTE, "admin:<db>"] on the target database (the same permission required for db backup and db restore).
  • Transaction logging enabled on the database (always enabled for cluster databases). If it’s not already enabled, turn it on:

     $ stardog-admin db offline myDatabase
     $ stardog-admin metadata set -o transaction.logging=true -- myDatabase
     $ stardog-admin db online myDatabase
    

Without transaction.logging=true, no records are written and tx log returns an empty log. Cluster databases have transaction logging enabled by default and cannot disable it.

How it works

Anchoring on the backup

When a backup is created, the database’s metadata file (metadata.bk, inside the dated backup directory) records the UUID of the last committed transaction. That UUID is the exact point at which the backup’s data is consistent. Any tx-log slice you want to replay onto the restored backup must start from this UUID.

Use metadata convert to read the UUID directly from the file:

$ stardog-admin metadata convert \
        --input-format BINARY --output-format json \
        /var/stardog/pitr/backup/myDatabase/2026-05-12/metadata.bk \
        | jq -r '.["index.last.tx"]'
54a1f110-fa6c-4d71-8b11-820bd9ea01be

Reading from the backup file (rather than querying the live database) guarantees the anchor UUID and the backup data are in sync — even if more transactions committed in the small window between the snapshot completing and the script reading the metadata.

Detecting new transactions

The currently-committed UUID on the live database is exposed under the metadata key index.last.tx:

$ stardog-admin metadata get -o index.last.tx myDatabase --output-format json \
        | jq -r '.["index.last.tx"]'
9f823c11-77be-4d28-94a1-1d04e8a3aabe

If this UUID is different from the one at the end of the last slice, new transactions have been committed and need to be captured. If it’s identical, there’s nothing to do — skip the export.

Slicing the log

stardog-admin tx log --format raw exports a binary slice that tx replay can consume directly. Bounding it with --from-uuid and --to-uuid, using the UUID values captured earlier, makes the slice precisely cover the new range:

$ stardog-admin tx log myDatabase \
        --from-uuid 54a1f110-fa6c-4d71-8b11-820bd9ea01be \
        --to-uuid   9f823c11-77be-4d28-94a1-1d04e8a3aabe \
        --format raw --output /var/stardog/pitr/txlog/tx-0001.log
Transaction log exported to: /var/stardog/pitr/txlog/tx-0001.log
Last transaction UUID: 9f823c11-77be-4d28-94a1-1d04e8a3aabe

The backup script

pitr-backup.sh writes everything under a single capture directory:

$BACKUP_DIR/
├── backup/myDatabase/2026-05-12/   # the full backup
└── txlog/
    ├── tx-0001-<uuid>.log
    ├── tx-0002-<uuid>.log
    └── ...

Each slice’s filename embeds its terminal UUID for traceability. The numeric prefix is a monotonic counter, so lexicographic sort = chronological order at replay time.

#!/usr/bin/env bash
set -euo pipefail

# Continuous point-in-time backup. Takes one full database backup, then
# polls index.last.tx every $INTERVAL seconds. Whenever the UUID
# advances, exports the new range as a tx-log slice.
#
# Args / env:
#   $1               database name
#   BACKUP_DIR       capture root (default: /var/stardog/pitr)
#   INTERVAL         seconds between polls (default: 60)
#   STARDOG_ADMIN    path to stardog-admin (default: stardog-admin)

DB="${1:?usage: $0 <database>}"
BACKUP_DIR="${BACKUP_DIR:-/var/stardog/pitr}"
INTERVAL="${INTERVAL:-60}"
STARDOG_ADMIN="${STARDOG_ADMIN:-stardog-admin}"

mkdir -p "$BACKUP_DIR/backup" "$BACKUP_DIR/txlog"

# 1. Take a full backup. db backup prints a line of the form
#    "Database <db> backed up <n> triples to <path> in <time>".
#    Extract the path so we can read the metadata file out of it.
echo "Taking full backup of $DB..."
backup_path="$("$STARDOG_ADMIN" db backup --to "$BACKUP_DIR/backup" "$DB" \
               | sed -n 's/.* triples to \(.*\) in .*/\1/p' \
               | tail -n 1)"

if [ -z "$backup_path" ] || [ ! -d "$backup_path" ]; then
    echo "Could not determine the backup directory from db backup output." >&2
    exit 1
fi
echo "Backup written to $backup_path"

# 2. Read the anchor UUID directly from the backup metadata. This is the
#    UUID of the last committed transaction at the exact moment of the
#    snapshot — reading it from the file guarantees consistency with the
#    backup's data, regardless of what commits afterwards on the live db.
last_uuid="$("$STARDOG_ADMIN" metadata convert \
                 --input-format BINARY --output-format json \
                 "$backup_path/metadata.bk" \
             | jq -r '.["index.last.tx"]')"

if [ -z "$last_uuid" ] || [ "$last_uuid" = "null" ]; then
    echo "Could not read index.last.tx from $backup_path/metadata.bk;" \
         "is transaction.logging enabled for $DB?" >&2
    exit 1
fi
echo "Backup anchored at $last_uuid"

# 3. Poll the live db. Whenever index.last.tx advances, slice the log.
seq=0
while true; do
    sleep "$INTERVAL"

    current_uuid="$("$STARDOG_ADMIN" metadata get -o index.last.tx "$DB" \
                       --output-format json | jq -r '.["index.last.tx"]')"

    if [ -z "$current_uuid" ] || [ "$current_uuid" = "null" ]; then
        echo "[$(date -u +%FT%TZ)] failed to read current UUID; will retry" >&2
        continue
    fi

    if [ "$current_uuid" = "$last_uuid" ]; then
        # No new transactions since the previous slice.
        continue
    fi

    seq=$((seq + 1))
    slice="$BACKUP_DIR/txlog/$(printf 'tx-%04d-%s.log' "$seq" "$current_uuid")"

    "$STARDOG_ADMIN" tx log "$DB" \
        --from-uuid "$last_uuid" --to-uuid "$current_uuid" \
        --format raw --output "$slice" >/dev/null

    echo "[$(date -u +%FT%TZ)] slice $seq: $last_uuid -> $current_uuid"
    last_uuid="$current_uuid"
done

A few details worth pointing out:

  • $backup_path is parsed out of db backup’s stdout, which prints Database <db> backed up <n> triples to <path> in <time>. The sed extracts the path between ` triples to ` and ` in . This avoids guessing the date-versioned directory name and works whether backup.dir defaults are used or –to` overrides them.
  • Polling reuses the same UUID the previous slice ended on as --from-uuid for the next slice. Slices form a continuous chain with no overlap and no gap.
  • If a single iteration fails (network blip, server restart), the loop retries from the same last_uuid on the next tick. The interval’s worth of transactions simply rolls into the next successful slice.
  • The slice filename tx-NNNN-<uuid>.log uses the numeric prefix for ordering and embeds the terminal UUID for human traceability.

The replay script

pitr-restore.sh restores the captured backup and then replays each tx-log slice in chronological (lexical) order. tx replay validates log continuity by default, so any gap will surface as a validation error rather than silently producing an inconsistent state.

#!/usr/bin/env bash
set -euo pipefail

# Restore a database from a pitr-backup.sh capture and replay the
# accumulated tx-log slices.
#
# Args / env:
#   $1               capture root (the BACKUP_DIR used by pitr-backup.sh)
#   $2               source database name (must match the name used at
#                    backup time; this is the directory under $1/backup/)
#   $3               optional target database name to restore into
#                    (default: same as source). Must not already exist on
#                    the target server.
#   STOP_UUID        optional: replay only up to this UUID (inclusive);
#                    passed to every tx replay call until the restored
#                    database reaches it
#   STARDOG_ADMIN    path to stardog-admin (default: stardog-admin)

CAPTURE="${1:?usage: $0 <capture-dir> <source-db> [target-db]}"
SOURCE_DB="${2:?usage: $0 <capture-dir> <source-db> [target-db]}"
TARGET_DB="${3:-$SOURCE_DB}"
STARDOG_ADMIN="${STARDOG_ADMIN:-stardog-admin}"

# 1. Locate and restore the most recent full backup under the capture
#    directory. The backup is looked up by the source name; the restored
#    database is named with the target name.
latest_backup="$(ls -1d "$CAPTURE/backup/$SOURCE_DB"/* | LC_ALL=C sort | tail -n 1)"
echo "Restoring $latest_backup -> $TARGET_DB"
"$STARDOG_ADMIN" db restore --name "$TARGET_DB" "$latest_backup"

# 2. Replay tx-log slices in order. tx-NNNN-<uuid>.log makes lex sort
#    = chronological. The numeric prefix carries the ordering; the
#    trailing UUID is for human traceability.
shopt -s nullglob
slices=("$CAPTURE/txlog/"tx-*.log)
shopt -u nullglob

if [ "${#slices[@]}" -eq 0 ]; then
    echo "No tx-log slices found; restored backup as-is."
    exit 0
fi

IFS=$'\n' sorted=($(printf '%s\n' "${slices[@]}" | LC_ALL=C sort))
unset IFS

for i in "${!sorted[@]}"; do
    slice="${sorted[$i]}"
    if [ -n "${STOP_UUID:-}" ]; then
        echo "Replaying $slice up to $STOP_UUID"
        "$STARDOG_ADMIN" tx replay --to-uuid "$STOP_UUID" "$TARGET_DB" "$slice"

        current_uuid="$($STARDOG_ADMIN metadata get -o index.last.tx "$TARGET_DB" \
                           --output-format json | jq -r '."index.last.tx"')"

        if [ "$current_uuid" = "$STOP_UUID" ]; then
            echo "Reached stop UUID $STOP_UUID"
            break
        fi
    else
        echo "Replaying $slice"
        "$STARDOG_ADMIN" tx replay "$TARGET_DB" "$slice"
    fi
done

if [ -n "${STOP_UUID:-}" ] && [ "$current_uuid" != "$STOP_UUID" ]; then
    echo "STOP_UUID $STOP_UUID was not reached in the available slices." >&2
    exit 1
fi

echo "Recovery complete."

db restore rejects restoring over an existing database of the same name unless --overwrite is used, and --overwrite cannot be combined with --name. The script above always passes --name, so the target database name must be free on the target server. If you need to overwrite the existing same-named database, either drop it first (stardog-admin db drop <db>) or invoke db restore --overwrite directly without --name.

Walking through a full cycle

Put the two scripts on the backup host and start the capture in the background:

$ chmod +x pitr-backup.sh pitr-restore.sh
$ BACKUP_DIR=/var/stardog/pitr INTERVAL=60 ./pitr-backup.sh myDatabase &
Taking full backup of myDatabase...
Backup written to /var/stardog/pitr/backup/myDatabase/2026-05-12
Backup anchored at 54a1f110-fa6c-4d71-8b11-820bd9ea01be
[2026-05-12T14:32:11Z] slice 1: 54a1f110-... -> 7c9e1d44-...
[2026-05-12T14:34:11Z] slice 2: 7c9e1d44-... -> 9f823c11-...
...

The capture directory now looks like this:

$ tree /var/stardog/pitr
/var/stardog/pitr
├── backup
│   └── myDatabase
│       └── 2026-05-12
│           ├── data.bk
│           └── metadata.bk
└── txlog
    ├── tx-0001-7c9e1d44-7c1a-4b3f-9e88-6c2b73f9aa11.log
    ├── tx-0002-9f823c11-77be-4d28-94a1-1d04e8a3aabe.log
    └── tx-0003-8aa3f114-f2b7-44e0-9c5e-0a3a8a59e4cd.log

To recover the captured backup to its original name (myDatabase must not already exist on the target server, drop it first if needed):

$ ./pitr-restore.sh /var/stardog/pitr myDatabase
Restoring /var/stardog/pitr/backup/myDatabase/2026-05-12 -> myDatabase
Replaying /var/stardog/pitr/txlog/tx-0001-7c9e1d44-....log
Replaying /var/stardog/pitr/txlog/tx-0002-9f823c11-....log
Replaying /var/stardog/pitr/txlog/tx-0003-8aa3f114-....log
Recovery complete.

To restore the same capture under a different name — for example into a side-by-side copy you can inspect before promoting — pass the target name as the third argument:

$ ./pitr-restore.sh /var/stardog/pitr myDatabase myDatabase_recovered
Restoring /var/stardog/pitr/backup/myDatabase/2026-05-12 -> myDatabase_recovered
...

Recovering to a specific moment

To stop replay at a precise point — for example, just before a bad transaction — pass that transaction’s UUID as STOP_UUID:

$ STOP_UUID=8aa3f114-f2b7-44e0-9c5e-0a3a8a59e4cd \
        ./pitr-restore.sh /var/stardog/pitr myDatabase myDatabase_recovered

When STOP_UUID is set, the script passes --to-uuid on every replay call and checks index.last.tx after each slice. As soon as the restored database reaches STOP_UUID, the script stops. If none of the captured slices reaches that UUID, the script exits with an error instead of replaying past the available history.

UUID-bounded replay is exact. Time-bounded replay (--from-time / --to-time) is also supported but requires --skip-validate because the filter doesn’t necessarily align with the transaction boundaries (see How Validation Works). Prefer UUIDs when you have them.

Finding the right UUID to stop at

If you don’t know the UUID off-hand, list the slices and inspect the relevant one in text format. The terminal UUID in each filename tells you which range it covers:

$ ls /var/stardog/pitr/txlog/
tx-0001-7c9e1d44-7c1a-4b3f-9e88-6c2b73f9aa11.log
tx-0002-9f823c11-77be-4d28-94a1-1d04e8a3aabe.log
tx-0003-8aa3f114-f2b7-44e0-9c5e-0a3a8a59e4cd.log

$ stardog-admin tx log --file \
        /var/stardog/pitr/txlog/tx-0003-8aa3f114-f2b7-44e0-9c5e-0a3a8a59e4cd.log \
        --format text

tx log --file takes a single file path — pass the exact slice you want to inspect, not a glob. Each transaction shows its Started / Commit / Done records with their UUIDs and timestamps; pick the UUID of the last good transaction.

Tuning and operational notes

  • Polling interval (INTERVAL). Shorter intervals make slices smaller and tighten your achievable recovery point objective (RPO). Most deployments do well between 30 seconds and 5 minutes. The cost of a poll is one cheap metadata read.
  • Slice retention. Slices accumulate forever as written. Pair the script with a retention policy that keeps slices at least as long as your oldest usable backup. When you take a new full backup, you can safely discard slices older than that backup’s anchor UUID.
  • Disk space. Each slice contains the raw bytes of all transactions in its range — roughly the same size you’d expect from the source database’s own tx-log file for that window. Plan for peak write rates.
  • No concurrent writers during recovery. While replaying, no other application should be writing to the target database. Concurrent writes can interleave their own transaction UUIDs and break validation on subsequent slices.
  • Don’t reuse a recovered database for further capture. Once you’ve replayed to a stop point, the recovered database’s index.last.tx is that stop UUID — restarting pitr-backup.sh against it would diverge from the original timeline. Take a fresh full backup of the recovered database first if you want to continue PITR going forward.
  • Cluster deployments. Run the backup script against any single cluster node; the tx log is replicated, so any node will return the same UUIDs. During recovery, db restore from a file-based backup can only be run when the cluster has a single node — scale the cluster down to one node, restore, then scale back up (see Restoring the Cluster). db restore from a cloud backup (S3, GCP) replicates to all nodes and can run against a multi-node cluster directly.
  • Securing credentials. The scripts assume stardog-admin is authenticated (e.g., via a stored token or ~/.stardog/credentials). Don’t pass credentials on the command line, since they will appear in the process listing and shell history.

See also

  • Transaction Logs — full reference for tx log and tx replay, including validation rules and configuration options.
  • Backup and Restore — full reference for db backup and db restore.