Link Search Menu Expand Document
Start for Free

Storage Layer

The data storage layer is the foundation of any data management system. Understanding how data is stored and updated is a prerequisite to effective administration of the system, especially for scheduling necessary maintenance operations.

As of version 7, Stardog’s storage layer is based on RocksDB, which is a key-value database maintained by Facebook. RocksDB uses LSM trees for storing data. The LSM tree data structure is optimized for write performance on modern hardware due to its append-only update strategy: instead of changing data items in-place (like the classical B+ tree would do), each update appends the item to the end of a sorted sequence in the main memory and, when that is filled up, it is flushed to a new file on disk. The tree is a collection of levels each of which contains one or several such immutable sorted files, the so-called SSTables (or SST).

This data update strategy achieves very good write throughput because appending data is much more efficient than random, dispersed lookups or updates-in-place. However, it has important implications for performance of read queries since they may require reading multiple files and merging contents on the fly. As the number of files grows so does the read overhead. To rectify that issue RocksDB periodically executes the compaction operation in the background to merge files and reduce their number.

It is important to understand that every update in a Stardog database, not just adding a triple but also deleting triples, is an append operation on the physical storage level (that is, a new key inside RocksDB). This has the following implications:

  1. When a triple is deleted, it is written to storage just as if it were added (but with a flag indicating deletion, the so-called tombstone). The existing triple, if exists, is left intact.
  2. When a triple is deleted, Stardog will not check whether that triple exists in the database. It is appended regardless (in other words, Stardog does not search for existing data at all during updates, again, for optimizing write performance).
  3. When a triple is added, Stardog also won’t check whether it exists. It is appended anyway so if it existed before, it will be a duplicate.

Since write transactions can leave duplicate triples and tombstones, read queries have to filter them out to ensure that client applications get a logically consistent view of the data. That adds overhead on top of having to process multiple files. To address this problem the server can compress the multiple records into one via the following CLI command: stardog-admin db optimize.

Database optimization

Database optimization, or executing stardog-admin db optimize, is a key maintenance operation to optimize the organization of data in storage so it can be read in the most efficient way. The operation does not require exclusive access to the database and does not block other operations, i.e. write transactions, read queries, backups, etc. It is however a good idea to run optimize at times of low system activity since it’s disk IO intensive (it reads every file storing some database’s data on disk). Also, concurrently running queries or transactions may have access to data snapshots, which are updated in later transactions. In that case optimize will not be able to compact or remove those snapshots.

Check the databases.{dbName}.queries.running and databases.{dbName}.txns.openTransactions metrics to see whether queries or transactions are running against the database.

What does db optimize do

The operation performs the following tasks:

  • Data compaction. This is similar to background compaction executed by RocksDB but also includes Stardog-specific operations such as eliminating multiple versions of the same triple (if it was added in multiple transactions over time).
  • Data vacuuming. This scans tombstones representing previously deleted triples and removes them from storage. Vacuuming is similar to garbage collection in programming languages in the sense that it only collects triples which no currently active transaction or query can see. If a triple was deleted but there’s a currently active query which started before the delete transaction committed, that query can still see the triple (as per the Snapshot Isolation semantics), so the optimize won’t remove it. This is another reason why optimize is better scheduled at a time of inactivity. Vacuuming requires compaction.
  • Statistics recomputation. Stardog’s query optimize uses selectivity statistics to plan query execution. The statistics are updated periodically in the background (according to the index.statistics.update.automatic option) but optimize will also recompute it from scratch.
  • Recomputation of additional indexes, if enabled, such as the full-text search, geospatial, and literal indexes.

The db optimize command executes all the tasks above but takes options to exclude some of them. For example, one may execute just the compaction/vacuuming steps to optimize the physical data layout on disk:

$ stardog-admin db optimize -o optimize.statistics=false optimize.compact=true optimize.vacuum.data=true -- dbName

or just statistics. Usually it is the compaction and vacuuming which take most of the time, so if only statistics refresh is needed, one can use:

$ stardog-admin db optimize -o optimize.statistics=true optimize.compact=false optimize.vacuum.data=false -- dbName

See the man page for the full list of options.

When to execute db optimize

Since db optimize is important for read performance but could run for a long time (possibly over an hour for a database with several hundred million triples or more), it is important to schedule it properly. Running it too often places an unnecessary burden on the disk subsystem while not running it often enough will likely result in slow queries. We provide here guidelines for deciding when to execute it.

The most informed way to make that decision is by monitoring the server’s metrics. Specifically Stardog reports the approximate size of the database (i.e. the number of triples) and the number of keys in the tables inside storage, as follows:

databases.{db}.size: 1,000,000
databases.{db}.ternary.numKeys: 20,000,000

The *.ternary.numKeys metric shows the number of keys in the indexes representing sorted collections of quads. In the standard configuration Stardog maintains 8 such indexes (SPOC, POSC, OSPC, etc.). Therefore in the optimal state the database should report 8 times higher value for *.ternary.numKeys than for *.size. With each transaction which deletes some data the difference will increase (because, as explained above, each deleted triple is appended as a tombstone). Based on the current experience we suggest to run db optimize no later than when *.ternary.numKeys grows beyond 16x *.size.

However, db optimize might be needed before that. For example, when the data is only added but never deleted, there will be no tombstones so the ratio won’t deviate from 8x. However the data might still need to be compacted to eliminate duplicate versions of the same triple if same triples were added repeatedly over time. One can execute stardog data size --exact {dbName} to obtain the accurate size of the database and if that is substantially smaller than the *.size metric, execute db optimize. It’s possible to disable vacuuming if there were no deletes to speed up the process.

stardog data size --exact {dbName} will scan the entire database. While faster than db optimize it may also take a considerable amount of time.

In addition to monitoring the metrics, there are some clear-cut cases which warrant optimization, typically involving deletion of a large amount of data. The classical example is a wipe-and-load operation which drops a large named graph and re-creates it with new data, for example when data is periodically refreshed or a staging graph’s data moves to production after cleansing. A wipe-and-load means that all deleted triples are first written as tombstones and then the new triples are appended on top of that (in addition to existing triples still being present). That is likely to have an impact on read performance unless db optimize runs immediately after the commit.

Adaptive Transactional Bulk Loading

Stardog employs an adaptive transactional bulk loading mechanism that automatically switches between different data loading strategies based on the total size of the transaction at commit time. There are two primary code-paths in Stardog for handling writes: The default “pipelined” write-path, which buffers writes in memory and flushes them to disk in the background, and the “bulk loading” path, which is optimized for large data loads. The intent of the adaptive loading is to optimize performance for large data operations while maintaining efficient processing for smaller updates.

Configuration properties

The adaptive bulk loading behavior is controlled by two database configuration properties. To understand how they work, one needs to remember that there is a difference in estimated TX size and actual TX size. The estimated size is based on the size of the input data, while the actual size is based on the number of quads added or removed.

Visualise these thresholds on a horizontal axis with 3 ranges:

  • 0 - index.bulk.load.sst.txlimit (estimated): tx pipeline is used without adaptations
  • index.bulk.load.sst.txlimit - index.bulk.load.tx.threshold (actual): the adaptive method is used, the final decision is made either at prepare time or when the size exceeds the upper bound.
  • index.bulk.load.tx.threshold (estimated) - inf: the bulk load method is used without adaptation.

index.bulk.load.tx.threshold

This property defines the threshold of the actual number of quads where Stardog will use bulk loading. If less quads are part of the transaction, standard transactional updates are used. Standard writes are done single-threaded, and data is written to RocksDB [memtables])(https://github.com/facebook/rocksdb/wiki/Memtable).

Mutability
Mutable
Default
4000000
Description
The threshold number of actual quads at which Stardog switches from standard transactional updates to bulk loading. When an input file is actual to contain more quads than this threshold, the bulk loading mechanism is engaged. The maximum effective value is 8388608 (approximately 8.4 million quads).

index.bulk.load.sst.txlimit

This property controls when bulk loading is considered based on the estimated input size. Should the estimated input size of the first batch of data added in a transaction exceed this threshold, the bulk loading mechanism will be considered for the entire transaction. Should it be less than this threshold, standard transactional updates will be used.

Mutability
Mutable
Default
495000
Description
The threshold of **estimated input size** at which the bulk loading mechanism will be considered for transactional updates. This provides an heuristic for determining when to use bulk loading, particularly useful when the exact quad count is not yet known.

Transactional Bulk Loading Usage

Bulk loading is automatically triggered when using commands like:

  • stardog data add or stardog data remove with data sizes larger than the configured thresholds.
  • When using the --tx <id> option the total data size of the transaction is considered at commit time, or whenever a flush to disk is required.
  • For multi-operation transactions bulk loading may used, if the total data size exceeds the threshold - whenever a flush to disk is done. To elaborate:
    • stardog data add --tx <id> will use adaptive transactional bulk loading, as long as only the add command is used consecutively and the count of triples added exceeds the threshold.
    • stardog data remove --tx <id> will use adaptive transactional bulk loading as long as only the remove command is used consecutively and the count of triples removed exceeds the threshold.
    • Executing a read or update query via stardog query --tx <id> within the transaction will force a flush of the already written data.

You can adjust these thresholds to better match your workload patterns:

$ stardog-admin metadata set -o index.bulk.load.tx.threshold=2000000 index.bulk.load.sst.txlimit=50000 -- myDatabase

Performance effects of transactional bulk loading

Understanding the performance implications of transactional bulk loading is crucial for optimizing your Stardog database operations.

Write performance

Transactional bulk loading significantly improves write throughput for large data operations by:

  • Writing data directly as SST (Sorted String Table) files to RocksDB
  • Bypassing the standard transactional write path, avoiding write stalls and other limitations
  • Utilizing multi-threaded external disk sorting, which can lead to substantial performance gains for large datasets.

However, transactional bulk loading creates SST files directly in the LSM tree structure, which can have downstream effects on read performance and compaction. Subsequent read operations may suffer temporarily.

Read performance impact

After bulk loading operations:

  • Initial read performance may degrade due to an increased number of SST files that need to be scanned
  • Index scans may be particularly affected as they need to scan across multiple SST files
  • Query performance gradually rebounds over time as background compaction merges the SST files

Effects on database optimization

Bulk loading has significant implications for db optimize operations:

  1. Increased optimization time: Databases that have undergone multiple transactional bulk loading operations, typically require more time to optimize.

  2. More frequent optimization needed: The SST files created by bulk loading at lower LSM tree levels require compaction to maintain optimal read performance.

  3. Tombstone accumulation: When bulk loading is used for delete operations (data remove --tx), tombstones accumulate more rapidly, necessitating more frequent vacuuming.

Recommendations

Mixed read/write workloads:

  • Consider using a higher threshold of index.bulk.load.tx.threshold if SPARQL query performance suffers after transactional bulk loading
  • Experiments show that queries may perform 10-30% slower immediately after transactional bulk loading, compared to standard writes. In the worst cases, the performance hit can be larger.
  • Schedule db optimize after repeated bulk loading operations
  • Monitor the databases.{db}.ternary.numKeys metric

For write-heavy workloads:

  • Use the default or lower thresholds to maximize write throughput
  • Plan for optimization windows
  • Consider running db optimize when the ratio of *.ternary.numKeys to *.size exceeds 12-16x

Frequent incremental updates:

  • If your workload consists of many small to medium transactions, the default threshold to avoid unnecessary bulk loading should be sufficient
  • Standard transactional updates may provide better overall performance for frequent, smaller updates

Monitor the impact of bulk loading on your specific workload by tracking query performance metrics before and after large data operations, and adjust the thresholds accordingly.