Storage Layer
The data storage layer is the foundation of any data management system. Knowing how data is stored and how it is updated is a premise to effective administration of the running system, in particular, for scheduling the necessary maintenance operations.
As of version 7, Stardog’s storage layer is based on RocksDB which is a key-value database maintained by Facebook. RocksDB uses LSM trees for storing data. The LSM tree data structure is optimized for write performance on modern hardware due to its append-only update strategy: instead of changing data items in-place (like the classical B+ tree would do), each update appends the item to the end of a sorted sequence in the main memory and, when that is filled up, it is flushed to a new file on disk. The tree is a collection of levels each of which contains one or several such immutable sorted files, the so-called SSTables (or SST).
This data update strategy achieves very good write throughput because appending data is much more efficient than random, dispersed lookups or updates-in-place. However, it has important implications for performance of read queries since they may require reading multiple files and merging contents on the fly. As the number of files grows so does the read overhead. To rectify that issue RocksDB periodically executes the compaction operation in the background to merge files and reduce their number.
It is important to understand that every update in a Stardog database, not just adding a triple but also deleting triples, is an append operation on the physical storage level (that is, a new key inside RocksDB). This has the following implications:
- When a triple is deleted, it is written to the storage just as if it were added (but with a flag indicating deletion, the so-called tombstone). The existing triple, if exists, is left intact.
- When a triple is deleted, Stardog will not check whether that triple exists in the database. It is appended regardless (in other words, Stardog does not search for existing data at all during updates, again, for optimizing write performance).
- When a triple is added, Stardog also won’t check whether it exists. It is appended anyway so if it existed before, it will be a duplicate.
Since write transactions can leave duplicate triples and tombstones, read queries have to filter them out to ensure that client applications get a logically consistent view of the data. That adds overhead on top of having to process multiple files. To address this problem the server can compress the multiple records into one via the following CLI command: stardog-admin db optimize
.
Database optimization
Database optimization, or executing stardog-admin db optimize
, is a key maintenance operation to optimize the organization of data in the storage so it can be read in the most efficient way. The operation does not require exclusive access to the database and does not block other operations, i.e. write transactions, read queries, backups, etc. It is however a good idea to run optimize at times of low system activity since it’s disk IO intensive (basically it reads every file storing some database’s data on disk). Also concurrently running queries or transactions may have access to data snapshots which updated in later transactions. In that case optimize
will not be able to compact or remove those snapshots.
Check the databases.{dbName}.queries.running
and databases.{dbName}.txns.openTransactions
metrics to see whether queries or transactions are running against the database.
What does db optimize
do
The operation performs the following tasks:
- Data compaction. This is similar to background compaction executed by RocksDB but also includes Stardog-specific operations such as eliminating multiple versions of the same triple (if it was added in multiple transactions over time).
- Data vacuuming. This scans tombstones representing previously deleted triples and removes them from the storage. Vacuuming is similar to garbage collection in programming languages in the sense that it only collects triples which no currently active transaction or query can see. If a triple was deleted but there’s a currently active query which started before the delete transaction committed, that query can still see the triple (as per the Snapshot Isolation semantics) so the optimize won’t remove it. This is another reason why optimize is better scheduled at a time of inactivity. Vacuuming requires compaction.
- Statistics recomputation. Stardog’s query optimize uses selectivity statistics to plan query execution. The statistics are updated periodically in the background (according to the
index.statistics.update.automatic
option) butoptimize
will also recompute it from scratch. - Recomputation of additional indexes, if enabled, such as the full-text search, geospatial, and literal indexes.
The db optimize
command executes all the tasks above but takes options to exclude some of them. For example, one may execute just the compaction/vacuuming steps to optimize the physical data layout on disk:
$ stardog-admin db optimize -o optimize.statistics=false optimize.compact=true optimize.vacuum.data=true -- dbName
or just statistics. Usually it is the compaction and vacuuming which take most of the time so if only statistics refresh is needed, one can use
$ stardog-admin db optimize -o optimize.statistics=true optimize.compact=false optimize.vacuum.data=false -- dbName
See the man page for the full list of options.
When to execute db optimize
Since db optimize
is important for read performance but could run for a long time (possibly over an hour for a database with several hundred million triples or more), it is important to schedule it properly. Running it too often places an unnecessary burden on the disk subsystem while not running it often enough will likely result in slow queries. We provide here guidelines for deciding when to execute it.
The most informed way to make that decision is by monitoring server’s metrics. Specifically Stardog reports the approximate size of the database (i.e. the number of triples) and the number of keys in the tables inside the storage, as follows:
databases.{db}.size: 1,000,000
databases.{db}.ternary.numKeys: 20,000,000
The *.ternary.numKeys
metric shows the number of keys in the indexes representing sorted collections of quads. In the standard configuration Stardog maintains 8 such indexes (SPOC
, POSC
, OSPC
, etc.). Therefore in the optimal state the database should report 8 times higher value for *.ternary.numKeys
than for *.size
. With each transaction which deletes some data the difference will increase (because, as explained above, each deleted triple is appended as a tombstone). Based on the current experience we suggest to run db optimize
no later than when *.ternary.numKeys
grows beyond 16x *.size
.
However, db optimize
might be needed before that. For example, when the data is only added but never deleted, there will be no tombstones so the ratio won’t deviate from 8x. However the data might still need to be compacted to eliminate duplicate versions of the same triple if same triples were added repeatedly over time. One can execute stardog data size --exact {dbName}
to obtain the accurate size of the database and if that is substantially smaller than the *.size
metric, execute db optimize
. It’s possible to disable vacuuming if there were no deletes to speed up the process.
stardog data size --exact {dbName}
will scan the entire database. While faster than db optimize
it may also take a considerable amount of time.
In addition to monitoring the metrics, there are some clear-cut cases which warrant optimization, typically involving deletion of a large amount of data. The classical example is a wipe-and-load operation which drops a large named graph and re-creates it with new data, for example when data is periodically refreshed or a staging graph’s data moves to production after cleansing. A wipe-and-load means that all deleted triples are first written as tombstones and then the new triples are appended on top of that (in addition to existing triples still being present). That is likely to have an impact on read performance unless db optimize
runs immediately after the commit.