Memory Management
This page discusses Stardog’s memory management approach and some configurations that can be made based on the usage scenario.
Page Contents
Overview
Stardog by default uses a custom memory management approach to minimize GC activity during query evaluation. All intermediate query results are managed in native (aka off-heap or direct) memory, which is pre-allocated on server start-up and never returned to the OS until server shutdown. Every query, including SPARQL Update queries with a WHERE
clause, gets a chunk of memory from that pre-allocated pool to handle intermediate results and will return it back to the pool when it finishes or gets cancelled.
Learn more about this GC-less memory management scheme in this blog post!
The main goal of this memory management approach is to improve the server’s resilience under heavy load. A common problem with JVM applications under load is the notorious Out-Of-Memory (OOM) exceptions, which are hard to foresee and impossible to reliably recover from. Also, in the SPARQL world, it is generally difficult to estimate how many intermediate results any particular query will have to process before the query starts (although the selectivity statistics offer great help to this end). As such, the server has to deal with the situation when there is no memory available to continue with the current query. Stardog handles this by placing all intermediate results into custom collections, which are tightly integrated with the memory manager. Every collection, e.g. for hashing, sorting, or aggregating binding sets, requests memory blocks from the manager and transparently spills data to disk when such requests are denied.
This helps avoid OOMs at any time during query evaluation since running out of memory only means triggering spilling, and the query will continue slower because of additional disk access. This also means Stardog can run harder (e.g. analytic) queries, which may exceed the memory capacity on your server. We have also seen performance improvements in specific (but common) scenarios, such as with many concurrent queries, where the GC pressure would considerably slow down the server running on heap. However, everything comes at a price. The custom collections can be slightly slower than those based on JDK collections when the server is under light load, all queries are selective, and there is no GC pressure. For that reason, Stardog has a server option memory.management
. You can set it to JVM
in stardog.properties
to disable custom memory management and have Stardog run all queries on heap.
Spilling Data
Some queries may keep large amounts of intermediate results in memory. The spilling.dir
server option specifies the directory which will be used for spilling data in case the server runs out of native memory. It may make sense to set this to another disk to minimize disk contention.
A query can generate multiple spilling files and each will be capped independently. The cap is determined by the server configuration option spilling.max.file.length
. The default value is 10G
.
A query will allocate memory for its own exclusive use from the global pool of memory blocks. This reduces the amount of memory available for other queries. Once a query spilled to disk we have the opportunity to release some of the allocated memory blocks back into the common pool. A query will do this, if it has allocated more than 256kB
of memory and it has spilled to disk. The query may then reallocate these blocks, but other concurrently running queries have a chance to claim these first.
The database configuration option
query.memory.exceeds.strategy
can be modified from its default option ofSPILL_TO_DISK
toFINISH_QUERY_EXECUTION
to prevent a query from spilling when the memory allocated to the query is exceeded. This effectively terminates the query, similar to a timeout, and release all the memory the query has acquired.The database configuration option
query.memory.limit
can also be modified from its default option of9223372036854775807B
(B
for bytes) to set the memory limit used by an individual query. Once this limit has been reached, the query will either start spilling or terminate (depending onquery.memory.exceeds.strategy
) regardless of the amount of memory still left in the shared pool.The above config options can be set for an individual database, as well as for the entire server via the Stardog properties: The relevant options are the query memory limit and the option for query memory exceeds strategy
Memory Configuration
Stardog provides a range of configuration options related to memory management. By default, the query engine uses the custom memory management approach described above, but it is not the only critical Stardog component which may require a large amount of memory. Memory is also consumed aggressively during bulk loading and updates.
The term “bulk loading” in Stardog exclusively means loading data at database creation time.
Stardog defines three standard memory consumption modes to allow users to configure how memory should be distributed based on the usage scenario.
The corresponding server property is memory.mode
, and it accepts the following values:
Value | Description |
---|---|
default | This is the default option. It provides roughly equal amount of memory for queries and updates (including bulk loading). This should be used either when the server is expected to run both read queries and updates in roughly equal proportion, or when the expected load is unknown. |
read_optimized | This option provides more memory to read queries and SPARQL Update queries with the WHERE clause. This minimizes the chance of having to spill data to disk during query execution, at the expense of update and bulk loading operations. This option should be used when the transactions will be infrequent or small in size (up to a thousand triples), since such transactions do not use a significant amount of memory. |
write_optimized | This option should be used for optimal loading and update performance. Queries may run slower if there is not enough memory for processing intermediate results. It may also be suitable when the server is doing a lot of updates and some read queries but the latter are selective and not highly concurrent. |
bulk_load | This option should be used for bulk loading very large databases (billions of triples) when there is no other workload on the server. When bulk loading is complete, the memory configuration should be changed and the server restarted. |
As with any server option, the server has to be restarted after the user changes the memory mode.
The stardog-admin server status
command displays detailed information on memory usage and the current configuration.
Memory Block Size
The Stardog server allocates memory based on a fixed block-size. Reducing the block size can reduce memory fragmentation, but may increase overhead. See the option memory.managed.block.size
.