This page discusses Stardog’s memory management approach and some configurations that can be made based on the usage scenario.
As of version 5.0, Stardog by default uses a custom memory management approach to minimize GC activity during query evaluation. All intermediate query results are now managed in native (aka off-heap or direct) memory which is pre-allocated on server start-up and never returned to the OS until server shutdown. Every query, including SPARQL Update queries with the
WHERE clause, gets a chunk of memory from that pre-allocated pool to handle intermediate results and will return it back to the pool when it finishes or gets cancelled.
Learn more about this GC-less memory management scheme in the blog post!
The main goal of this memory management approach is to improve server’s resilience under heavy load. A common problem with JVM applications under load is the notorious Out-Of-Memory (OOM) exceptions which are hard to foresee and impossible to reliably recover from. Also, in the SPARQL world, it is generally difficult to estimate how many intermediate results any particular query will have to process before the query starts (although the selectivity statistics offers great help to this end). As such, the server has to deal with the situation when there is no memory available to continue with the current query. Stardog handles this by placing all intermediate results into custom collections which are tightly integrated with the memory manager. Every collection, e.g. for hashing, sorting, or aggregating binding sets, requests memory blocks from the manager and transparently spills data to disk when such requests are denied.
This helps avoid OOMs at any time during query evaluation since running out of memory only means triggering spilling and the query will continue slower because of additional disk access. This also means Stardog 5.0+ can run harder, e.g. analytic, queries which may exceed the memory capacity on your server. We have also seen performance improvements in specific (but common) scenarios, such as with many concurrent queries, where the GC pressure would considerably slow down the server running on heap. However, everything comes at a price and the custom collections can be slightly slower than those based on JDK collections when the server is under light load, all queries are selective, and there is no GC pressure. For that reason Stardog has a server option
memory.management which you can set to
stardog.properties to disable custom memory management and have Stardog run all queries on heap.
spilling.dir server option specifies the directory which will be used for spilling data in case the server runs out of native memory. It may make sense to set this to another disk to minimize disk contention.
Stardog provides a range of configuration options related to memory management. Query engine by default uses the custom memory management approach described above but it is not the only critical Stardog component which may require a large amount of memory. Memory is also consumed aggressively during bulk loading and updates. Stardog defines three standard memory consumption modes to allow users to configure how memory should be distributed based on the usage scenario.
The corresponding server property is
memory.mode which accepts the following values:
| ||This is the default option which provides roughly equal amount of memory for queries and updates (including bulk loading). This should be used either when the server is expected to run both read queries and updates in roughly equal proportion or when the expected load is unknown.|
| ||This option provides more memory to read queries and SPARQL Update queries with the WHERE clause. This minimizes the chance of having to spill data to disk during query execution at the expense of update and bulk loading operations. This option should be used when the transactions will be infrequent or small in size, e.g. up to a thousand triples since such transactions do not use significant amount of memory.|
| ||This option should be used for optimal loading and update performance. Queries may run slower if there is not enough memory for processing intermediate results. It may be also suitable when the server is doing a lot of updates and some read queries but the latter are selective and are not highly concurrent.|
| ||This option should be used for bulk loading very large databases (billions of triples) where there is no other workload on the server. When bulk loading is complete, the memory configuration should be changed and the server restarted.|
As with any server option the server has to be restarted after the user changes the memory mode.
stardog-admin server status command displays detailed information on memory usage and the current configuration.