Link Search Menu Expand Document
Start for Free

Administering Stardog 101

This page discusses best practices for operating and administering Stardog, including capacity planning and some common pitfalls Stardog operators run into.


Memory Settings

  1. Stardog uses both JVM memory (heap memory) and the operating system memory outside the JVM (direct or native memory). Review the Memory Usage section for guidance on allocating sufficient memory to Stardog.
  2. Some queries may keep large amounts of intermediate results in memory, e.g. see pipeline breakers. Once the available memory is consumed, the results will start spilling to disk (discussed below), which will slow down the query.
    1. Using query hints and/or more selective query patterns can help alleviate this situation.
  3. When loading data, some amount of it will need to be kept in memory while it’s being processed and indexed. Enabling full-text-search or geospatial support on databases can increase this amount dramatically.
  4. Virtual Import of JSON files can use a lot of heap memory in very specific circumstances (generally those involving long JSON arrays).

We strongly recommend not running other processes on the same machine Stardog is running on. Doing so can reduce the amount of memory available to Stardog.

Disk Settings

  1. Review the Disk Usage section for guidance on allocating sufficient disk space.
  2. Stardog requires adequate disk space both in $STARDOG_HOME and java.io.tmpdir. See Configuring Temp Space for more information.
  3. Some queries may keep large amounts of intermediate results in memory.
    1. Once the available memory is consumed, these query results spill to disk (by default, into the $STARDOG_HOME/.spilling directory). This location can be modified by the server configuration option spilling.dir in the stardog.properties file.
    2. A query can generate multiple spilling files, and each will be capped independently. The cap is determined by the server configuration option spilling.max.file.length. The default value is 10G.
    3. It is important to ensure that spilling.dir has at least enough space to accommodate a file of size spilling.max.file.length (default is 10G). This value should be higher because (as stated in b) Stardog will continue to spill data to another file if a spilling file reaches its max size.

    The database configuration option query.memory.exceeds.strategy can be modified from its default option of SPILL_TO_DISK to FINISH_QUERY_EXECUTION to prevent a query from spilling when the memory allocated to the query is exceeded. This effectively kills the query, similar to a timeout.

    The database configuration option query.memory.limit can also be modified from its default option of 9223372036854775807B (B for bytes) to determine the memory limit used by an individual query.

  4. Most UNIX-like operating systems provide a way to limit and control the usage of system resources (e.g. files, threads, etc). We recommend providing a limit of 100k for open file handles. We’ve seen successful deployments using limits as low as 10k and as high as 1m. See the ulimit manual page for more information.

Monitoring

  1. Use monitoring software on particularly important servers to produce alerts when memory or disk usage is approaching unacceptable thresholds.
    1. Stardog exposes memory usage metrics that can be analyzed to test your workload. See Server Monitoring to learn more about how Stardog exposes its metrics (e.g. via CLI, HTTP, JMX, Prometheus endpoint).
    2. Some key metrics to monitor with respect to memory are:
    Metric Name Description
    dbms.memory.heap.used The current amount of used heap.
    dbms.memory.heap.max The maximum amount of memory allowed for the Java heap. Equivalent to -Xmx settings.
    dbms.memory.native.max The amount of available native memory.
    dbms.memory.system.rss The current RSS size for Stardog (that includes all kinds of memory).
    dbms.memory.system.rss.peak The peak RSS. This is often more important than current RSS because the current value may not be representative of what the workload requires.
    dbms.memory.system.usageRatio The ratio of currently used memory to the total amount available to the process. It is highly recommended to configure an alert to fire when this ratio exceeds 0.9.

We highly recommend testing your production load in your development and test environments to ensure you have adequate resources to run Stardog safely and efficiently.

Security

  1. In all environments except development/sandbox environments where security is not a concern:
    1. Optionally remove the default admin superuser entirely. Absolutely change the admin user’s default password of admin to something more secure.
    2. Set a password policy for new Stardog users (e.g., ensure Stardog user passwords have a minimum length of 8 with at least one capital letter). See Setting Password Constraints for more information.
    3. Optionally use Kerberos/LDAP as a means to authenticate users.
  2. Set credentials and permissions appropriately.
    1. Stardog’s security model is based on standard role-based access control (RBAC) - it’s possible to assign permissions directly to users, but it’s much easier in the long run to assign all permissions to roles and then add those roles to users. Read more about this in Managing Users and Roles.
  3. Enable or require SSL communications with the Stardog server. See Encryption in Transit for more information.

Pitfalls

  1. Do not start two instances of Stardog on the same $STARDOG_HOME directory. This will result in data corruption. Stardog uses a file lock to prevent two instances of Stardog from running on the same home directory. If Stardog fails to start because of the lock file, you should not manually remove the file to start Stardog without first verifying that any Stardog processes configured to use that directory have been stopped as well as creating a complete backup of $STARDOG_HOME.

Not all Docker volume drivers respect the file locking mechanism. This means if you have Stardog running outside of Docker and use Docker to start another instance of Stardog that maps to the same $STARDOG_HOME, they will both start and cause data loss.