Link Search Menu Expand Document
Start for Free

Administrating Stardog 101

This page discusses best practices for operating and administrating Stardog including capacity planning and some common pitfalls Stardog operators run into.


Memory Settings

  1. Stardog uses both JVM memory (heap memory) and also the operating system memory outside the JVM (direct or native memory). Review the Memory Usage section for guidance on allocating sufficient memory to Stardog.
  2. Some queries may keep large amounts of intermediate results in memory, e.g. see the notion of pipeline breakers. Once the available memory is consumed the results will start spilling to disk (discussed below) which will slow down the query.
    1. Using query hints and/or more selective query patterns can help alleviate this situation.
  3. When loading data some amount of it will need to be kept in memory as it’s being processed and indexed. Enabling full-text-search or geospatial support on databases can multiply this.
  4. Virtual Import of JSON files can use lots of heap memory in very specific circumstances, generally involving long JSON arrays

We strongly recommend not running other processes on the same machine Stardog is running on. This can reduce the amount of memory available to Stardog.

Disk Settings

  1. Review the Disk Usage section for guidance on allocating sufficient disk space.
  2. Stardog requires adequate disk space both in $STARDOG_HOME and java.io.tmpdir. See Configuring Temp Space for more information.
  3. Some queries may keep large amounts of intermediate results in memory
    1. Once the available memory is consumed, these query results spill on to disk by default into the $STARDOG_HOME/.spilling directory. This location can be modified by the server configuration option spilling.dir in the stardog.properties file.
    2. A query can generate multiple spilling files and each will be capped independently. The cap is determined by the server configuration option spilling.max.file.length. The default value is 10G.
    3. Important to ensure that spilling.dir has at least enough space to accommodate a file of size spilling.max.file.length (default is 10G). This value should be higher because as stated in b), if a spilling file reaches it max size, Stardog will continue to spill data to another spilling file.

    The database configuration option query.memory.exceeds.strategy can be modified from its default option of SPILL_TO_DISK to FINISH_QUERY_EXECUTION to prevent a query from spilling when the memory allocated to the query is exceeded. This effectively kills the query, similar to a timeout.

    The database configuration option query.memory.limit can also be modified from its default option of 9223372036854775807B (B for bytes) to determine the memory limit used by an individual query.

  4. Most UNIX-like operating systems provide a way to limit and control the usage of system resources (e.g. files, threads, etc). We recommend providing a limit of 100k for open file handles. We’ve seen successful deployments using a limit of 10k and as high as 1m. See the ulimit manual page for more information.

Monitoring

  1. Use monitoring software on particularly important servers to produce alerts when memory or disk usage is approaching unacceptable thresholds.
    1. Stardog exposes memory usage metrics that can be analyzed to test your workload. See Server Monitoring to learn more about how Stardog exposes its metrics (e.g. via CLI, HTTP, JMX, Prometheus endpoint).
    2. Some key metrics to monitor with respect to memory are:
    Metric Name Description
    dbms.memory.heap.used the current amount of used heap
    dbms.memory.heap.max the maximum amount of memory allowed for the Java heap. Equivalent to -Xmx settings.
    dbms.memory.native.max the amount of available native memory
    dbms.memory.system.rss the current RSS size for Stardog (that includes all kinds of memory)
    dbms.memory.system.rss.peak the peak RSS. This is often more important than current RSS because the current value may not be representative of what the workload requires.
    dbms.memory.system.usageRatio The ratio of currently used memory to the total amount available to the process. It is highly recommended to configure an alarm to fire when this ratio exceeds a value greater than 0.9.

We highly recommend testing your production load in your development and test environments to ensure you have adequate resources allocated to safely and efficiently run Stardog.

Security

  1. In all environments except development/sandbox environments where security is not a concern:
    1. Optionally remove the default admin superuser entirely. Absolutely change the admin user’s default password of admin to something more secure.
    2. Can set a password policy for new Stardog users (e.g. ensure Stardog user passwords have a minimum length of 8 with at least one capital letter). See Setting Password Constraints for more information.
    3. Optionally use Kerberos/LDAP as a means to authenticate users.
  2. Set credentials and permissions appropriately
    1. Stardog’s security model is based on standard role-based access control (RBAC) - it’s possible to assign permissions directly to users, but it’s much easier in the long run to assign all permissions to roles and then add those roles to the users. Read more about this in the Security chapter.
  3. Enable or Require SSL communications with the Stardog server. See Deploying Stardog Securely for more information.

Pitfalls

  1. Do not start two instances of Stardog on the same $STARDOG_HOME directory. This will result in data corruption. Stardog uses a file lock to prevent two instances of Stardog from running on the same home directory. If Stardog fails to start because of the lock file, you should not manually remove the file to start Stardog without first verifying that any Stardog processes configured to use that directory have been stopped as well as creating a complete backup of $STARDOG_HOME.

Not all Docker volume drivers respect the file locking mechanism. This means that if you have Stardog running outside of Docker and you start another instance of Stardog via Docker that maps its home to the same $STARDOG_HOME, they will both start and cause data loss.