Link Search Menu Expand Document
Start for Free

Server Monitoring

This page discusses how to monitor the Stardog server.

Page Contents
  1. Overview
  2. Accessing Monitoring Information
    1. Prometheus
      1. Prometheus Metrics Filters
      2. Filter examples
  3. JMX Monitoring
  4. Disabling Monitoring
  5. Knowledge Graph Metrics
    1. Instance Wide Metrics
  6. Block Cache Metrics
  7. Database Metrics
    1. Per Index Metrics
  8. HTTP Server Metrics
  9. Memory Usage Metrics
    1. Memory Management Metrics
    2. Java Memory Metrics
    3. Process Memory Metrics
  10. Process Metrics

Overview

Stardog provides server monitoring via the Metrics library. In addition to providing some basic JVM information, Stardog also exports information about the Stardog DBMS configuration, as well as stats for all databases within the system (e.g., the total number of open connections, database size, and average query time).

Accessing Monitoring Information

Monitoring information is available via the Java API, the HTTP API, the CLI, or (if configured) the JMX interface.

Performing a GET on the /admin/status endpoint will return a JSON object containing all the information available about the server and its databases.

$ curl -u admin:admin "http://localhost:5820/admin/status/"

The stardog-admin server status CLI command will print a subset of this information to the console.

The endpoint /{yourDatabaseName}/status endpoint will return the monitoring information about that database’s status.

$ curl -u admin:admin "http://localhost:5820/{yourDatabaseName}/status/"

Prometheus

Monitoring information is also available for Prometheus via the /admin/status/prometheus endpoint, allowing Prometheus servers to scrape Stardog directly. This endpoint requires authentication.

In some environments, it can be advantageous to scrape metrics from an unauthenticated endpoint. The /admin/status/prometheus/internal endpoint allows a service from a private address space to scrape the metrics without Stardog authentication. By default, it is restricted to connections from 127.0.0.1/32. The allowed CIDR can be changed via stardog.properties with the prometheus.allowCIDR option.

$ curl -u admin:admin "http://localhost:5820/admin/status/prometheus/"

READ permission on dbms-admin:metrics is required to access both the /admin/status and /admin/status/prometheus endpoints in order to consume server metrics information. This is to prevent giving access to sensitive information in server metrics to any unauthenticated users. If the user does not have this permission, admin/status will only return a few metrics, such as the version of the server.

Prometheus Metrics Filters

The Prometheus API offers ways to limit the number of metrics returned. This may be achieved by supplying a regex either in the stardog.properties or directly via a query parameter. The API endpoint supports the query parameters include and exclude, which can be configured in the scrape_config section of the Prometheus configuration file. stardog.properties has corresponding config options named metrics.prometheus.include and metrics.prometheus.exclude. When both parameters are supplied, the exclusion will win. Query parameters take precedence over the configuration option.

Filter examples

Get all metrics starting with dbms_ and com_ with the regex ^(dbms|com)_.*:

$ curl -u admin:admin "http://localhost:5820/admin/status/prometheus?include=dbms_.%2A&include=%5E%28dbms_%7Ccom%29_.%2A"

Or add the following line to stardog.properties:

metrics.prometheus.include=^(dbms|com)_.*

Exclude database specific metrics and kga metrics with the regex ^(databases|kga)_.*:

$ curl -u admin:admin "http://localhost:5820/admin/status/prometheus?exclude=%5E%28databases%7Ckga%29_.%2A"

Or add the following line to stardog.properties:

metrics.prometheus.exclude=^(databases|kga)_.*

JMX Monitoring

By default, JMX monitoring is not enabled. You can enable it by setting metrics.reporter=jmx in the stardog.properties file. Then, you can use a tool like VisualVM or JConsole to attach to the process running the JVM, or connect directly to the JMX server.

If you want to connect to the JMX server remotely, you need to set metrics.jmx.remote.access=true in stardog.properties. Stardog will bind an RMI server for remote access on port 5833. If you want to change the port Stardog binds the remote server to, you can set the property metrics.jmx.port in stardog.properties.

Disabling Monitoring

If you wish to disable monitoring completely, set metrics.enabled to false in stardog.properties.

Knowledge Graph Metrics

These metrics focus on measuring a few key aspects of the knowledge graph.

Metric Name Type Unit Description
kga.YourDb.cn long count The number of “Connected Nodes” in the Knowledge Graph, which is the number of nodes with outgoing edges
kga.YourDb.ce.YourClass long count The number of entities of a particular class
kga.YourDb.take long count The number of edges in the Knowledge Graph
kga.YourDb.reach.cardinality long count The number of edges needed to answer all queries in the last 1 hr
kga.YourDb.reach.accuracy string enum The estimated accuracy of the current reach cardinality
kga.YourDb.reach.rate double count The number of edges per second used to answer queries in the last 1 hr
kga.YourDb.reach.histogram histogram - The histogram of reach measurements for the last 1 hr

Instance Wide Metrics

These metrics are written out for the entire Stardog instance and capture triple counts for each virtual graph and the total triple count. For the purposes of this document, virtual graph IRIs such as virtual://name consist of UriScheme (virtual) and VirtualGraphName (name).

Metric Name Type Unit Description
kga.totalTriples double count Total number of triples in all virtual graphs
kga.UriScheme.VirtualGraphName.triples double count Number of triples in the given virtual graph

Block Cache Metrics

These are metrics for the 3 global block caches: Data, Dictionary, and TXN. Each Block cache is global and shared by all databases simultaneously. There are three distinct caches with distinct purposes, but each has their own set of metrics.

  1. The Data Cache stores data from indices.
  2. The Dictionary Cache stores entries from dictionary mappings.
  3. The Txn Cache stores transaction entries. This speeds up access to transaction metadata.

For convenience, the metrics system rolls up statistics for all three Block Caches into a “total” metrics. Thus, we have four prefix forms:

  1. dbms.memory.blockcache.data
  2. dbms.memory.blockcache.dictionary
  3. dbms.memory.blockcache.txn
  4. dbms.memory.blockcache.total

Since all metrics have the same definition for each time, we will only list the metrics once, using the form dbms.memory.blockcache.CACHE.<metric>, where CACHE can be data,dictionary,txn, or total.

Each Block cache has three internal components:

  1. The “data” component is where actual bytes from files are stored.
  2. The “index” component is where file indices are stored.
  3. The “filter” component is where bloom filters are stored.

If storage.cacheIndexBlocks, storage.cacheDictionaryIndexBlocks, and storage.cacheTxnIndexBlocks are false, then the “index” and “filter” sections will not be populated for that block cache and will instead have all zeros.

Metric Name Type Unit Description
dbms.memory.blockcache.CACHE.ratio double percentage The percentage of cache requests that were served by the cache directly since the process started
dbms.memory.blockcache.CACHE.hits long count The number of cache requests that were served by the cache directly since the process started
dbms.memory.blockcache.CACHE.misses long count The number of cache requests that could not be served by the cache since the process started
dbms.memory.blockcache.CACHE.add.count long count The number of entries that were added to the cache since the process started
dbms.memory.blockcache.CACHE.add.failure.count long count The number of times adding to the cache failed, for any reason, since the process started
dbms.memory.blockcache.CACHE.index.ratio double percentage The percentage of index requests to the cache that were served by the cache directly since the process started
dbms.memory.blockcache.CACHE.index.hits long count The number of index requests to the cache that were successfully served by the cache directly since the process started
dbms.memory.blockcache.CACHE.index.misses long count The number of index requests to the cache that did not find any data since the process started
dbms.memory.blockcache.CACHE.filter.ratio double percentage The percentage of filter requests to the cache that were served by the cache directly since the process started
dbms.memory.blockcache.CACHE.filter.hits long count The number of filter requests to the cache that were successfully served by the cache directly since the process started
dbms.memory.blockcache.CACHE.filter.misses long count The number of filter requests to the cache that did not find any data since the process started
dbms.memory.blockcache.CACHE.data.ratio double percentage The percentage of data requests to the cache that were served by the cache directly since the process started
dbms.memory.blockcache.CACHE.data.hits long count The number of data requests to the cache that were successfully served by the cache directly since the process started
dbms.memory.blockcache.CACHE.data.misses long count The number of data requests to the cache that did not find any data since the process started
dbms.memory.blockcache.CACHE.read long bytes Amount of data read from the cache since the process started
dbms.memory.blockcache.CACHE.written long bytes The amount of data written to the cache since the process started
dbms.memory.blockcache.CACHE.cachesIndexBlocks Boolean N/A If true, the cache will store index and filter blocks
dbms.memory.blockcache.CACHE.strictCapacity Boolean N/A If true, the cache will throw an error if there is no more room in the cache. When false, the cache will be allowed to “soft” grow past the capacity limit temporarily in the event of high contention
dbms.memory.blockcache.CACHE.usage long bytes The amount of memory currently being used for this block cache
dbms.memory.blockcache.CACHE.pinnedUsage long bytes The amount of memory in the block cache currently in use (i.e. by readers)
dbms.memory.blockcache.CACHE.capacity long bytes The maximum amount of memory that can be used for this block cache

Database Metrics

These metrics are written out once per database and are written in the form of databases.*. In JMX, these metrics are collected in a separate folder. For reference purposes, metrics in this table will use YourDb as the database name.

Metric Name Type Unit Description
databases.YourDb.state String N/A A String describing the current state of the database. Can be one of Online, GoingOffline, Offline, ComingOnline, Disabled
databases.YourDb.size long count An estimate of the number of quads contained in the database. This number may be inaccurate in mastiff, due to transactional considerations, and should be treated only as an estimate
databases.YourDb.openConnections long count The current number of open connections to this database
databases.YourDb.txns.openTransactions long count The current number of open transactions on this database
databases.YourDb.txns.latency.count long count The number of transactions that were recorded
databases.YourDb.txns.latency.duration_units String N/A The units that duration is measured in (usually seconds)
databases.YourDb.txns.latency.max double time The highest latency transaction measured since the database was created or the process started
databases.YourDb.txns.latency.mean double time The overall average latency of a transaction since the database was created or the process started
databases.YourDb.txns.latency.stddev double time The standard deviation latency of a transaction since the database was created or the process started
databases.YourDb.txns.latency.min double time The slowest transaction speed measured since the database was created or the process started
databases.YourDb.txns.latency.p50 double time The 50th percentile transaction latency (50% of all transactions have lower latency than this)
databases.YourDb.txns.latency.p75 double time The 75th percentile transaction latency (75% of all transactions have lower latency than this)
databases.YourDb.txns.latency.p95 double time The 95th percentile transaction latency (95% of all transactions have lower latency than this)
databases.YourDb.txns.latency.p98 double time The 98th percentile transaction latency (98% of all transactions have lower latency than this)
databases.YourDb.txns.latency.p99 double time The 99th percentile transaction latency (99% of all transactions have lower latency than this)
databases.YourDb.txns.latency.p999 double time The 99.9th percentile transaction latency (99.9% of all transactions have lower latency than this )
databases.YourDb.txns.latency.mean_rate double Rate The overall average throughput of transactions since the database was created or the process started
databases.YourDb.txns.latency.m15_rate double Rate The 15-minute exponentially-weighted moving average throughput of transactions per unit time
databases.YourDb.txns.latency.m1_rate double Rate The 1-minute exponentially-weighted moving average throughput of transactions per unit time
databases.YourDb.txns.latency.m5_rate double Rate The 5-minute exponentially-weighted moving average throughput of transactions per unit time
databases.YourDb.txns.latency.rate_units String N/A The configured units to use when measuring transaction throughput (usually in calls/unit time, where calls = ‘transactions’)
databases.YourDb.txns.size.count long count The number of transactions that were measured
databases.YourDb.txns.size.max long count The largest transaction size measured
databases.YourDb.txns.size.mean double count The average transaction size, since the database was created or the process started
databases.YourDb.txns.size.stddev double count The standard deviation in transaction size, since the database was created or the process started
databases.YourDb.txns.size.min double count The smallest transaction size, since the database was created or the process started
databases.YourDb.txns.size.p50 double count The 50th percentile transaction size (50% of all transactions are smaller than this number)
databases.YourDb.txns.size.p75 double count The 85th percentile transaction size (75% of all transactions are smaller than this number)
databases.YourDb.txns.size.p95 double count The 95th percentile transaction size (95% of all transactions are smaller than this number)
databases.YourDb.txns.size.p98 double count The 98th percentile transaction size (98% of all transactions are smaller than this number)
databases.YourDb.txns.size.p99 double count The 99th percentile transaction size (99% of all transactions are smaller than this number)
databases.YourDb.txns.size.p999 double count The 99.9th percentile transaction size (99.9% of all transactions are smaller than this number)
databases.YourDb.queries.latency.count long count The number of queries that were measured since the database was created or the process started
databases.YourDb.queries.latency.duration_units String N/A The units that query latency is measured in (usually seconds)
databases.YourDb.queries.latency.max double time The highest latency query measured since the database was created or the process started
databases.YourDb.queries.latency.min double time The lowest latency query measured since the database was created or the process started
databases.YourDb.queries.latency.mean double time The overall average latency of a query since the database was created or the process started
databases.YourDb.queries.latency.stddev double time The standard deviation latency of a query since the database was created or the process started
databases.YourDb.queries.latency.p50 double time The 50th percentile query latency (50% of all queries have lower latency than this)
databases.YourDb.queries.latency.p75 double time The 75th percentile query latency (75% of all queries have lower latency than this)
databases.YourDb.queries.latency.p95 double time The 95th percentile query latency (95% of all queries have lower latency than this)
databases.YourDb.queries.latency.p98 double time The 98th percentile query latency (98% of all queries have lower latency than this)
databases.YourDb.queries.latency.p99 double time The 99th percentile query latency (99% of all queries have lower latency than this)
databases.YourDb.queries.latency.p999 double time The 99.9th percentile query latency (99.9% of all queries have lower latency than this )
databases.YourDb.queries.latency.mean_rate double Rate The overall average throughput of queries since the database was created or the process started
databases.YourDb.queries.latency.m15_rate double Rate The 15-minute exponentially-weighted moving average throughput of queries per unit time
databases.YourDb.queries.latency.m1_rate double Rate The 1-minute exponentially-weighted moving average throughput of queries per unit time
databases.YourDb.queries.latency.m5_rate double Rate The 5-minute exponentially-weighted moving average throughput of queries per unit time
databases.YourDb.queries.latency.rate_units String N/A The configured units to use when measuring query throughput (usually in calls/unit time, where calls = ‘queries’)
databases.YourDb.queries.running long count The number of currently running queries
databases.YourDb.planCache.ratio double count The hit ratio of the plan cache, as a percentage
databases.YourDb.planCache.size double count The size of the plan cache, in entries
databases.YourDb.backgroundErrors long count The number of errors that occur during compaction or flushing, asynchronously to user calls
databases.YourDb.files.total long count The total number of files held in the database, over all indices
databases.YourDb.numKeys long count The estimated number of quads in the database. Note that this number is not transactional, so deleted quads may still be counted. Also, it’s an estimate, so it may not be very accurate to begin with

Per Index Metrics

These metrics are written out once per index within a database (i.e. SPOC, C, CPO, etc.). They are of the form databases.*. For the purposes of this document, we will use YourDb as the database name, and INAME as the index name.

There are 8 different kinds of indices in Stardog:

Index Name Description
ternary The main index storing encoded data
dictionary.dict The dictionary encoding table for the database
dictionary.value The dictionary decoding table for the database
stats The statistics index
equality The equality index
binary Binary count indices. These are only present when a database is created using Abort-on-conflict transaction semantics (disabled by default)
unary Unary count indices. These are only present when a database is created using Abort-on-conflict transaction semantics (disabled by default)
context Context count indices. These are only present when a database is created using Abort-on-conflict transaction semantics (disabled by default)

Table of Index Metrics:

Metric Name Type Unit Description
databases.YourDb.INAME.files.total long count The total number of files currently held by this index on disk
databases.YourDb.INAME.flushes.pending long count The number of flushes currently pending on this index (no more than the max. number of configured memtables)
databases.YourDb.INAME.flushes.running long count The number of flushes currently running for this index (no more than the max. number of configured memtables)
databases.YourDb.INAME.liveDataSize long count The estimated size of the “live” data for this index. “Live” data is data which will actively be processed by the read and write systems or by compaction (disregarding out of date files)
databases.YourDb.INAME.numKeys long count The estimated number of keys in this index. For Ternary indices, this is a (rough) estimate of the number of quads in the database, for the dictionary, it’s an estimate of how many statements are in the dictionary. Note that this not a transactional estimate: deleted entries are ignored, so this value will likely overcount in that scenario
databases.YourDb.INAME.numLevels Int count The configured number of levels for this index. This is set by configuration, and won’t change during the lifecycle of the process
databases.YourDb.INAME.backgroundErrors long count The number of errors that were detected during background processing of this index since the process began
databases.YourDb.INAME.tableReaderMemory.bytes long count The amount of memory currently pinned in the OS to support active readers
databases.YourDb.INAME.memory.total long count The estimated total memory used by this index, for all purposes, including memtable, reader memory, and block cache contributions
databases.YourDb.INAME.memtable.immutable.count long count The number of memtables which are currently in the “immutable phase” (i.e. waiting to flush to disk). Can never be more than the configured maximum number of memtables
databases.YourDb.INAME.memtable.total.size.bytes long bytes The current size of all memtables (active, inactive, and immutable), in bytes
databases.YourDb.INAME.memtable.unpinned.size.bytes long bytes The current size of all unpinned memtables for this index. Unpinned memtables are memtables which are not currently pinned in memory for readers
databases.YourDb.INAME.memtable.pinned.size.bytes long bytes The current size of all memtables which are pinned for reachers for this index
databases.YourDb.INAME.memtable.immutable.size.bytes long bytes The current size of all immutable memtables (memtables waiting to flush)
databases.YourDb.INAME.memtable.immutable.entries long count The current number of entries in all immutable memtables
databases.YourDb.INAME.memtable.active.entries long count The current number of entries in the active memtable (the active memtable is the memtable currently accepting writes)
databases.YourDb.INAME.memtable.active.size.bytes long bytes The current size of the active memtable
databases.YourDb.INAME.memtable.memtableStalls long count The total number of memtable stalls which have occurred since the process started or the database was created. Memtable stalls are where a flush is forced to wait for the number of L0 files to be reduced
databases.YourDb.INAME.memtable.memtableSlowdowns long count The total number of memtable slowdowns which have occurred since the process started or the database was created. Memtable slowdowns are when a flush is delayed in order to allow the L0 file count to be reduced
databases.YourDb.INAME.stalls long count The total number of stalls which have occurred on this index since the process started or the database was created. Stalls are when data cannot be accepted into a given level because it is full, and all writes must stop until that level has reduced its file count
databases.YourDb.INAME.slowdowns long count The total number of slowdowns which have occurred on this index since the process started or the database was created. Slowdowns are when the data must be delayed in order to allow compaction to reduce the file count to a give level
databases.YourDb.INAME.stalls.pendingCompaction long count The current number of stalls which happened while a compaction was pending since the process started or the database was created
databases.YourDb.INAME.slowdowns.pendingCompaction long count The current number of slowdowns which occured while a compaction was pending since the process started or the database was created
databases.YourDb.INAME.slowdowns.l0 long count The total number of slowdowns which occurred because the number of files in the L0 level exceeded the soft limit, and writes must be delayed because of it.
databases.YourDb.INAME.stalls.l0 long count The total number of stalls which occurred because the number of files in the L0 level exceeded the hard limit, and all writes must pause because of it
databases.YourDb.INAME.slowdowns.l0.withCompaction long count The total number of slowdowns which occured in the L0 level while a compaction was currently running
databases.YourDb.INAME.stalls.l0.withCompaction long count The total number of stalls which occurred in the L0 level while a compaction was currently running
databases.YourDb.INAME.numFilesCompacting long count The current number of files compacting for this index
databases.YourDb.INAME.compactions.pending long count The current number of compactions which are waiting to run for this index
databases.YourDb.INAME.compactions.completed long count The number of compactions which have completed for this index since the process began or the database was created
databases.YourDb.INAME.compactions.read.bytes long bytes The number of bytes read during compaction since the process started or the database was created
databases.YourDb.INAME.compactions.written.bytes long bytes The number of bytes written during compaction since the process started or the database was created
databases.YourDb.INAME.compaction.read.
throughput.bytesPerSec
double bytes/sec The overall read throughput of compaction (off disk) for this index since the process started or the database was created
databases.YourDb.INAME.compaction.write.
throughput.bytesPerSec
double bytes/sec The overall write throughput of compaction (to disk) for this index since the process started or the database was created
databases.YourDb.INAME.compaction.time.sec double seconds The total time spent compacting files for the index since the process started or the database was created
databases.YourDb.INAME.compaction.time.avg.sec double seconds The overall average time spent performing a compaction for this index since the process started or the database was created
databases.YourDb.INAME.compaction.keysProcessed long count The number of keys which were processed during compaction
databases.YourDb.INAME.compaction.keysDropped long count The number of keys which were removed as part of the compaction process
databases.YourDb.INAME.compaction.memory.total long count The total amount of memory currently being used to perform compactions for this index
databases.YourDb.INAME.compactions.running long count The total number of compactions currently running for this index
databases.YourDb.INAME.writeAmplification double ratio The ratio of bytes written to storage versus bytes written to the database. This is a guide to how many copies of the same data is presently on disk; for example, a write amplification of 3 means that you are writing roughly three times as much data to disk as you are writing entries to the index

HTTP Server Metrics

These metrics are used to monitor the HTTP subsystem. They are general to the process itself (since there is only one HTTP layer per process).

Metric Name Type Unit Description
admin.threads.active Integer count The current number of active threads in the admin pool (equivalent to the number of admin-level operations occurring)
admin.threads.queued Integer count The current number of admin-level operations which are queued up waiting for a thread to operate on them
admin.threads.size Integer count The maximum number of threads that admin-level operations can make use of.
user.threads.active Integer count The current number of active threads in the user pool (equivalent to the number of user-level operations currently occurring)
user.threads.queued Integer count The current number of user-level operations which are enqueued waiting for a thread. A high number here may indicate an overloaded server
user.threads.size Integer count The maximum number of threads that user-level operations can make use of.
com.stardog.http.server-.avgRequesttime.count long count The number of HTTP requests that have been made since the process started, where is the HTTP port of the process
com.stardog.http.server-.avgRequesttime.max double seconds The longest HTTP request that has been made since the process started
com.stardog.http.server-.avgRequesttime.mean double seconds The average time taken to process an HTTP request since the process started
com.stardog.http.server-.avgRequesttime.stddev double seconds The standard deviation in time taken to process an HTTP request since the process started
com.stardog.http.server-.avgRequesttime.min double seconds The minimum time taken to process an HTTP request since the process started
com.stardog.http.server-.avgRequesttime.p50 double seconds The 50th percentile time taken to process an HTTP request since the process started (50% of all HTTP requests are shorter than this number)
com.stardog.http.server-.avgRequesttime.p75 double seconds The 75th percentile time taken to process an HTTP request since the process started (75% of all HTTP requests are shorter than this number)
com.stardog.http.server-.avgRequesttime.p95 double seconds The 95th percentile time taken to process an HTTP request since the process started (95% of all HTTP requests are shorter than this number)
com.stardog.http.server-.avgRequesttime.p98 double seconds The 98th percentile time taken to process an HTTP request since the process started (98% of all HTTP requests are shorter than this number )
com.stardog.http.server-.avgRequesttime.p99 double seconds The 99th percentile time taken to process an HTTP request since the process started (99% of all HTTP requests are shorter than this number )
com.stardog.http.server-.avgRequesttime.p999 double seconds The 99.9th percentile time taken to process an HTTP request since the process started (99.9% of all HTTP requests are shorter than this number )
com.stardog.http.server-.currentRequests long count The current number of open HTTP requests

Memory Usage Metrics

Memory Management Metrics

The Memory Management subsystem is responsible for efficiently managing Stardog’s internal memory usage, especially during query answering. Memory is broken down into a set of reusable memory “blocks”.

Metric Name Type Unit Description
dbms.memory.heap.query.blocks.used long bytes The amount of Java heap which is currently being used by query blocks in the memory management system
dbms.memory.heap.query.blocks.max long bytes The maximum amount of Java heap which is devoted to use by query blocks
dbms.memory.native.query.blocks.used long bytes The amount of native (off-heap) memory which is currently being used by query blocks in the memory management system
dbms.memory.native.query.blocks.max long bytes The maximum amount of native (off-heap) memory which is devoted to use by query blocks
databases.YourDb.queries.memory.spilled long bytes The monotonically increasing number of bytes spilled over to disk during evaluation of queries against the given database
databases.YourDb.queries.memory.acquired long bytes The monotonically increasing number of bytes acquired for processing intermediate results for queries against the given database

Java Memory Metrics

These are metrics about (or related to) the JVM’s memory usage. They are usually accessible through other JVM tools (like JMX) but are provided as explicit metrics for end-user convenience.

Metric Name Type Unit Description
dbms.memory.heap.used long bytes The amount of memory currently being used by the Java heap
dbms.memory.heap.max long bytes The maximum amount of memory allowed for the Java heap. Equivalent to -Xmx settings
dbms.memory.mapped.used long bytes The amount of memory currently in use for memory-mapped buffers in the Java subsystem. Note that this does not include any memory-mapped usage from native sources (such as RocksDB)
dbms.memory.direct.buffer.used long bytes The amount of off-heap memory currently being used by Java buffers which are managed by the JVM. Note that this does not include memory buffers which are created inside of native code
dbms.memory.native.max long bytes The maximum amount of native memory that the process is allowed to use outside of the JVM. This includes any buffers that are natively created by populated inside the JVM, as well as any memory which is natively allocated (like RocksDB)

Thread dumps for the server can be retrieved with the metric jvm.threads, but only if the threads parameter is set to true in the HTTP request. Using the --threads option in the stardog-admin server metrics CLI command will achieve this. This capability is useful as an alternative to jstack, as it does not require login access to the server.

When metrics.jvm.enabled is set to true in stardog.properties, Stardog additionally reports a set of JVM metrics. They have the following prefixes:

  • jvm.gc.* for GC related metrics
  • jvm.memory.* for JVM heap related metrics
  • jvm.memory.buffers.* for JVM metrics related to use of memory buffers

Process Memory Metrics

These are metrics about the process itself, ignoring the JVM. These are almost always accessible through other means (such as ps on Linux systems) but are provided as metrics within Stardog both for end-user convenience and for automatic management (such as warning when memory usage exceeds a threshold).

Metric Name Type Unit Description
dbms.memory.system.rss long bytes The current OS-reported RSS (Resident Set Size) for this process. For more information on RSS, see this article
dbms.memory.system.rss.peak long bytes The OS-reported maximum RSS achevied by this process since it started
dbms.memory.system.virtual long bytes The current OS-reported Virtual memory size for this process. Note that a large virtual size does not automatically equate to a large actual memory usage. For more information see this StackOverflow description
dbms.memory.system.regioncount long Count The current OS-reported number of regions in use by this process. This number only applies to operating systems which have a region-based memory system, like OS X (but not Linux or Windows). For operating systems which does not use regional memory, this number will be set to 1
dbms.memory.system.pinnedSize long bytes The current amount of memory which is “pinned” by the operating system, and cannot be swapped out by the process. Note that only some operating systems support this; operating systems which do not support the metric will always report -1 for this value
dbms.memory.system.pageSize long bytes The size of a single memory page in the OS
dbms.memory.system.usageRatio long percentage The ratio of currently used memory to the total amount available to the process

Process Metrics

Process metrics are metrics that are unique to the Stardog process currently running and its environment. They contain information about the process itself without referencing any specific database.

Metric Name Type Unit Description
dbms.version String N/A The release version of the server
dbms.type String N/A The type of license in effect for the server. Can be one of: Community, Developer, Enterprise
dbms.id String N/A The id of the kernel. This is a unique identifier for the specific Stardog process. In non-clustered environments, this is just a random ID which is not persisted across restarts. In clustered environments, the kernel id is constructed from configuration and IP addresses to allow for unique identity within a cluster
dbms.home String N/A The full path to the home directory of this running process (i.e. $STARDOG_HOME)
system.uptime long milliseconds The amount of time since the process started
system.os String N/A An identifier for the operating system that Stardog is running on
system.arch String N/A An identifier of the specific architecture that Stardog is running on
system.cpu.usage double percentage The percentage of available system CPUs that are being used for the Stardog process. Calculated as the total CPU cycles used by the process (as reported by the Operating System) divided by the number of processors available
dbms.credentials.cache.size long count The approximate number of entries in the security cache
dbms.credentials.cache.hits long count The number of cache hits in the security cache
dbms.credentials.cache.misses long count The number of cache misses in the security cache
dbms.credentials.cache.loadSuccesses long count The number of times a cache miss resulted in successfully loading a value from the underlying cache storage system since this process started
dbms.credentials.cache.loadFailures long count The number of times a load into the security cache failed, for any reason, since the process started
dbms.credentials.cache.evictions long count The number of entries which have been evicted from the security cache since the process started
system.db.count long count The number of databases stored in Stardog