Server Monitoring
This page discusses how to monitor the Stardog server.
Page Contents
Overview
Stardog provides server monitoring via the Metrics library. In addition to providing some basic JVM information, Stardog also exports information about the Stardog DBMS configuration, as well as stats for all databases within the system (e.g., the total number of open connections, database size, and average query time).
Accessing Monitoring Information
Monitoring information is available via the Java API, the HTTP API, the CLI, or (if configured) the JMX interface.
Performing a GET
on the /admin/status
endpoint will return a JSON object containing all the information available about the server and its databases.
$ curl -u admin:admin "http://localhost:5820/admin/status/"
The stardog-admin server status
CLI command will print a subset of this information to the console.
The endpoint /{yourDatabaseName}/status
endpoint will return the monitoring information about that database’s status.
$ curl -u admin:admin "http://localhost:5820/{yourDatabaseName}/status/"
Prometheus
Monitoring information is also available for Prometheus via the /admin/status/prometheus
endpoint, allowing Prometheus servers to scrape Stardog directly. This endpoint requires authentication.
In some environments, it can be advantageous to scrape metrics from an unauthenticated endpoint. The /admin/status/prometheus/internal
endpoint allows a service from a private address space to scrape the metrics without Stardog authentication. By default, it is restricted to connections from 127.0.0.1/32
. The allowed CIDR can be changed via stardog.properties
with the prometheus.allowCIDR
option.
$ curl -u admin:admin "http://localhost:5820/admin/status/prometheus/"
READ
permission on dbms-admin:metrics
is required to access both the /admin/status
and /admin/status/prometheus
endpoints in order to consume server metrics information. This is to prevent giving access to sensitive information in server metrics to any unauthenticated users. If the user does not have this permission, admin/status
will only return a few metrics, such as the version of the server.
Prometheus Metrics Filters
The Prometheus API offers ways to limit the number of metrics returned. This may be achieved by supplying a regex either in the stardog.properties
or directly via a query parameter. The API endpoint supports the query parameters include
and exclude
, which can be configured in the scrape_config section of the Prometheus configuration file. stardog.properties
has corresponding config options named metrics.prometheus.include
and metrics.prometheus.exclude
. When both parameters are supplied, the exclusion will win. Query parameters take precedence over the configuration option.
Filter examples
Get all metrics starting with dbms_
and com_
with the regex ^(dbms|com)_.*
:
$ curl -u admin:admin "http://localhost:5820/admin/status/prometheus?include=dbms_.%2A&include=%5E%28dbms_%7Ccom%29_.%2A"
Or add the following line to stardog.properties
:
metrics.prometheus.include=^(dbms|com)_.*
Exclude database specific metrics and kga metrics with the regex ^(databases|kga)_.*
:
$ curl -u admin:admin "http://localhost:5820/admin/status/prometheus?exclude=%5E%28databases%7Ckga%29_.%2A"
Or add the following line to stardog.properties
:
metrics.prometheus.exclude=^(databases|kga)_.*
JMX Monitoring
By default, JMX monitoring is not enabled. You can enable it by setting metrics.reporter=jmx
in the stardog.properties
file. Then, you can use a tool like VisualVM or JConsole to attach to the process running the JVM, or connect directly to the JMX server.
If you want to connect to the JMX server remotely, you need to set metrics.jmx.remote.access=true
in stardog.properties
. Stardog will bind an RMI server for remote access on port 5833
. If you want to change the port Stardog binds the remote server to, you can set the property metrics.jmx.port
in stardog.properties
.
Disabling Monitoring
If you wish to disable monitoring completely, set metrics.enabled
to false
in stardog.properties
.
Knowledge Graph Metrics
These metrics focus on measuring a few key aspects of the knowledge graph.
Metric Name | Type | Unit | Description |
---|---|---|---|
kga.YourDb.cn | long | count | The number of “Connected Nodes” in the Knowledge Graph, which is the number of nodes with outgoing edges |
kga.YourDb.ce.YourClass | long | count | The number of entities of a particular class |
kga.YourDb.take | long | count | The number of edges in the Knowledge Graph |
kga.YourDb.reach.cardinality | long | count | The number of edges needed to answer all queries in the last 1 hr |
kga.YourDb.reach.accuracy | string | enum | The estimated accuracy of the current reach cardinality |
kga.YourDb.reach.rate | double | count | The number of edges per second used to answer queries in the last 1 hr |
kga.YourDb.reach.histogram | histogram | - | The histogram of reach measurements for the last 1 hr |
Instance Wide Metrics
These metrics are written out for the entire Stardog instance and capture triple counts for each virtual graph and the total triple count. For the purposes of this document, virtual graph IRIs such as virtual://name
consist of UriScheme (virtual
) and VirtualGraphName (name
).
Metric Name | Type | Unit | Description |
---|---|---|---|
kga.totalTriples | double | count | Total number of triples in all virtual graphs |
kga.UriScheme.VirtualGraphName.triples | double | count | Number of triples in the given virtual graph |
Block Cache Metrics
These are metrics for the 3 global block caches: Data, Dictionary, and TXN. Each Block cache is global and shared by all databases simultaneously. There are three distinct caches with distinct purposes, but each has their own set of metrics.
- The Data Cache stores data from indices.
- The Dictionary Cache stores entries from dictionary mappings.
- The Txn Cache stores transaction entries. This speeds up access to transaction metadata.
For convenience, the metrics system rolls up statistics for all three Block Caches into a “total” metrics. Thus, we have four prefix forms:
dbms.memory.blockcache.data
dbms.memory.blockcache.dictionary
dbms.memory.blockcache.txn
dbms.memory.blockcache.total
Since all metrics have the same definition for each time, we will only list the metrics once, using the form dbms.memory.blockcache.CACHE.<metric>
, where CACHE
can be data
,dictionary
,txn
, or total
.
Each Block cache has three internal components:
- The “data” component is where actual bytes from files are stored.
- The “index” component is where file indices are stored.
- The “filter” component is where bloom filters are stored.
If storage.cacheIndexBlocks
, storage.cacheDictionaryIndexBlocks
, and storage.cacheTxnIndexBlocks
are false, then the “index” and “filter” sections will not be populated for that block cache and will instead have all zeros.
Metric Name | Type | Unit | Description |
---|---|---|---|
dbms.memory.blockcache.CACHE.ratio | double | percentage | The percentage of cache requests that were served by the cache directly since the process started |
dbms.memory.blockcache.CACHE.hits | long | count | The number of cache requests that were served by the cache directly since the process started |
dbms.memory.blockcache.CACHE.misses | long | count | The number of cache requests that could not be served by the cache since the process started |
dbms.memory.blockcache.CACHE.add.count | long | count | The number of entries that were added to the cache since the process started |
dbms.memory.blockcache.CACHE.add.failure.count | long | count | The number of times adding to the cache failed, for any reason, since the process started |
dbms.memory.blockcache.CACHE.index.ratio | double | percentage | The percentage of index requests to the cache that were served by the cache directly since the process started |
dbms.memory.blockcache.CACHE.index.hits | long | count | The number of index requests to the cache that were successfully served by the cache directly since the process started |
dbms.memory.blockcache.CACHE.index.misses | long | count | The number of index requests to the cache that did not find any data since the process started |
dbms.memory.blockcache.CACHE.filter.ratio | double | percentage | The percentage of filter requests to the cache that were served by the cache directly since the process started |
dbms.memory.blockcache.CACHE.filter.hits | long | count | The number of filter requests to the cache that were successfully served by the cache directly since the process started |
dbms.memory.blockcache.CACHE.filter.misses | long | count | The number of filter requests to the cache that did not find any data since the process started |
dbms.memory.blockcache.CACHE.data.ratio | double | percentage | The percentage of data requests to the cache that were served by the cache directly since the process started |
dbms.memory.blockcache.CACHE.data.hits | long | count | The number of data requests to the cache that were successfully served by the cache directly since the process started |
dbms.memory.blockcache.CACHE.data.misses | long | count | The number of data requests to the cache that did not find any data since the process started |
dbms.memory.blockcache.CACHE.read | long | bytes | Amount of data read from the cache since the process started |
dbms.memory.blockcache.CACHE.written | long | bytes | The amount of data written to the cache since the process started |
dbms.memory.blockcache.CACHE.cachesIndexBlocks | Boolean | N/A | If true, the cache will store index and filter blocks |
dbms.memory.blockcache.CACHE.strictCapacity | Boolean | N/A | If true, the cache will throw an error if there is no more room in the cache. When false, the cache will be allowed to “soft” grow past the capacity limit temporarily in the event of high contention |
dbms.memory.blockcache.CACHE.usage | long | bytes | The amount of memory currently being used for this block cache |
dbms.memory.blockcache.CACHE.pinnedUsage | long | bytes | The amount of memory in the block cache currently in use (i.e. by readers) |
dbms.memory.blockcache.CACHE.capacity | long | bytes | The maximum amount of memory that can be used for this block cache |
Database Metrics
These metrics are written out once per database and are written in the form of databases.*
. In JMX, these metrics are collected in a separate folder. For reference purposes, metrics in this table will use YourDb
as the database name.
Metric Name | Type | Unit | Description |
---|---|---|---|
databases.YourDb.state | String | N/A | A String describing the current state of the database. Can be one of Online, GoingOffline, Offline, ComingOnline, Disabled |
databases.YourDb.size | long | count | An estimate of the number of quads contained in the database. This number may be inaccurate in mastiff, due to transactional considerations, and should be treated only as an estimate |
databases.YourDb.openConnections | long | count | The current number of open connections to this database |
databases.YourDb.txns.openTransactions | long | count | The current number of open transactions on this database |
databases.YourDb.txns.latency.count | long | count | The number of transactions that were recorded |
databases.YourDb.txns.latency.duration_units | String | N/A | The units that duration is measured in (usually seconds) |
databases.YourDb.txns.latency.max | double | time | The highest latency transaction measured since the database was created or the process started |
databases.YourDb.txns.latency.mean | double | time | The overall average latency of a transaction since the database was created or the process started |
databases.YourDb.txns.latency.stddev | double | time | The standard deviation latency of a transaction since the database was created or the process started |
databases.YourDb.txns.latency.min | double | time | The slowest transaction speed measured since the database was created or the process started |
databases.YourDb.txns.latency.p50 | double | time | The 50th percentile transaction latency (50% of all transactions have lower latency than this) |
databases.YourDb.txns.latency.p75 | double | time | The 75th percentile transaction latency (75% of all transactions have lower latency than this) |
databases.YourDb.txns.latency.p95 | double | time | The 95th percentile transaction latency (95% of all transactions have lower latency than this) |
databases.YourDb.txns.latency.p98 | double | time | The 98th percentile transaction latency (98% of all transactions have lower latency than this) |
databases.YourDb.txns.latency.p99 | double | time | The 99th percentile transaction latency (99% of all transactions have lower latency than this) |
databases.YourDb.txns.latency.p999 | double | time | The 99.9th percentile transaction latency (99.9% of all transactions have lower latency than this ) |
databases.YourDb.txns.latency.mean_rate | double | Rate | The overall average throughput of transactions since the database was created or the process started |
databases.YourDb.txns.latency.m15_rate | double | Rate | The 15-minute exponentially-weighted moving average throughput of transactions per unit time |
databases.YourDb.txns.latency.m1_rate | double | Rate | The 1-minute exponentially-weighted moving average throughput of transactions per unit time |
databases.YourDb.txns.latency.m5_rate | double | Rate | The 5-minute exponentially-weighted moving average throughput of transactions per unit time |
databases.YourDb.txns.latency.rate_units | String | N/A | The configured units to use when measuring transaction throughput (usually in calls/unit time, where calls = ‘transactions’) |
databases.YourDb.txns.size.count | long | count | The number of transactions that were measured |
databases.YourDb.txns.size.max | long | count | The largest transaction size measured |
databases.YourDb.txns.size.mean | double | count | The average transaction size, since the database was created or the process started |
databases.YourDb.txns.size.stddev | double | count | The standard deviation in transaction size, since the database was created or the process started |
databases.YourDb.txns.size.min | double | count | The smallest transaction size, since the database was created or the process started |
databases.YourDb.txns.size.p50 | double | count | The 50th percentile transaction size (50% of all transactions are smaller than this number) |
databases.YourDb.txns.size.p75 | double | count | The 85th percentile transaction size (75% of all transactions are smaller than this number) |
databases.YourDb.txns.size.p95 | double | count | The 95th percentile transaction size (95% of all transactions are smaller than this number) |
databases.YourDb.txns.size.p98 | double | count | The 98th percentile transaction size (98% of all transactions are smaller than this number) |
databases.YourDb.txns.size.p99 | double | count | The 99th percentile transaction size (99% of all transactions are smaller than this number) |
databases.YourDb.txns.size.p999 | double | count | The 99.9th percentile transaction size (99.9% of all transactions are smaller than this number) |
databases.YourDb.queries.latency.count | long | count | The number of queries that were measured since the database was created or the process started |
databases.YourDb.queries.latency.duration_units | String | N/A | The units that query latency is measured in (usually seconds) |
databases.YourDb.queries.latency.max | double | time | The highest latency query measured since the database was created or the process started |
databases.YourDb.queries.latency.min | double | time | The lowest latency query measured since the database was created or the process started |
databases.YourDb.queries.latency.mean | double | time | The overall average latency of a query since the database was created or the process started |
databases.YourDb.queries.latency.stddev | double | time | The standard deviation latency of a query since the database was created or the process started |
databases.YourDb.queries.latency.p50 | double | time | The 50th percentile query latency (50% of all queries have lower latency than this) |
databases.YourDb.queries.latency.p75 | double | time | The 75th percentile query latency (75% of all queries have lower latency than this) |
databases.YourDb.queries.latency.p95 | double | time | The 95th percentile query latency (95% of all queries have lower latency than this) |
databases.YourDb.queries.latency.p98 | double | time | The 98th percentile query latency (98% of all queries have lower latency than this) |
databases.YourDb.queries.latency.p99 | double | time | The 99th percentile query latency (99% of all queries have lower latency than this) |
databases.YourDb.queries.latency.p999 | double | time | The 99.9th percentile query latency (99.9% of all queries have lower latency than this ) |
databases.YourDb.queries.latency.mean_rate | double | Rate | The overall average throughput of queries since the database was created or the process started |
databases.YourDb.queries.latency.m15_rate | double | Rate | The 15-minute exponentially-weighted moving average throughput of queries per unit time |
databases.YourDb.queries.latency.m1_rate | double | Rate | The 1-minute exponentially-weighted moving average throughput of queries per unit time |
databases.YourDb.queries.latency.m5_rate | double | Rate | The 5-minute exponentially-weighted moving average throughput of queries per unit time |
databases.YourDb.queries.latency.rate_units | String | N/A | The configured units to use when measuring query throughput (usually in calls/unit time, where calls = ‘queries’) |
databases.YourDb.queries.running | long | count | The number of currently running queries |
databases.YourDb.planCache.ratio | double | count | The hit ratio of the plan cache, as a percentage |
databases.YourDb.planCache.size | double | count | The size of the plan cache, in entries |
databases.YourDb.backgroundErrors | long | count | The number of errors that occur during compaction or flushing, asynchronously to user calls |
databases.YourDb.files.total | long | count | The total number of files held in the database, over all indices |
databases.YourDb.numKeys | long | count | The estimated number of quads in the database. Note that this number is not transactional, so deleted quads may still be counted. Also, it’s an estimate, so it may not be very accurate to begin with |
Per Index Metrics
These metrics are written out once per index within a database (i.e. SPOC, C, CPO, etc.). They are of the form databases.*
. For the purposes of this document, we will use YourDb
as the database name, and INAME
as the index name.
There are 8 different kinds of indices in Stardog:
Index Name | Description |
---|---|
ternary | The main index storing encoded data |
dictionary.dict | The dictionary encoding table for the database |
dictionary.value | The dictionary decoding table for the database |
stats | The statistics index |
equality | The equality index |
binary | Binary count indices. These are only present when a database is created using Abort-on-conflict transaction semantics (disabled by default) |
unary | Unary count indices. These are only present when a database is created using Abort-on-conflict transaction semantics (disabled by default) |
context | Context count indices. These are only present when a database is created using Abort-on-conflict transaction semantics (disabled by default) |
Table of Index Metrics:
Metric Name | Type | Unit | Description |
---|---|---|---|
databases.YourDb.INAME.files.total | long | count | The total number of files currently held by this index on disk |
databases.YourDb.INAME.flushes.pending | long | count | The number of flushes currently pending on this index (no more than the max. number of configured memtables) |
databases.YourDb.INAME.flushes.running | long | count | The number of flushes currently running for this index (no more than the max. number of configured memtables) |
databases.YourDb.INAME.liveDataSize | long | count | The estimated size of the “live” data for this index. “Live” data is data which will actively be processed by the read and write systems or by compaction (disregarding out of date files) |
databases.YourDb.INAME.numKeys | long | count | The estimated number of keys in this index. For Ternary indices, this is a (rough) estimate of the number of quads in the database, for the dictionary, it’s an estimate of how many statements are in the dictionary. Note that this not a transactional estimate: deleted entries are ignored, so this value will likely overcount in that scenario |
databases.YourDb.INAME.numLevels | Int | count | The configured number of levels for this index. This is set by configuration, and won’t change during the lifecycle of the process |
databases.YourDb.INAME.backgroundErrors | long | count | The number of errors that were detected during background processing of this index since the process began |
databases.YourDb.INAME.tableReaderMemory.bytes | long | count | The amount of memory currently pinned in the OS to support active readers |
databases.YourDb.INAME.memory.total | long | count | The estimated total memory used by this index, for all purposes, including memtable, reader memory, and block cache contributions |
databases.YourDb.INAME.memtable.immutable.count | long | count | The number of memtables which are currently in the “immutable phase” (i.e. waiting to flush to disk). Can never be more than the configured maximum number of memtables |
databases.YourDb.INAME.memtable.total.size.bytes | long | bytes | The current size of all memtables (active, inactive, and immutable), in bytes |
databases.YourDb.INAME.memtable.unpinned.size.bytes | long | bytes | The current size of all unpinned memtables for this index. Unpinned memtables are memtables which are not currently pinned in memory for readers |
databases.YourDb.INAME.memtable.pinned.size.bytes | long | bytes | The current size of all memtables which are pinned for reachers for this index |
databases.YourDb.INAME.memtable.immutable.size.bytes | long | bytes | The current size of all immutable memtables (memtables waiting to flush) |
databases.YourDb.INAME.memtable.immutable.entries | long | count | The current number of entries in all immutable memtables |
databases.YourDb.INAME.memtable.active.entries | long | count | The current number of entries in the active memtable (the active memtable is the memtable currently accepting writes) |
databases.YourDb.INAME.memtable.active.size.bytes | long | bytes | The current size of the active memtable |
databases.YourDb.INAME.memtable.memtableStalls | long | count | The total number of memtable stalls which have occurred since the process started or the database was created. Memtable stalls are where a flush is forced to wait for the number of L0 files to be reduced |
databases.YourDb.INAME.memtable.memtableSlowdowns | long | count | The total number of memtable slowdowns which have occurred since the process started or the database was created. Memtable slowdowns are when a flush is delayed in order to allow the L0 file count to be reduced |
databases.YourDb.INAME.stalls | long | count | The total number of stalls which have occurred on this index since the process started or the database was created. Stalls are when data cannot be accepted into a given level because it is full, and all writes must stop until that level has reduced its file count |
databases.YourDb.INAME.slowdowns | long | count | The total number of slowdowns which have occurred on this index since the process started or the database was created. Slowdowns are when the data must be delayed in order to allow compaction to reduce the file count to a give level |
databases.YourDb.INAME.stalls.pendingCompaction | long | count | The current number of stalls which happened while a compaction was pending since the process started or the database was created |
databases.YourDb.INAME.slowdowns.pendingCompaction | long | count | The current number of slowdowns which occured while a compaction was pending since the process started or the database was created |
databases.YourDb.INAME.slowdowns.l0 | long | count | The total number of slowdowns which occurred because the number of files in the L0 level exceeded the soft limit, and writes must be delayed because of it. |
databases.YourDb.INAME.stalls.l0 | long | count | The total number of stalls which occurred because the number of files in the L0 level exceeded the hard limit, and all writes must pause because of it |
databases.YourDb.INAME.slowdowns.l0.withCompaction | long | count | The total number of slowdowns which occured in the L0 level while a compaction was currently running |
databases.YourDb.INAME.stalls.l0.withCompaction | long | count | The total number of stalls which occurred in the L0 level while a compaction was currently running |
databases.YourDb.INAME.numFilesCompacting | long | count | The current number of files compacting for this index |
databases.YourDb.INAME.compactions.pending | long | count | The current number of compactions which are waiting to run for this index |
databases.YourDb.INAME.compactions.completed | long | count | The number of compactions which have completed for this index since the process began or the database was created |
databases.YourDb.INAME.compactions.read.bytes | long | bytes | The number of bytes read during compaction since the process started or the database was created |
databases.YourDb.INAME.compactions.written.bytes | long | bytes | The number of bytes written during compaction since the process started or the database was created |
databases.YourDb.INAME.compaction.read. throughput.bytesPerSec | double | bytes/sec | The overall read throughput of compaction (off disk) for this index since the process started or the database was created |
databases.YourDb.INAME.compaction.write. throughput.bytesPerSec | double | bytes/sec | The overall write throughput of compaction (to disk) for this index since the process started or the database was created |
databases.YourDb.INAME.compaction.time.sec | double | seconds | The total time spent compacting files for the index since the process started or the database was created |
databases.YourDb.INAME.compaction.time.avg.sec | double | seconds | The overall average time spent performing a compaction for this index since the process started or the database was created |
databases.YourDb.INAME.compaction.keysProcessed | long | count | The number of keys which were processed during compaction |
databases.YourDb.INAME.compaction.keysDropped | long | count | The number of keys which were removed as part of the compaction process |
databases.YourDb.INAME.compaction.memory.total | long | count | The total amount of memory currently being used to perform compactions for this index |
databases.YourDb.INAME.compactions.running | long | count | The total number of compactions currently running for this index |
databases.YourDb.INAME.writeAmplification | double | ratio | The ratio of bytes written to storage versus bytes written to the database. This is a guide to how many copies of the same data is presently on disk; for example, a write amplification of 3 means that you are writing roughly three times as much data to disk as you are writing entries to the index |
HTTP Server Metrics
These metrics are used to monitor the HTTP subsystem. They are general to the process itself (since there is only one HTTP layer per process).
Metric Name | Type | Unit | Description |
---|---|---|---|
admin.threads.active | Integer | count | The current number of active threads in the admin pool (equivalent to the number of admin-level operations occurring) |
admin.threads.queued | Integer | count | The current number of admin-level operations which are queued up waiting for a thread to operate on them |
admin.threads.size | Integer | count | The maximum number of threads that admin-level operations can make use of. |
user.threads.active | Integer | count | The current number of active threads in the user pool (equivalent to the number of user-level operations currently occurring) |
user.threads.queued | Integer | count | The current number of user-level operations which are enqueued waiting for a thread. A high number here may indicate an overloaded server |
user.threads.size | Integer | count | The maximum number of threads that user-level operations can make use of. |
com.stardog.http.server-.avgRequesttime.count | long | count | The number of HTTP requests that have been made since the process started, where is the HTTP port of the process |
com.stardog.http.server-.avgRequesttime.max | double | seconds | The longest HTTP request that has been made since the process started |
com.stardog.http.server-.avgRequesttime.mean | double | seconds | The average time taken to process an HTTP request since the process started |
com.stardog.http.server-.avgRequesttime.stddev | double | seconds | The standard deviation in time taken to process an HTTP request since the process started |
com.stardog.http.server-.avgRequesttime.min | double | seconds | The minimum time taken to process an HTTP request since the process started |
com.stardog.http.server-.avgRequesttime.p50 | double | seconds | The 50th percentile time taken to process an HTTP request since the process started (50% of all HTTP requests are shorter than this number) |
com.stardog.http.server-.avgRequesttime.p75 | double | seconds | The 75th percentile time taken to process an HTTP request since the process started (75% of all HTTP requests are shorter than this number) |
com.stardog.http.server-.avgRequesttime.p95 | double | seconds | The 95th percentile time taken to process an HTTP request since the process started (95% of all HTTP requests are shorter than this number) |
com.stardog.http.server-.avgRequesttime.p98 | double | seconds | The 98th percentile time taken to process an HTTP request since the process started (98% of all HTTP requests are shorter than this number ) |
com.stardog.http.server-.avgRequesttime.p99 | double | seconds | The 99th percentile time taken to process an HTTP request since the process started (99% of all HTTP requests are shorter than this number ) |
com.stardog.http.server-.avgRequesttime.p999 | double | seconds | The 99.9th percentile time taken to process an HTTP request since the process started (99.9% of all HTTP requests are shorter than this number ) |
com.stardog.http.server-.currentRequests | long | count | The current number of open HTTP requests |
Memory Usage Metrics
Memory Management Metrics
The Memory Management subsystem is responsible for efficiently managing Stardog’s internal memory usage, especially during query answering. Memory is broken down into a set of reusable memory “blocks”.
Metric Name | Type | Unit | Description |
---|---|---|---|
dbms.memory.heap.query.blocks.used | long | bytes | The amount of Java heap which is currently being used by query blocks in the memory management system |
dbms.memory.heap.query.blocks.max | long | bytes | The maximum amount of Java heap which is devoted to use by query blocks |
dbms.memory.native.query.blocks.used | long | bytes | The amount of native (off-heap) memory which is currently being used by query blocks in the memory management system |
dbms.memory.native.query.blocks.max | long | bytes | The maximum amount of native (off-heap) memory which is devoted to use by query blocks |
databases.YourDb.queries.memory.spilled | long | bytes | The monotonically increasing number of bytes spilled over to disk during evaluation of queries against the given database |
databases.YourDb.queries.memory.acquired | long | bytes | The monotonically increasing number of bytes acquired for processing intermediate results for queries against the given database |
Java Memory Metrics
These are metrics about (or related to) the JVM’s memory usage. They are usually accessible through other JVM tools (like JMX) but are provided as explicit metrics for end-user convenience.
Metric Name | Type | Unit | Description |
---|---|---|---|
dbms.memory.heap.used | long | bytes | The amount of memory currently being used by the Java heap |
dbms.memory.heap.max | long | bytes | The maximum amount of memory allowed for the Java heap. Equivalent to -Xmx settings |
dbms.memory.mapped.used | long | bytes | The amount of memory currently in use for memory-mapped buffers in the Java subsystem. Note that this does not include any memory-mapped usage from native sources (such as RocksDB) |
dbms.memory.direct.buffer.used | long | bytes | The amount of off-heap memory currently being used by Java buffers which are managed by the JVM. Note that this does not include memory buffers which are created inside of native code |
dbms.memory.native.max | long | bytes | The maximum amount of native memory that the process is allowed to use outside of the JVM. This includes any buffers that are natively created by populated inside the JVM, as well as any memory which is natively allocated (like RocksDB) |
Thread dumps for the server can be retrieved with the metric jvm.threads
, but only if the threads
parameter is set to true in the HTTP request. Using the --threads
option in the stardog-admin server metrics
CLI command will achieve this. This capability is useful as an alternative to jstack
, as it does not require login access to the server.
When metrics.jvm.enabled
is set to true
in stardog.properties
, Stardog additionally reports a set of JVM metrics. They have the following prefixes:
jvm.gc.*
for GC related metricsjvm.memory.*
for JVM heap related metricsjvm.memory.buffers.*
for JVM metrics related to use of memory buffers
Process Memory Metrics
These are metrics about the process itself, ignoring the JVM. These are almost always accessible through other means (such as ps
on Linux systems) but are provided as metrics within Stardog both for end-user convenience and for automatic management (such as warning when memory usage exceeds a threshold).
Metric Name | Type | Unit | Description |
---|---|---|---|
dbms.memory.system.rss | long | bytes | The current OS-reported RSS (Resident Set Size) for this process. For more information on RSS, see this article |
dbms.memory.system.rss.peak | long | bytes | The OS-reported maximum RSS achevied by this process since it started |
dbms.memory.system.virtual | long | bytes | The current OS-reported Virtual memory size for this process. Note that a large virtual size does not automatically equate to a large actual memory usage. For more information see this StackOverflow description |
dbms.memory.system.regioncount | long | Count | The current OS-reported number of regions in use by this process. This number only applies to operating systems which have a region-based memory system, like OS X (but not Linux or Windows). For operating systems which does not use regional memory, this number will be set to 1 |
dbms.memory.system.pinnedSize | long | bytes | The current amount of memory which is “pinned” by the operating system, and cannot be swapped out by the process. Note that only some operating systems support this; operating systems which do not support the metric will always report -1 for this value |
dbms.memory.system.pageSize | long | bytes | The size of a single memory page in the OS |
dbms.memory.system.usageRatio | long | percentage | The ratio of currently used memory to the total amount available to the process |
Process Metrics
Process metrics are metrics that are unique to the Stardog process currently running and its environment. They contain information about the process itself without referencing any specific database.
Metric Name | Type | Unit | Description |
---|---|---|---|
dbms.version | String | N/A | The release version of the server |
dbms.type | String | N/A | The type of license in effect for the server. Can be one of: Community , Developer , Enterprise |
dbms.id | String | N/A | The id of the kernel. This is a unique identifier for the specific Stardog process. In non-clustered environments, this is just a random ID which is not persisted across restarts. In clustered environments, the kernel id is constructed from configuration and IP addresses to allow for unique identity within a cluster |
dbms.home | String | N/A | The full path to the home directory of this running process (i.e. $STARDOG_HOME) |
system.uptime | long | milliseconds | The amount of time since the process started |
system.os | String | N/A | An identifier for the operating system that Stardog is running on |
system.arch | String | N/A | An identifier of the specific architecture that Stardog is running on |
system.cpu.usage | double | percentage | The percentage of available system CPUs that are being used for the Stardog process. Calculated as the total CPU cycles used by the process (as reported by the Operating System) divided by the number of processors available |
dbms.credentials.cache.size | long | count | The approximate number of entries in the security cache |
dbms.credentials.cache.hits | long | count | The number of cache hits in the security cache |
dbms.credentials.cache.misses | long | count | The number of cache misses in the security cache |
dbms.credentials.cache.loadSuccesses | long | count | The number of times a cache miss resulted in successfully loading a value from the underlying cache storage system since this process started |
dbms.credentials.cache.loadFailures | long | count | The number of times a load into the security cache failed, for any reason, since the process started |
dbms.credentials.cache.evictions | long | count | The number of entries which have been evicted from the security cache since the process started |
system.db.count | long | count | The number of databases stored in Stardog |