Server Monitoring

This page discusses how to monitor the Stardog server.

Page Contents

Overview
Accessing Monitoring Information
1. Prometheus
  1. Prometheus Metrics Filters
  2. Filter examples
JMX Monitoring
Disabling Monitoring
Knowledge Graph Metrics
1. Instance Wide Metrics
Block Cache Metrics
Database Metrics
1. Per Index Metrics
HTTP Server Metrics
Memory Usage Metrics
Process Metrics

Overview

Stardog provides server monitoring via the Metrics library. In addition to providing some basic JVM information, Stardog also exports information about the Stardog DBMS configuration, as well as stats for all databases within the system (e.g., the total number of open connections, database size, and average query time).

Accessing Monitoring Information

Monitoring information is available via the Java API, the HTTP API, the CLI, or (if configured) the JMX interface.

Performing a GET on the /admin/status endpoint will return a JSON object containing all the information available about the server and its databases.

$ curl -u admin:admin "http://localhost:5820/admin/status/"

The stardog-admin server status CLI command will print a subset of this information to the console.

The endpoint /{yourDatabaseName}/status endpoint will return the monitoring information about that database’s status.

$ curl -u admin:admin "http://localhost:5820/{yourDatabaseName}/status/"

Prometheus

Monitoring information is also available for Prometheus via the /admin/status/prometheus endpoint, allowing Prometheus servers to scrape Stardog directly. This endpoint requires authentication.

In some environments, it can be advantageous to scrape metrics from an unauthenticated endpoint. The /admin/status/prometheus/internal endpoint allows a service from a private address space to scrape the metrics without Stardog authentication. By default, it is restricted to connections from 127.0.0.1/32. The allowed CIDR can be changed via stardog.properties with the prometheus.allowCIDR option.

$ curl -u admin:admin "http://localhost:5820/admin/status/prometheus/"

READ permission on dbms-admin:metrics is required to access both the /admin/status and /admin/status/prometheus endpoints in order to consume server metrics information. This is to prevent giving access to sensitive information in server metrics to any unauthenticated users. If the user does not have this permission, admin/status will only return a few metrics, such as the version of the server.

Prometheus Metrics Filters

The Prometheus API offers ways to limit the number of metrics returned. This may be achieved by supplying a regex either in the stardog.properties or directly via a query parameter. The API endpoint supports the query parameters include and exclude, which can be configured in the scrape_config section of the Prometheus configuration file. stardog.properties has corresponding config options named metrics.prometheus.include and metrics.prometheus.exclude. When both parameters are supplied, the exclusion will win. Query parameters take precedence over the configuration option.

Filter examples

Get all metrics starting with dbms_ and com_ with the regex ^(dbms|com)_.*:

$ curl -u admin:admin "http://localhost:5820/admin/status/prometheus?include=dbms_.%2A&include=%5E%28dbms_%7Ccom%29_.%2A"

Or add the following line to stardog.properties:

metrics.prometheus.include=^(dbms|com)_.*

Exclude database specific metrics and kga metrics with the regex ^(databases|kga)_.*:

$ curl -u admin:admin "http://localhost:5820/admin/status/prometheus?exclude=%5E%28databases%7Ckga%29_.%2A"

Or add the following line to stardog.properties:

metrics.prometheus.exclude=^(databases|kga)_.*

JMX Monitoring

By default, JMX monitoring is not enabled. You can enable it by setting metrics.reporter=jmx in the stardog.properties file. Then, you can use a tool like VisualVM or JConsole to attach to the process running the JVM, or connect directly to the JMX server.

If you want to connect to the JMX server remotely, you need to set metrics.jmx.remote.access=true in stardog.properties. Stardog will bind an RMI server for remote access on port 5833. If you want to change the port Stardog binds the remote server to, you can set the property metrics.jmx.port in stardog.properties.

Disabling Monitoring

If you wish to disable monitoring completely, set metrics.enabled to false in stardog.properties.

Knowledge Graph Metrics

These metrics focus on measuring a few key aspects of the knowledge graph.

Metric Name	Type	Unit	Description
`kga.YourDb.cn`	long	count	The number of “Connected Nodes” in the Knowledge Graph, which is the number of nodes with outgoing edges
`kga.YourDb.ce.YourClass`	long	count	The number of entities of a particular class
`kga.YourDb.take`	long	count	The number of edges in the Knowledge Graph
`kga.YourDb.reach.cardinality`	long	count	The number of edges needed to answer all queries in the last 1 hr
`kga.YourDb.reach.accuracy`	string	enum	The estimated accuracy of the current reach cardinality
`kga.YourDb.reach.rate`	double	count	The number of edges per second used to answer queries in the last 1 hr
`kga.YourDb.reach.histogram`	histogram	-	The histogram of reach measurements for the last 1 hr

Instance Wide Metrics

These metrics are written out for the entire Stardog instance and capture triple counts for each virtual graph and the total triple count. For the purposes of this document, virtual graph IRIs such as virtual://name consist of UriScheme (virtual) and VirtualGraphName (name).

Metric Name	Type	Unit	Description
`kga.totalTriples`	double	count	Total number of triples in all virtual graphs
`kga.UriScheme.VirtualGraphName.triples`	double	count	Number of triples in the given virtual graph

Block Cache Metrics

These are metrics for the 3 global block caches: Data, Dictionary, and TXN. Each Block cache is global and shared by all databases simultaneously. There are three distinct caches with distinct purposes, but each has their own set of metrics.

The Data Cache stores data from indices.
The Dictionary Cache stores entries from dictionary mappings.
The Txn Cache stores transaction entries. This speeds up access to transaction metadata.

For convenience, the metrics system rolls up statistics for all three Block Caches into a “total” metrics. Thus, we have four prefix forms:

dbms.memory.blockcache.data
dbms.memory.blockcache.dictionary
dbms.memory.blockcache.txn
dbms.memory.blockcache.total

Since all metrics have the same definition for each time, we will only list the metrics once, using the form dbms.memory.blockcache.CACHE.<metric>, where CACHE can be data,dictionary,txn, or total.

Each Block cache has three internal components:

The “data” component is where actual bytes from files are stored.
The “index” component is where file indices are stored.
The “filter” component is where bloom filters are stored.

If storage.cacheIndexBlocks, storage.cacheDictionaryIndexBlocks, and storage.cacheTxnIndexBlocks are false, then the “index” and “filter” sections will not be populated for that block cache and will instead have all zeros.

Metric Name	Type	Unit	Description
`dbms.memory.blockcache.CACHE.ratio`	double	percentage	The percentage of cache requests that were served by the cache directly since the process started
`dbms.memory.blockcache.CACHE.hits`	long	count	The number of cache requests that were served by the cache directly since the process started
`dbms.memory.blockcache.CACHE.misses`	long	count	The number of cache requests that could not be served by the cache since the process started
`dbms.memory.blockcache.CACHE.add.count`	long	count	The number of entries that were added to the cache since the process started
`dbms.memory.blockcache.CACHE.add.failure.count`	long	count	The number of times adding to the cache failed, for any reason, since the process started
`dbms.memory.blockcache.CACHE.index.ratio`	double	percentage	The percentage of index requests to the cache that were served by the cache directly since the process started
`dbms.memory.blockcache.CACHE.index.hits`	long	count	The number of index requests to the cache that were successfully served by the cache directly since the process started
`dbms.memory.blockcache.CACHE.index.misses`	long	count	The number of index requests to the cache that did not find any data since the process started
`dbms.memory.blockcache.CACHE.filter.ratio`	double	percentage	The percentage of filter requests to the cache that were served by the cache directly since the process started
`dbms.memory.blockcache.CACHE.filter.hits`	long	count	The number of filter requests to the cache that were successfully served by the cache directly since the process started
`dbms.memory.blockcache.CACHE.filter.misses`	long	count	The number of filter requests to the cache that did not find any data since the process started
`dbms.memory.blockcache.CACHE.data.ratio`	double	percentage	The percentage of data requests to the cache that were served by the cache directly since the process started
`dbms.memory.blockcache.CACHE.data.hits`	long	count	The number of data requests to the cache that were successfully served by the cache directly since the process started
`dbms.memory.blockcache.CACHE.data.misses`	long	count	The number of data requests to the cache that did not find any data since the process started
`dbms.memory.blockcache.CACHE.read`	long	bytes	Amount of data read from the cache since the process started
`dbms.memory.blockcache.CACHE.written`	long	bytes	The amount of data written to the cache since the process started
`dbms.memory.blockcache.CACHE.cachesIndexBlocks`	Boolean	N/A	If true, the cache will store index and filter blocks
`dbms.memory.blockcache.CACHE.strictCapacity`	Boolean	N/A	If true, the cache will throw an error if there is no more room in the cache. When false, the cache will be allowed to “soft” grow past the capacity limit temporarily in the event of high contention
`dbms.memory.blockcache.CACHE.usage`	long	bytes	The amount of memory currently being used for this block cache
`dbms.memory.blockcache.CACHE.pinnedUsage`	long	bytes	The amount of memory in the block cache currently in use (i.e. by readers)
`dbms.memory.blockcache.CACHE.capacity`	long	bytes	The maximum amount of memory that can be used for this block cache

Database Metrics

These metrics are written out once per database and are written in the form of databases.*. In JMX, these metrics are collected in a separate folder. For reference purposes, metrics in this table will use YourDb as the database name.

Metric Name	Type	Unit	Description
`databases.YourDb.state`	String	N/A	A String describing the current state of the database. Can be one of Online, GoingOffline, Offline, ComingOnline, Disabled
`databases.YourDb.size`	long	count	An estimate of the number of quads contained in the database. This number may be inaccurate in mastiff, due to transactional considerations, and should be treated only as an estimate
`databases.YourDb.openConnections`	long	count	The current number of open connections to this database
`databases.YourDb.txns.openTransactions`	long	count	The current number of open transactions on this database
`databases.YourDb.txns.latency.count`	long	count	The number of transactions that were recorded
`databases.YourDb.txns.latency.duration_units`	String	N/A	The units that duration is measured in (usually seconds)
`databases.YourDb.txns.latency.max`	double	time	The highest latency transaction measured since the database was created or the process started
`databases.YourDb.txns.latency.mean`	double	time	The overall average latency of a transaction since the database was created or the process started
`databases.YourDb.txns.latency.stddev`	double	time	The standard deviation latency of a transaction since the database was created or the process started
`databases.YourDb.txns.latency.min`	double	time	The slowest transaction speed measured since the database was created or the process started
`databases.YourDb.txns.latency.p50`	double	time	The 50th percentile transaction latency (50% of all transactions have lower latency than this)
`databases.YourDb.txns.latency.p75`	double	time	The 75th percentile transaction latency (75% of all transactions have lower latency than this)
`databases.YourDb.txns.latency.p95`	double	time	The 95th percentile transaction latency (95% of all transactions have lower latency than this)
`databases.YourDb.txns.latency.p98`	double	time	The 98th percentile transaction latency (98% of all transactions have lower latency than this)
`databases.YourDb.txns.latency.p99`	double	time	The 99th percentile transaction latency (99% of all transactions have lower latency than this)
`databases.YourDb.txns.latency.p999`	double	time	The 99.9th percentile transaction latency (99.9% of all transactions have lower latency than this )
`databases.YourDb.txns.latency.mean_rate`	double	Rate	The overall average throughput of transactions since the database was created or the process started
`databases.YourDb.txns.latency.m15_rate`	double	Rate	The 15-minute exponentially-weighted moving average throughput of transactions per unit time
`databases.YourDb.txns.latency.m1_rate`	double	Rate	The 1-minute exponentially-weighted moving average throughput of transactions per unit time
`databases.YourDb.txns.latency.m5_rate`	double	Rate	The 5-minute exponentially-weighted moving average throughput of transactions per unit time
`databases.YourDb.txns.latency.rate_units`	String	N/A	The configured units to use when measuring transaction throughput (usually in calls/unit time, where calls = ‘transactions’)
`databases.YourDb.txns.size.count`	long	count	The number of transactions that were measured
`databases.YourDb.txns.size.max`	long	count	The largest transaction size measured
`databases.YourDb.txns.size.mean`	double	count	The average transaction size, since the database was created or the process started
`databases.YourDb.txns.size.stddev`	double	count	The standard deviation in transaction size, since the database was created or the process started
`databases.YourDb.txns.size.min`	double	count	The smallest transaction size, since the database was created or the process started
`databases.YourDb.txns.size.p50`	double	count	The 50th percentile transaction size (50% of all transactions are smaller than this number)
`databases.YourDb.txns.size.p75`	double	count	The 85th percentile transaction size (75% of all transactions are smaller than this number)
`databases.YourDb.txns.size.p95`	double	count	The 95th percentile transaction size (95% of all transactions are smaller than this number)
`databases.YourDb.txns.size.p98`	double	count	The 98th percentile transaction size (98% of all transactions are smaller than this number)
`databases.YourDb.txns.size.p99`	double	count	The 99th percentile transaction size (99% of all transactions are smaller than this number)
`databases.YourDb.txns.size.p999`	double	count	The 99.9th percentile transaction size (99.9% of all transactions are smaller than this number)
`databases.YourDb.queries.latency.count`	long	count	The number of queries that were measured since the database was created or the process started
`databases.YourDb.queries.latency.duration_units`	String	N/A	The units that query latency is measured in (usually seconds)
`databases.YourDb.queries.latency.max`	double	time	The highest latency query measured since the database was created or the process started
`databases.YourDb.queries.latency.min`	double	time	The lowest latency query measured since the database was created or the process started
`databases.YourDb.queries.latency.mean`	double	time	The overall average latency of a query since the database was created or the process started
`databases.YourDb.queries.latency.stddev`	double	time	The standard deviation latency of a query since the database was created or the process started
`databases.YourDb.queries.latency.p50`	double	time	The 50th percentile query latency (50% of all queries have lower latency than this)
`databases.YourDb.queries.latency.p75`	double	time	The 75th percentile query latency (75% of all queries have lower latency than this)
`databases.YourDb.queries.latency.p95`	double	time	The 95th percentile query latency (95% of all queries have lower latency than this)
`databases.YourDb.queries.latency.p98`	double	time	The 98th percentile query latency (98% of all queries have lower latency than this)
`databases.YourDb.queries.latency.p99`	double	time	The 99th percentile query latency (99% of all queries have lower latency than this)
`databases.YourDb.queries.latency.p999`	double	time	The 99.9th percentile query latency (99.9% of all queries have lower latency than this )
`databases.YourDb.queries.latency.mean_rate`	double	Rate	The overall average throughput of queries since the database was created or the process started
`databases.YourDb.queries.latency.m15_rate`	double	Rate	The 15-minute exponentially-weighted moving average throughput of queries per unit time
`databases.YourDb.queries.latency.m1_rate`	double	Rate	The 1-minute exponentially-weighted moving average throughput of queries per unit time
`databases.YourDb.queries.latency.m5_rate`	double	Rate	The 5-minute exponentially-weighted moving average throughput of queries per unit time
`databases.YourDb.queries.latency.rate_units`	String	N/A	The configured units to use when measuring query throughput (usually in calls/unit time, where calls = ‘queries’)
`databases.YourDb.queries.running`	long	count	The number of currently running queries
`databases.YourDb.planCache.ratio`	double	count	The hit ratio of the plan cache, as a percentage
`databases.YourDb.planCache.size`	double	count	The size of the plan cache, in entries
`databases.YourDb.backgroundErrors`	long	count	The number of errors that occur during compaction or flushing, asynchronously to user calls
`databases.YourDb.files.total`	long	count	The total number of files held in the database, over all indices
`databases.YourDb.numKeys`	long	count	The estimated number of quads in the database. Note that this number is not transactional, so deleted quads may still be counted. Also, it’s an estimate, so it may not be very accurate to begin with

Per Index Metrics

These metrics are written out once per index within a database (i.e. SPOC, C, CPO, etc.). They are of the form databases.*. For the purposes of this document, we will use YourDb as the database name, and INAME as the index name.

There are 8 different kinds of indices in Stardog:

Index Name	Description
ternary	The main index storing encoded data
dictionary.dict	The dictionary encoding table for the database
dictionary.value	The dictionary decoding table for the database
stats	The statistics index
equality	The equality index
binary	Binary count indices. These are only present when a database is created using Abort-on-conflict transaction semantics (disabled by default)
unary	Unary count indices. These are only present when a database is created using Abort-on-conflict transaction semantics (disabled by default)
context	Context count indices. These are only present when a database is created using Abort-on-conflict transaction semantics (disabled by default)

Table of Index Metrics:

Metric Name	Type	Unit	Description
`databases.YourDb.INAME.files.total`	long	count	The total number of files currently held by this index on disk
`databases.YourDb.INAME.flushes.pending`	long	count	The number of flushes currently pending on this index (no more than the max. number of configured memtables)
`databases.YourDb.INAME.flushes.running`	long	count	The number of flushes currently running for this index (no more than the max. number of configured memtables)
`databases.YourDb.INAME.liveDataSize`	long	count	The estimated size of the “live” data for this index. “Live” data is data which will actively be processed by the read and write systems or by compaction (disregarding out of date files)
`databases.YourDb.INAME.numKeys`	long	count	The estimated number of keys in this index. For Ternary indices, this is a (rough) estimate of the number of quads in the database, for the dictionary, it’s an estimate of how many statements are in the dictionary. Note that this not a transactional estimate: deleted entries are ignored, so this value will likely overcount in that scenario
`databases.YourDb.INAME.numLevels`	Int	count	The configured number of levels for this index. This is set by configuration, and won’t change during the lifecycle of the process
`databases.YourDb.INAME.backgroundErrors`	long	count	The number of errors that were detected during background processing of this index since the process began
`databases.YourDb.INAME.tableReaderMemory.bytes`	long	count	The amount of memory currently pinned in the OS to support active readers
`databases.YourDb.INAME.memory.total`	long	count	The estimated total memory used by this index, for all purposes, including memtable, reader memory, and block cache contributions
`databases.YourDb.INAME.memtable.immutable.count`	long	count	The number of memtables which are currently in the “immutable phase” (i.e. waiting to flush to disk). Can never be more than the configured maximum number of memtables
`databases.YourDb.INAME.memtable.total.size.bytes`	long	bytes	The current size of all memtables (active, inactive, and immutable), in bytes
`databases.YourDb.INAME.memtable.unpinned.size.bytes`	long	bytes	The current size of all unpinned memtables for this index. Unpinned memtables are memtables which are not currently pinned in memory for readers
`databases.YourDb.INAME.memtable.pinned.size.bytes`	long	bytes	The current size of all memtables which are pinned for reachers for this index
`databases.YourDb.INAME.memtable.immutable.size.bytes`	long	bytes	The current size of all immutable memtables (memtables waiting to flush)
`databases.YourDb.INAME.memtable.immutable.entries`	long	count	The current number of entries in all immutable memtables
`databases.YourDb.INAME.memtable.active.entries`	long	count	The current number of entries in the active memtable (the active memtable is the memtable currently accepting writes)
`databases.YourDb.INAME.memtable.active.size.bytes`	long	bytes	The current size of the active memtable
`databases.YourDb.INAME.memtable.memtableStalls`	long	count	The total number of memtable stalls which have occurred since the process started or the database was created. Memtable stalls are where a flush is forced to wait for the number of L0 files to be reduced
`databases.YourDb.INAME.memtable.memtableSlowdowns`	long	count	The total number of memtable slowdowns which have occurred since the process started or the database was created. Memtable slowdowns are when a flush is delayed in order to allow the L0 file count to be reduced
`databases.YourDb.INAME.stalls`	long	count	The total number of stalls which have occurred on this index since the process started or the database was created. Stalls are when data cannot be accepted into a given level because it is full, and all writes must stop until that level has reduced its file count
`databases.YourDb.INAME.slowdowns`	long	count	The total number of slowdowns which have occurred on this index since the process started or the database was created. Slowdowns are when the data must be delayed in order to allow compaction to reduce the file count to a give level
`databases.YourDb.INAME.stalls.pendingCompaction`	long	count	The current number of stalls which happened while a compaction was pending since the process started or the database was created
`databases.YourDb.INAME.slowdowns.pendingCompaction`	long	count	The current number of slowdowns which occured while a compaction was pending since the process started or the database was created
`databases.YourDb.INAME.slowdowns.l0`	long	count	The total number of slowdowns which occurred because the number of files in the L0 level exceeded the soft limit, and writes must be delayed because of it.
`databases.YourDb.INAME.stalls.l0`	long	count	The total number of stalls which occurred because the number of files in the L0 level exceeded the hard limit, and all writes must pause because of it
`databases.YourDb.INAME.slowdowns.l0.withCompaction`	long	count	The total number of slowdowns which occured in the L0 level while a compaction was currently running
`databases.YourDb.INAME.stalls.l0.withCompaction`	long	count	The total number of stalls which occurred in the L0 level while a compaction was currently running
`databases.YourDb.INAME.numFilesCompacting`	long	count	The current number of files compacting for this index
`databases.YourDb.INAME.compactions.pending`	long	count	The current number of compactions which are waiting to run for this index
`databases.YourDb.INAME.compactions.completed`	long	count	The number of compactions which have completed for this index since the process began or the database was created
`databases.YourDb.INAME.compactions.read.bytes`	long	bytes	The number of bytes read during compaction since the process started or the database was created
`databases.YourDb.INAME.compactions.written.bytes`	long	bytes	The number of bytes written during compaction since the process started or the database was created
`databases.YourDb.INAME.compaction.read.` `throughput.bytesPerSec`	double	bytes/sec	The overall read throughput of compaction (off disk) for this index since the process started or the database was created
`databases.YourDb.INAME.compaction.write.` `throughput.bytesPerSec`	double	bytes/sec	The overall write throughput of compaction (to disk) for this index since the process started or the database was created
`databases.YourDb.INAME.compaction.time.sec`	double	seconds	The total time spent compacting files for the index since the process started or the database was created
`databases.YourDb.INAME.compaction.time.avg.sec`	double	seconds	The overall average time spent performing a compaction for this index since the process started or the database was created
`databases.YourDb.INAME.compaction.keysProcessed`	long	count	The number of keys which were processed during compaction
`databases.YourDb.INAME.compaction.keysDropped`	long	count	The number of keys which were removed as part of the compaction process
`databases.YourDb.INAME.compaction.memory.total`	long	count	The total amount of memory currently being used to perform compactions for this index
`databases.YourDb.INAME.compactions.running`	long	count	The total number of compactions currently running for this index
`databases.YourDb.INAME.writeAmplification`	double	ratio	The ratio of bytes written to storage versus bytes written to the database. This is a guide to how many copies of the same data is presently on disk; for example, a write amplification of 3 means that you are writing roughly three times as much data to disk as you are writing entries to the index

HTTP Server Metrics

These metrics are used to monitor the HTTP subsystem. They are general to the process itself (since there is only one HTTP layer per process).

Metric Name	Type	Unit	Description
`admin.threads.active`	Integer	count	The current number of active threads in the admin pool (equivalent to the number of admin-level operations occurring)
`admin.threads.queued`	Integer	count	The current number of admin-level operations which are queued up waiting for a thread to operate on them
`admin.threads.size`	Integer	count	The maximum number of threads that admin-level operations can make use of.
`user.threads.active`	Integer	count	The current number of active threads in the user pool (equivalent to the number of user-level operations currently occurring)
`user.threads.queued`	Integer	count	The current number of user-level operations which are enqueued waiting for a thread. A high number here may indicate an overloaded server
`user.threads.size`	Integer	count	The maximum number of threads that user-level operations can make use of.
`com.stardog.http.server-.avgRequesttime.count`	long	count	The number of HTTP requests that have been made since the process started, where is the HTTP port of the process
`com.stardog.http.server-.avgRequesttime.max`	double	milliseconds	The longest HTTP request that has been made since the process started
`com.stardog.http.server-.avgRequesttime.mean`	double	milliseconds	The average time taken to process an HTTP request since the process started
`com.stardog.http.server-.avgRequesttime.stddev`	double	milliseconds	The standard deviation in time taken to process an HTTP request since the process started
`com.stardog.http.server-.avgRequesttime.min`	double	milliseconds	The minimum time taken to process an HTTP request since the process started
`com.stardog.http.server-.avgRequesttime.p50`	double	milliseconds	The 50th percentile time taken to process an HTTP request since the process started (50% of all HTTP requests are shorter than this number)
`com.stardog.http.server-.avgRequesttime.p75`	double	milliseconds	The 75th percentile time taken to process an HTTP request since the process started (75% of all HTTP requests are shorter than this number)
`com.stardog.http.server-.avgRequesttime.p95`	double	milliseconds	The 95th percentile time taken to process an HTTP request since the process started (95% of all HTTP requests are shorter than this number)
`com.stardog.http.server-.avgRequesttime.p98`	double	milliseconds	The 98th percentile time taken to process an HTTP request since the process started (98% of all HTTP requests are shorter than this number )
`com.stardog.http.server-.avgRequesttime.p99`	double	milliseconds	The 99th percentile time taken to process an HTTP request since the process started (99% of all HTTP requests are shorter than this number )
`com.stardog.http.server-.avgRequesttime.p999`	double	milliseconds	The 99.9th percentile time taken to process an HTTP request since the process started (99.9% of all HTTP requests are shorter than this number )
`com.stardog.http.server-.currentRequests`	long	count	The current number of open HTTP requests

Memory Usage Metrics

Memory Management Metrics

The Memory Management subsystem is responsible for efficiently managing Stardog’s internal memory usage, especially during query answering. Memory is broken down into a set of reusable memory “blocks”.

Metric Name	Type	Unit	Description
`dbms.memory.heap.query.blocks.used`	long	bytes	The amount of Java heap which is currently being used by query blocks in the memory management system
`dbms.memory.heap.query.blocks.max`	long	bytes	The maximum amount of Java heap which is devoted to use by query blocks
`dbms.memory.native.query.blocks.used`	long	bytes	The amount of native (off-heap) memory which is currently being used by query blocks in the memory management system
`dbms.memory.native.query.blocks.max`	long	bytes	The maximum amount of native (off-heap) memory which is devoted to use by query blocks
`databases.YourDb.queries.memory.spilled`	long	bytes	The monotonically increasing number of bytes spilled over to disk during evaluation of queries against the given database
`databases.YourDb.queries.memory.acquired`	long	bytes	The monotonically increasing number of bytes acquired for processing intermediate results for queries against the given database

Java Memory Metrics

These are metrics about (or related to) the JVM’s memory usage. They are usually accessible through other JVM tools (like JMX) but are provided as explicit metrics for end-user convenience.

Metric Name	Type	Unit	Description
`dbms.memory.heap.used`	long	bytes	The amount of memory currently being used by the Java heap
`dbms.memory.heap.max`	long	bytes	The maximum amount of memory allowed for the Java heap. Equivalent to `-Xmx` settings
`dbms.memory.mapped.used`	long	bytes	The amount of memory currently in use for memory-mapped buffers in the Java subsystem. Note that this does not include any memory-mapped usage from native sources (such as RocksDB)
`dbms.memory.direct.buffer.used`	long	bytes	The amount of off-heap memory currently being used by Java buffers which are managed by the JVM. Note that this does not include memory buffers which are created inside of native code
`dbms.memory.native.max`	long	bytes	The maximum amount of native memory that the process is allowed to use outside of the JVM. This includes any buffers that are natively created by populated inside the JVM, as well as any memory which is natively allocated (like RocksDB)

Thread dumps for the server can be retrieved with the metric jvm.threads, but only if the threads parameter is set to true in the HTTP request. Using the --threads option in the stardog-admin server metrics CLI command will achieve this. This capability is useful as an alternative to jstack, as it does not require login access to the server.

When metrics.jvm.enabled is set to true in stardog.properties, Stardog additionally reports a set of JVM metrics. They have the following prefixes:

jvm.gc.* for GC related metrics
jvm.memory.* for JVM heap related metrics
jvm.memory.buffers.* for JVM metrics related to use of memory buffers

Process Memory Metrics

These are metrics about the process itself, ignoring the JVM. These are almost always accessible through other means (such as ps on Linux systems) but are provided as metrics within Stardog both for end-user convenience and for automatic management (such as warning when memory usage exceeds a threshold).

Metric Name	Type	Unit	Description
`dbms.memory.system.rss`	long	bytes	The current OS-reported RSS (Resident Set Size) for this process. For more information on RSS, see this article
`dbms.memory.system.rss.peak`	long	bytes	The OS-reported maximum RSS achevied by this process since it started
`dbms.memory.system.virtual`	long	bytes	The current OS-reported Virtual memory size for this process. Note that a large virtual size does not automatically equate to a large actual memory usage. For more information see this StackOverflow description
`dbms.memory.system.regioncount`	long	Count	The current OS-reported number of regions in use by this process. This number only applies to operating systems which have a region-based memory system, like OS X (but not Linux or Windows). For operating systems which does not use regional memory, this number will be set to 1
`dbms.memory.system.pinnedSize`	long	bytes	The current amount of memory which is “pinned” by the operating system, and cannot be swapped out by the process. Note that only some operating systems support this; operating systems which do not support the metric will always report -1 for this value
`dbms.memory.system.pageSize`	long	bytes	The size of a single memory page in the OS
`dbms.memory.system.usageRatio`	long	percentage	The ratio of currently used memory to the total amount available to the process

Process Metrics

Process metrics are metrics that are unique to the Stardog process currently running and its environment. They contain information about the process itself without referencing any specific database.

Metric Name	Type	Unit	Description
`dbms.version`	String	N/A	The release version of the server
`dbms.type`	String	N/A	The type of license in effect for the server. Can be one of: `Community`, `Developer`, `Enterprise`
`dbms.id`	String	N/A	The id of the kernel. This is a unique identifier for the specific Stardog process. In non-clustered environments, this is just a random ID which is not persisted across restarts. In clustered environments, the kernel id is constructed from configuration and IP addresses to allow for unique identity within a cluster
`dbms.home`	String	N/A	The full path to the home directory of this running process (i.e. $STARDOG_HOME)
`system.uptime`	long	milliseconds	The amount of time since the process started
`system.os`	String	N/A	An identifier for the operating system that Stardog is running on
`system.arch`	String	N/A	An identifier of the specific architecture that Stardog is running on
`system.cpu.usage`	double	percentage	The percentage of available system CPUs that are being used for the Stardog process. Calculated as the total CPU cycles used by the process (as reported by the Operating System) divided by the number of processors available
`dbms.credentials.cache.size`	long	count	The approximate number of entries in the security cache
`dbms.credentials.cache.hits`	long	count	The number of cache hits in the security cache
`dbms.credentials.cache.misses`	long	count	The number of cache misses in the security cache
`dbms.credentials.cache.loadSuccesses`	long	count	The number of times a cache miss resulted in successfully loading a value from the underlying cache storage system since this process started
`dbms.credentials.cache.loadFailures`	long	count	The number of times a load into the security cache failed, for any reason, since the process started
`dbms.credentials.cache.evictions`	long	count	The number of entries which have been evicted from the security cache since the process started
`system.db.count`	long	count	The number of databases stored in Stardog

Overview
Accessing Monitoring Information
- Prometheus
JMX Monitoring
Disabling Monitoring
Knowledge Graph Metrics
- Instance Wide Metrics
Block Cache Metrics
Database Metrics
- Per Index Metrics
HTTP Server Metrics
Memory Usage Metrics
Process Metrics