This page contains migration guides for migrating Stardog from one major version to another (e.g. 6.X to 7.Y)
- Migrating to Stardog 7
- Migrating Single-Server Stardog
- Migrating Docker-hosted Stardog
- Migrating Stardog Cluster
- Disk Usage and Layout
- Web Console Removed
- Memory Databases
- Memory Configuration
- Database Optimization & Compaction
- Database Configuration
- Snapshot Isolation
- Configuration for new Stardog 7 features
- Migrating to Stardog 6
Stardog 7 introduces a new storage engine and snapshot isolation for concurrent transactions. This section provides an overview of those changes and how they affect users and programs written against previous versions.
The new storage engine in Stardog 7 introduces a completely new disk index format and databases created by previous versions of Stardog must be migrated in order to work with Stardog 7. There is a dedicated CLI command for migrating the contents of an existing Stardog home directory (i.e., all of the databases in a multi-tenant system).
The following instructions are for migrating all the databases in an existing
STARDOG_HOME directory. Instead of migrating all the databases you can start with a new empty home directory and restore select databases using backups created by Stardog versions 4 or 5. If you use the following instructions with very large databases then you should increase the memory settings by setting the environment variable
The steps for a single server migration:
- Stop the existing Stardog server; do not start Stardog 7 or have either server running
- Create a new empty Stardog home folder (we’ll call it
- Copy your license file to
- Install Stardog 7
cdto where you’ve installed Stardog 7
# OLD_HOME is the STARDOG_HOME before you start the migration $ stardog-admin server migrate OLD_HOME NEW_HOME
.bashrcprofile or otherwise) to be equal to
The command will migrate the contents of the each database along with the system database that contains users, roles, permissions, and other metadata. Progress for the migration will be printed to
STDOUT and can take a significant amount of time if you have large databases. The
stardog.properties (if it exists) file will not be copied automatically. See Disk Usage and Layout for changes to the configuration options and other information.
The migration process for Stardog running in Docker is effectively the same with a couple of Docker-specific differences.
- Stop your Docker container.
- Create a new directory on the Docker host machine (we’ll call it
- Copy your license file to
- Run the Stardog 7 Docker container in the following way, which will bring you to a command prompt within the container:
# OLD_HOME is the STARDOG_HOME before you start the migration $ docker run -v <path to NEW_HOME>:/var/opt/stardog -v <path to OLD_HOME>:/old_stardog \ --entrypoint /bin/bash -it stardog-eps-docker.jfrog.io/stardog:6.0.0-alpha
- Run the Stardog 7 migration tool in the following way:
$ stardog-admin server migrate /old_stardog /var/opt/stardog
STARDOG_HOME(in your bashrc profile or otherwise) to be equal to
The migration steps for the cluster:
- Stop all of the cluster nodes, but not the ZK cluster
- Follow the above steps for single server migration on any one cluster node
- Run the command
stardog-admin zk clear
- Start the node where migration completed with Stardog 7
- On the other cluster nodes, create empty home folders
- Start another node, wait for the node to join the cluster, and then repeat for each cluster node
The layout of data in Stardog 7 home directory is different than in all previous versions. Previously the data stored in a database was stored under a directory with the name of the database. In Stardog 7 the data for all databases is stored in a directory named
data in the home directory. The database directories still exist but they contain only index metadata along with search and spatial index if those features are enabled.
The disk usage requirements for Stardog 7 are higher than Stardog 6. The actual difference will depend on the characteristics of your data, but you should expect to see 20% to 30% increase in disk usage. Similar to Stardog 6, the disk usage of bulk loaded databases, e.g. when data is loaded by the
db create command, will be lower than the disk usage when the same data is added incrementally, that is, in smaller transactions over time.
The web console, which had been deprecated in Stardog 6, has been removed entirely from Stardog 7. We encourage you to use Stardog Studio instead.
Stardog 7 no longer supports in-memory databases. If keeping all data in memory is desired, we recommend placing the home directory on a RAM disk and create databases in the usual way.
Stardog 7 uses a new storage engine (RocksDB) which is a native library. No changes to the Java JVM memory settings are required, as Stardog will allocate memory to the storage engine from its off-heap pool. As with Stardog 6, users provide limits for the Java heap memory (
-Xmx option) and the off-heap memory (
-XX:MaxDirectMemorySize option). See Memory Usage for details.
Similar to Stardog 6, Stardog 7 performance degrades over time as the database is updated with transactions. The disk usage will continue to increase and data deleted by transactions will not be removed from disk. The existing
db optimize can be used to perform index compaction on disk to improve the performance of reads and writes. The optimize command now provides additional options for the administrators to instruct which exact optimization steps to perform.
All server and database options and their meanings are unchanged in Stardog 7, with the following exceptions:
- Options starting with
index.writer. Stardog 7 has a new mechanism which replaces the previous implementation of Differential Indexes and Read-Your-Writes so these options are ignored.
transaction.isolationneeds to be set to
SERIALIZABLEfor ICV Guard Mode in order to ensure data integrity w.r.t. the constraints.#
Stardog 7 uses a multi-versioned concurrency control (MVCC) model providing lock-free transactions with snapshot isolation guarantees. Stardog 6 provided a weaker snapshot isolation mechanism that required writers to acquire locks that sometimes blocked other transactions for a very long time, which is no longer the case. As a result, the performance of concurrent updates is greatly improved in Stardog 7, especially in the cluster setting.
There are two different modes for the MVCC transactions based on how conflicting changes made by two concurrent transactions will be handled by setting the
transaction.write.conflict.strategy database option.
This is the default behavior (
transaction.write.conflict.strategy=last_commit_wins) where the change made by the last committed transaction will be accepted. If two concurrent transactions try to add or remove the same quad the change made by the transaction last committed will be accepted while the other change is silently ignored. This is similar to Stardog 6 behavior which uses locks to achieve the same effect in a less efficient way.
This option provides the best write throughput performance but it also means Stardog cannot maintain the aggregate indexes it otherwise uses for statistics and answering some queries. For this reason, the database option
index.aggregate is set to
off in this mode.
This also means Stardog cannot track the exact size of the database without introducing additional overhead. In this mode, when you ask for the size of the database using the
data size CLI command or
Connection.size() API call you will get an approximate number. For example, if you add a quad that already exists in the database it might be double counted. Stardog will periodically update this number to be accurate but the accuracy is not guaranteed in general. The option to retrieve the exact size of the database is provided both in the CLI and the Java API but it will require scanning the contents of whole database which might be slow for large databases.
In this mode (
transaction.write.conflict.strategy=abort_on_conflict), if two concurrent transactions try to add or remove the same quad, one of the transactions will be aborted with a transaction conflict. The client then should decide if conflicted transactions should be retried or aborted. This check introduces additional overhead for write transactions but makes it possible to maintain additional indexes and provide exact size information by setting the option
You may want to do additional configuration for two features added in Stardog 7. Read more about those here:
There are two major changes to take account of.
First, the primary incompatible change in Stardog 6 is a new core API, called Stark, which replaces RDF4j/Sesame as the core API around graph concepts. Additional information about that change is detailed below.
Second, as of Stardog 6, the web console is DEPRECATED. It is still available, but it is NOT supported. We encourage you to use Stardog Studio instead.
The first thing you might notice is some different naming conventions than traditional Java libraries. Most notably, the Java Bean-style conventions of
set prefixes are abandoned in favor of shorter, more concise method names. Similarly, you’ll notice exceptions are not post fixed with
Exception, and are instead
InvalidRDF. We don’t think the
Exception postfix adds anything; it’s clear from usage that it’s an
Exception, no need to add noise to the name.
Additionally, you will not find
null returned by any method in Stark. If it’s the case that there is no return value, you get an
Optional instead of
null. This includes the optional context of a
Statement; instead of using
null to denote the default context, there’s a specific constant to indicate this, namely
Values#DEFAULT_GRAPH and utility methods on
Values for checking if a
Statement corresponds to the default graph. If you’re using an IDE that will leverage the JSR-305 annotations,
@Nonnull, we’ve used these throughout the interface to document the behavior and you should see warnings if you’re mis-using the API.
There’s no longer a
Graph class, so for cases where it’s appropriate to return a collection of
Statement, such as the result of parsing a file, we’re simply using
Set<Statement>. If you need to select subsets of
Statement objects, such as all of the
rdf:type assertions, there are utility methods provided from
Statements, or you can simply get a
Stream from the
Set and do the filting like you would with any other
Many of the core APIs have been cleaned up from their original counterparts. For example,
Literal was trimmed down to just the basics, and if you need to get its value as a different type, like an
int, there are static methods available from the
In addition to the changes already mentioned, one thing to look out for is
Value#stringValue on the older, Sesame based API. It returned the label of a
Literal, the ID of a
BNode and an IRI as a
String. Generally, the correct replacement this behavior is
Literal#str, but in some usages, using
toString is sufficient.
Value#toString in STARK returns the complete value of the
Value object, eg, for a
Literal it includes the lang/datatype, whereas
stringValue did not.
This is a list of commonly used classes from the previous API, and their new counterparts:
The IRIs used to assess the quality of machine learning models have been renamed as follows:
|Stardog 5||Stardog 6|
See the examples in Automatic Evaluation section about the usage of these terms.