Link Search Menu Expand Document
Start for Free

Managing Databases

This page discusses managing Stardog databases.

Page Contents
  1. Creating a Database
    1. Configuring a Database at Creation Time
      1. Database Creation Templates
    2. Bulk Loading Data at Creation Time
      1. Tuning Bulk Loading Performance
    3. Archetypes
  2. Listing Databases
  3. Database Status
    1. Offline/Online a Database
  4. Namespaces
  5. Dropping a Database
  6. Repairing a Database
    1. Dictionary corruption
    2. Index corruption
    3. Physical corruption

Creating a Database

Stardog databases may be created locally or remotely. All data files, indexes, and server metadata for the new database will be stored in $STARDOG_HOME.

  • Stardog will not let you create a database with the same name as an existing database. Stardog database names must start with an alpha character followed by zero or more alphanumeric, hyphen or underscore characters, as is given by the regular expression [A-Za-z]{1}[A-Za-z0-9_-]*.

There are four reserved words that may not be used for the names of Stardog databases: system, admin,docs and watchdog.

  • Minimally, the only thing you must know to create a Stardog database is a database name; additionally, you may customize some other database parameters and options depending on anticipated workloads, data modeling, and other factors.
  • Bulk loading data can be done at database creation time
    • Bulk loading performance is better if data files don’t have to be transferred over a network during creation and initial loading.

To create an empty database:

  • At a minimum, a database name needs to be passed into the db create CLI command using the -n/--name option.

    stardog-admin db create -n myDatabase
    
    1. Navigate to the “Databases” tab
    2. Click on the “Create Database” button
    3. Enter a name for the new database
    4. Click the “Create” button

  • See the HTTP API documentation for more information.

    curl -u username:password -X POST \
    -F root="{\
    "dbname":"myDatabase"\
    }" http://localhost:5820/admin/databases
    
  • import com.complexible.stardog.api.admin.AdminConnection;
    
    import static com.complexible.stardog.api.admin.AdminConnectionConfiguration.toServer;
    
    public class CreateDatabaseBasic {
    
        public static void main(String[] args){
    
            String serverUrl = "http://localhost:5820";
            String username = "username";
            String password = "password";
    
            try(AdminConnection adminConnection = toServer(serverUrl).credentials(username, password).connect();){
                adminConnection.newDatabase("myDatabase").create();
            }
        }
    }
    

    See com.complexible.stardog.api.admin.AdminConnection.AdminConnection#newDatabase for more information.

Configuring a Database at Creation Time

It’s possible to provide configuration options to a database at creation time. While you can provide any type of configuration option at creation time regardless of their mutability, you must declare immutable database configuration options at creation time (e.g. edge.properties).

  • Database configuration options can be passed into the db create CLI command using the -o/--options option. Each option is a key=value pair; multiple options are separated by whitespaces, e.g., -o option1=value1 option2=value2. When used as the last option, values should be followed by --.

    # enables search and edge properties for the database
    stardog-admin db create -n myDatabase -o search.enabled=true edge.properties=true --
    
    1. Navigate to the “Databases” tab
    2. Click on the “Create Database” button
    3. Enter a name for the new database
    4. Select the “Manually Set Immutable Properties” radio button and select which properties to set

      The only way to set mutable database options at database creation time in Studio is by providing a Java properties file containing configuration options.

    5. Click the “Create” button

  • See the HTTP API documentation for more information.

    curl -u username:password -X POST \
    -F root="{\
    "dbname":"myDatabase",\
    "options":{"search.enabled":true, "edge.properties":true}\
    }" http://localhost:5820/admin/databases
    
  • import com.complexible.stardog.api.admin.AdminConnection;
    import com.complexible.stardog.db.DatabaseOptions;
    import com.complexible.stardog.metadata.Metadata;
    import com.complexible.stardog.search.SearchOptions;
    
    import static com.complexible.stardog.api.admin.AdminConnectionConfiguration.toServer;
    
    public class CreateDatabaseWithConfigOptions {
    
        public static void main(String[] args){
    
            String serverUrl = "http://localhost:5820";
            String username = "username";
            String password = "password";
    
            Metadata metadata = Metadata.create()
                                        .set(SearchOptions.SEARCHABLE, true)
                                        .set(DatabaseOptions.EDGE_PROPERTIES, true);
    
            try(AdminConnection adminConnection = toServer(serverUrl).credentials(username, password).connect();){
                adminConnection.newDatabase("myDatabase").setAll(metadata).create();
            }
        }
    }
    

Database Creation Templates

The Stardog CLI and Studio allow you to pass a Java properties file containing database configuration options at database creation time. If the configuration option database.name is provided in the properties file it will override the name passed in at creation time.

  • A properties file can be provided to the db create command using the --config/-c option.

    $ cat database.properties
    database.name=myDatabase
    search.enabled=true
    edge.properties=true
    
    # -n/--name option can be omitted because 'database.name' is contained in database.properties
    stardog-admin db create -c database.properties
    Successfully created database 'myDatabase'.
    
    1. Navigate to the “Databases” section
    2. Click on the “Create Database” button
    3. Enter a name for the database if database.name is not defined in the properties file being used to configure the database
    4. Select the “Use File” radio button
    5. Select the properties file on your filesystem.
    6. Click the “Create” button

Bulk Loading Data at Creation Time

Stardog tries hard to do bulk loading at database creation time in the most efficient and scalable way possible.

To load data at creation time:

  • Files to be added to the database may be passed as final arguments to the db create command.

    • If a directory is passed as one of the final arguments, all the files in that directory and its child directories will be recursively loaded to the database.
    • Zip files will be uncompressed and the RDF files they contain will be loaded.
    • Files with unrecognized extensions, or that produce parse errors, will be (silently) ignored. Named graphs can be specified with an @ sign preceding the graph iri.
    • All files after that graph will be loaded into that graph until another @graph is encountered. A single @ can be used to switch back to the default graph.

    By default, files are not copied to the remote server; only the paths are sent. If the files do not exist on the remote server, the --copy-server-side flag should be specified in order to copy them before creating the database and bulk loading the data.

    # load input01.ttl to the default graph, input1.ttl and input2.ttl to urn:stardog:graph1, switch back to the default graph and load input3.ttl to it
    stardog-admin db create -n myDatabase input0.ttl @urn:stardog:graph:1 input1.ttl input2.ttl @ input3.ttl
    
  • Create a geospatial enabled database, and bulk load labels_en.nq.bz2 to the named graph some:graph and geo_coordinates_en.nq.bz2 to the default graph. Both of these files exist on the client machine and will be shipped to the server.

    curl -u username:password -X POST \
    -F root="{\
    "dbname":"spatialDB",\
    "options":{"spatial.enabled":true},\
    "files":[{"filename":"labels_en.nq.bz2", "context":"some:graph"},{"filename":"geo_coordinates_en.nq.bz2"}]\
    }"\
    -F "geo_coordinates_en.nq.bz2"=@/path/to/geo_coordinates_en.nq.bz2 \
    -F "labels_en.nq.bz2"=@/path/to/labels_en.nq.bz2 \
    http://remote-server:5820/admin/databases
    

    Create a search enabled database, and bulk load /path/to/data1.ttl to the named graph some:graph and /path/to/data2.ttl to the default graph. Both of these files exist on the same machine Stardog is running on.

    curl -u admin:admin -X POST \
    -F root="{\
    "dbname":"myDatabase",\
    "options":{"search.enabled":true},\
    "files":[{"filename":'/path/to/data1.ttl',"context":'some:graph'}, {"filename":'/path/to/data2.ttl'}]}" \
    http://localhost:5820/admin/databases
    

    See the HTTP API for more information.

  • import com.complexible.stardog.api.admin.AdminConnection;
    import com.google.common.collect.ImmutableMap;
    import com.stardog.stark.Resource;
    import com.stardog.stark.Values;
    
    import java.nio.file.Path;
    import java.nio.file.Paths;
    
    import static com.complexible.stardog.api.admin.AdminConnectionConfiguration.toServer;
    
    public class CreateDatabaseBulkLoad {
        
        public static void main(String[] args){
    
            String serverUrl = "http://localhost:5820";
            String username = "username";
            String password = "password";
    
            try(AdminConnection adminConnection = toServer(serverUrl).credentials(username, password).connect();){
    
                Resource g1 = Values.iri("urn:g1");
                Resource g2 = Values.iri("urn:g2");
                Path f1 = Paths.get("/path/to/data1.ttl");
                Path f2 = Paths.get("/path/to/data2.ttl");
                Path f3 = Paths.get("/path/to/data3.ttl");
    
                ImmutableMap<Path, Resource> contexts = ImmutableMap.of(f1, g1, f2, g2);
    
                // f1 is loaded into g1
                // f2 is loaded into g2
                // f3 is loaded into the default graph
                adminConnection.newDatabase("myDatabase").create(contexts::get, f1, f2, f3);
            }
        }
    }
    

    See com.complexible.stardog.api.admin.DatabaseBuilder#create for more information.

    If the files to be bulk loaded do not exist on the same machine as Stardog, use com.complexible.stardog.api.admin.DatabaseBuilder#copyServerSide to specify that the files should be first copied to the server.

It’s not currently possible to bulk load data at creation time via Stardog Studio.

Tuning Bulk Loading Performance

Data loading time can vary widely, depending on factors in the data to be loaded, including the number of unique resources, etc. Below are some tips to help you achieve the best bulk loading times:

  1. Copy or move the files to be loaded onto the same machine as Stardog. Copying the files from a client over a network will introduce overhead.
  2. In your stardog.properties file, set the memory.mode configuration option to a value of bulk_load:

     memory.mode=bulk_load
    

    Be sure to disable this option after bulk loading is complete. See Memory Configuration for more information.

  3. Load compressed data (GZIP, BZ2, ZIP) since compression minimizes disk access.
  4. Use a multicore machine since bulk loading is highly parallelized and database indexes are built concurrently.
  5. Load many files together at creation time, since different files will be parsed and processed concurrently, improving the load speed. Files using NTriples or NQuads format are parsed in multiple threads automatically without the need to split them into multiple files.

    The file split CLI utility can be used to split an RDF files into smaller files.

  6. With caution, turn off the database configuration option strict.parsing.

Archetypes

A database archetype is a simple templating mechanism for bundling a set of namespaces, schemas and constraints to populate a newly created database. Archetypes are an easy way to register the namespaces, schemas and constraints for standardized vocabularies and ontologies with a database. Archetypes are composable, so multiple archetypes can be specified at database creation time to load all the defined namespaces, schemas and constraints into the database. Archetypes are intended to be used alongside your domain data, which may include as many other schemas and constraints as are required.

The only way of using archetypes is via the Stardog Archetype Repository which comes with archetypes for FOAF, SKOS and PROV. Follow the instructions on the GitHub repository for setting up and using archetypes.

Once the archetypes have been setup you can use the following command to create a new database that will load the namespaces, schemas and constraints associated with an archetype:

stardog-admin db create -o database.archetypes="cim" -n myDatabase

Archetypes can be used as a predefined way of loading a schema and a set of constraints to the database just like any RDF data can be loaded to a database. The contents of the archetype will appear in the database under predefined named graphs, as explained next. These named graphs that are automatically created by archetypes can be queried and modified by the user as any other named graph after the database has been created.

Each archetype has a unique IRI identifying it and the schema contents of archetypes will be loaded into a named graph with that IRI. To see an example, follow the setup instructions to download the archetypes to ${STARDOG_HOME}/.archetypes and create a new database with the FOAF archetype:

stardog-admin db create -o database.archetypes="foaf" myDatabase

If you query the database you will see a named graph automatically created:

$ stardog query myDatabase "select distinct ?g { graph ?g { } }"
+----------------------------+
|             g              |
+----------------------------+
| http://xmlns.com/foaf/0.1/ |
+----------------------------+

Listing Databases

To list all of the databases in the Stardog server

  • stardog-admin db list
    

    Output:

    +-------------+
    |  Databases  |
    +-------------+
    | db1         |
    | db2         |
    +-------------+
    

    See the db list command for more information.

    1. Navigate to the “Databases” section. The list of all databases the user has access to will appear in the left pane.

  • curl -u username:password http://localhost:5820/admin/databases
    

    See the HTTP API for more information

  • import com.complexible.stardog.api.admin.AdminConnection;
    
    import static com.complexible.stardog.api.admin.AdminConnectionConfiguration.toServer;
    
    public class ListDatabases {
    
        public static void main(String[] args) {
    
            String serverUrl = "http://localhost:5820";
            String username = "admin";
            String password = "admin";
    
            try (AdminConnection adminConnection = toServer(serverUrl).credentials(username, password).connect();) {
                System.out.println(adminConnection.list());
            }
        }
    }
    

    See com.complexible.stardog.api.admin.AdminConnection#list for more information.

Database Status

One can obtain a status report for any database in the server. The status report contains the following information:

  • Database: name of the database
  • Status: whether the database is online/offline
  • Approx. Size: the approximate number of triples in the database
  • Queries: number of queries currently running
  • Open Connections: number of open connections to the database
  • Open Transactions: number of open transactions to the database
  • Query Avg. Time: average query execution time
  • Plans Cached: number of query plans cached for the database
  • Plan Cache Hit Ratio: ratio to monitor hits/misses on the plan cache for the database

To obtain a status report for a database:

  • stardog-admin db status db1
    

    Output:

    Database             : db1
    Status               : Online
    Approx. size         : 0 triples
    Queries              : None running
    Open Connections     : 0
    Open Transactions    : 0
    Query Avg. Time      : 0.00 s
    Query Rate           : 0.00 queries/sec
    Plans Cached         : 3
    Plan Cache Hit Ratio : 57.14%
    

    See the db status command for more information.

    1. Navigate to the “Databases” section
      • Not all information obtainable via the db status CLI command is available in Studio. The listing of databases in the left pane shows the approximate amount of triples in each database. If you click on a specific database in the listing and into the “Admin” tab, the number of running queries and the database’s status (offline/online) will be displayed.

Offline/Online a Database

Databases are either online or offline; this allows database maintenance to be decoupled from server maintenance.

  • Databases are put online or offline synchronously: these operations block until other database activity is completed or terminated.
  • If the Stardog server is shutdown while a database is offline, the database will be offline when the server restarts.
  • Some database configuration options (e.g. search.enabled) require the database to be offline when the configuration option is set. See Getting and Setting Database Options for more information.

To offline a database:

  • stardog-admin db offline myDatabase
    

    See the db offline command for more information.

    1. Navigate to the “Databases” section.
    2. Select the database you wish to offline
    3. Toggle the switch from the online position to the offline position. The green dot that was previously displayed to the right of the database name should now be orange.

  • curl -u username:password -X PUT \
    http://localhost:5820/admin/databases/myDatabase/offline
    

    See the HTTP API for more information.

  • import com.complexible.stardog.api.admin.AdminConnection;
    
    import static com.complexible.stardog.api.admin.AdminConnectionConfiguration.toServer;
    
    public class OfflineDatabase {
    
        public static void main(String[] args) {
    
            String serverUrl = "http://localhost:5820";
            String username = "username";
            String password = "password";
    
            try (AdminConnection adminConnection = toServer(serverUrl).credentials(username, password).connect();) {
                adminConnection.offline("myDatabase");
            }
        }
    }
    

    See com.complexible.stardog.api.admin.AdminConnection#offline for more information.

To online a database:

  • stardog-admin db online myDatabase
    

    See the db online command for more information.

    1. Navigate to the “Databases” section.
    2. Select the database you wish to online
    3. Toggle the switch from the offline position to the online position. The orange dot that was previously displayed to the right of the database name should now be green.

  • curl -u username:password -X PUT \
    http://localhost:5820/admin/databases/myDatabase/online
    

    See the HTTP API for more information.

  • import com.complexible.stardog.api.admin.AdminConnection;
    
    import static com.complexible.stardog.api.admin.AdminConnectionConfiguration.toServer;
    
    public class OnlineDatabase {
    
        public static void main(String[] args) {
    
            String serverUrl = "http://localhost:5820";
            String username = "username";
            String password = "password";
    
            try (AdminConnection adminConnection = toServer(serverUrl).credentials(username, password).connect();) {
                adminConnection.online("myDatabase");
            }
        }
    }
    

    See com.complexible.stardog.api.admin.AdminConnection#online for more information.

Namespaces

Stardog allows database administrators to persist and manage custom namespace prefix bindings.

At database creation time, if data is loaded to the database that has namespace prefixes, then those are persisted for the life of the database. This includes setting the default namespace to the default that appears in the file. Any subsequent queries to the database may simply omit the PREFIX declarations:

If no files are used during database creation, or if the files do not define any prefixes (e.g. NTriples), then the following prefixes are stored:

Prefix IRI
(default prefix) http://api.stardog.com/
rdf http://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfs http://www.w3.org/2000/01/rdf-schema#
xsd http://www.w3.org/2001/XMLSchema#
owl http://www.w3.org/2002/07/owl#
stardog tag:stardog:api:
  • When executing queries in the CLI, the default table format for SPARQL SELECT results will use the bindings as qnames. SPARQL CONSTRUCT query output (including export) will also use the stored prefixes. To reiterate, namespace prefix bindings are per database, not global.

    Suppose you had a database movies and stored the namespace consisting of the prefix n corresponding to the IRI http://www.imdb.com/name/

    $ stardog query execute movies "select * { ?s rdf:type :Person } limit 5"
    +-------------+
    |      s      |
    +-------------+
    | n:nm0000001 |
    | n:nm0000002 |
    | n:nm0000003 |
    | n:nm0000004 |
    | n:nm0000005 |
    +-------------+
    

    The result set above uses the binding n as the qname for each result.


    To add new bindings, use the namespace add command

    stardog namespace add movies --prefix n --uri 'http://www.imdb.com/name/'
    

    To change the default binding, use a quote prefix ("") when adding a new one:

    stardog namespace add movies --prefix "" --uri 'http://new.default'
    

    To change an existing binding, remove the existing one using the namespace remove command and then add a new one:

    stardog namespace remove movies --prefix ex && stardog namespace add movies --prefix "ex" --uri 'http://another.iri'
    

    To list all namespace prefix bindings use the namespace list command:

    $ stardog namespace list movies
    +---------+---------------------------------------------+
    | Prefix  |                  Namespace                  |
    +---------+---------------------------------------------+
    |         | http://schema.org/                          |
    | ex      | http://some.iri                             |
    | foaf    | http://xmlns.com/foaf/0.1/                  |
    | geo     | http://www.w3.org/2003/01/geo/wgs84_pos#    |
    | n       | http://www.imdb.com/name/                   |
    | owl     | http://www.w3.org/2002/07/owl#              |
    | rdf     | http://www.w3.org/1999/02/22-rdf-syntax-ns# |
    | rdfs    | http://www.w3.org/2000/01/rdf-schema#       |
    | sfn     | tag:stardog:api:functions:                  |
    | spf     | tag:stardog:api:property:                   |
    | stardog | tag:stardog:api:                            |
    | t       | http://www.imdb.com/title/                  |
    | xml     | http://www.w3.org/XML/1998/namespace        |
    | xsd     | http://www.w3.org/2001/XMLSchema#           |
    +---------+---------------------------------------------+
    

    To export all namespace prefix bindings in the database, use the namespace export command:

    # saving the namespaces (exported in Turtle by default) to a file prefixes.ttl
    stardog namespace export movies > prefixes.ttl
    

    To import namespace prefixes from an RDF file that contains prefix declarations into the database use the namespace import:

    stardog namespace import -- newDatabase /path/to/prefixes.ttl
    

    Any prefix imported will override any previous mappings for the prefix sharing the same name.

    1. Navigate to the “Databases” section.
    2. Select the database you wish to see all namespace prefix bindings for
    3. Select the “Namespaces” tab
      • From this view, you can edit, add, or remove any prefix binding. Be sure to click the “Save” button in the top right corner after making any changes you wish to persist.
      • There are two buttons to import new prefix bindings and export the existing ones.

  • To retrieve the namespaces stored in the database

    curl -u username:password http://localhost:5820/movies/namespaces
    

    See the HTTP API for more information


    To import namespaces stored in an RDF file:

    curl -u username:password -X POST \
    -F name=@path/to/prefixes.ttl \
    http://localhost:5820/movies/namespaces
    

    See the HTTP API for more information.

  • import com.complexible.stardog.api.Connection;
    import com.complexible.stardog.api.ConnectionConfiguration;
    import com.stardog.stark.Namespace;
    import com.stardog.stark.io.turtle.TurtleUtil;
    
    import java.io.PrintStream;
    import java.util.Optional;
    
    public class ManagingNamespaces {
    
        public static void main(String[] args) {
    
            String serverUrl = "http://localhost:5820";
            String username = "username";
            String password = "password";
    
            try (Connection connection = ConnectionConfiguration.to("movies").server(serverUrl).credentials(username, password).connect()){
    
                // Add a namespace
                connection.namespaces().add("somePrefix", "http://some.iri");
    
                // Given an IRI, get the corresponding prefix
                Optional<String> thePrefix = connection.namespaces().prefix("http://some.iri");
                System.out.println(thePrefix.get());
    
                // Given a prefix, get the corresponding IRI
                Optional<String> theIRI = connection.namespaces().iri("somePrefix");
                System.out.println(theIRI.get());
    
                // Remove a namespace
                connection.namespaces().remove("somePrefix");
    
                // List/export namespaces in Turtle
                try(PrintStream out = System.out){
                    for(Namespace ns : connection.namespaces()){
                        out.print(ns.prefix());
                        out.print(": <");
                        out.print(TurtleUtil.encodeURIString(ns.iri()));
                        out.print("> .");
                        out.println();
                    }
                }
            }
        }
    }
    

    See com.complexible.stardog.api.Connection#namespaces for more information.

Dropping a Database

Dropping a database deletes the database, all associated files, and metadata. This means all files on disk related to the database will be deleted, so please use with caution.

  • Provide the database name as the only argument to the db drop command:

    stardog-admin db drop myDatabase
    
    1. Navigate to the “Databases” section
    2. Select the database to be dropped
    3. Click on “Drop Database”. Confirm you do indeed want to drop this database.

  • curl -u username:password -X DELETE \
    http://localhost:5820/admin/databases/myDatabase
    

    See the HTTP API for more information.

  • import com.complexible.stardog.api.admin.AdminConnection;
    
    import static com.complexible.stardog.api.admin.AdminConnectionConfiguration.toServer;
    
    public class DropDatabase {
    
        public static void main(String[] args) {
    
            String serverUrl = "http://localhost:5820";
            String username = "username";
            String password = "password";
            String db = "myDatabase";
    
            try (AdminConnection adminConnection = toServer(serverUrl).credentials(username, password).connect();) {
                if(adminConnection.list().contains(db)){
                    adminConnection.drop(db);
                }
            }
        }
    }
    

    See com.complexible.stardog.api.admin.AdminConnection#drop for more information.

Repairing a Database

Stardog data storage has been designed to be resilient to software and hardware failures. If a transaction fails or the server crashes, the integrity of the stored data should not be affected. However, due to unforeseen issues database storage might get into a state that it needs to be repaired manually. In this section we will explain the high-level storage components, the possible corruption issues that might occur and the ways to resolve these issues.

Stardog database storage has two main components: dictionary and index. The dictionary is a bidirectional mapping between the RDF values (IRIs, bnodes, literals) and 64-bit integers that are used as IDs. The index component uses these IDs to store the RDF statements as quads (triple plus the named graph ID). So the dictionary is essential to turn the quads stored in the index to user-visible RDF statements.

There are two categories of database corruption: logical and physical. Physical corruption is the case where files stored on disk for the dictionary or the index have been deleted or damaged in some way. Logical corruption is when the files on disk are valid but the dictionary or the index contents has issues. We will describe these cases in more detail in the following section:

Dictionary corruption

There are two different ways a dictionary might be corrupted:

  1. Incomplete: If there is an ID in the index that does not exist in the dictionary the dictionary is called incomplete. The quads in the index using those IDs cannot be recovered.
  2. Inconsistent: Since the dictionary is bidirectional the entries in both directions of the mapping should be exactly the same. For example, if we have urn:example → 1234 mapping in one direction and the 2345 → urn:example mapping on the other direction, the dictionary is inconsistent. Sometimes one side of the mapping might be missing an entry completely. Inconsistent dictionaries can be repaired fully.

If a dictionary is both incomplete and inconsistent it is called invalid.

Symptoms The most common symptom of a dictionary corruption is getting a NullPointerException while trying to do a lookup from the dictionary (this would happen during query execution). The definitive way to check for dictionary corruption is by using the db verify command. Example outputs in the case of dictionary corruption look like this:

    Database test is not valid
    Index: Valid Count=5 Hash=bb5d5605b6b91a20
    Dictionary: Inconsistent
    The database can be repaired fully using the SPO index order.

    Database test is not valid
    Index: Valid Count=5 Hash=38c97a47f0f26ecf
    Dictionary: Incomplete
    The database can be repaired partially using the SPO index order.

Recovery

If you have a recent backup of the database, restoring the database from the backup is the simplest resolution. After the restore is complete, double check the database validity by running db verify again to make sure your backup was created from a valid state. The changes that have happened since the last backup will be lost as a result of this step.

If you do not have a recent backup or you would like to recover the updates that occurred since the last backup, you can try the manual recovery steps as outlined here:

  1. Back up the database The db backup has a --repair option that can repair the corruption while creating a backup from an invalid state. If the db verify output indicates that the dictionary is inconsistent then use the following command to create a new repaired backup:

     stardog-admin db backup --repair DB-NAME
    

    When the --repair argument is used a special backup process is used to resolve the dictionary inconsistency. If the dictionary is incomplete, the following command would fail with an error message:

     Error creating the backup, use partial backup option: ...
    

    The correct command to use in this case is:

     stardog-admin db backup --partial --repair DB-NAME
    

    When this command completes it will print output that looks as follows:

     Database test backed up X triples (Y errors) to <backup-location>
    

    This output means that Y number of triples were not recoverable and only X number of triples are included in the backup. For dictionaries that are inconsistent but complete, Y is expected to be 0, so that part of the message will be omitted.

  2. Manual Inspection [Optional - Recommended] After the repaired backup is created it is recommended to restore it under a new name as a dry-run and optionally inspect database contents manually. Restoring a backup under a new name can be done with the following command:

     stardog-admin db restore -n NEW-NAME <backup-location>
    

    This temporary database can be dropped after manual inspection:

     stardog-admin db drop NEW-NAME
    
  3. Restore the database Backup can now be restored to overwrite the corrupted database:
  4. stardog-admin db restore --overwrite <backup-location>
    

    The restore process drops the existing database first so the database will not be available until the restore operation finishes.

Index corruption

Stardog maintains eight different indexes to allow efficient answering of different query patterns. Each index contains the same set of triples (or more correctly quads) sorted in different orders. The indexes are named by their sort order: SPO, PSO, POS, OSP, SPOC, PSOC, OSPC. If these index orders get out of sync with each other for any reason the index is considered to be corrupted.

Symptoms When indexes are out of sync, users would see different results for queries that are supposed to return the same results because the query optimizer might choose a different index order for different queries. Running a clear graph query might not clear a graph because the clear operation would first use one of the indexes to read what triples will be removed.

As with dictionary corruption, the definitive way to confirm index corruption is to use the db verify command. The following is an example output showing index corruption:

Database is not valid
Index:
SPO    : Count=180,145,289 Hash=99ebbce4d2933432
PSO    : Count=180,145,289 Hash=99ebbce4d2933432
POS    : Count=180,145,289 Hash=99ebbce4d2933432
OSP    : Count=180,145,289 Hash=99ebbce4d2933432
SPOC   : Count=180,145,289 Hash=99ebbce4d2933432
PSOC   : Count=180,145,482 Hash=ab2172475f647001
POSC   : Count=180,145,289 Hash=99ebbce4d2933432
OSPC   : Count=180,145,289 Hash=99ebbce4d2933432
Dictionary: Valid
The database can be repaired partially. See the report above to choose an index order.

The command outputs the number of quads found in each index and a hash value (checksum) computed from the index contents. It might be the case that two index orders might have the same number of triples but not the exact same set of quads which would result in a different hash value and would be considered corrupted.

Recovery The process to repair a database with index corruption follows similar steps to fixing the dictionary corruption problem explained above but it is important to choose a correct index for recovery.

  1. Choose Index The repair process will completely recover all the triples from an index the user specifies. By definition, index corruption is considered to be a partial repair operation because a user most likely would not be able to tell which index order has the correct set of triples. There are two strategies that can be used in choosing an index order:

    1. If the majority of the indexes have the same size and hash, it is safe to assume that set of triples is the correct one, and any of those indexes can be used for repair. In the above example, most indexes agree with each other and only PSOC differs so using any other index would be acceptable.
    2. In cases where indexes have wildly different numbers of results and no clear consensus, then it might be preferable to pick the index with the largest number of triples and then use a manual process to get rid of extra triples.
  2. Back up the Database Once the index order for recovery is selected, you need to follow similar steps to what was outlined above to do a backup and restore. The only parameter that needs to be supplied is the --index parameter:

     stardog-admin db backup --index INDEX-NAME DB-NAME
    

    where INDEX-NAME is one of SPO, PSO, POS, OSP, SPOC, PSOC, POSC, OSPC and selected based on the instructions from the previous step. If no index order is specified, then SPOC will be used by default.

    The command output should look like this:

     Database DB-NAME backed up X triples to <backup-location>
    
  3. Manual Inspection [Optional - Recommended] Again it is recommended that the backup is first restored under a new name as a dry-run and database contents inspected manually. Restoring a backup under a new name can be done with the following command:

     stardog-admin db restore -n NEW-NAME <backup-location>
    
  4. This temporary database can be dropped after manual inspection and the backup can be restored to overwrite the corrupted database:

     stardog-admin db restore --overwrite <backup-location>
    

Physical corruption

This is the case where files stored on disk have been deleted or damaged in some way. Physical corruption can happen due to hardware malfunction or a user manually deleting files by mistake. If two Stardog instances run against the same STARDOG_HOME this might also happen. Stardog uses filesystem-level locks to prevent this, and there are two lock files used for this purpose: STARDOG_HOME/system.lock and STARDOG_HOME/data/LOCK. If the user manually deletes these files while a server is running, the second Stardog instance would be able to start and potentially corrupt the files on disk.

Symptoms The database might be completely missing from the list of available databases. If this is the case, during server start-up one of the following errors will be logged in stardog.log:

Database X will not be present in the system because of an initialization error. The data is not deleted.
X is put offline due to an initialization error

In some cases, the database will be available but trying to read from or write to the database will trigger errors as follows:

Caused by: com.stardog.starrocks.CorruptionException: Corruption: Can't access /007226.sst: IO error: while stat a file for size: /var/opt/stardog/data/007226.sst: No such file or directory

Recovery It is really hard to recover from physical corruption errors. Restoring a recent server backup is typically the best course of action. Reach out to Stardog Support to discuss other possibilities.