Frequently Asked Questions

This page provides answers to some frequently asked questions.

Page Contents

Why can’t I load Dbpedia (or other RDF) data?
Why doesn’t search work?
Why don’t my queries work?!
Why is Stardog Cluster acting weird or running slowly?
Update Performance
Public Endpoint
Remote Bulk Loading
Canonicalized Literals
Cluster Isn’t Working
Client Connection Isn’t Working
Logging
Loading Compressed Data
Working with RDF Files
Virtual Graph Identifier Quoting
Virtual Graph Table not Found
Virtual Graphs over MarkLogic

Why can’t I load Dbpedia (or other RDF) data?

Question

I get a parsing error when loading Dbpedia or some other RDF. What can I do?

Answer

First, it’s not a bad thing to expect data providers to publish valid data. Second, it is, apparently, a very naive thing to expect data providers to publish valid data…

Stardog supports a loose parsing mode which will ignore certain kinds of data invalidity and may allow you to load invalid data. See strict.parsing in the Database Configuration Options.

Why doesn’t search work?

Question

I created a database but search doesn’t work.

Answer

Search is disabled by default; you can enable it at database creation time when using db create. You can also enable it at any subsequent time by setting the search.enabled database property using metadata set CLI command.

Why don’t my queries work?!

Question

I’ve got some named graphs and my queries don’t work!

Answer

Queries with FROM NAMED with a named graph that is not in Stardog will not cause Stardog to download the data from an arbitrary HTTP URL and include it in the query. Stardog will only evaluate queries over data that has been loaded into it.

SPARQL queries without a context or named graph are executed against the default, unnamed graph. In Stardog, the default graph is not the union of all the named graphs and the default graph. This behavior is configurable via the query.all.graphs configuration parameter.

Why is Stardog Cluster acting weird or running slowly?

Question

Should I put Stardog HA and Zookeeper on the same hard drives?

Answer

Never do this! Zookeeper is disk-intensive and displays bad I/O contention with Stardog query evaluation. Running both Zk and Stardog on the same disks will result in bad performance and, in some cases, intermittent failures.

Update Performance

Question

I’m adding one triple at a time, in a tight loop, to Stardog; is this the ideal strategy with respect to performance?

Answer

The answer is “not really”…Update performance is best if there are fewer transactions that modify larger number of triples. If you are using the Stardog Java API, the client will buffer changes in large transactions and flush the buffer periodically so you don’t need to worry about memory problems. If you need transactions with small number of triples then you may need to experiment to find the sweet spot with respect to your data, database size, the size of the differential index, and update frequency.

Public Endpoint

Question

I want to use Stardog to serve a public SPARQL endpoint; is there some way I can do this without publishing user account information?

Answer

We don’t necessarily recommend this, but it’s possible. Simply pass --disable-security to stardog-admin when you start the Stardog Server. This completely disables security in Stardog which will let users access the SPARQL endpoint, and all other functionality, without needing authorization.

Remote Bulk Loading

Question

I’m trying to create a database and bulk load files from my machine to the server and it’s not working, the files don’t seem to load, what gives?

Answer

Stardog does not transfer files during database creation to the server, sending big files over a network kind of defeats the purpose of blazing fast bulk loading. If you want to bulk load files from your machine to a remote server, copy them to the server and bulk load them.

Canonicalized Literals

Question

Why doesn’t my literal look the same as I when I added it to Stardog?

Answer

Stardog performs literal canonicalization] by default. This can be turned off by setting index.literals.canonical to false.

Cluster Isn’t Working

Question

I’ve setup Stardog Cluster, but it isn’t working and I have NoRouteToHostException exceptions all over my Zookeeper log.

Answer

Typically–but especially on Red Hat Linux and its variants–this means that iptables is blocking one, some, or all of the ports that the Cluster is trying to use. You can disable iptables or, better yet, configure it to unblock the ports Cluster is using.

Client Connection Isn’t Working

Question

I’m getting a ServiceConfigurationError saying that SNARLDriver could not be instantiated.

Answer

Make sure that your classpath includes all Stardog JARs and that the user executing your code has access to them.

Logging

Question

Why doesn’t Stardog implement our corporate logging scheme?

Answer

Stardog will log to $STARDOG_HOME/stardog.log by default, but you can use a log4j2 config file in $STARDOG_HOME so that Stardog will log wherever & however you want. The default configuration file can be found at $STARDOG/server/dbms/log4j2.xml.

Loading Compressed Data

Question

How can I load data from a compressed format that Stardog doesn’t support without decompressing the file?

Answer

Stardog supports several compression formats by default (zip, gzip, bzip2) so files compressed with those formats can be passed as input directly without decompression. Files compressed with other formats can also be loaded to Stardog by decompressing them on-the-fly using named pipes in Unix-like systems. The following example shows using a named pipe where the decompressed data is sent directly to Stardog without being writing to disk.

$ mkfifo some-data.rdf 
$ xz -dc some-data.rdf.xz > some-data.rdf &
$ stardog-admin db create -n test some-data.rdf

Working with RDF Files

Question

I have some RDF files that I need to process without loading into Stardog. What can I do?

Answer

Stardog provides some CLI commands that work directly over files. These commands exist under the stardog file command group. For example, you can use the file cat command to concatenate multiple RDF files into a single file and file split command to split a single RDF file into multiple RDF files. These commands are similar to their *nix counterparts but can handle RDF formats and perform compression/decompression on-the-fly. There is also the file obfuscate command that can create an obfuscated version of the input RDF files similar to data obfuscate command.

Virtual Graph Identifier Quoting

Question

How do I quote field and table names in mappings and when should I do it?

Answer

Interpretation of identifiers follows that of the database system backing the virtual graph. For example, Oracle, interprets nonquoted identifiers as uppercase. PostgreSQL interprets unquoted identifiers as lowercase. In general, if you need to quote the identifier in a query, then you should quote it in a mapping.

Quoting is done using the native quoting character of the database. This means double quote for Oracle, PostgreSQL and other SQL standard-compatible systems. MySQL uses a backquote and SQL Server uses square brackets. This setting can be overridden by adding parser.sql.quoting=ANSI to your virtual graph properties file. This will allow the use of double quotes to quote identifiers. This is commonly done to write mappings using the R2RML convention of using double quotes and supporting mappings generated by other systems.

Virtual Graph Table not Found

Question

Why am I getting an error when I try to create a virtual graph?

Unable to parse logical table [[some_table]]: From line 1, column 15 to line 1, column 18: Object 'SOME_TABLE' not found

Answer

The virtual graph subsystem maintains a set of metadata including a list of tables and the types of their fields. If a table is not found, it’s likely that it either needs to be quoted or the schema needs to be added to the search path by adding sql.schemas to your virtual graph properties file. This setting enables Stardog to see the table metadata in the named schemas. The table/query still needs to be qualified with the schema name when referring to it.

The virtual source_metadata command can be used to inspect metadata returned by the JDBC driver.

Virtual Graphs over MarkLogic

Question

How do I create a virtual graph over MarkLogic? An error is returned:

org.postgresql.util.PSQLException: ERROR: XDMP-UNDFUN: (err:XPST0017) Undefined function current_schema()

Answer

Stardog requires the schema to be provided. This cannot be done automatically with MarkLogic and should be set using the sql.default.schema option for the virtual graph. The schema name is defined when creating the view in MarkLogic.

Why can’t I load Dbpedia (or other RDF) data?
Why doesn’t search work?
Why don’t my queries work?!
Why is Stardog Cluster acting weird or running slowly?
Update Performance
Public Endpoint
Remote Bulk Loading
Canonicalized Literals
Cluster Isn’t Working
Client Connection Isn’t Working
Logging
Loading Compressed Data
Working with RDF Files
Virtual Graph Identifier Quoting
Virtual Graph Table not Found
Virtual Graphs over MarkLogic