Frequently Asked Questions
This page provides answers to some frequently asked questions.
Page Contents
- Why can’t I load Dbpedia (or other RDF) data?
- Why doesn’t search work?
- Why don’t my queries work?!
- Why is Stardog Cluster acting weird or running slowly?
- Update Performance
- Public Endpoint
- Remote Bulk Loading
- Canonicalized Literals
- Cluster Isn’t Working
- Client Connection Isn’t Working
- Logging
- Loading Compressed Data
- Working with RDF Files
- Virtual Graph Identifier Quoting
- Virtual Graph Table not Found
- Virtual Graphs over MarkLogic
Why can’t I load Dbpedia (or other RDF) data?
Question
I get a parsing error when loading Dbpedia or some other RDF. What can I do?
Answer
First, it’s not a bad thing to expect data providers to publish valid data. Second, it is, apparently, a very naive thing to expect data providers to publish valid data…
Stardog supports a loose parsing mode which will ignore certain kinds of data invalidity and may allow you to load invalid data. See strict.parsing
in the Database Configuration Options.
Why doesn’t search work?
Question
I created a database but search doesn’t work.
Answer
Search is disabled by default; you can enable it at database creation time when using db create
. You can also enable it at any subsequent time by setting the search.enabled
database property using metadata set
CLI command.
Why don’t my queries work?!
Question
I’ve got some named graphs and my queries don’t work!
Answer
Queries with FROM NAMED
with a named graph that is not in Stardog will not cause Stardog to download the data from an arbitrary HTTP URL and include it in the query. Stardog will only evaluate queries over data that has been loaded into it.
SPARQL queries without a context or named graph are executed against the default, unnamed graph. In Stardog, the default graph is not the union of all the named graphs and the default graph. This behavior is configurable via the query.all.graphs
configuration parameter.
Why is Stardog Cluster acting weird or running slowly?
Question
Should I put Stardog HA and Zookeeper on the same hard drives?
Answer
Never do this! Zookeeper is disk-intensive and displays bad I/O contention with Stardog query evaluation. Running both Zk and Stardog on the same disks will result in bad performance and, in some cases, intermittent failures.
Update Performance
Question
I’m adding one triple at a time, in a tight loop, to Stardog; is this the ideal strategy with respect to performance?
Answer
The answer is “not really”…Update performance is best if there are fewer transactions that modify larger number of triples. If you are using the Stardog Java API, the client will buffer changes in large transactions and flush the buffer periodically so you don’t need to worry about memory problems. If you need transactions with small number of triples then you may need to experiment to find the sweet spot with respect to your data, database size, the size of the differential index, and update frequency.
Public Endpoint
Question
I want to use Stardog to serve a public SPARQL endpoint; is there some way I can do this without publishing user account information?
Answer
We don’t necessarily recommend this, but it’s possible. Simply pass --disable-security
to stardog-admin
when you start the Stardog Server. This completely disables security in Stardog which will let users access the SPARQL endpoint, and all other functionality, without needing authorization.
Remote Bulk Loading
Question
I’m trying to create a database and bulk load files from my machine to the server and it’s not working, the files don’t seem to load, what gives?
Answer
Stardog does not transfer files during database creation to the server, sending big files over a network kind of defeats the purpose of blazing fast bulk loading. If you want to bulk load files from your machine to a remote server, copy them to the server and bulk load them.
Canonicalized Literals
Question
Why doesn’t my literal look the same as I when I added it to Stardog?
Answer
Stardog performs literal canonicalization] by default. This can be turned off by setting index.literals.canonical
to false
.
Cluster Isn’t Working
Question
I’ve setup Stardog Cluster, but it isn’t working and I have NoRouteToHostException
exceptions all over my Zookeeper log.
Answer
Typically–but especially on Red Hat Linux and its variants–this means that iptables
is blocking one, some, or all of the ports that the Cluster is trying to use. You can disable iptables
or, better yet, configure it to unblock the ports Cluster is using.
Client Connection Isn’t Working
Question
I’m getting a ServiceConfigurationError
saying that SNARLDriver
could not be instantiated.
Answer
Make sure that your classpath includes all Stardog JARs and that the user executing your code has access to them.
Logging
Question
Why doesn’t Stardog implement our corporate logging scheme?
Answer
Stardog will log to $STARDOG_HOME/stardog.log
by default, but you can use a log4j2 config file in $STARDOG_HOME
so that Stardog will log wherever & however you want. The default configuration file can be found at $STARDOG/server/dbms/log4j2.xml
.
Loading Compressed Data
Question
How can I load data from a compressed format that Stardog doesn’t support without decompressing the file?
Answer
Stardog supports several compression formats by default (zip, gzip, bzip2) so files compressed with those formats can be passed as input directly without decompression. Files compressed with other formats can also be loaded to Stardog by decompressing them on-the-fly using named pipes in Unix-like systems. The following example shows using a named pipe where the decompressed data is sent directly to Stardog without being writing to disk.
$ mkfifo some-data.rdf
$ xz -dc some-data.rdf.xz > some-data.rdf &
$ stardog-admin db create -n test some-data.rdf
Working with RDF Files
Question
I have some RDF files that I need to process without loading into Stardog. What can I do?
Answer
Stardog provides some CLI commands that work directly over files. These commands exist under the stardog file
command group. For example, you can use the file cat
command to concatenate multiple RDF files into a single file and file split
command to split a single RDF file into multiple RDF files. These commands are similar to their *nix counterparts but can handle RDF formats and perform compression/decompression on-the-fly. There is also the file obfuscate
command that can create an obfuscated version of the input RDF files similar to data obfuscate
command.
Virtual Graph Identifier Quoting
Question
How do I quote field and table names in mappings and when should I do it?
Answer
Interpretation of identifiers follows that of the database system backing the virtual graph. For example, Oracle, interprets nonquoted identifiers as uppercase. PostgreSQL interprets unquoted identifiers as lowercase. In general, if you need to quote the identifier in a query, then you should quote it in a mapping.
Quoting is done using the native quoting character of the database. This means double quote for Oracle, PostgreSQL and other SQL standard-compatible systems. MySQL uses a backquote and SQL Server uses square brackets. This setting can be overridden by adding parser.sql.quoting=ANSI
to your virtual graph properties file. This will allow the use of double quotes to quote identifiers. This is commonly done to write mappings using the R2RML convention of using double quotes and supporting mappings generated by other systems.
Virtual Graph Table not Found
Question
Why am I getting an error when I try to create a virtual graph?
Unable to parse logical table [[some_table]]: From line 1, column 15 to line 1, column 18: Object 'SOME_TABLE' not found
Answer
The virtual graph subsystem maintains a set of metadata including a list of tables and the types of their fields. If a table is not found, it’s likely that it either needs to be quoted or the schema needs to be added to the search path by adding sql.schemas
to your virtual graph properties file. This setting enables Stardog to see the table metadata in the named schemas. The table/query still needs to be qualified with the schema name when referring to it.
The virtual source_metadata
command can be used to inspect metadata returned by the JDBC driver.
Virtual Graphs over MarkLogic
Question
How do I create a virtual graph over MarkLogic? An error is returned:
org.postgresql.util.PSQLException: ERROR: XDMP-UNDFUN: (err:XPST0017) Undefined function current_schema()
Answer
Stardog requires the schema to be provided. This cannot be done automatically with MarkLogic and should be set using the sql.default.schema
option for the virtual graph. The schema name is defined when creating the view in MarkLogic.