This page provides answers to some frequently asked questions.
- Why can’t I load Dbpedia (or other RDF) data?
- Why doesn’t search work?
- Why don’t my queries work?!
- Why is Stardog Cluster acting weird or running slowly?
- Update Performance
- Public Endpoint
- Remote Bulk Loading
- Canonicalized Literals
- Cluster Isn’t Working
- Client Connection Isn’t Working
- Loading Compressed Data
- Working with RDF Files
- Virtual Graph Identifier Quoting
- Virtual Graph Table not Found
- Virtual Graphs over MarkLogic
I get a parsing error when loading Dbpedia or some other RDF. What can I do?
First, it’s not a bad thing to expect data providers to publish valid data. Second, it is, apparently, a very naive thing to expect data providers to publish valid data…
I created a database but search doesn’t work.
Search is disabled by default; you can enable it at database creation time when using
db create. You can also enable it at any subsequent time by setting the
search.enabled database property using
metadata set CLI command.
I’ve got some named graphs and my queries don’t work!
FROM NAMED with a named graph that is not in Stardog will not cause Stardog to download the data from an arbitrary HTTP URL and include it in the query. Stardog will only evaluate queries over data that has been loaded into it.
SPARQL queries without a context or named graph are executed against the default, unnamed graph. In Stardog, the default graph is not the union of all the named graphs and the default graph. This behavior is configurable via the
query.all.graphs configuration parameter.
Should I put Stardog HA and Zookeeper on the same hard drives?
Never do this! Zookeeper is disk-intensive and displays bad I/O contention with Stardog query evaluation. Running both Zk and Stardog on the same disks will result in bad performance and, in some cases, intermittent failures.
I’m adding one triple at a time, in a tight loop, to Stardog; is this the ideal strategy with respect to performance?
The answer is “not really”…Update performance is best if there are fewer transactions that modify larger number of triples. If you are using the Stardog Java API, the client will buffer changes in large transactions and flush the buffer periodically so you don’t need to worry about memory problems. If you need transactions with small number of triples then you may need to experiment to find the sweet spot with respect to your data, database size, the size of the differential index, and update frequency.
I want to use Stardog to serve a public SPARQL endpoint; is there some way I can do this without publishing user account information?
We don’t necessarily recommend this, but it’s possible. Simply pass
stardog-admin when you start the Stardog Server. This completely disables security in Stardog which will let users access the SPARQL endpoint, and all other functionality, without needing authorization.
I’m trying to create a database and bulk load files from my machine to the server and it’s not working, the files don’t seem to load, what gives?
Stardog does not transfer files during database creation to the server, sending big files over a network kind of defeats the purpose of blazing fast bulk loading. If you want to bulk load files from your machine to a remote server, copy them to the server and bulk load them.
Why doesn’t my literal look the same as I when I added it to Stardog?
Stardog performs literal canonicalization] by default. This can be turned off by setting
I’ve setup Stardog Cluster, but it isn’t working and I have
NoRouteToHostException exceptions all over my Zookeeper log.
Typically–but especially on Red Hat Linux and its variants–this means that
iptables is blocking one, some, or all of the ports that the Cluster is trying to use. You can disable
iptables or, better yet, configure it to unblock the ports Cluster is using.
I’m getting a
ServiceConfigurationError saying that
SNARLDriver could not be instantiated.
Make sure that your classpath includes all Stardog JARs and that the user executing your code has access to them.
Why doesn’t Stardog implement our corporate logging scheme?
Stardog will log to
$STARDOG_HOME/stardog.log by default, but you can use a log4j2 config file in
$STARDOG_HOME so that Stardog will log wherever & however you want. The default configuration file can be found at
How can I load data from a compressed format that Stardog doesn’t support without decompressing the file?
Stardog supports several compression formats by default (zip, gzip, bzip2) so files compressed with those formats can be passed as input directly without decompression. Files compressed with other formats can also be loaded to Stardog by decompressing them on-the-fly using named pipes in Unix-like systems. The following example shows using a named pipe where the decompressed data is sent directly to Stardog without being writing to disk.
$ mkfifo some-data.rdf $ xz -dc some-data.rdf.xz > some-data.rdf & $ stardog-admin db create -n test some-data.rdf
I have some RDF files that I need to process without loading into Stardog. What can I do?
As of Stardog 5.0, Stardog provides some CLI commands that work directly over files. These commands exist under the
stardog file command group. For example, you can use the
file cat command to concatenate multiple RDF files into a single file and
file split command to split a single RDF file into multiple RDF files. These commands are similar to their *nix counterparts but can handle RDF formats and perform compression/decompression on-the-fly. There is also the
file obfuscate command that can create an obfuscated version of the input RDF files similar to
data obfuscate command.
How do I quote field and table names in mappings and when should I do it?
Interpretation of identifiers follows that of the database system backing the virtual graph. For example, Oracle, interprets nonquoted identifiers as uppercase. PostgreSQL interprets unquoted identifiers as lowercase. In general, if you need to quote the identifier in a query, then you should quote it in a mapping.
Quoting is done using the native quoting character of the database. This means double quote for Oracle, PostgreSQL and other SQL standard-compatible systems. MySQL uses a backquote and SQL Server uses square brackets. This setting can be overridden by adding
parser.sql.quoting=ANSI to your virtual graph properties file. This will allow the use of double quotes to quote identifiers. This is commonly done to write mappings using the R2RML convention of using double quotes and supporting mappings generated by other systems.
Why am I getting an error when I try to create a virtual graph?
Unable to parse logical table [[some_table]]: From line 1, column 15 to line 1, column 18: Object 'SOME_TABLE' not found
The virtual graph subsystem maintains a set of metadata including a list of tables and the types of their fields. If a table is not found, it’s likely that it either needs to be quoted or the schema needs to be added to the search path by adding
sql.schemas to your virtual graph properties file. This setting enables Stardog to see the table metadata in the named schemas. The table/query still needs to be qualified with the schema name when referring to it.
virtual source_metadata command can be used to inspect metadata returned by the JDBC driver.
How do I create a virtual graph over MarkLogic? An error is returned:
org.postgresql.util.PSQLException: ERROR: XDMP-UNDFUN: (err:XPST0017) Undefined function current_schema()
Stardog requires the schema to be provided. This cannot be done automatically with MarkLogic and should be set using the
sql.default.schema option for the virtual graph. The schema name is defined when creating the view in MarkLogic.