Data Sources

This page discusses Data Sources, how they are used by Virtual Graphs, and how to manage them.

Page Contents

Overview
Creating and Managing Data Sources
Private and Shared Data Sources
Managing Metadata
1. Refreshing Metadata
2. Refreshing Row-Count Estimates
Data Source and Virtual Graph Availability
Chapter Contents

Overview

Virtual Graphs use Data Sources as sources of connections to external (to Stardog) data resources and of metadata (table name, columns, data types, keys, row counts, etc.) describing those resources.

Having Virtual Graphs broken into two separately managed resources has these advantages:

Multiple Virtual Graphs can share the same Data Source
Administrators can manage connections (e.g. max connections limit) per data source
The management of the Data Source and Virtual Graphs can be granted to different security roles
Data Source metadata can be managed separately from individual Virtual Graphs

Creating and Managing Data Sources

To create a Data Source, register it with the data-source add CLI command:

$ stardog-admin data-source add dept.properties

In the above example, we are using the same properties file from the Virtual Graph example. The support for password files applies to Data Sources as well.

Use the data-source list CLI command to view a list of registered data sources:

$ stardog-admin data-source list

+--------------------+--------+--------+
|    Data Source     | Shared | Online |
+--------------------+--------+--------+
| data-source://dept | true   | true   |
+--------------------+--------+--------+

The data-source options CLI command will return the properties that were used when creating the Data Source. Note that sensitive properties such as passwords are masked with *** in the output. If you intend to reuse this output to recreate the Data Source — for example, when copying to another server — you will need to replace any masked values with the real credentials before using the output.

$ stardog-admin data-source options dept

jdbc.url=jdbc:mysql://localhost/dept
jdbc.username=MySqlUserName
jdbc.password=***
jdbc.driver=com.mysql.jdbc.Driver

To remove a Data Source, use the data-source remove command:

$ stardog-admin data-source remove dept

Cannot remove data source 'dept' without the force option because it is in use by virtual graph: dept

As illustrated, the data-source remove command will fail if the Data Source is in use by any Virtual Graphs. Running the data-source remove command with the --force option will remove the Data Source as well as all its dependent Virtual Graphs (use with caution):

$ stardog-admin data-source remove --force dept

Successfully removed data source dept

Private and Shared Data Sources

Every Virtual Graph uses one Data Source. The Stardog APIs allow the creation of a Virtual Graph without explicitly naming a Data Source. However, when that is done, Stardog automatically creates a “private” Data Source with the same local name as the Virtual Graph for the sole use of that Virtual Graph. The life cycle of a private Data Source is tied to the life cycle of its dependent Virtual Graph – when the Virtual Graph is removed, the private Data Source is automatically removed as well.

Private Data Sources can be converted to shared Data Sources with the data-source share CLI command:

$ stardog-admin data-source share dept

Stardog Studio always creates shared Data Sources.

Managing Metadata

When a Virtual Graph loads, it interrogates its Data Source for the names of all the visible schemas and tables, as well as the names and data types of columns, primary keys, and other constraints, and row count estimates for all the tables that are referenced in the mappings of the Virtual Graph. In a large enterprise, this process can take considerable time, so Stardog saves this metadata with the Data Source.

Stardog uses this saved metadata during query translation and optimization. If any of this metadata changes after the Data Source has saved it, those changes will not be visible to the loaded Virtual Graphs. This phenomenon is known as schema drift.

Refreshing Metadata

The data-source refresh-metadata CLI command is used to clear the saved metadata for a Data Source and reload all its dependent Virtual Graphs with fresh metadata.

$ stardog-admin data-source refresh-metadata dept

Data Sources load metadata on demand, or lazily. This is to reduce, as much as possible, the load between Stardog and the external data resources.

Metadata caching is currently supported for JDBC Data Sources only.

The refresh-metadata command will not load tables from new databases because seeing them requires updating the sql.schemas data source configuration option. Use the data-source add command with the --overwrite option to refresh the data source configuration options.

Refreshing Row-Count Estimates

There’s one element of the metadata that is saved with a Data Source that is more likely to change than the other types – row-count estimates. Row-count estimates are expected to change whenever data is inserted or deleted from a data resource. While this type of change is distinct from general schema drift, it can affect query optimization, so keeping it up to date is important. The data-source refresh-counts CLI command satisfies this requirement:

$ stardog-admin data-source refresh-counts dept

Data Source and Virtual Graph Availability

Both Virtual Graphs and Data Sources can encounter errors, either when initially created or when recreated at startup. A Data Source can fail to load as a result of connection problems. Likewise, a Virtual Graph can fail when the metadata for a Data Source is refreshed and the new schema is not compatible with the mappings. When these errors occur, Stardog will mark the resource as “Unavailable”, which is like an offline mode. It provides a way for the resource to continue to appear as a resource while at the same time providing an indication that there is a problem with it.

Once a Data Source or Virtual Graph is marked as unavailable, it will stay unavailable until Stardog is restarted or the data-source online CLI command or virtual online CLI command, respectively, is run:

$ stardog-admin data-source online dept

$ stardog-admin virtual online dept

Chapter Contents

Supported Data Sources
Supported Client Drivers
Data Source Configuration
Specific Data Source Considerations
Pass-Through Authentication
REST Connector Configuration
Secrets Integration - retrieving credentials with secret managers

Overview
Creating and Managing Data Sources
Private and Shared Data Sources
Managing Metadata
- Refreshing Metadata
- Refreshing Row-Count Estimates
Data Source and Virtual Graph Availability
Chapter Contents