This page discusses Data Sources, how they are used by Virtual Graphs, and how to manage them.
Virtual Graphs use Data Sources as sources of connections to external (to Stardog) data resources and of metadata (table name, columns, data types, keys, row counts, etc.) describing those resources.
Having Virtual Graphs broken into two separately managed resources has these advantages:
- Multiple Virtual Graphs can share the same Data Source
- Administrators can manage connections (e.g. max connections limit) per data source
- The management of the Data Source and Virtual Graphs can be granted to different security roles
- Data Source metadata can be managed separately from individual Virtual Graphs
To create a Data Source, register it with the
data-source add CLI command:
$ stardog-admin data-source add dept.properties
In the above example, we are using the same properties file from the Virtual Graph example. The support for password files applies to Data Sources as well.
data-source list CLI command to view a list of registered data sources:
$ stardog-admin data-source list
| Data Source | Shared | Online |
| data-source://dept | true | true |
data-source options CLI command will return the properties that were used when creating the Data Source:
$ stardog-admin data-source options dept
To remove a Data Source, use the
data-source remove command:
$ stardog-admin data-source remove dept
Cannot remove data source 'dept' without the force option because it is in use by virtual graph: dept
As illustrated, the
data-source remove command will fail if the Data Source is in use by any Virtual Graphs. Running the
data-source remove command with the
--force option will remove the Data Source as well as all its dependent Virtual Graphs (use with caution):
$ stardog-admin data-source remove --force dept
Successfully removed data source dept
Every Virtual Graph uses one Data Source. The Stardog APIs allow the creation of a Virtual Graph without explicitly naming a Data Source. However, when that is done, Stardog automatically creates a “private” Data Source with the same local name as the Virtual Graph for the sole use of that Virtual Graph. The life cycle of a private Data Source is tied to the life cycle of its dependent Virtual Graph – when the Virtual Graph is removed, the private Data Source is automatically removed as well.
Private Data Sources can be converted to shared Data Sources with the
data-source share CLI command:
$ stardog-admin data-source share dept
Stardog Studio always creates shared Data Sources.
When a Virtual Graph loads, it interrogates its Data Source for the names of all the visible schemas and tables, as well as the names and data types of columns, primary keys, and other constraints, and row count estimates for all the tables that are referenced in the mappings of the Virtual Graph. In a large enterprise, this process can take considerable time, so Stardog saves this metadata with the Data Source.
Stardog uses this saved metadata during query translation and optimization. If any of this metadata changes after the Data Source has saved it, those changes will not be visible to the loaded Virtual Graphs. This phenomenon is known as schema drift.
data-source refresh-metadata CLI command is used to clear the saved metadata for a Data Source and reload all its dependent Virtual Graphs with fresh metadata.
$ stardog-admin data-source refresh-metadata dept
Data Sources load metadata on demand, or lazily. This is to reduce, as much as possible, the load between Stardog and the external data resources.
Metadata caching is currently supported for JDBC Data Sources only.
refresh-metadata command will not load tables from new databases because seeing them requires updating the
sql.schemas data source configuration option. Use the data-source add command with the
--overwrite option to refresh the data source configuration options.
There’s one element of the metadata that is saved with a Data Source that is more likely to change than the other types – row-count estimates. Row-count estimates are expected to change whenever data is inserted or deleted from a data resource. While this type of change is distinct from general schema drift, it can affect query optimization, so keeping it up to date is important. The
data-source refresh-counts CLI command satisfies this requirement:
$ stardog-admin data-source refresh-counts dept
Both Virtual Graphs and Data Sources can encounter errors, either when initially created or when recreated at startup. A Data Source can fail to load as a result of connection problems. Likewise, a Virtual Graph can fail when the metadata for a Data Source is refreshed and the new schema is not compatible with the mappings. When these errors occur, Stardog will mark the resource as “Unavailable”, which is like an offline mode. It provides a way for the resource to continue to appear as a resource while at the same time providing an indication that there is a problem with it.
Once a Data Source or Virtual Graph is marked as unavailable, it will stay unavailable until Stardog is restarted or the
data-source online CLI command or
virtual online CLI command, respectively, is run:
$ stardog-admin data-source online dept
$ stardog-admin virtual online dept
- Supported Data Sources
- Supported Client Drivers
- Data Source Configuration
- Specific Data Source Considerations
- Pass-Through Authentication
- REST Connector Configuration
- Secrets Integration - retrieving credentials with secret managers