Data Catalog
This page discusses Stardog’s support for querying virtual graph metadata using SPARQL.
Page Contents
Overview
The Data Catalog allows users to query for virtual graph metadata using SPARQL. The Data Catalog feature is enabled by default. It watches for changes to virtual graphs and data sources and adds or updates the metadata for those graphs in a user-supplied database. By default that database is expected to be named catalog
.
The Data Catalog automatically creates a catalog
database when it starts. Set the catalog.database
property to use a different name or prevent the Data Catalog from using an existing database named catalog
.
To prevent the Data Catalog database from being created automatically set catalog.auto.create.db
to false
.
Configuration Options
The Data Catalog can be configured with the following options in stardog.properties:
Option | Description | Value | Default |
---|---|---|---|
catalog.database | Name of the database to use for storing catalog metadata. This database needs to be created by the user before metadata will be captured and stored. | string | catalog |
catalog.name | Name of the IRI used for the catalog named graph. | string | local |
catalog.reload.onstart | If true any existing catalog data will be dropped and rescanned for all metadata providers on server start. When false metadata will only be captured for change events occuring after server start. | true/false | false |
catalog.reload.auto | If false then metadata provider schedules will be ignored. | true/false | true |
catalog.auto.create.db | If false then the catalog database will not be created automatically at server start. | true/false | true |
Usage
The capture and storage of Stardog metadata happens automatically and without user interaction. The user facing feature of the Data Catalog are SPARQL queries and Explorer for visual exploration of metadata. The new Explorer advanced query feature can also be used to query the metadata model.
To query only for metadata, SPARQL queries are run against the catalog
database. To query for metadata in addition to another database the local database service can be used.
Data Model
This table contains the classes used for modeling the Data Catalog metadata.
Class | Description |
---|---|
dcat:Catalog | Top level class for all metadata |
dcat:Dataset | A collection of data available for access in one or more representations |
dcat:Distribution | A specific representation of a dataset |
tag:stardog:api:catalog:DataSource | A distribution of a data source |
tag:stardog:api:catalog:Schema | The tables that are part of a data source |
tag:stardog:api:catalog:Table | A single table |
tag:stardog:api:catalog:Column | A table column |
tag:stardog:api:catalog:VirtualGraph | The configuration for a virtual graph |
tag:stardog:api:catalog:Mapping | Mappings of tables to RDF |
Example SPARQL Queries
The following are some query examples that demonstrate how virtual graph metadata can be queried.
- What catalogs are available?
prefix dcterms: <http://purl.org/dc/terms/> prefix dcat: <http://www.w3.org/ns/dcat#> select ?src ?lbl where { graph <tag:stardog:api:catalog:local> { ?src a dcat:Catalog ; dcterms:title ?lbl }}
- What datasets are in the catalog?
prefix dcterms: <http://purl.org/dc/terms/> prefix dcat: <http://www.w3.org/ns/dcat#> select ?ds where { graph <tag:stardog:api:catalog:local> { ?src a dcat:Catalog ; dcat:Dataset ?ds . }}
- Query for all table columns across all datasets
prefix : <tag:stardog:api:catalog:stardog:> select * from stardog:context:local where { ?t a :Table ; :hasColumn ?c . ?c :columnName ?n . }
- Dump the contents of a catalog
prefix : <tag:stardog:api:catalog:stardog:> select * from stardog:context:local where { ?ds a :DataSource ; :hasSchema ?schema . ?schema :hasTable ?table . ?table :hasColumn ?column . ?vg :connectsTo ?ds ; :hasMapping ?map . ?map <http://www.w3.org/ns/r2rml#predicateObjectMap> [ <http://www.w3.org/ns/r2rml#objectMap> ?omap ; <http://www.w3.org/ns/r2rml#predicateMap> ?predmap ] . ?omap <http://www.w3.org/ns/r2rml#column> ?col . ?omap <http://www.w3.org/ns/r2rml#datatype> ?dtype . ?predmap <http://www.w3.org/ns/r2rml#constant> ?con . ?fld <http://www.w3.org/ns/r2rml#logicalTable> [ <http://www.w3.org/ns/r2rml#tableName> ?tblname ] . ?fld <http://www.w3.org/ns/r2rml#subjectMap> [ <http://www.w3.org/ns/r2rml#template> ?template ; <http://www.w3.org/ns/r2rml#termType> ?termtype ] . }
Explorer
After you have Data Catalog configured and running on your server you can log into Explorer and select your catalog database to begin exploring.
Explorer caches the catalog data when you log in. If you make virtual graph changes in Studio or on the command line you will need to refresh Explorer to see the changes.
Explorer Advanced Query
Properties have been added to enable the new Explorer advanced query functionality with Data Catalog.
Databricks Unity Catalog
If you have a Databricks account and would like to include your Unity Catalog metadata in the Stadog Catalog you can add a Databricks metadata provider. The Databricks provider will run on a customizable schedule to pull down Unity catalog information and write it into the Stardog Catalog where it can be queried in conjuction with your Stardog databases.
Adding A Databrics Provider
To add a provider, insert a provider statement into the Data Catalog.
insert data {
graph stardog:catalog:providers
{
<tag:stardog:api:catalog:MetadataProvider:IDENTITY> a <tag:stardog:api:catalog:MetadataProvider> ;
<tag:stardog:api:catalog:providerType> "DatabricksProvider" ;
<tag:stardog:api:catalog:unity:dataSource> "DATA_SOURCE_HERE" ;
<tag:stardog:api:catalog:unity:schedule> "SCHEDULE_HERE" .
}
}
This table details the property values that can be set for a Databricks provider.
Property | Description | Values |
---|---|---|
rdf:type | Metadata Provider | tag:stardog:api:catalog:MetadataProvider |
tag:stardog:api:catalog:providerType | Type of provider | DatabricksProvider |
tag:stardog:api:catalog:unity:dataSource | Datasource to use for connecting | The name of a saved Data Source |
tag:stardog:api:catalog:unity:schedule | Frequency of updates | Quartz cron expression (ex. 0 0 22 * * ? Every day at 10pm) |
Data Model
Below are the classes and properties used for modeling the Databricks Unity Catalog metadata.
## Databricks Classes
:bricks:Databricks a rdfs:Class ;
rdfs:comment "The metadata from an external Databricks platform." ;
rdfs:label "Databricks Unity" ;
dct:title "Databricks Unity" ;
rdfs:subClassOf <http://www.w3.org/ns/dcat#Dataset> .
:bricks:DatabricksCatalog a rdfs:Class ;
rdfs:comment "A Databricks catalog." ;
rdfs:label "Databricks Catalog" ;
dct:title "Databricks Catalog" ;
rdfs:subClassOf :Catalog .
:bricks:DatabricksColumn a rdfs:Class ;
rdfs:comment "A Databricks column." ;
rdfs:label "Databricks Column" ;
dct:title "Databricks Column" ;
rdfs:subClassOf :Column .
:bricks:DatabricksSchema a rdfs:Class ;
rdfs:comment "A Databricks schema." ;
rdfs:label "Databricks Schema" ;
dct:title "Databricks Schema" ;
rdfs:subClassOf :Schema .
:bricks:DatabricksTable a rdfs:Class ;
rdfs:comment "A Databricks table." ;
rdfs:label "Databricks Table" ;
dct:title "Databricks Table" ;
rdfs:subClassOf :Table .
## Databricks Properties
:bricks:position a owl:DatatypeProperty ;
rdfs:range xsd:integer ;
rdfs:domain :bricks:DatabricksColumn .
:bricks:dataSourceFormat a owl:DatatypeProperty ;
rdfs:range xsd:string ;
rdfs:domain :bricks:DatabricksTable .
:bricks:catalogType a owl:DatatypeProperty ;
rdfs:range xsd:string ;
rdfs:domain :bricks:DatabricksCatalog .
:bricks:nullable a owl:DatatypeProperty ;
rdfs:range xsd:boolean ;
rdfs:domain :bricks:DatabricksColumn .
:bricks:tableType a owl:DatatypeProperty ;
rdfs:range xsd:string ;
rdfs:domain :bricks:DatabricksTable .
:bricks:precision a owl:DatatypeProperty ;
rdfs:range xsd:integer ;
rdfs:domain :bricks:DatabricksColumn .
:bricks:fullName a owl:DatatypeProperty ;
rdfs:range xsd:string ;
rdfs:domain :bricks:DatabricksSchema , :bricks:DatabricksTable .
:bricks:owner a owl:DatatypeProperty ;
rdfs:range xsd:string ;
rdfs:domain :bricks:DatabricksSchema , :bricks:DatabricksTable , :bricks:DatabricksCatalog .
:bricks:dataType a owl:DatatypeProperty ;
rdfs:range xsd:string ;
rdfs:domain :bricks:DatabricksColumn .
:bricks:scale a owl:DatatypeProperty ;
rdfs:range xsd:integer ;
rdfs:domain :bricks:DatabricksColumn .