Link Search Menu Expand Document
Start for Free

Data Catalog

This page discusses Stardog’s support for querying virtual graph metadata using SPARQL.

Page Contents
  1. Overview
  2. Configuration Options
  3. Usage
    1. Data Model
    2. Example SPARQL Queries
    3. Explorer
    4. Explorer Advanced Query

Overview

The Data Catalog allows users to query for virtual graph metadata using SPARQL. The Data Catalog feature is enabled by default. It watches for changes to virtual graphs and data sources and adds or updates the metadata for those graphs in a user-supplied database. By default that database is expected to be named catalog.

The current version of Data Catalog requires a catalog database to be created by the user. If no database is found the Data Catalog will write a warning to the log and not track any metadata changes.

Configuration Options

The Data Catalog can be configured with the following options in stardog.properties:

Option Description Value Default
catalog.database Name of the database to use for storing catalog metadata. This database needs to be created by the user before metadata will be captured and stored. string catalog
catalog.reload.onstart If true any existing catalog data will be dropped and rescanned for virtual graph and data source metadata on server start. When false metadata will only be captured for change events occuring after server start. true/false false
catalog.dcat.files This option can be used to load static external catalog data into the catalog database. If the data complies with the DCat specification it can be queried along with the Stardog metadata. filepath none

Usage

The capture and storage of Stardog metadata happens automatically and without user interaction. The user facing feature of the Data Catalog comes in the form of SPARQL queries and optionally using Explorer for visual exploration of metadata. The new Explorer advanced query feature can also be used to query the metadata model.

To query only for metadata, SPARQL queries are run against the catalog database. To query for metadata in addition to another database the local database service can be used.

Data Model

This table contains the classes used for modeling the Data Catalog metadata.

Class Description
dcat:Catalog Top level class for all metadata
dcat:Dataset A collection of data available for access in one or more representations
dcat:Distribution A specific representation of a dataset
tag:stardog:api:catalog:DataSource A distribution of a data source
tag:stardog:api:catalog:Schema The tables that are part of a data source
tag:stardog:api:catalog:Table A single table
tag:stardog:api:catalog:Column A table column
tag:stardog:api:catalog:VirtualGraph The configuration for a virtual graph
tag:stardog:api:catalog:Mapping Mappings of tables to RDF

Example SPARQL Queries

The following are some query examples that demonstrate how virtual graph metadata can be queried.

  • What catalogs are available?
      prefix dcterms: <http://purl.org/dc/terms/>
      prefix dcat: <http://www.w3.org/ns/dcat#>
      select ?src ?lbl where {
          graph <tag:stardog:api:catalog:local> {
          ?src a dcat:Catalog ; dcterms:title ?lbl
      }}
    
  • What datasets are in the catalog?
      prefix dcterms: <http://purl.org/dc/terms/>
      prefix dcat: <http://www.w3.org/ns/dcat#>
      select ?ds where {
          graph <tag:stardog:api:catalog:local> {
          ?src a dcat:Catalog ; dcat:Dataset ?ds .
      }}
    
  • Query all distributions that are data sources
      prefix sdcat: <tag:stardog:api:catalog:>
      prefix dcterms: <http://purl.org/dc/terms/>
      prefix dcat: <http://www.w3.org/ns/dcat#>
      select ?dist where {
        graph <tag:stardog:api:catalog:local> {
          ?src a dcat:Dataset ; 
               dcat:Distribution ?dist .
          ?dist a sdcat:DataSource .
      }}
    
  • Find how many REST data sources exist
      prefix dcterms: <http://purl.org/dc/terms/>
      prefix dcat: <http://www.w3.org/ns/dcat#>
      select (count(?src) as ?num) 
      from stardog:context:local
      where {
          ?src a dcat:Distribution ; 
               dcterms:source <http://system.stardog.com/registry/vegaSqlDialect/db/REST> .
      }
    
  • Get all the database columns by table for a data source
      prefix sdc: <tag:stardog:api:catalog:>
      prefix dcterms: <http://purl.org/dc/terms/>
      prefix dcat: <http://www.w3.org/ns/dcat#>
      prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
      select ?table ?column 
      from stardog:context:local
      where {
          ?src a dcat:Distribution ; 
               a <tag:stardog:api:catalog:DataSource> ;
               dcterms:conformsTo ?schema .
          ?schema a <tag:stardog:api:catalog:Schema> ;
             <tag:stardog:api:catalog:table> ?tbl .
          ?tbl a <tag:stardog:api:catalog:Table> ;
               rdfs:label ?table ;
               <tag:stardog:api:catalog:column> ?col .
          ?col a sdc:Column ;
               rdfs:label ?column .
        FILTER (?src = dcat:Distribution:myDataSource)
      } ORDER BY ?table ?column
    
  • Dump the contents of the catalog
      prefix sdcat: <tag:stardog:api:catalog:>
      prefix sdc: <tag:stardog:api:catalog:>
      prefix dcterms: <http://purl.org/dc/terms/>
      prefix dcat: <http://www.w3.org/ns/dcat#>
      prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
      prefix rml: <http://www.w3.org/ns/r2rml#>
      select * 
      from stardog:context:local
      where {
          <http://www.w3.org/ns/dcat#Distribution:myDataSource> a dcat:Distribution  .
          ?vg sdcat:connectsTo ?dist ;
             dcterms:conformsTo ?map .
          ?map a <tag:stardog:api:catalog:Map> ;
              <tag:stardog:api:catalog:Mapping> ?fld .
          ?fld <http://www.w3.org/ns/r2rml#predicateObjectMap> [ <http://www.w3.org/ns/r2rml#objectMap> ?omap ;
                                                                 <http://www.w3.org/ns/r2rml#predicateMap> ?predmap ] .
          ?omap <http://www.w3.org/ns/r2rml#column> ?col .
          ?omap <http://www.w3.org/ns/r2rml#datatype> ?dtype .
          ?predmap <http://www.w3.org/ns/r2rml#constant> ?con .
          ?fld <http://www.w3.org/ns/r2rml#logicalTable> [ <http://www.w3.org/ns/r2rml#tableName> ?tblname ] .
          ?fld <http://www.w3.org/ns/r2rml#subjectMap> [ <http://www.w3.org/ns/r2rml#template> ?template ;
                                                         <http://www.w3.org/ns/r2rml#termType> ?termtype ] .
      }
    

Explorer

After you have Data Catalog configured and running on your server you can log into Explorer and select your catalog database to begin exploring.

Explorer caches the catalog data when you log in. If you make virtual graph changes in Studio or on the command line you will need to refresh Explorer to see the changes.

Explorer Advanced Query

Properties have been added to enable the new Explorer advanced query functionality with Data Catalog.