Link Search Menu Expand Document
Start for Free

External Catalogs

This page discusses importing and using metadata from external catalog systems.

Page Contents
  1. Overview
    1. External Credentials
      1. Adding a credential
      2. Listing existing stored credentials
  2. Databricks Unity Catalog
    1. Configuration
    2. Data Model
  3. Collibra
    1. Configuration
    2. Data Model
  4. Microsoft Purview
    1. Configuration
    2. Exporting Stardog Metadata
    3. Data Model
  5. JDBC
    1. Configuration
    2. Data Model

Overview

The Knowledge Catalog can import metadata from external catalog systems to enable a single unified semantic layer over multiple catalogs.

External Credentials

Metadata providers that call external REST APIs cannot use Data Sources to hold credentials. The Catalog provides CLI commands to store encrypted usernames and passwords in the catalog credential store. A token is returned that is used in provider configurations. The token is exchanged for the stored credentials during server processing.

The catalog credential store is a user configurable location for storing credential strings. The stored strings can be configured to be stored as keyless or encryption key hashes.

The following options can be used to override the defaults.

Option Description Value Default
catalog.key.store The filepath of the key store. Ignored for database key store. file path $STARDOG_HOME
catalog.key.type The type of hashing to use. RSA or AES or XOR RSA
catalog.key.password Password for an AES key. string  
catalog.credential.store The type of key storage to use. database or filepath database

By default the credential store expectes a DER encoded RSA key pair located in the Stardog home directory. The keys need to be named catalog.priv and catalog.pub.

The following example uses openssl to generate an RSA key pair:

  • Create an RSA PEM encoded file
    openssl genrsa -out catalog.pem 2048
    
  • Extract the private key as a DER encoded file
    openssl pkcs8 -topk8 -nocrypt -in catalog.pem -outform der -out catalog.priv
    
  • Extract the public key as a DER encoded file
    openssl pkey -in catalog.pem -pubout -outform der -out catalog.pub
    

To add credentials to the store use the CLI administrator commands catalog credentials-add and catalog credentials-list.

Adding a credential

In the following example a username my_user_name is added along with the corresponding password. The returned UUID 04bee4c7-26cc-4c97-b817-dbb3298fa842 is the value that is the used in the access_key property for the metadata provider using these credentials.

$ stardog-admin catalog credentials-add
Username: my_user_name
Password: 
Description (optional): My External System Credentials

04bee4c7-26cc-4c97-b817-dbb3298fa842

Listing existing stored credentials

In the following example the account credentials that were added are now listed. Only the UUID and the user provided description are retrieved. There is no way to retrieve the original credential values. The credential values can only be retrived by the catalog server once they are stored.

$ stardog-admin catalog credentials-list

Catalog Stored Credentials
+--------------------------------------+--------------------------------+
|              Access Key              |          Description           |
+--------------------------------------+--------------------------------+
| 04bee4c7-26cc-4c97-b817-dbb3298fa842 | My External System Credentials |
+--------------------------------------+--------------------------------+

Databricks Unity Catalog

The Knowledge Catalog can be configured to import Unity Catalog metadata using a Databricks account. You can configure the import to occur on a customizable schedule. Databricks Unity metadata is written to the Stardog Catalog, where it can be queried in conjunction with your Stardog databases.

Configuration

To import Databricks Unity Catalog metadata, you insert a DatabricksProvider configuration into the Knowledge Catalog’s stardog:catalog:providers named graph. The configuration describes how Databricks can be accessed and how often the Knowledge Catalog should refresh the metadata.

insert data {
    graph stardog:catalog:providers 
    { 
        <urn:myDBricksProvider> a <tag:stardog:api:catalog:DatabricksProvider> ;
            <tag:stardog:api:catalog:provider:dataSource> <DATA_SOURCE_HERE> ;
            <tag:stardog:api:catalog:provider:schedule> "SCHEDULE_HERE"  .
    }
} 

This table details the property values that need to be set for configuring a Databricks metadata provider.

Property Description Values
rdf:type Databricks metadata provider class tag:stardog:api:catalog:DatabricksProvider
tag:stardog:api:catalog:provider:dataSource Datasource to use for connecting to a Databricks account The IRI of an existing Data Source, e.g. <data-source://myDatasource>
tag:stardog:api:catalog:provider:schedule Frequency of metadata imports Quartz cron expression (ex. 0 0 22 * * ? Every day at 10pm)

After the configuration is inserted, a job is automatically created to run on the specified schedule. The job will import Databricks Unity metadata and load a general data model for viewing the metadata in Explorer.

Data Model

This table contains the classes used for modeling the Databricks metadata. Prefix bricks is namespace tag:stardog:api:catalog:databricks:.

Class Property Description
bricks:Databricks   The metadata from an external Databricks platform
bricks:DatabricksCatalog   A Databricks catalog
  bricks:owner The owner account
  bricks:catalogType The catalog type
bricks:DatabricksSchema   A Databricks schema
  bricks:owner The owner account
  bricks:fullName The full name of a schema
bricks:DatabricksTable   A Databricks table
  bricks:tableType The table type
  bricks:fullName The full name of the table
  bricks:dataSourceFormat The data source format
  bricks:owner The owner account
bricks:DatabricksColumn   A Databricks column
  bricks:position The column position
  bricks:precision The column precision
  bricks:nullable If the column is nullable
  bricks:dataType The column data type
  bricks:scale The column scale

Collibra

The Knowledge Catalog can be configured to import data from a Collibra Data Intelligence Cloud account. Collibra is a data catalog product that collects business glossary, data governance, lineage and compliance metadata.

Configuration

The configuration for a Collibra provider requires that Collibra credentials be stored in the catalog credential store. See storing credentials for details.

Collibra Data Intelligence Cloud uses HTTP Basic Auth with a username and password for authentication. Add your account username and password to the catalog credential store and use the returned access key to configure the Collibra provider.

insert data {
    graph stardog:catalog:providers 
    { 
        <urn:collibra> a <tag:stardog:api:catalog:CollibraProvider> ;
            <tag:stardog:api:catalog:provider:accessKey> "ACCESS_KEY_HERE" ;
            <tag:stardog:api:catalog:provider:serverAddress> "CLOUD_URL_HERE" ;  
            <tag:stardog:api:catalog:provider:schedule> "SCHEDULE_HERE"  .
    }
} 

This table details the property values that need to be set for configuring a Collibra metadata provider.

Property Description Values
rdf:type Collibra metadata provider class tag:stardog:api:catalog:CollibraProvider
tag:stardog:api:catalog:provider:accessKey Access key from credential store UUID
tag:stardog:api:catalog:provider:serverAddress Cloud URL for Collibra account URL
tag:stardog:api:catalog:provider:schedule Frequency of metadata imports Quartz cron expression (ex. 0 0 22 * * ? Every day at 10pm)

Data Model

This table contains the classes used for modeling the Collibra metadata. Prefix collibra is namespace tag:stardog:api:catalog:collibra:.

Class Property Description
collibra:Collibra   The metadata from an external Collibra platform
  collibra:community A Collibra community
collibra:Asset   A Collibra asset
  collibra:id An asset ID
  collibra:name An asset name
  collibra:domain An asset domain
  collibra:assetType An asset type
  collibra:tag An asset tag
  collibra:collibraUrl URL to Collibra asset page
collibra:AssetType   An asset type
  collibra:childOf A parent asset type
collibra:Domain   A Collibra domain
  collibra:id A domain ID
  collibra:name A domain name
  collibra:community A domain community
  collibra:domainType A domain type
collibra:Community   A Collibra community
  collibra:id A community ID
  collibra:community A parent community
  collibra:name A community name
collibra:Relation   A Collibra relation
  collibra:id A relation ID
  collibra:relationType A domain ID
  collibra:targetAsset A domain target asset
  collibra:sourceAsset A domain source asset
collibra:DomainType   A Collibra domain type
collibra:RelationType   A Collibra relation type
  collibra:role A relation role
  collibra:coRole A relation co-role
collibra:Tag   A Collibra tag
collibra:Attribute   A Collibra attribute
  collibra:value An attribute value
  collibra:class An attribute class
  collibra:asset An attribute asset
  collibra:attributeType An attribute type
collibra:AttributeType   A Collibra attribute type
  collibra:attributeKind An attribute kind
  collibra:language An attributes language
  collibra:isInteger If attribute is an integer
  collibra:allowedValues The allowed values

Microsoft Purview

The Knowledge Catalog can be configured to both import data from an Microsoft Purview application running on Azure and export Stardog catalog data back into it. Purview is Microsoft’s data governance, cataloging and protection product.

Configuration

The configuration for a Purview provider requires that Purview credentials be stored in the catalog credential store. See storing credentials for details.

Microsoft Purview on Azure uses the OAuth client_credentials grant type for authorization. Add your Azure client id and application client secret to the catalog credential store and use the returned access key to configure the Purview provider.

insert data {
    graph stardog:catalog:providers 
    { 
        <urn:purview> a <tag:stardog:api:catalog:PurviewProvider> ;
            <tag:stardog:api:catalog:provider:accessKey> "ACCESS_KEY_HERE" ;
            <tag:stardog:api:catalog:provider:serverAddress> "CLOUD_URL_HERE" ;  
            <tag:stardog:api:catalog:provider:tenantId> "AZURE_TENTANT_ID_HERE" ;  
            <tag:stardog:api:catalog:provider:schedule> "SCHEDULE_HERE"  .
    }
} 

This table details the property values that need to be set for configuring a Purview metadata provider.

Property Description Values
rdf:type Purview metadata provider class tag:stardog:api:catalog:PurviewProvider
tag:stardog:api:catalog:provider:accessKey Access key from credential store UUID
tag:stardog:api:catalog:provider:serverAddress Cloud URL for Purview application URL
tag:stardog:api:catalog:provider:tenantId Azure tenant ID for Purview application UUID
tag:stardog:api:catalog:provider:schedule Frequency of metadata imports Quartz cron expression (ex. 0 0 22 * * ? Every day at 10pm)

Exporting Stardog Metadata

The Purview provider exports data source metadata into the configured Purview server. There is no extra configuration required to export Stardog metadata. When the provider’s scheduled job is run the export will automatically occur after the import is completed.

The following Purview asset types are added for the Stardog data source metadata:

Type Description
stardog_data_source The asset type for Stardog data sources
stardog_database The asset type for databases
stardog_schema The asset type for schemas
stardog_table The asset type for tables
stardog_column The asset type for columns
stardog_concept The asset type for mapped concepts

After the scheduled provider job has run you can log into your Purview account and view the Stardog metadata. It is located in a custom stardogcatalog collection.

Data Model

This table contains the classes used for modeling the Purview metadata. Prefix purview is namespace tag:stardog:api:catalog:purview: and the atlas prefix is namespace tag:stardog:api:catalog:atlas:.

Class Property Description
purview:Purview   The metadata from an external Purview platform
  purview:hasGlossary A Purview glossary
  purview:hasCollection A Purview collection
  purview:hasAsset Has a Purview asset
purview:Glossary   A glossary
  purview:hasTerm Has a glossary term
purview:GlossaryTerm   A glossary term
  purview:assignedTo Asset assigned to term
purview:Collection   A Collection of assets and source
  purview:hasSource A data source
  purview:hasAsset An asset
purview:Asset   An asset
  purview:scanId The Id of the last scan
  purview:lastScanned The time of the last scan
  purview:attribute An attribute
  purview:assetType The asset type
  purview:sourceId An Id of the source that generated this asset
purview:AssetType   An asset type
purview:Source   A data source
purview:Relationship   An asset relationship
  purview:head The head asset of a relationship
  purview:tail The tail asset of a relationship
purview:Attribute   An asset attribute
  purview:value The data value
  purview:attributeType The type of attribute
purview:AttributeType   An attribute type
  purview:typeName The type name

JDBC

The Knowledge Catalog can be configured to use a JDBC driver to import database metadata into the Knowledge Catalog. Any JDBC compliant driver should work. Be sure to first add the JAR file to the classpath of the Stardog server.

Configuration

The configuration for a JDBC provider requires that credentials be stored in the catalog credential store. See storing credentials for details.

Currently, the JDBC provider expects a username and password for authentication. Add your database username and password to the catalog credential store and use the returned access key to configure the JDBC provider. When the provider import job is run, the standard jdbc.username and jdbc.password properties will be injected to the JDBC connection string.

insert data {
    graph stardog:catalog:providers 
    { 
        <urn:purview> a <tag:stardog:api:catalog:JdbcProvider> ;
            <tag:stardog:api:catalog:provider:accessKey> "ACCESS_KEY_HERE" ;
            <tag:stardog:api:catalog:provider:jdbcDriver> "DRIVER_CLASSNAME_HERE" ;  
            <tag:stardog:api:catalog:provider:jdbcURL> "JDBC_CONNECTION_STRING_HERE" ;  
            <tag:stardog:api:catalog:provider:schedule> "SCHEDULE_HERE"  .
    }
} 

This table details the property values that need to be set for configuring a JDBC metadata provider.

Property Description Values
rdf:type JDBC metadata provider class tag:stardog:api:catalog:JdbcProvider
tag:stardog:api:catalog:provider:accessKey Access key from credential store UUID
tag:stardog:api:catalog:provider:jdbcDriver JDBC driver class name The full class name
tag:stardog:api:catalog:provider:jdbcURL A valid JDBC connection string A valid connection string
tag:stardog:api:catalog:provider:schedule Frequency of metadata imports Quartz cron expression (ex. 0 0 22 * * ? Every day at 10pm)

Data Model

This table contains the classes used for modeling the Purview metadata. Prefix jdbc is namespace tag:stardog:api:catalog:jdbc: and the catalog prefix is namespace tag:stardog:api:catalog:.

Class Property Description
jdbc:DBMS   A database management system
  jdbc:databaseName The database name
  jdbc:databaseVersion The database version
  jdbc:driverName The driver name
  jdbc:driverVersion The driver version
  jdbc:user The accessing user
jdbc:PrimaryKey   A database table column designated to uniquely identify each record
jdbc:ForeignKey   A database table column used to link data between tables
catalog:DatabaseCatalog   A database catalog. Not all systems have a catalog, their highest level object may be schema
catalog:DatabaseSchema   Database schemas are containers for the tables of the database
catalog:Table   A table within a database
  catalog:tableName The name of a table within a database
  catalog:tableType The type of table
catalog:Column   A database table column
  catalog:name The name of a column within a table
  catalog:columnType The datatype of the column