Link Search Menu Expand Document
Start for Free

External Catalogs

This page discusses importing and using metadata from external catalog systems.

Page Contents
  1. Overview
    1. External Credentials
      1. Adding a credential
      2. Listing existing stored credentials
  2. Databricks Unity Catalog
    1. Configuration
    2. Data Model
  3. Collibra
    1. Configuration
    2. Data Model
  4. Azure Purview
    1. Configuration
    2. Data Model

Overview

The Knowledge Catalog can import metadata from external catalog systems to enable a single unified semantic layer over multiple catalogs.

External Credentials

Metadata providers that call external REST APIs cannot use Data Sources to hold credentials. The Catalog provides CLI commands to store encrypted usernames and passwords in the catalog credential store. A token is returned that is used in provider configurations. The token is exchanged for the stored credentials during server processing.

The catalog credential store is a user configurable location for storing credential strings. The stored strings can be configured to be stored as keyless or encryption key hashes.

The following options can be used to override the defaults.

Option Description Value Default
catalog.key.store The filepath of the key store. Ignored for database key store. file path $STARDOG_HOME
catalog.key.type The type of hashing to use. RSA or AES or XOR RSA
catalog.key.password Password for an AES key. string  
catalog.credential.store The type of key storage to use. database or filepath database

By default the credential store expectes a DER encoded RSA key pair located in the Stardog home directory. The keys need to be named catalog.priv and catalog.pub.

The following example uses openssl to generate an RSA key pair:

  • Create an RSA PEM encoded file
    openssl genrsa -out catalog.pem 2048
    
  • Extract the private key as a DER encoded file
    openssl pkcs8 -topk8 -nocrypt -in catalog.pem -outform der -out catalog.priv
    
  • Extract the public key as a DER encoded file
    openssl pkey -in catalog.pem -pubout -outform der -out catalog.pub
    

To add credentials to the store use the CLI administrator commands catalog credentials-add and catalog credentials-list.

Adding a credential

In the following example a username my_user_name is added along with the corresponding password. The returned UUID 04bee4c7-26cc-4c97-b817-dbb3298fa842 is the value that is the used in the access_key property for the metadata provider using these credentials.

$ stardog-admin catalog credentials-add
Username: my_user_name
Password: 
Description (optional): My External System Credentials

04bee4c7-26cc-4c97-b817-dbb3298fa842

Listing existing stored credentials

In the following example the account credentials that were added are now listed. Only the UUID and the user provided description are retrieved. There is no way to retrieve the original credential values. The credential values can only be retrived by the catalog server once they are stored.

$ stardog-admin catalog credentials-list

Catalog Stored Credentials
+--------------------------------------+--------------------------------+
|              Access Key              |          Description           |
+--------------------------------------+--------------------------------+
| 04bee4c7-26cc-4c97-b817-dbb3298fa842 | My External System Credentials |
+--------------------------------------+--------------------------------+

Databricks Unity Catalog

The Knowledge Catalog can be configured to import Unity Catalog metadata using a Databricks account. You can configure the import to occur on a customizable schedule. Databricks Unity metadata is written to the Stardog Catalog, where it can be queried in conjunction with your Stardog databases.

Configuration

To import Databricks Unity Catalog metadata, you insert a DatabricksProvider configuration into the Knowledge Catalog’s stardog:catalog:providers named graph. The configuration describes how Databricks can be accessed and how often the Knowledge Catalog should refresh the metadata.

insert data {
    graph stardog:catalog:providers 
    { 
        <urn:myDBricksProvider> a <tag:stardog:api:catalog:DatabricksProvider> ;
            <tag:stardog:api:catalog:provider:dataSource> <DATA_SOURCE_HERE> ;
            <tag:stardog:api:catalog:provider:schedule> "SCHEDULE_HERE"  .
    }
} 

This table details the property values that need to be set for configuring a Databricks metadata provider.

Property Description Values
rdf:type Databricks metadata provider class tag:stardog:api:catalog:DatabricksProvider
tag:stardog:api:catalog:provider:dataSource Datasource to use for connecting to a Databricks account The IRI of an existing Data Source, e.g. <data-source://myDatasource>
tag:stardog:api:catalog:provider:schedule Frequency of metadata imports Quartz cron expression (ex. 0 0 22 * * ? Every day at 10pm)

After the configuration is inserted, a job is automatically created to run on the specified schedule. The job will import Databricks Unity metadata and load a general data model for viewing the metadata in Explorer.

Data Model

This table contains the classes used for modeling the Databricks metadata. Prefix bricks is namespace tag:stardog:api:catalog:databricks:.

Class Property Description
bricks:Databricks   The metadata from an external Databricks platform
bricks:DatabricksCatalog   A Databricks catalog
  bricks:owner The owner account
  bricks:catalogType The catalog type
bricks:DatabricksSchema   A Databricks schema
  bricks:owner The owner account
  bricks:fullName The full name of a schema
bricks:DatabricksTable   A Databricks table
  bricks:tableType The table type
  bricks:fullName The full name of the table
  bricks:dataSourceFormat The data source format
  bricks:owner The owner account
bricks:DatabricksColumn   A Databricks column
  bricks:position The column position
  bricks:precision The column precision
  bricks:nullable If the column is nullable
  bricks:dataType The column data type
  bricks:scale The column scale

Collibra

The Data Catalog can be configured to import data from a Collibra Data Intelligence Cloud account. Collibra is a data catalog product that collects business glossary, data governance, lineage and compliance metadata.

Configuration

The configuration for a Collibra provider requires that Collibra credentials be stored in the catalog credential store. See storing credentials for details.

Collibra Data Intelligence Cloud uses HTTP Basic Auth with a username and password for authentication. Add your account username and password to the catalog credential store and use the returned access token to configure the Collibra provider.

insert data {
    graph stardog:catalog:providers 
    { 
        <urn:collibra> a <tag:stardog:api:catalog:CollibraProvider> ;
            <tag:stardog:api:catalog:provider:accessKey> "ACCESS_KEY_HERE" ;
            <tag:stardog:api:catalog:provider:serverAddress> "CLOUD_URL_HERE" ;  
            <tag:stardog:api:catalog:provider:schedule> "SCHEDULE_HERE"  .
    }
} 

This table details the property values that need to be set for configuring a Collibra metadata provider.

Property Description Values
rdf:type Collibra metadata provider class tag:stardog:api:catalog:CollibraProvider
tag:stardog:api:catalog:provider:accessKey Access key from credential store UUID
tag:stardog:api:catalog:provider:serverAddress Cloud URL for Collibra account URL
tag:stardog:api:catalog:provider:schedule Frequency of metadata imports Quartz cron expression (ex. 0 0 22 * * ? Every day at 10pm)

Data Model

This table contains the classes used for modeling the Collibra metadata. Prefix collibra is namespace tag:stardog:api:catalog:collibra:.

Class Property Description
collibra:Collibra   The metadata from an external Collibra platform
  collibra:community A Collibra community
collibra:Asset   A Collibra asset
  collibra:id An asset ID
  collibra:name An asset name
  collibra:domain An asset domain
  collibra:assetType An asset type
  collibra:tag An asset tag
collibra:Domain   A Collibra domain
  collibra:id A domain ID
  collibra:name A domain name
  collibra:community A domain community
  collibra:domainType A domain type
collibra:Community   A Collibra community
  collibra:id A community ID
  collibra:name A communit name
collibra:Relation   A Collibra relation
  collibra:id A relation ID
  collibra:relationType A domain ID
  collibra:targetAsset A domain target asset
  collibra:sourceAsset A domain source asset
collibra:DomainType   A Collibra domain type
collibra:RelationType   A Collibra relation type
  collibra:role A relation role
  collibra:coRole A relation co-role
collibra:Tag   A Collibra tag
collibra:Attribute   A Collibra attribute
  collibra:value An attribute value
  collibra:class An attribute class
  collibra:asset An attribute asset
  collibra:attributeType An attribute type
collibra:AttributeType   A Collibra attribute type
  collibra:attributeKind An attribute kind
  collibra:language An attributes language
  collibra:isInteger If attribute is an integer
  collibra:allowedValues The allowed values

Azure Purview

The Data Catalog can be configured to import data from an Azure Purview application. Purview is Microsoft’s data governance and protection product that combines an Apache Atlas server with Azure Data Lake metadata.

Configuration

The configuration for a Purview provider requires that Purview credentials be stored in the catalog credential store. See storing credentials for details.

Azure Purview uses the OAuth client_credentials grant type for authorization. Add your Azure client id and application client secret to the catalog credential store and use the returned access token to configure the Purview provider.

insert data {
    graph stardog:catalog:providers 
    { 
        <urn:purview> a <tag:stardog:api:catalog:PurviewProvider> ;
            <tag:stardog:api:catalog:provider:accessKey> "ACCESS_KEY_HERE" ;
            <tag:stardog:api:catalog:provider:serverAddress> "CLOUD_URL_HERE" ;  
            <tag:stardog:api:catalog:provider:tenantId> "AZURE_TENTANT_ID_HERE" ;  
            <tag:stardog:api:catalog:provider:schedule> "SCHEDULE_HERE"  .
    }
} 

This table details the property values that need to be set for configuring a Purview metadata provider.

Property Description Values
rdf:type Purview metadata provider class tag:stardog:api:catalog:PurviewProvider
tag:stardog:api:catalog:provider:accessKey Access key from credential store UUID
tag:stardog:api:catalog:provider:serverAddress Cloud URL for Purview application URL
tag:stardog:api:catalog:provider:tenantId Azure tenant ID for Purview application UUID
tag:stardog:api:catalog:provider:schedule Frequency of metadata imports Quartz cron expression (ex. 0 0 22 * * ? Every day at 10pm)

Data Model

This table contains the classes used for modeling the Purview metadata. Prefix purview is namespace tag:stardog:api:catalog:purview: and the atlas prefix is namespace tag:stardog:api:catalog:atlas:.

Class Property Description
purview:Purview   The metadata from an external Purview platform
  purview:hasGlossary A Purview glossary
  purview:hasAsset Has a Purview asset
atlas:Glossary   A glossary
  atlas:hasTerm Has a glossary term
atlas:GlossaryTerm   A glossary term
  atlas:assignedTo Asset assigned to term
atlas:Asset   An asset