External Catalogs
This page discusses importing and using metadata from external catalog systems.
Page Contents
Overview
The Knowledge Catalog can import metadata from external catalog systems to enable a single unified semantic layer over multiple catalogs.
External Credentials
Metadata providers that call external REST APIs cannot use Data Sources to hold credentials. The Catalog provides CLI commands to store encrypted usernames and passwords in the catalog credential store. A token is returned that is used in provider configurations. The token is exchanged for the stored credentials during server processing.
The catalog credential store is a user configurable location for storing credential strings. The stored strings can be configured to be stored as keyless or encryption key hashes.
The following options can be used to override the defaults.
Option | Description | Value | Default |
---|---|---|---|
catalog.key.store | The filepath of the key store. Ignored for database key store. | file path | $STARDOG_HOME |
catalog.key.type | The type of hashing to use. | RSA or AES or XOR | RSA |
catalog.key.password | Password for an AES key. | string | |
catalog.credential.store | The type of key storage to use. | database or filepath | database |
By default the credential store expectes a DER encoded RSA key pair located in the Stardog home directory. The keys need to be named catalog.priv
and catalog.pub
.
The following example uses openssl
to generate an RSA key pair:
- Create an RSA PEM encoded file
openssl genrsa -out catalog.pem 2048
- Extract the private key as a DER encoded file
openssl pkcs8 -topk8 -nocrypt -in catalog.pem -outform der -out catalog.priv
- Extract the public key as a DER encoded file
openssl pkey -in catalog.pem -pubout -outform der -out catalog.pub
To add credentials to the store use the CLI administrator commands catalog credentials-add
and catalog credentials-list
.
Adding a credential
In the following example a username my_user_name
is added along with the corresponding password. The returned UUID 04bee4c7-26cc-4c97-b817-dbb3298fa842
is the value that is the used in the access_key
property for the metadata provider using these credentials.
$ stardog-admin catalog credentials-add
Username: my_user_name
Password:
Description (optional): My External System Credentials
04bee4c7-26cc-4c97-b817-dbb3298fa842
Listing existing stored credentials
In the following example the account credentials that were added are now listed. Only the UUID and the user provided description are retrieved. There is no way to retrieve the original credential values. The credential values can only be retrived by the catalog server once they are stored.
$ stardog-admin catalog credentials-list
Catalog Stored Credentials
+--------------------------------------+--------------------------------+
| Access Key | Description |
+--------------------------------------+--------------------------------+
| 04bee4c7-26cc-4c97-b817-dbb3298fa842 | My External System Credentials |
+--------------------------------------+--------------------------------+
Databricks Unity Catalog
The Knowledge Catalog can be configured to import Unity Catalog metadata using a Databricks account. You can configure the import to occur on a customizable schedule. Databricks Unity metadata is written to the Stardog Catalog, where it can be queried in conjunction with your Stardog databases.
Configuration
To import Databricks Unity Catalog metadata, you insert a DatabricksProvider
configuration into the Knowledge Catalog’s stardog:catalog:providers
named graph. The configuration describes how Databricks can be accessed and how often the Knowledge Catalog should refresh the metadata.
insert data {
graph stardog:catalog:providers
{
<urn:myDBricksProvider> a <tag:stardog:api:catalog:DatabricksProvider> ;
<tag:stardog:api:catalog:provider:dataSource> <DATA_SOURCE_HERE> ;
<tag:stardog:api:catalog:provider:schedule> "SCHEDULE_HERE" .
}
}
This table details the property values that need to be set for configuring a Databricks metadata provider.
Property | Description | Values |
---|---|---|
rdf:type | Databricks metadata provider class | tag:stardog:api:catalog:DatabricksProvider |
tag:stardog:api:catalog:provider:dataSource | Datasource to use for connecting to a Databricks account | The IRI of an existing Data Source, e.g. <data-source://myDatasource> |
tag:stardog:api:catalog:provider:schedule | Frequency of metadata imports | Quartz cron expression (ex. 0 0 22 * * ? Every day at 10pm) |
After the configuration is inserted, a job is automatically created to run on the specified schedule. The job will import Databricks Unity metadata and load a general data model for viewing the metadata in Explorer.
Data Model
This table contains the classes used for modeling the Databricks metadata. Prefix bricks
is namespace tag:stardog:api:catalog:databricks:
.
Class | Property | Description |
---|---|---|
bricks:Databricks | The metadata from an external Databricks platform | |
bricks:DatabricksCatalog | A Databricks catalog | |
bricks:owner | The owner account | |
bricks:catalogType | The catalog type | |
bricks:DatabricksSchema | A Databricks schema | |
bricks:owner | The owner account | |
bricks:fullName | The full name of a schema | |
bricks:DatabricksTable | A Databricks table | |
bricks:tableType | The table type | |
bricks:fullName | The full name of the table | |
bricks:dataSourceFormat | The data source format | |
bricks:owner | The owner account | |
bricks:DatabricksColumn | A Databricks column | |
bricks:position | The column position | |
bricks:precision | The column precision | |
bricks:nullable | If the column is nullable | |
bricks:dataType | The column data type | |
bricks:scale | The column scale |
Collibra
The Data Catalog can be configured to import data from a Collibra Data Intelligence Cloud account. Collibra is a data catalog product that collects business glossary, data governance, lineage and compliance metadata.
Configuration
The configuration for a Collibra provider requires that Collibra credentials be stored in the catalog credential store. See storing credentials for details.
Collibra Data Intelligence Cloud uses HTTP Basic Auth with a username and password for authentication. Add your account username and password to the catalog credential store and use the returned access token to configure the Collibra provider.
insert data {
graph stardog:catalog:providers
{
<urn:collibra> a <tag:stardog:api:catalog:CollibraProvider> ;
<tag:stardog:api:catalog:provider:accessKey> "ACCESS_KEY_HERE" ;
<tag:stardog:api:catalog:provider:serverAddress> "CLOUD_URL_HERE" ;
<tag:stardog:api:catalog:provider:schedule> "SCHEDULE_HERE" .
}
}
This table details the property values that need to be set for configuring a Collibra metadata provider.
Property | Description | Values |
---|---|---|
rdf:type | Collibra metadata provider class | tag:stardog:api:catalog:CollibraProvider |
tag:stardog:api:catalog:provider:accessKey | Access key from credential store | UUID |
tag:stardog:api:catalog:provider:serverAddress | Cloud URL for Collibra account | URL |
tag:stardog:api:catalog:provider:schedule | Frequency of metadata imports | Quartz cron expression (ex. 0 0 22 * * ? Every day at 10pm) |
Data Model
This table contains the classes used for modeling the Collibra metadata. Prefix collibra
is namespace tag:stardog:api:catalog:collibra:
.
Class | Property | Description |
---|---|---|
collibra:Collibra | The metadata from an external Collibra platform | |
collibra:community | A Collibra community | |
collibra:Asset | A Collibra asset | |
collibra:id | An asset ID | |
collibra:name | An asset name | |
collibra:domain | An asset domain | |
collibra:assetType | An asset type | |
collibra:tag | An asset tag | |
collibra:Domain | A Collibra domain | |
collibra:id | A domain ID | |
collibra:name | A domain name | |
collibra:community | A domain community | |
collibra:domainType | A domain type | |
collibra:Community | A Collibra community | |
collibra:id | A community ID | |
collibra:name | A communit name | |
collibra:Relation | A Collibra relation | |
collibra:id | A relation ID | |
collibra:relationType | A domain ID | |
collibra:targetAsset | A domain target asset | |
collibra:sourceAsset | A domain source asset | |
collibra:DomainType | A Collibra domain type | |
collibra:RelationType | A Collibra relation type | |
collibra:role | A relation role | |
collibra:coRole | A relation co-role | |
collibra:Tag | A Collibra tag | |
collibra:Attribute | A Collibra attribute | |
collibra:value | An attribute value | |
collibra:class | An attribute class | |
collibra:asset | An attribute asset | |
collibra:attributeType | An attribute type | |
collibra:AttributeType | A Collibra attribute type | |
collibra:attributeKind | An attribute kind | |
collibra:language | An attributes language | |
collibra:isInteger | If attribute is an integer | |
collibra:allowedValues | The allowed values |
Azure Purview
The Data Catalog can be configured to import data from an Azure Purview application. Purview is Microsoft’s data governance and protection product that combines an Apache Atlas server with Azure Data Lake metadata.
Configuration
The configuration for a Purview provider requires that Purview credentials be stored in the catalog credential store. See storing credentials for details.
Azure Purview uses the OAuth client_credentials
grant type for authorization. Add your Azure client id and application client secret to the catalog credential store and use the returned access token to configure the Purview provider.
insert data {
graph stardog:catalog:providers
{
<urn:purview> a <tag:stardog:api:catalog:PurviewProvider> ;
<tag:stardog:api:catalog:provider:accessKey> "ACCESS_KEY_HERE" ;
<tag:stardog:api:catalog:provider:serverAddress> "CLOUD_URL_HERE" ;
<tag:stardog:api:catalog:provider:tenantId> "AZURE_TENTANT_ID_HERE" ;
<tag:stardog:api:catalog:provider:schedule> "SCHEDULE_HERE" .
}
}
This table details the property values that need to be set for configuring a Purview metadata provider.
Property | Description | Values |
---|---|---|
rdf:type | Purview metadata provider class | tag:stardog:api:catalog:PurviewProvider |
tag:stardog:api:catalog:provider:accessKey | Access key from credential store | UUID |
tag:stardog:api:catalog:provider:serverAddress | Cloud URL for Purview application | URL |
tag:stardog:api:catalog:provider:tenantId | Azure tenant ID for Purview application | UUID |
tag:stardog:api:catalog:provider:schedule | Frequency of metadata imports | Quartz cron expression (ex. 0 0 22 * * ? Every day at 10pm) |
Data Model
This table contains the classes used for modeling the Purview metadata. Prefix purview
is namespace tag:stardog:api:catalog:purview:
and the atlas
prefix is namespace tag:stardog:api:catalog:atlas:
.
Class | Property | Description |
---|---|---|
purview:Purview | The metadata from an external Purview platform | |
purview:hasGlossary | A Purview glossary | |
purview:hasAsset | Has a Purview asset | |
atlas:Glossary | A glossary | |
atlas:hasTerm | Has a glossary term | |
atlas:GlossaryTerm | A glossary term | |
atlas:assignedTo | Asset assigned to term | |
atlas:Asset | An asset |