External Catalogs
This page discusses importing and using metadata from external catalog systems.
Page Contents
Overview
The Knowledge Catalog can import metadata from external catalog systems to enable a single unified semantic layer over multiple catalogs.
External Credentials
Metadata providers that call external REST APIs cannot use Data Sources to hold credentials. The Catalog provides CLI commands to store encrypted usernames and passwords in the catalog credential store. A token is returned that is used in provider configurations. The token is exchanged for the stored credentials during server processing.
The catalog credential store is a user configurable location for storing credential strings. The stored strings can be configured to be stored as keyless or encryption key hashes.
The following options can be used to override the defaults.
Option | Description | Value | Default |
---|---|---|---|
catalog.key.store | The filepath of the key store. Ignored for database key store. | file path | $STARDOG_HOME |
catalog.key.type | The type of hashing to use. | RSA or AES or XOR | RSA |
catalog.key.password | Password for an AES key. | string | |
catalog.credential.store | The type of key storage to use. | database or filepath | database |
By default the credential store expectes a DER encoded RSA key pair located in the Stardog home directory. The keys need to be named catalog.priv
and catalog.pub
.
The following example uses openssl
to generate an RSA key pair:
- Create an RSA PEM encoded file
openssl genrsa -out catalog.pem 2048
- Extract the private key as a DER encoded file
openssl pkcs8 -topk8 -nocrypt -in catalog.pem -outform der -out catalog.priv
- Extract the public key as a DER encoded file
openssl pkey -in catalog.pem -pubout -outform der -out catalog.pub
To add credentials to the store use the CLI administrator commands catalog credentials-add
and catalog credentials-list
.
Adding a credential
In the following example a username my_user_name
is added along with the corresponding password. The returned UUID 04bee4c7-26cc-4c97-b817-dbb3298fa842
is the value that is the used in the access_key
property for the metadata provider using these credentials.
$ stardog-admin catalog credentials-add
Username: my_user_name
Password:
Description (optional): My External System Credentials
04bee4c7-26cc-4c97-b817-dbb3298fa842
Listing existing stored credentials
In the following example the account credentials that were added are now listed. Only the UUID and the user provided description are retrieved. There is no way to retrieve the original credential values. The credential values can only be retrived by the catalog server once they are stored.
$ stardog-admin catalog credentials-list
Catalog Stored Credentials
+--------------------------------------+--------------------------------+
| Access Key | Description |
+--------------------------------------+--------------------------------+
| 04bee4c7-26cc-4c97-b817-dbb3298fa842 | My External System Credentials |
+--------------------------------------+--------------------------------+
Databricks Unity Catalog
The Knowledge Catalog can be configured to import Unity Catalog metadata using a Databricks account. You can configure the import to occur on a customizable schedule. Databricks Unity metadata is written to the Stardog Catalog, where it can be queried in conjunction with your Stardog databases.
Configuration
To import Databricks Unity Catalog metadata, you insert a DatabricksProvider
configuration into the Knowledge Catalog’s stardog:catalog:providers
named graph. The configuration describes how Databricks can be accessed and how often the Knowledge Catalog should refresh the metadata.
insert data {
graph stardog:catalog:providers
{
<urn:myDBricksProvider> a <tag:stardog:api:catalog:DatabricksProvider> ;
<tag:stardog:api:catalog:provider:dataSource> <DATA_SOURCE_HERE> ;
<tag:stardog:api:catalog:provider:schedule> "SCHEDULE_HERE" .
}
}
This table details the property values that need to be set for configuring a Databricks metadata provider.
Property | Description | Values |
---|---|---|
rdf:type | Databricks metadata provider class | tag:stardog:api:catalog:DatabricksProvider |
tag:stardog:api:catalog:provider:dataSource | Datasource to use for connecting to a Databricks account | The IRI of an existing Data Source, e.g. <data-source://myDatasource> |
tag:stardog:api:catalog:provider:schedule | Frequency of metadata imports | Quartz cron expression (ex. 0 0 22 * * ? Every day at 10pm) |
After the configuration is inserted, a job is automatically created to run on the specified schedule. The job will import Databricks Unity metadata and load a general data model for viewing the metadata in Explorer.
Data Model
This table contains the classes used for modeling the Databricks metadata. Prefix bricks
is namespace tag:stardog:api:catalog:databricks:
.
Class | Property | Description |
---|---|---|
bricks:Databricks | The metadata from an external Databricks platform | |
bricks:DatabricksCatalog | A Databricks catalog | |
bricks:owner | The owner account | |
bricks:catalogType | The catalog type | |
bricks:DatabricksSchema | A Databricks schema | |
bricks:owner | The owner account | |
bricks:fullName | The full name of a schema | |
bricks:DatabricksTable | A Databricks table | |
bricks:tableType | The table type | |
bricks:fullName | The full name of the table | |
bricks:dataSourceFormat | The data source format | |
bricks:owner | The owner account | |
bricks:DatabricksColumn | A Databricks column | |
bricks:position | The column position | |
bricks:precision | The column precision | |
bricks:nullable | If the column is nullable | |
bricks:dataType | The column data type | |
bricks:scale | The column scale |
Collibra
The Knowledge Catalog can be configured to import data from a Collibra Data Intelligence Cloud account. Collibra is a data catalog product that collects business glossary, data governance, lineage and compliance metadata.
Configuration
The configuration for a Collibra provider requires that Collibra credentials be stored in the catalog credential store. See storing credentials for details.
Collibra Data Intelligence Cloud uses HTTP Basic Auth with a username and password for authentication. Add your account username and password to the catalog credential store and use the returned access key to configure the Collibra provider.
insert data {
graph stardog:catalog:providers
{
<urn:collibra> a <tag:stardog:api:catalog:CollibraProvider> ;
<tag:stardog:api:catalog:provider:accessKey> "ACCESS_KEY_HERE" ;
<tag:stardog:api:catalog:provider:serverAddress> "CLOUD_URL_HERE" ;
<tag:stardog:api:catalog:provider:schedule> "SCHEDULE_HERE" .
}
}
This table details the property values that need to be set for configuring a Collibra metadata provider.
Property | Description | Values |
---|---|---|
rdf:type | Collibra metadata provider class | tag:stardog:api:catalog:CollibraProvider |
tag:stardog:api:catalog:provider:accessKey | Access key from credential store | UUID |
tag:stardog:api:catalog:provider:serverAddress | Cloud URL for Collibra account | URL |
tag:stardog:api:catalog:provider:schedule | Frequency of metadata imports | Quartz cron expression (ex. 0 0 22 * * ? Every day at 10pm) |
Data Model
This table contains the classes used for modeling the Collibra metadata. Prefix collibra
is namespace tag:stardog:api:catalog:collibra:
.
Class | Property | Description |
---|---|---|
collibra:Collibra | The metadata from an external Collibra platform | |
collibra:community | A Collibra community | |
collibra:Asset | A Collibra asset | |
collibra:id | An asset ID | |
collibra:name | An asset name | |
collibra:domain | An asset domain | |
collibra:assetType | An asset type | |
collibra:tag | An asset tag | |
collibra:collibraUrl | URL to Collibra asset page | |
collibra:AssetType | An asset type | |
collibra:childOf | A parent asset type | |
collibra:Domain | A Collibra domain | |
collibra:id | A domain ID | |
collibra:name | A domain name | |
collibra:community | A domain community | |
collibra:domainType | A domain type | |
collibra:Community | A Collibra community | |
collibra:id | A community ID | |
collibra:community | A parent community | |
collibra:name | A community name | |
collibra:Relation | A Collibra relation | |
collibra:id | A relation ID | |
collibra:relationType | A domain ID | |
collibra:targetAsset | A domain target asset | |
collibra:sourceAsset | A domain source asset | |
collibra:DomainType | A Collibra domain type | |
collibra:RelationType | A Collibra relation type | |
collibra:role | A relation role | |
collibra:coRole | A relation co-role | |
collibra:Tag | A Collibra tag | |
collibra:Attribute | A Collibra attribute | |
collibra:value | An attribute value | |
collibra:class | An attribute class | |
collibra:asset | An attribute asset | |
collibra:attributeType | An attribute type | |
collibra:AttributeType | A Collibra attribute type | |
collibra:attributeKind | An attribute kind | |
collibra:language | An attributes language | |
collibra:isInteger | If attribute is an integer | |
collibra:allowedValues | The allowed values |
Microsoft Purview
The Knowledge Catalog can be configured to both import data from an Microsoft Purview application running on Azure and export Stardog catalog data back into it. Purview is Microsoft’s data governance, cataloging and protection product.
Configuration
The configuration for a Purview provider requires that Purview credentials be stored in the catalog credential store. See storing credentials for details.
Microsoft Purview on Azure uses the OAuth client_credentials
grant type for authorization. Add your Azure client id and application client secret to the catalog credential store and use the returned access key to configure the Purview provider.
insert data {
graph stardog:catalog:providers
{
<urn:purview> a <tag:stardog:api:catalog:PurviewProvider> ;
<tag:stardog:api:catalog:provider:accessKey> "ACCESS_KEY_HERE" ;
<tag:stardog:api:catalog:provider:serverAddress> "CLOUD_URL_HERE" ;
<tag:stardog:api:catalog:provider:tenantId> "AZURE_TENTANT_ID_HERE" ;
<tag:stardog:api:catalog:provider:schedule> "SCHEDULE_HERE" .
}
}
This table details the property values that need to be set for configuring a Purview metadata provider.
Property | Description | Values |
---|---|---|
rdf:type | Purview metadata provider class | tag:stardog:api:catalog:PurviewProvider |
tag:stardog:api:catalog:provider:accessKey | Access key from credential store | UUID |
tag:stardog:api:catalog:provider:serverAddress | Cloud URL for Purview application | URL |
tag:stardog:api:catalog:provider:tenantId | Azure tenant ID for Purview application | UUID |
tag:stardog:api:catalog:provider:schedule | Frequency of metadata imports | Quartz cron expression (ex. 0 0 22 * * ? Every day at 10pm) |
Exporting Stardog Metadata
The Purview provider exports data source metadata into the configured Purview server. There is no extra configuration required to export Stardog metadata. When the provider’s scheduled job is run the export will automatically occur after the import is completed.
The following Purview asset types are added for the Stardog data source metadata:
Type | Description |
---|---|
stardog_data_source | The asset type for Stardog data sources |
stardog_database | The asset type for databases |
stardog_schema | The asset type for schemas |
stardog_table | The asset type for tables |
stardog_column | The asset type for columns |
stardog_concept | The asset type for mapped concepts |
After the scheduled provider job has run you can log into your Purview account and view the Stardog metadata. It is located in a custom stardogcatalog
collection.
Data Model
This table contains the classes used for modeling the Purview metadata. Prefix purview
is namespace tag:stardog:api:catalog:purview:
and the atlas
prefix is namespace tag:stardog:api:catalog:atlas:
.
Class | Property | Description |
---|---|---|
purview:Purview | The metadata from an external Purview platform | |
purview:hasGlossary | A Purview glossary | |
purview:hasCollection | A Purview collection | |
purview:hasAsset | Has a Purview asset | |
purview:Glossary | A glossary | |
purview:hasTerm | Has a glossary term | |
purview:GlossaryTerm | A glossary term | |
purview:assignedTo | Asset assigned to term | |
purview:Collection | A Collection of assets and source | |
purview:hasSource | A data source | |
purview:hasAsset | An asset | |
purview:Asset | An asset | |
purview:scanId | The Id of the last scan | |
purview:lastScanned | The time of the last scan | |
purview:attribute | An attribute | |
purview:assetType | The asset type | |
purview:sourceId | An Id of the source that generated this asset | |
purview:AssetType | An asset type | |
purview:Source | A data source | |
purview:Relationship | An asset relationship | |
purview:head | The head asset of a relationship | |
purview:tail | The tail asset of a relationship | |
purview:Attribute | An asset attribute | |
purview:value | The data value | |
purview:attributeType | The type of attribute | |
purview:AttributeType | An attribute type | |
purview:typeName | The type name |
JDBC
The Knowledge Catalog can be configured to use a JDBC driver to import database metadata into the Knowledge Catalog. Any JDBC compliant driver should work. Be sure to first add the JAR file to the classpath of the Stardog server.
Configuration
The configuration for a JDBC provider requires that credentials be stored in the catalog credential store. See storing credentials for details.
Currently, the JDBC provider expects a username and password for authentication. Add your database username and password to the catalog credential store and use the returned access key to configure the JDBC provider. When the provider import job is run, the standard jdbc.username
and jdbc.password
properties will be injected to the JDBC connection string.
insert data {
graph stardog:catalog:providers
{
<urn:purview> a <tag:stardog:api:catalog:JdbcProvider> ;
<tag:stardog:api:catalog:provider:accessKey> "ACCESS_KEY_HERE" ;
<tag:stardog:api:catalog:provider:jdbcDriver> "DRIVER_CLASSNAME_HERE" ;
<tag:stardog:api:catalog:provider:jdbcURL> "JDBC_CONNECTION_STRING_HERE" ;
<tag:stardog:api:catalog:provider:schedule> "SCHEDULE_HERE" .
}
}
This table details the property values that need to be set for configuring a JDBC metadata provider.
Property | Description | Values |
---|---|---|
rdf:type | JDBC metadata provider class | tag:stardog:api:catalog:JdbcProvider |
tag:stardog:api:catalog:provider:accessKey | Access key from credential store | UUID |
tag:stardog:api:catalog:provider:jdbcDriver | JDBC driver class name | The full class name |
tag:stardog:api:catalog:provider:jdbcURL | A valid JDBC connection string | A valid connection string |
tag:stardog:api:catalog:provider:schedule | Frequency of metadata imports | Quartz cron expression (ex. 0 0 22 * * ? Every day at 10pm) |
Data Model
This table contains the classes used for modeling the Purview metadata. Prefix jdbc
is namespace tag:stardog:api:catalog:jdbc:
and the catalog
prefix is namespace tag:stardog:api:catalog:
.
Class | Property | Description |
---|---|---|
jdbc:DBMS | A database management system | |
jdbc:databaseName | The database name | |
jdbc:databaseVersion | The database version | |
jdbc:driverName | The driver name | |
jdbc:driverVersion | The driver version | |
jdbc:user | The accessing user | |
jdbc:PrimaryKey | A database table column designated to uniquely identify each record | |
jdbc:ForeignKey | A database table column used to link data between tables | |
catalog:DatabaseCatalog | A database catalog. Not all systems have a catalog, their highest level object may be schema | |
catalog:DatabaseSchema | Database schemas are containers for the tables of the database | |
catalog:Table | A table within a database | |
catalog:tableName | The name of a table within a database | |
catalog:tableType | The type of table | |
catalog:Column | A database table column | |
catalog:name | The name of a column within a table | |
catalog:columnType | The datatype of the column |