Databricks Configuration

This page discusses how to configure Databricks as an external compute platform.

Page Contents

Mandatory properties:
Optional properties:

A Databricks Data Source added in Stardog using the data-source add CLI command or Stardog Studio can be registered as an external compute platform. Add specific properties in the data source definition to configure the Databricks data source as an external compute platform.

Mandatory properties:

Property	Description	Example
external.compute	Boolean value for specifying whether or not data-source is registered as an external compute platform.	`true`
external.compute.host.name	Name of the Databricks workspace.	`adb-XXXXXXXXXXXXXX.XX.azuredatabricks.net`
databricks.cluster.id	Databricks compute cluster id.	`0704-XXXXXX-XXXXXdir`
stardog.host.url	Stardog URL to which Databricks should connect back to write the results. URL should point to the same Stardog server from where external compute operation is triggered.	`https://myhost.stardog.cloud:5820`

Optional properties:

Property	Description	Default
stardog.external.jar.path	Path of the released `stardog-spark-connector` jar file from where the file should transfer to the Databricks cluster. By default it points to Stardog’s public S3 bucket, where the latest released version is available. There are two options for overriding the default path: 1) The jar can be downloaded from another s3 bucket. In this case, this should point to a custom s3 bucket path. 2) The jar can reside locally on the file system where the Stardog server is running. In this case, it should point to the local file system path. The Stardog server will upload the jar to the Databricks cluster if it is not present on the value specified by `stardog.external.jar.upload.path` property. Download the latest jar from this link	`s3://stardog-spark/stardog-spark-connector-3.0.0.jar`
stardog.external.jar.upload.path	Path of the `stardog-spark-connector` jar file on Databricks dbfs file system. Should be set both when the jar is uploaded manually by the user and when the jar is uploaded automatically by Stardog.	`/FileStore/stardog/`
stardog.external.mapping.upload.path	Path on the dbfs where Stardog and spark job will write the temporary files. E.g., in the case of Virtual Graph Materialization operation, the mapping of the virtual graph will be stored here. Stardog and spark jobs will delete these temporary files after completing the process.	`/FileStore/stardog/`
stardog.external.databricks.job.timeout	Spark job timeout property in seconds.	`86400`
stardog.external.databricks.task.timeout	Spark task timeout property in seconds.	`86400`
stardog.external.databricks.task.retry.count	The number of retries to attempt before the Spark job fails. Set to zero for a single attempt with no retries.	`3`
stardog.external.databricks.task.retry.interval.millis	Time interval after which Spark jobs make a retry attempt in case of an error.	`2000`
stardog.external.databricks.is.retry.timeout	Boolean value for specifying whether the Spark job makes a retry attempt or not in case of a timeout error.	`false`
stardog.external.databricks.job.on.start.email.list	Comma-separated list of emails to be notified when the Spark job stars.
stardog.external.databricks.job.on.success.email.list	Comma-separated list of emails to be notified when the Spark job completes.
stardog.external.databricks.job.on.failure.email.list	Comma-separated list of emails to be notified when the Spark job errors out.
spark.dataset.repartition	Refer to spark-docs. Set this value to override the default partition behavior.

Mandatory properties:
Optional properties: