Link Search Menu Expand Document
Start for Free

Setup and Usage

This page discusses how to set up the Stardog Spark connector for running graph analytics algorithms.

Page Contents
  1. Input Parameters
  2. What’s next?

Stardog supports graph analytics capabilities via integration with Apache Spark. Stardog Spark connector is compatible with Spark 3.X API and can be used with any Spark cluster deployed via Azure Databricks, Amazon EMR or any other means. Stardog Spark connector is compatible with Stardog versions 7 and above.

Stardog Spark connector pulls data from a Stardog database into Spark, runs the graph analytics algorithm selected, and writes the results back to Stardog. You should have a Stardog server (single node or cluster) and a Spark cluster running to use this capability. For testing purposes, you can also run Spark locally on your machine as explained below, but this option will not work for large-scale analytics.

In order to use the graph analytics capabilities, you need to complete the following steps:

  1. Install Spark on your machine. Create an environment variable called SPARK_HOME so that your shell can find Spark.
  2. Download the Spark-Connector jar and place it in the directory where you plan to submit the job. We’ll call this the Project Directory.
  3. (Mac only) Create empty directory spark_events in the Project Directory.
  4. Set up input parameters for the algorithm.
  5. Submit a Spark job using the connector and the algorithm parameters.

We explain these steps in more details in the following sections.

Input Parameters

The input parameters to graph analytics specify information such as the algorithm that will be run, configuration options for the algorithm, Stardog connection parameters, and options to configure how the graph results should be saved to Stardog. The input parameters can be written in a Java-style properties file. An example file for parameters looks as follows:

# Algorithm parameters

# Stardog connection parameters

# Output parameters

Note that if you need to use multiline values in Java properties files you will need to use backslash (‘\’) at the end of a line to indicate the next line is a part of the value.

Not all the parameters shown above are required. More detailed information about input parameters can be found in the Graph Analytics Algorithms section.

The input parameters can also be specified as CLI arguments to the graph analytics program. In this case, each ‘key=value’ pair would be passed separated by space. algorithm.iterations=5 stardog.server=http://localhost:5820 stardog.database=testDB output.graph=example:analytics:graph

What’s next?

See Usage to see how to get started with the Stardog Spark connector.