Setup and Usage
This page discusses how to set up the Stardog Spark connector for running graph analytics algorithms.
Page Contents
Stardog supports graph analytics capabilities via integration with Apache Spark. Stardog Spark connector is compatible with Spark 3.X API and can be used with any Spark cluster deployed via Azure Databricks, Amazon EMR or any other means. Stardog Spark connector is compatible with Stardog versions 7 and above.
Stardog Spark connector pulls data from a Stardog database into Spark, runs the graph analytics algorithm selected, and writes the results back to Stardog. You should have a Stardog server (single node or cluster) and a Spark cluster running to use this capability. For testing purposes, you can also run Spark locally on your machine as explained below, but this option will not work for large-scale analytics.
In order to use the graph analytics capabilities, you need to complete the following steps:
- Install Spark on your machine. Create an environment variable called
SPARK_HOME
so that your shell can find Spark. - Download the Spark-Connector jar and place it in the directory where you plan to submit the job. We’ll call this the Project Directory.
- (Mac only) Create empty directory
spark_events
in the Project Directory. - Set up input parameters for the algorithm.
- Submit a Spark job using the connector and the algorithm parameters.
We explain these steps in more details in the following sections.
Input Parameters
The input parameters to graph analytics specify information such as the algorithm that will be run, configuration options for the algorithm, Stardog connection parameters, and options to configure how the graph results should be saved to Stardog. The input parameters can be written in a Java-style properties file. An example file for parameters looks as follows:
# Algorithm parameters
algorithm.name=PageRank
algorithm.iterations=5
# Stardog connection parameters
stardog.server=http://example.com:5820
stardog.database=testDB
stardog.username=admin
stardog.password=admin
stardog.query.timeout=10m
stardog.reasoning=false
# Output parameters
output.property=example:analytics:rank
output.graph=example:analytics:graph
Note that if you need to use multiline values in Java properties files you will need to use backslash (‘\’) at the end of a line to indicate the next line is a part of the value.
Not all the parameters shown above are required. More detailed information about input parameters can be found in the Graph Analytics Algorithms section.
The input parameters can also be specified as CLI arguments to the graph analytics program. In this case, each ‘key=value’ pair would be passed separated by space.
algorithm.name=PageRank algorithm.iterations=5 stardog.server=http://localhost:5820 stardog.database=testDB output.property=example:analytics:rank output.graph=example:analytics:graph
What’s next?
See Usage to see how to get started with the Stardog Spark connector.