This page discusses how to set up the Stardog Spark connector for running graph analytics algorithms.
Stardog supports graph analytics capabilities via integration with Apache Spark. Stardog Spark connector is compatible with Spark 3.X API and can be used with any Spark cluster deployed via Azure Databricks, Amazon EMR or any other means. Stardog Spark connector is compatible with Stardog versions 7 and above.
Stardog Spark connector pulls data from a Stardog database into Spark, runs the graph analytics algorithm selected, and writes the results back to Stardog. You should have a Stardog server (single node or cluster) and a Spark cluster running to use this capability. For testing purposes, you can also run Spark locally on your machine as explained below, but this option will not work for large-scale analytics.
In order to use the graph analytics capabilities, you need to complete the following steps:
- Install Spark on your machine. Create an environment variable called
SPARK_HOMEso that your shell can find Spark.
- Download the Spark-Connector jar and place it in the directory where you plan to submit the job. We’ll call this the Project Directory.
- (Mac only) Create empty directory
spark_eventsin the Project Directory.
- Set up input parameters for the algorithm.
- Submit a Spark job using the connector and the algorithm parameters.
We explain these steps in more details in the following sections.
The input parameters to graph analytics specify information such as the algorithm that will be run, configuration options for the algorithm, Stardog connection parameters, and options to configure how the graph results should be saved to Stardog. The input parameters can be written in a Java-style properties file. An example file for parameters looks as follows:
# Algorithm parameters algorithm.name=PageRank algorithm.iterations=5 # Stardog connection parameters stardog.server=http://example.com:5820 stardog.database=testDB stardog.username=admin stardog.password=admin stardog.query.timeout=10m stardog.reasoning=false # Output parameters output.property=example:analytics:rank output.graph=example:analytics:graph
Note that if you need to use multiline values in Java properties files you will need to use backslash (‘\’) at the end of a line to indicate the next line is a part of the value.
Not all the parameters shown above are required. More detailed information about input parameters can be found in the Graph Analytics Algorithms section.
The input parameters can also be specified as CLI arguments to the graph analytics program. In this case, each ‘key=value’ pair would be passed separated by space.
algorithm.name=PageRank algorithm.iterations=5 stardog.server=http://localhost:5820 stardog.database=testDB output.property=example:analytics:rank output.graph=example:analytics:graph
See Usage to see how to get started with the Stardog Spark connector.