Spark Programs
This page discusses how to write custom Spark applications using the Stardog Spark connector.
Page Contents
Overview
Stardog Spark connector allows users to create Spark Dataset
instances backed by a Stardog connection and a SELECT query. Once the dataset is created all the regular Spark functionality can be used in custom Spark applications.
Java Example
Creating Stardog datasets requires setting connection parameters.
import com.stardog.spark.datasource.StardogSource;
import com.stardog.spark.utils.Options;
// Set connection parameters for Stardog
Map<String, String> options = new HashMap<>();
options.put(Options.SERVER.getName(), "http://localhost:5820");
options.put(Options.DATABASE.getName(), "testDB");
// SELECT query should be used
options.put(Options.QUERY.getName(), "SELECT * { ... } ");
// Create a Spark session
SparkSession spark = ...
// Create a Stardog dataset
Dataset<Row> dataset = spark.read()
.format(StardogSource.class.getName())
.options(options)
.load();
Schema Inference
Stardog Spark connector analyzes the input SELECT query and tries to infer a schema for the dataset. Schema inference inspects the properties used in the triple patterns of the query and queries the Stardog database to retrieve any rdfs:range
triples that might be defined for the property. If the select query uses variable predicates or complex SPARQL patterns or the database does not contain range definitions the schema inference would fail and a generic type will be assigned to the columns. In such cases the user can supply a schema for the dataset using the Dataset.schema()
function.
The RDF types are mapped to Spark types using the following approach: IRIs and bnodes from RDF are always mapped to strings on the Spark side. Literals are mapped to built-in Spark types using the correspondence table below.
RDF Type | Spark Type |
---|---|
xsd:string | string |
xsd:boolean | boolean |
xsd:byte | byte |
xsd:short | short |
xsd:int | integer |
xsd:long | long |
xsd:integer | long |
xsd:decimal | decimal |
xsd:float | float |
xsd:double | double |