Spark Programs

This page discusses how to write custom Spark applications using the Stardog Spark connector.

Page Contents

Overview
Java Example
Schema Inference

Overview

Stardog Spark connector allows users to create Spark Dataset instances backed by a Stardog connection and a SELECT query. Once the dataset is created all the regular Spark functionality can be used in custom Spark applications.

Java Example

Creating Stardog datasets requires setting connection parameters.

import com.stardog.spark.datasource.StardogSource;
import com.stardog.spark.utils.Options;

// Set connection parameters for Stardog  
Map<String, String> options = new HashMap<>();
options.put(Options.SERVER.getName(), "http://localhost:5820");
options.put(Options.DATABASE.getName(), "testDB");

// SELECT query should be used
options.put(Options.QUERY.getName(), "SELECT * { ... } ");

// Create a Spark session
SparkSession spark = ...

// Create a Stardog dataset
Dataset<Row> dataset = spark.read()
                            .format(StardogSource.class.getName())
                            .options(options)
                            .load();

Schema Inference

Stardog Spark connector analyzes the input SELECT query and tries to infer a schema for the dataset. Schema inference inspects the properties used in the triple patterns of the query and queries the Stardog database to retrieve any rdfs:range triples that might be defined for the property. If the select query uses variable predicates or complex SPARQL patterns or the database does not contain range definitions the schema inference would fail and a generic type will be assigned to the columns. In such cases the user can supply a schema for the dataset using the Dataset.schema() function.

The RDF types are mapped to Spark types using the following approach: IRIs and bnodes from RDF are always mapped to strings on the Spark side. Literals are mapped to built-in Spark types using the correspondence table below.

RDF Type	Spark Type
xsd:string	string
xsd:boolean	boolean
xsd:byte	byte
xsd:short	short
xsd:int	integer
xsd:long	long
xsd:integer	long
xsd:decimal	decimal
xsd:float	float
xsd:double	double

Overview
Java Example
Schema Inference