Link Search Menu Expand Document
Start for Free

ETL Data Into Stardog

This page discusses Stardog’s NiFi integration support to ETL data sources to Stardog.

Page Contents
  1. Overview
  2. Installation
  3. Running NiFi
    1. Example NiFi Workflow

Overview

Stardog supports integration with Apache NiFi, a scalable, robust ETL platform that can be used to load and transform data from Enterprise data sources to Stardog. Stardog includes three Processors for loading, querying, and updating a Stardog database.

Installation

To install NiFi and the Stardog connector:

  1. Go to http://nifi.apache.org/download.html and download the latest binary release (nifi-1.12.0-bin.zip as of this writing).
  2. Decompress the zip file to a local folder.
  3. Download the Stardog NiFi nar files from http://downloads.stardog.com/extras/stardog-extras-STARDOG_VERSION_NUMBER.zip into the lib folder in the NiFi installation folder, where STARDOG_VERSION_NUMBER is the version of your Stardog server.

Running NiFi

Start the NiFi server by running the command bin/nifi.sh start in the NiFi installation folder. It takes up to a minute for the NiFi server to start. Once the server is running, you can go to the URL http://localhost:8080/nifi in your browser, which will show the empty workflow. You can drag the processor icon from the top left to the empty canvas and add a Stardog processor:

NiFi Demo

Once the processor is added, you can change the parameters to specify the Stardog server to connect to, credentials, etc. See the following example for more details:

Example NiFi Workflow

An example NiFi workflow is provided in the Stardog Examples GitHub repository. The workflow is for loading the Covid19 dataset published by the New York Times on GitHub into Stardog. It contains three processors:

  1. NiFi’s built-in processor to retrieve the CSV file from GitHub.
  2. StardogPut processor that ingests the CSV file into a staging graph in Stardog. It uses the Stardog mappings available in the examples repository.
  3. StardogQuery processor that copies the staging graph to the default graph and updates the last modification time.

Follow these steps to upload this workflow to your NiFi instance (see the screencast below and refer to Apache NiFi user interface for terminology):

  1. From the Operate Palette, click the “Upload Template” button and select the covid19-stardog.xml file.
  2. Drag the “Template” icon from the Components Toolbar onto the canvas.
  3. Unselect the processors by clicking an empty spot on the canvas, and then select the StardogPut processor to configure the connection details. Point to the correct location for the mappings file, nyt-covid.sms.
  4. Modify the connection details for the StardogQuery processor in a similar way.

NiFi Demo

The example is created to run every hour, so if you leave NiFi running, the data will be fetched, transformed, and uploaded into Stardog every hour.

Instead of supplying the Stardog URL and credentials to every Stardog processor, you can configure the Stardog Connection Service once and then reference that service in each Stardog processor.