ETL Data Into Stardog
This page discusses Stardog’s NiFi integration support to ETL data sources to Stardog.
Page Contents
Overview
Stardog supports integration with Apache NiFi, a scalable, robust ETL platform that can be used to load and transform data from Enterprise data sources to Stardog. Stardog includes three Processors for loading, querying, and updating a Stardog database.
Installation
To install NiFi and the Stardog connector:
- Go to http://nifi.apache.org/download.html and download the latest binary release (nifi-1.12.0-bin.zip as of this writing).
- Decompress the zip file to a local folder.
- Download the Stardog NiFi nar files from http://downloads.stardog.com/extras/stardog-extras-
STARDOG_VERSION_NUMBER
.zip into thelib
folder in the NiFi installation folder, whereSTARDOG_VERSION_NUMBER
is the version of your Stardog server.
Running NiFi
Start the NiFi server by running the command bin/nifi.sh start
in the NiFi installation folder. It takes up to a minute for the NiFi server to start. Once the server is running, you can go to the URL http://localhost:8080/nifi in your browser, which will show the empty workflow. You can drag the processor icon from the top left to the empty canvas and add a Stardog processor:
Once the processor is added, you can change the parameters to specify the Stardog server to connect to, credentials, etc. See the following example for more details:
Example NiFi Workflow
An example NiFi workflow is provided in the Stardog Examples GitHub repository. The workflow is for loading the Covid19 dataset published by the New York Times on GitHub into Stardog. It contains three processors:
- NiFi’s built-in processor to retrieve the CSV file from GitHub.
- StardogPut processor that ingests the CSV file into a staging graph in Stardog. It uses the Stardog mappings available in the examples repository.
- StardogQuery processor that copies the staging graph to the default graph and updates the last modification time.
Follow these steps to upload this workflow to your NiFi instance (see the screencast below and refer to Apache NiFi user interface for terminology):
- From the Operate Palette, click the “Upload Template” button and select the
covid19-stardog.xml
file. - Drag the “Template” icon from the Components Toolbar onto the canvas.
- Unselect the processors by clicking an empty spot on the canvas, and then select the
StardogPut
processor to configure the connection details. Point to the correct location for the mappings file,nyt-covid.sms
. - Modify the connection details for the
StardogQuery
processor in a similar way.
The example is created to run every hour, so if you leave NiFi running, the data will be fetched, transformed, and uploaded into Stardog every hour.
Instead of supplying the Stardog URL and credentials to every Stardog processor, you can configure the Stardog Connection Service once and then reference that service in each Stardog processor.