Link Search Menu Expand Document
Start for Free

Block-Based Accelerated Query Engine

This page contains contains information on the Block-Based Accelerated Query Engine.

Page Contents
  1. Introduction
  2. Support
  3. Example Query
  4. Enabling the BARQ engine
  5. Analyzing BARQ engine performance

Introduction

BARQ - Block-Based Accelerated Query Engine - is the new block-based query execution engine for Stardog.

Stardogs default query engine was designed around a sophisticated optimizer, which uses advanced selectivity statistics to minimize disk IO. Execution is row or tuple based (i.e. the Volcano model) that works really well for queries with selective patterns. It works less well for analytical CPU bound queries, especially for large joins the traditional tuple-at-a-time model will not have a high throughput.

BARQ is inspired by systems like MonetDB (later rebranded as VectorWise and then Actian) and more recently Velox, in which executable operators operate on and generate a block of tuples at a time. This enables much higher throughput on CPU-bound query workloads.

BARQ is integrated into the query engine pipeline. Once it is enabled the query translator can pick the correct execution model. Queries that use supported operators may use BARQ automatically, once it is enabled.

Support

Support of BARQ operators is not at 100% coverage yet, plans can be executed in a hybrid way. This means Stardog may switch between the block and non-block based query executors on the fly.

As of Stardog 10.1, BARQ beta supports most key SPARQL query operators: joins, filters, simple aggregation, anti-joins (MINUS), and distinct. The missing bits include traversals (property paths and paths), services (particularly for virtualized data), and deeper integration with Stardog’s custom memory management layer, particularly, for block based hashtable lookups.

Example Query

This query of the Labelled Subgraph Query Benchmark will execute in only a few seconds with BARQ enabled, yet may take 10-20 times as long with the default query engine:

SELECT (COUNT(*) as ?count)
WHERE { 
 ?person1 (lp:Person_knows_Person | ^lp:Person_knows_Person ) ?person2 .
 ?person2 (lp:Person_knows_Person | ^lp:Person_knows_Person ) ?person3 .
 ?person3 lp:Person_hasInterest_Tag ?tag .
 FILTER ( ?person1 != ?person3 )
}

Enabling the BARQ engine

To enable BARQ globally for all SPARQL queries, one can add the following to stardog.properties this causes BARQ to be used for suitable queries.

query.executor=AUTO

To use BARQ on a single query add this pragma to the top of the query (preamble):

#pragma executor AUTO
SELECT * { ?s ?p ?o }

Analyzing BARQ engine performance

Which parts of the plan are executed with BARQ or without BARQ can be identified from the query plan profiler output.

When running stardog query explain --profile myquery.sparql the text output contains the batched keyword for operators which used BARQ:

Projection(?count) [#1], results: 1, wall time: 0 ms (0.0%), batched
`─ Group(aggregates=[(COUNT(*) AS ?count)]) [#1], results: 1, wall time: 54 ms (0.5%), batched
   `─ Filter(?person1 != ?person3) [#100.9M], results: 285.5M, wall time: 4839 ms (41.8%), batched
  	`─ MergeJoin(?person2) [#201.8M], results: 288.1M, wall time: 2745 ms (23.7%), batched
     	+─ Union [#114K], results: 114K, wall time: 2 ms (0.0%), batched
     	   +─ Scan[POSC](?person1, http://ldbcouncil.org/Person_knows_Person, ?person2) [#57K], results: 57K, wall time: 12 ms (0.1%), batched
     	   `─ Scan[PSOC](?person2, http://ldbcouncil.org/Person_knows_Person, ?person1) [#57K], results: 57K, wall time: 12 ms (0.1%), batched