Aggregates
This page discusses implementing custom aggregate functions for SPARQL queries.
Page Contents
Overview
While the SPARQL specification has an extension point for value testing and allows for custom functions in FILTER
/BIND
/SELECT
expressions, there is no similar mechanism for aggregates. The space of aggregates is closed by definition, all legal aggregates are enumerated in the spec itself.
However, as with custom functions, there are many use cases for creating and using custom aggregate functions. Stardog provides a mechanism for creating and using custom aggregates without requiring custom SPARQL syntax.
Implementing Custom Aggregates
To implement a custom aggregate, you should extend AbstractAggregate
.
The rules regarding constructor, “copy constructor” and the copy
method for Function
apply to Aggregate
as well.
Two methods must be implemented for custom aggregates, Value _getValue() throws ExpressionEvaluationException
and void aggregate(final Value theValue, final long theMultiplicity) throws ExpressionEvaluationException
. _getValue
returns the computed aggregate value while aggregate
adds a Value to the current running aggregation. In terms of the COUNT
aggregate, aggregate
would increment the counter and _getValue
would return the final count.
The multiplicity argument to aggregate
corresponds to the fact that intermediate solution sets have a multiplicity associated with them. It’s most often 1, but joins and choice of the indexes used for the scans internally can affect this. Rather than repeating the solution N times, we associate a multiplicity of N with the solution. Again, in terms of COUNT
, this would mean that rather than incrementing the count by 1
, it would be incremented by the multiplicity.
Registering Custom Aggregates
Aggregates such as COUNT
or SAMPLE
are implementations of Function
in the same way sameTerm
or str
are and are registered with Stardog in the exact same manner.
Using Custom Aggregates
You can use your custom aggregates just like any other aggregate function. Assuming we have a custom aggregate gmean
defined in the tag:stardog:api:
namespace, we can refer to it within a query as such:
PREFIX : <http://www.example.org>
PREFIX stardog: <tag:stardog:api:>
SELECT (stardog:gmean(?O) AS ?C)
WHERE { ?S ?P ?O }