Stardog Geospatial
Learn how to manage geospatial data in Stardog.
Page Contents
Background
Stardog’s geospatial index is a powerful tool. Many users have augmented their knowledge graphs with spatial data to great success, adding another layer of utility to the enterprise. However, it is often one of the more troublesome features, having the potential to cause a few headaches when getting started. In this post, I intend to provide a detailed primer to help alleviate those headaches.
Stardog supports two geospatial specs: W3C’s WGS 84 and OGC’s GeoSPARQL. In this post I will, for the sake of clarity and readability, combine the two by using GeoSPARQL’s hasGeometry
predicate to map locations and areas to all to all nodes of type geo:Geometry
. While this is technically unneeded for WGS 84 features, it makes the queries we will be running on the data much easier to follow.
We will be using a DC Landmarks data set. Feel free to load it yourself and play along!
Creating Geographical Data
By default, the spatial index is not enabled when creating a new database. It can be enabled by setting the database configuration option spatial.enabled=true
.
Our toy data set has about 10 nodes representing various landmarks in the Washington, DC area. Besides any domain knowledge we wish to attach to these nodes, in order to perform any spatial operations on them we need to associate them with a Geometry entity.
Representing single points
WGS latitude and longitude
For a simple latitude/longitude pair, we have a couple of choices available, the simplest of which is to use WGS 84 to specify them in our Geometry:
@prefix wgs: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix geo: <http://www.opengis.net/ont/geosparql#> .
@prefix : <http://blog.stardog.com/geons/> .
# Create geo:Geometry
:WhiteHouseGeom a geo:Geometry ;
wgs:lat "38.89761"^^xsd:float ;
wgs:long "-77.03637"^^xsd:float .
# Link it to our entity
:WhiteHouse a :Location ;
rdfs:label "The White House" ;
geo:hasGeometry :WhiteHouseGeom .
WKT
Our second option is to define our point’s Geometry using the OGC’s Well-Known Text (WKT) format. While it’s a fair bit easier to make mistakes this way, representing points with WKT will be more congruous with the rest of our data set, not to mention others’ data sets, as WKT is very widely used.
@prefix wgs: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix geo: <http://www.opengis.net/ont/geosparql#> .
@prefix : <http://blog.stardog.com/geons/> .
# geo:Geometry is still used as the type
# A big tripping point is that WKT points are expressed
# as (LONG LAT), with no comma!
:WashingtonMonGeom a geo:Geometry ;
geo:asWKT "Point(-77.03525 38.88956)"^^geo:wktLiteral .
# Link this Geometry to an entity in our graph
:WashingtonMon a :Location ;
rdfs:label "Washington Monument" ;
geo:hasGeometry :WashingtonMonGeom .
More complicated shapes
For any shape more complex than a latitude/longitude point, WKT is our only option. Lots of shapes are supported; here are some of the ones we most commonly see:
Point(LONG LAT)
: A single point as described above- Note the lack of a comma
Linestring(LONG1 LAT1, LONG2 LAT2, ..., LONGN LATN)
: A line connecting the specified points- Note commas between each point
Envelope(minLong, maxLong, maxLat, minLat)
: A rectangle with the specified corners- Note the commas between each
- Especially note the somewhat odd ordering of (min, max, max, min).
For more complex shapes, Stardog supports JTS. By downloading and enabling this library, you gain access to these shapes, most notably:
Polygon(LONG1 LAT1, LONG2 LAT2, ..., LONGN LATN, LONG1 LAT1)
: A filled-in shape with the specified points- Note that a polygon must start and end with the same point, i.e., be closed
Querying Geographical Data
Now that we have inserted Geometries into Stardog’s spatial index, it would be nice to query them spatially. Stardog supports five of the major operators defined by GeoSPARQL. These functions require units of measurement to be passed; we support the QUDT ontology for this, prefixed in our dataset by unit:
.
geof:within
This will return true when a given Geometry is contained within another. It has a few accepted forms:
<Geometry> geof:within <WKT Literal>
: Specifying a WKT Literal for an area<Geometry> geof:within <Geometry>
: Passing in another Geometry<Geometry> geof:within (LAT1 LONG1 LAT2 LONG2)
: Specifying Lat/Long of the lower-left and upper-right corner of a box<Geometry> geof:within (<WKT Literal> <WKT Literal>)
: Specifying the lower-left and upper-right corners as WKT Points
Imagine we wish to retrieve a list of DC landmarks in our dataset that are in the Arlington, VA area, we can do that a few different ways:
prefix geo: <http://www.opengis.net/ont/geosparql#>
prefix geof: <http://www.opengis.net/def/function/geosparql/>
prefix : <http://blog.stardog.com/geons/>
# All of these SPARQL queries are equivalent
# Pay special attention to the various ways the lat/long pairs are ordered
SELECT ?geom ?feature {
?f a :Location ;
rdfs:label ?feature ;
geo:hasGeometry ?geom .
?geom geof:within "ENVELOPE(-77.111, -77.052, 38.885, 38.855)"^^geo:wktLiteral .
}
SELECT ?geom ?feature {
?f a :Location ;
rdfs:label ?feature ;
geo:hasGeometry ?geom .
# We define :ArlingtonGeom elsewhere in our data set as the envelope from above
?geom geof:within :ArlingtonGeom ;
}
SELECT ?geom ?feature {
?f a :Location ;
rdfs:label ?feature ;
geo:hasGeometry ?geom .
?geom geof:within (38.855 -77.111 38.885 -77.052) ;
}
SELECT ?geom ?feature {
?f a :Location ;
rdfs:label ?feature ;
geo:hasGeometry ?geom .
?geom geof:within ("POINT(-77.111 38.855)"^^geo:wktLiteral "POINT(-77.052 38.885)"^^geo:wktLiteral) .
}
prefix geo: <http://www.opengis.net/ont/geosparql#> prefix geof: <http://www.opengis.net/def/function/geosparql/> prefix : <http://blog.stardog.com/geons/> SELECT ?geom ?feature { ?f a :Location ; rdfs:label ?feature ; geo:hasGeometry ?geom . ?geom geof:within ("POINT(-77.111 38.855)"^^geo:wktLiteral "POINT(-77.052 38.885)"^^geo:wktLiteral) . }
-
geom feature http://blog.stardog.com/geons/PentagonGeom “The Pentagon” http://blog.stardog.com/geons/TombOfUnknownGeom “Tomb of the Unknown Soldier”
We can also use geof:within
as a filter by passing in our Geometry as the first argument and then using any of the accepted sets of paramters.
prefix geo: <http://www.opengis.net/ont/geosparql#> prefix geof: <http://www.opengis.net/def/function/geosparql/> prefix : <http://blog.stardog.com/geons/> SELECT ?geom ?feature { ?f a :Location; rdfs:label ?feature; geo:hasGeometry ?geom . # We've expanded the box here to cover the entire DC metro area FILTER(geof:within(?geom, 38, -77.2, 39, -77.0)) }
-
geom feature http://blog.stardog.com/geons/PentagonGeom “The Pentagon” http://blog.stardog.com/geons/TombOfUnknownGeom “Tomb of the Unknown Soldier” http://blog.stardog.com/geons/JeffMemGeom “Jefferson Memorial” http://blog.stardog.com/geons/LincolnMemGeom “Lincoln Memorial” http://blog.stardog.com/geons/WashingtonMonGeom “Washington Monument” http://blog.stardog.com/geons/CapitolGeom “US Capitol Building” http://blog.stardog.com/geons/NatlMallGeom “National Mall” http://blog.stardog.com/geons/NASAHQGeom “NASA Headquarters” http://blog.stardog.com/geons/VietnamMemGeom “Vietnam Veterans’ Memorial” http://blog.stardog.com/geons/WhiteHouseGeom “The White House”
geof:nearby
This will return all Geometries that are within a specified radius of a given point. It has two forms:
<Geometry> geof:nearby (<Geometry> <Number of units> <Unit>)
<Geometry> geof:nearby (LAT LONG <Number of units> <Unit>)
prefix geo: <http://www.opengis.net/ont/geosparql#> prefix geof: <http://www.opengis.net/def/function/geosparql/> prefix : <http://blog.stardog.com/geons/> prefix unit: <http://qudt.org/vocab/unit#> # Get all features within 2km of the Kennedy Center SELECT ?geom ?feature { ?f a :Location ; rdfs:label ?feature ; geo:hasGeometry ?geom . ?geom geof:nearby (38.896004 -77.054995 2 unit:Kilometer) ; }
-
geom feature http://blog.stardog.com/geons/LincolnMemGeom “Lincoln Memorial” http://blog.stardog.com/geons/WashingtonMonGeom “Washington Monument” http://blog.stardog.com/geons/VietnamMemGeom “Vietnam Veterans’ Memorial” http://blog.stardog.com/geons/WhiteHouseGeom “The White House”
geof:area
This returns the area of a given Geometry in the specified unit. It can be used either to bind a variable or as part of a filter.
geof:area(<Geometry|WKT Literal>, <Unit>)
prefix geo: <http://www.opengis.net/ont/geosparql#> prefix geof: <http://www.opengis.net/def/function/geosparql/> prefix unit: <http://qudt.org/vocab/unit#> prefix : <http://blog.stardog.com/geons/> # Retrieve the area in km^2 of each shape in our dataset SELECT ?feature ?area { ?f a :Area ; rdfs:label ?feature ; geo:hasGeometry ?geom . BIND(geof:area(?geom, unit:Kilometer) as ?area) }
-
feature area “Arlington, VA” 1.7038960485445116E1 “DC Metro Area” 5.77753702962143E2
prefix geo: <http://www.opengis.net/ont/geosparql#> prefix geof: <http://www.opengis.net/def/function/geosparql/> prefix unit: <http://qudt.org/vocab/unit#> prefix : <http://blog.stardog.com/geons/> # Retrieve the shapes in our dataset that are bigger than 100 km^2 SELECT ?feature { ?f a :Area ; rdfs:label ?feature ; geo:hasGeometry ?geom . FILTER(geof:area(?geom, unit:Kilometer) > 100) }
-
feature “DC Metro Area”
geof:distance
This returns the distance between two spatial objects in the specified unit. It can also be used as a variable binding or as a filter.
geof:distance(<Geometry|WKT Literal>, <Geometry|WKT Literal>, <Unit>)
prefix geo: <http://www.opengis.net/ont/geosparql#> prefix geof: <http://www.opengis.net/def/function/geosparql/> prefix unit: <http://qudt.org/vocab/unit#> prefix : <http://blog.stardog.com/geons/> # Retrieve each feature and its distance in Yards from the White House SELECT ?feature ?distance { ?f a :Location ; rdfs:label ?feature ; geo:hasGeometry ?geom . BIND(geof:distance(?geom, :WhiteHouseGeom, unit:Yard) as ?distance) } ORDER BY DESC(?distance)
-
feature distance “Tomb of the Unknown Soldier” 4.27410351039E3 “The Pentagon” 3.70238119121E3 “US Capitol Building” 2.76937468038E3 “NASA Headquarters” 2.60127942558E3 “Jefferson Memorial” 2.01863079957E3 “Lincoln Memorial” 1.63871432741E3 “National Mall” 1.59761525629E3 “Vietnam Veterans’ Memorial” 1.28566137839E3 “Washington Monument” 9.8463529491E2 “The White House” 0.0E0
prefix geo: <http://www.opengis.net/ont/geosparql#> prefix geof: <http://www.opengis.net/def/function/geosparql/> prefix unit: <http://qudt.org/vocab/unit#> prefix : <http://blog.stardog.com/geons/> # Retrieve the features in our dataset that are at least 2 miles from the White House SELECT ?feature { ?f a :Location ; rdfs:label ?feature ; geo:hasGeometry ?geom . FILTER(geof:distance(?geom, :WhiteHouseGeom, unit:MileUSStatute) > 2) }
-
feature “The Pentagon” “Tomb of the Unknown Soldier”
geof:relate
This returns the relationship between two Geometries. Possible results are geo:contains
, geo:within
, geo:intersects
, geo:equals
, geo:disjoint
.
This function has slightly different forms, depending on if you’re using it as a BGP or a filter:
?relation geof:relate (<Geometry> <Geometry>)
FILTER(geof:relate(<Geometry>, <Geometry>, <desired result>))
prefix geo: <http://www.opengis.net/ont/geosparql#> prefix geof: <http://www.opengis.net/def/function/geosparql/> prefix unit: <http://qudt.org/vocab/unit#> prefix : <http://blog.stardog.com/geons/> # Retrieve each area and its relation to the others SELECT ?feature1 ?feature2 ?rel { ?f a :Area ; rdfs:label ?feature1 ; geo:hasGeometry ?geom1 . ?f2 a :Area ; rdfs:label ?feature2 ; geo:hasGeometry ?geom2 . ?rel geof:relate (?geom1 ?geom2) . }
-
feature1 feature2 rel “Arlington, VA” “DC Metro Area” http://www.opengis.net/ont/geosparql#within “Arlington, VA” “Arlington, VA” http://www.opengis.net/ont/geosparql#equals “DC Metro Area” “Arlington, VA” http://www.opengis.net/ont/geosparql#contains “DC Metro Area” “DC Metro Area” http://www.opengis.net/ont/geosparql#equals
prefix geo: <http://www.opengis.net/ont/geosparql#> prefix geof: <http://www.opengis.net/def/function/geosparql/> prefix unit: <http://qudt.org/vocab/unit#> prefix : <http://blog.stardog.com/geons/> # Retrieve the areas in our dataset where one contains the other SELECT ?feature1 ?feature2 { ?f a :Area ; rdfs:label ?feature1 ; geo:hasGeometry ?geom1 . ?f2 a :Area ; rdfs:label ?feature2 ; geo:hasGeometry ?geom2 . FILTER(geof:relate(?geom1, ?geom2, geo:contains)) }
-
feature1 feature2 “DC Metro Area” “Arlington, VA”
The SERVICE
syntax
All spatial operators listed above are internally supported through the SERVICE
extension point in Stardog (see the Query Stardog chapter for the list of supported services). In most cases, the BGP syntax for spatial operators should suffice, but it’s also possible to use the SERVICE
form explicitly (see below). The namespace prefix for all spatial service IRIs is defined as prefix geo_srv: <tag:stardog:api:geo:>
.
geof:nearby
SERVICE geo_srv:nearby {
[] geo:result ?shape ; # the output variable
geo:inputs (input radius unit) # or
geo:inputs (lat lon radius unit)
}
input
, lat
, and lon
can be variables or constants. radius
and unit
must be constants.
geof:within
SERVICE geo_srv:within {
[] geo_srv:result ?result ;
geo_srv:input (input1 input2) # or
geo_srv:input (input1 lower_left upper_right) # or
geo_srv:input (input1 lower_left_lat lower_left_lon upper_right_lat upper_right_top_lon)
}
as of Stardog 11, the ?result
variable is always set to true
. That is, it is not possible to query for shapes that are not within a given shape. The service does not return results when both inputs are constants and the first is not within the second. input1
and input2
can be variables or constants. lower_left
and upper_right
could be variables or WKT literals defining points. lower_left_lat
, lower_left_lon
, upper_right_lat
, and upper_right_top_lon
should be valid coordinates of the bounding box points.
geof:distance
SERVICE geo_srv:distance {
[] geo_srv:result ?distance ;
geo_srv:input (input1 input2 unit)
}
input1
and input2
can be variables or constants, unit
must be a valid constant.
geof:area
SERVICE geo_srv:area {
[] geo_srv:result ?area ;
geo_srv:input (input unit)
}
geof:relate
SERVICE geo_srv:relate {
[] geo_srv:result ?relation ;
geo_srv:input (input1 input2)
}
input1
and input2
can be variables or constants.
Known Issues and Limitations
The spatial operators in Stardog 11 exhibit the following properties, which sometimes affect query results:
geof:distance
measures the distance between the center points of two objects. For points, the center is naturally the point itself. However, for polygons, the distance in Stardog is different from the distance between the two nearest points of the polygons (which is how GeoSPARQL defines it).geof:nearby
is not a symmetric relation: for spatial objectsA
andB
,B
is nearbyA
(given a distanceR
) ifB
overlaps with the circle centered in the center ofA
with the radiusR
. This is only symmetric for points but not for polygons. One important consequence of this is that the results of
SELECT * {
:WhiteHouseGeom geof:nearby ( ?location 10 unit:Kilometer )
}
and
SELECT * {
?location geof:nearby ( :WhiteHouseGeom 10 unit:Kilometer )
}
are not necessarily the same. Furthermore, Stardog can only use its spatial index efficiently for the latter query and not the former. However, if the data only contains points, then Stardog’s optimizer will take advantage of geof:nearby
being symmetric and will use the spatial index for both queries. Stardog automatically tracks which spatial objects are added to the database and maintains the spatial.features
database property to decide how to optimize geof:nearby
calls. Note that if polygons had been added to the data but later removed, one may need to perform db optimize
to refresh the value of spatial.features
.
- Both
geof:within
andgeof:neaby
are binary relations between spatial objects that are represented as RDF resources in the data (that is, IRIs or blank nodes). That is, there are three indexed spatial objects in the following data:
@prefix wgs: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix geo: <http://www.opengis.net/ont/geosparql#> .
@prefix : <http://blog.stardog.com/geons/> .
:nationalZoo a geo:Feature ;
rdfs:label "National Zoo" ;
geo:hasGeometry :zooLocation.
:zooLocation a geo:Geometry ;
geo:asWKT "Point(-77.0576387 38.9248554 )"^^geo:wktLiteral .
:camdenArea a geo:Feature ;
rdfs:label "Area around Camden Yards" ;
geo:hasGeometry :camdenAreaGeo .
:camdenAreaGeo a geo:Geometry ;
geo:asWKT "Polygon(-76.63, -76.61, 39.29, 39.27)"^^geo:wktLiteral .
:lincolnMemorial a :Location ;
rdfs:label "Lincoln Memorial" ;
geow:lat "38.889269"^^xsd:float ;
geow:long "-77.050176"^^xsd:float .
The objects are :zooLocation
, :camdenAreaGeo
, and :lincolnMemorial
. The :zooLocation
object, for example, is not the same as the WKT literal that defines it: those are different RDF terms (one is an IRI and the other a literal). Therefore, only spatial resources (and not literals) can appear in results returned by geof:nearby
and geof:within
operators.
Nonetheless, for convenience reasons both geof:nearby
and geof:within
also accept WKT literals as constant inputs in SPARQL queries. For example,
SELECT * {
?location geof:nearby ( "Point(-77.0576387 38.9248554 )"^^geo:wktLiteral 1 unit:Kilometer )
}
is a valid query that will return the same results as
SELECT * {
?location geof:nearby ( :zooLocation 1 unit:Kilometer )
}
since :zooLocation
is the spatial object that corresponds to the above WKT literal.
The subtle issue here is that one can also use literals for which no spatial resource exists in the data (i.e. they are not indexed). Such queries return well-defined results only if the literals occur directly in geof:nearby
and geof:within
calls. If they occur in other parts of the query, possibly joined with the results of geof:nearby
or geof:within
, the final query results are undefined. The following are some examples of queries with undefined spatial results:
SELECT * {
?location geof:nearby ( ?geo 1 unit:Kilometer )
VALUES ?geo { "Polygon(...)"^^geo:wktLiteral }
}
SELECT * {
BIND("Polygon(...)"^^geo:wktLiteral as ?geo)
?location geof:nearby ( ?geo 1 unit:Kilometer )
}
One way to think about this restriction is that it enables the query optimizer to choose the evaluation order for spatial patterns in all well-defined queries without changing the results. Since querying for arbitrary (including non-indexed) spatial objects is allowed, it wouldn’t be possible to guarantee that, for example,
SELECT * {
?location geof:nearby ( ?geo 1 unit:Kilometer )
VALUES ?geo { "Polygon(...)"^^geo:wktLiteral }
}
is equivalent to
SELECT * {
?location geof:nearby ( "Polygon(...)"^^geo:wktLiteral 1 unit:Kilometer )
}
under the standard SPARQL semantics because ?location geof:nearby ( ?geo 1 unit:Kilometer )
would never return results where ?geo
is bound to a non-indexed spatial object (or any WKT literal for that matter). It is not difficult to imagine more complex queries where the optimization decisions are not obvious and could impact the results in a confusing way, which is something that we try to avoid.