Stardog Geospatial
Learn how to manage geospatial data in Stardog.
Page Contents
Background
Stardog’s geospatial index is a powerful tool. Many users have augmented their knowledge graphs with spatial data to great success, adding another layer of utility to the enterprise. However, it is often one of the more troublesome features, having the potential to cause a few headaches when getting started. In this post, I intend to provide a detailed primer to help alleviate those headaches.
Stardog supports two geospatial specs: W3C’s WGS 84 and OGC’s GeoSPARQL. In this post I will, for the sake of clarity and readability, combine the two by using GeoSPARQL’s hasGeometry
predicate to map locations and areas to all to all nodes of type geo:Geometry
. While this is technically unneeded for WGS 84 features, it makes the queries we will be running on the data much easier to follow.
We will be using a DC Landmarks data set. Feel free to load it yourself and play along!
Creating Geographical Data
By default, the spatial index is not enabled when creating a new database. It can be enabled by setting the database configuration option spatial.enabled=true
.
Our toy data set has about 10 nodes representing various landmarks in the Washington, DC area. Besides any domain knowledge we wish to attach to these nodes, in order to perform any spatial operations on them we need to associate them with a Geometry entity.
Representing single points
WGS latitude and longitude
For a simple latitude/longitude pair, we have a couple of choices available, the simplest of which is to use WGS 84 to specify them in our Geometry:
WKT
Our second option is to define our point’s Geometry using the OGC’s Well-Known Text (WKT) format. While it’s a fair bit easier to make mistakes this way, representing points with WKT will be more congruous with the rest of our data set, not to mention others’ data sets, as WKT is very widely used.
More complicated shapes
For any shape more complex than a latitude/longitude point, WKT is our only option. Lots of shapes are supported; here are some of the ones we most commonly see:
Point(LONG LAT)
: A single point as described above- Note the lack of a comma
Linestring(LONG1 LAT1, LONG2 LAT2, ..., LONGN LATN)
: A line connecting the specified points- Note commas between each point
Envelope(minLong, maxLong, maxLat, minLat)
: A rectangle with the specified corners- Note the commas between each
- Especially note the somewhat odd ordering of (min, max, max, min).
For more complex shapes, Stardog supports JTS. By downloading and enabling this library, you gain access to these shapes, most notably:
Polygon(LONG1 LAT1, LONG2 LAT2, ..., LONGN LATN, LONG1 LAT1)
: A filled-in shape with the specified points- Note that a polygon must start and end with the same point, i.e., be closed
Querying Geographical Data
Now that we have inserted Geometries into Stardog’s spatial index, it would be nice to query them spatially. Stardog supports five of the major operators defined by GeoSPARQL. These functions require units of measurement to be passed; we support the QUDT ontology for this, prefixed in our dataset by unit:
.
geof:within
This will return true when a given Geometry is contained within another. It has a few accepted forms:
<Geometry> geof:within <WKT Literal>
: Specifying a WKT Literal for an area<Geometry> geof:within <Geometry>
: Passing in another Geometry<Geometry> geof:within (LAT1 LONG1 LAT2 LONG2)
: Specifying Lat/Long of the lower-left and upper-right corner of a box<Geometry> geof:within (<WKT Literal> <WKT Literal>)
: Specifying the lower-left and upper-right corners as WKT Points
Imagine we wish to retrieve a list of DC landmarks in our dataset that are in the Arlington, VA area, we can do that a few different ways:
-
geom feature http://blog.stardog.com/geons/PentagonGeom “The Pentagon” http://blog.stardog.com/geons/TombOfUnknownGeom “Tomb of the Unknown Soldier”
We can also use geof:within
as a filter by passing in our Geometry as the first argument and then using any of the accepted sets of paramters.
-
geom feature http://blog.stardog.com/geons/PentagonGeom “The Pentagon” http://blog.stardog.com/geons/TombOfUnknownGeom “Tomb of the Unknown Soldier” http://blog.stardog.com/geons/JeffMemGeom “Jefferson Memorial” http://blog.stardog.com/geons/LincolnMemGeom “Lincoln Memorial” http://blog.stardog.com/geons/WashingtonMonGeom “Washington Monument” http://blog.stardog.com/geons/CapitolGeom “US Capitol Building” http://blog.stardog.com/geons/NatlMallGeom “National Mall” http://blog.stardog.com/geons/NASAHQGeom “NASA Headquarters” http://blog.stardog.com/geons/VietnamMemGeom “Vietnam Veterans’ Memorial” http://blog.stardog.com/geons/WhiteHouseGeom “The White House”
geof:nearby
This will return all Geometries that are within a specified radius of a given point. It has two forms:
<Geometry> geof:nearby (<Geometry> <Number of units> <Unit>)
<Geometry> geof:nearby (LAT LONG <Number of units> <Unit>)
-
geom feature http://blog.stardog.com/geons/LincolnMemGeom “Lincoln Memorial” http://blog.stardog.com/geons/WashingtonMonGeom “Washington Monument” http://blog.stardog.com/geons/VietnamMemGeom “Vietnam Veterans’ Memorial” http://blog.stardog.com/geons/WhiteHouseGeom “The White House”
geof:area
This returns the area of a given Geometry in the specified unit. It can be used either to bind a variable or as part of a filter.
geof:area(<Geometry|WKT Literal>, <Unit>)
geof:distance
This returns the distance between two spatial objects in the specified unit. It can also be used as a variable binding or as a filter.
geof:distance(<Geometry|WKT Literal>, <Geometry|WKT Literal>, <Unit>)
-
feature distance “Tomb of the Unknown Soldier” 4.27410351039E3 “The Pentagon” 3.70238119121E3 “US Capitol Building” 2.76937468038E3 “NASA Headquarters” 2.60127942558E3 “Jefferson Memorial” 2.01863079957E3 “Lincoln Memorial” 1.63871432741E3 “National Mall” 1.59761525629E3 “Vietnam Veterans’ Memorial” 1.28566137839E3 “Washington Monument” 9.8463529491E2 “The White House” 0.0E0
geof:relate
This returns the relationship between two Geometries. Possible results are geo:contains
, geo:within
, geo:intersects
, geo:equals
, geo:disjoint
.
This function has slightly different forms, depending on if you’re using it as a BGP or a filter:
?relation geof:relate (<Geometry> <Geometry>)
FILTER(geof:relate(<Geometry>, <Geometry>, <desired result>))
-
feature1 feature2 rel “Arlington, VA” “DC Metro Area” http://www.opengis.net/ont/geosparql#within “Arlington, VA” “Arlington, VA” http://www.opengis.net/ont/geosparql#equals “DC Metro Area” “Arlington, VA” http://www.opengis.net/ont/geosparql#contains “DC Metro Area” “DC Metro Area” http://www.opengis.net/ont/geosparql#equals
The SERVICE
syntax
All spatial operators listed above are internally supported through the SERVICE
extension point in Stardog (see the Query Stardog chapter for the list of supported services). In most cases, the BGP syntax for spatial operators should suffice, but it’s also possible to use the SERVICE
form explicitly (see below). The namespace prefix for all spatial service IRIs is defined as prefix geo_srv: <tag:stardog:api:geo:>
.
geof:nearby
input
, lat
, and lon
can be variables or constants. radius
and unit
must be constants.
geof:within
as of Stardog 11, the ?result
variable is always set to true
. That is, it is not possible to query for shapes that are not within a given shape. The service does not return results when both inputs are constants and the first is not within the second. input1
and input2
can be variables or constants. lower_left
and upper_right
could be variables or WKT literals defining points. lower_left_lat
, lower_left_lon
, upper_right_lat
, and upper_right_top_lon
should be valid coordinates of the bounding box points.
geof:distance
input1
and input2
can be variables or constants, unit
must be a valid constant.
geof:area
geof:relate
input1
and input2
can be variables or constants.
Known Issues and Limitations
The spatial operators in Stardog 11 exhibit the following properties, which sometimes affect query results:
geof:distance
measures the distance between the center points of two objects. For points, the center is naturally the point itself. However, for polygons, the distance in Stardog is different from the distance between the two nearest points of the polygons (which is how GeoSPARQL defines it).geof:nearby
is not a symmetric relation: for spatial objectsA
andB
,B
is nearbyA
(given a distanceR
) ifB
overlaps with the circle centered in the center ofA
with the radiusR
. This is only symmetric for points but not for polygons. One important consequence of this is that the results of
and
are not necessarily the same. Furthermore, Stardog can only use its spatial index efficiently for the latter query and not the former. However, if the data only contains points, then Stardog’s optimizer will take advantage of geof:nearby
being symmetric and will use the spatial index for both queries. Stardog automatically tracks which spatial objects are added to the database and maintains the spatial.features
database property to decide how to optimize geof:nearby
calls. Note that if polygons had been added to the data but later removed, one may need to perform db optimize
to refresh the value of spatial.features
.
- Both
geof:within
andgeof:neaby
are binary relations between spatial objects that are represented as RDF resources in the data (that is, IRIs or blank nodes). That is, there are three indexed spatial objects in the following data:
The objects are :zooLocation
, :camdenAreaGeo
, and :lincolnMemorial
. The :zooLocation
object, for example, is not the same as the WKT literal that defines it: those are different RDF terms (one is an IRI and the other a literal). Therefore, only spatial resources (and not literals) can appear in results returned by geof:nearby
and geof:within
operators.
Nonetheless, for convenience reasons both geof:nearby
and geof:within
also accept WKT literals as constant inputs in SPARQL queries. For example,
is a valid query that will return the same results as
since :zooLocation
is the spatial object that corresponds to the above WKT literal.
The subtle issue here is that one can also use literals for which no spatial resource exists in the data (i.e. they are not indexed). Such queries return well-defined results only if the literals occur directly in geof:nearby
and geof:within
calls. If they occur in other parts of the query, possibly joined with the results of geof:nearby
or geof:within
, the final query results are undefined. The following are some examples of queries with undefined spatial results:
One way to think about this restriction is that it enables the query optimizer to choose the evaluation order for spatial patterns in all well-defined queries without changing the results. Since querying for arbitrary (including non-indexed) spatial objects is allowed, it wouldn’t be possible to guarantee that, for example,
is equivalent to
under the standard SPARQL semantics because ?location geof:nearby ( ?geo 1 unit:Kilometer )
would never return results where ?geo
is bound to a non-indexed spatial object (or any WKT literal for that matter). It is not difficult to imagine more complex queries where the optimization decisions are not obvious and could impact the results in a confusing way, which is something that we try to avoid.