Stardog Geospatial

Learn how to manage geospatial data in Stardog.

Page Contents

Background
Creating Geographical Data
1. Representing single points
  1. WGS latitude and longitude
  2. WKT
2. More complicated shapes
Querying Geographical Data
Known Issues and Limitations

Background

Stardog’s geospatial index is a powerful tool. Many users have augmented their knowledge graphs with spatial data to great success, adding another layer of utility to the enterprise. However, it is often one of the more troublesome features, having the potential to cause a few headaches when getting started. In this post, I intend to provide a detailed primer to help alleviate those headaches.

Stardog supports two geospatial specs: W3C’s WGS 84 and OGC’s GeoSPARQL. In this post I will, for the sake of clarity and readability, combine the two by using GeoSPARQL’s hasGeometry predicate to map locations and areas to all to all nodes of type geo:Geometry. While this is technically unneeded for WGS 84 features, it makes the queries we will be running on the data much easier to follow.

We will be using a DC Landmarks data set. Feel free to load it yourself and play along!

Map of Washington DC

Creating Geographical Data

By default, the spatial index is not enabled when creating a new database. It can be enabled by setting the database configuration option spatial.enabled=true.

Our toy data set has about 10 nodes representing various landmarks in the Washington, DC area. Besides any domain knowledge we wish to attach to these nodes, in order to perform any spatial operations on them we need to associate them with a Geometry entity.

Representing single points

WGS latitude and longitude

For a simple latitude/longitude pair, we have a couple of choices available, the simplest of which is to use WGS 84 to specify them in our Geometry:

@prefix wgs: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix geo: <http://www.opengis.net/ont/geosparql#> .
@prefix : <http://blog.stardog.com/geons/> .

# Create geo:Geometry
:WhiteHouseGeom a geo:Geometry ;
  wgs:lat "38.89761"^^xsd:float ;
  wgs:long "-77.03637"^^xsd:float .

# Link it to our entity
:WhiteHouse a :Location ;
  rdfs:label "The White House" ;
  geo:hasGeometry :WhiteHouseGeom .

WKT

Our second option is to define our point’s Geometry using the OGC’s Well-Known Text (WKT) format. While it’s a fair bit easier to make mistakes this way, representing points with WKT will be more congruous with the rest of our data set, not to mention others’ data sets, as WKT is very widely used.

@prefix wgs: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix geo: <http://www.opengis.net/ont/geosparql#> .
@prefix : <http://blog.stardog.com/geons/> .

# geo:Geometry is still used as the type
# A big tripping point is that WKT points are expressed
# as (LONG LAT), with no comma!
:WashingtonMonGeom a geo:Geometry ;
  geo:asWKT "Point(-77.03525 38.88956)"^^geo:wktLiteral .

# Link this Geometry to an entity in our graph
:WashingtonMon a :Location ;
  rdfs:label "Washington Monument" ;
  geo:hasGeometry :WashingtonMonGeom .

More complicated shapes

For any shape more complex than a latitude/longitude point, WKT is our only option. Lots of shapes are supported; here are some of the ones we most commonly see:

Point(LONG LAT): A single point as described above
- Note the lack of a comma
Linestring(LONG1 LAT1, LONG2 LAT2, ..., LONGN LATN): A line connecting the specified points
- Note commas between each point
Envelope(minLong, maxLong, maxLat, minLat): A rectangle with the specified corners
- Note the commas between each
- Especially note the somewhat odd ordering of (min, max, max, min).

For more complex shapes, Stardog supports JTS. By downloading and enabling this library, you gain access to these shapes, most notably:

Polygon(LONG1 LAT1, LONG2 LAT2, ..., LONGN LATN, LONG1 LAT1): A filled-in shape with the specified points
- Note that a polygon must start and end with the same point, i.e., be closed

Querying Geographical Data

Now that we have inserted Geometries into Stardog’s spatial index, it would be nice to query them spatially. Stardog supports five of the major operators defined by GeoSPARQL. These functions require units of measurement to be passed; we support the QUDT ontology for this, prefixed in our dataset by unit:.

`geof:within`

This will return true when a given Geometry is contained within another. It has a few accepted forms:

<Geometry> geof:within <WKT Literal>: Specifying a WKT Literal for an area
<Geometry> geof:within <Geometry>: Passing in another Geometry
<Geometry> geof:within (LAT1 LONG1 LAT2 LONG2): Specifying Lat/Long of the lower-left and upper-right corner of a box
<Geometry> geof:within (<WKT Literal> <WKT Literal>): Specifying the lower-left and upper-right corners as WKT Points

Imagine we wish to retrieve a list of DC landmarks in our dataset that are in the Arlington, VA area, we can do that a few different ways:

prefix geo: <http://www.opengis.net/ont/geosparql#>
prefix geof: <http://www.opengis.net/def/function/geosparql/>
prefix : <http://blog.stardog.com/geons/>

# All of these SPARQL queries are equivalent
# Pay special attention to the various ways the lat/long pairs are ordered

SELECT ?geom ?feature {
  ?f a :Location ;
    rdfs:label ?feature ;
    geo:hasGeometry ?geom .
  ?geom geof:within "ENVELOPE(-77.111, -77.052, 38.885, 38.855)"^^geo:wktLiteral .
}

SELECT ?geom ?feature {
  ?f a :Location ;
    rdfs:label ?feature ;
    geo:hasGeometry ?geom .
  # We define :ArlingtonGeom elsewhere in our data set as the envelope from above
  ?geom geof:within :ArlingtonGeom ;
}

SELECT ?geom ?feature {
  ?f a :Location ;
    rdfs:label ?feature ;
    geo:hasGeometry ?geom .
  ?geom geof:within (38.855 -77.111 38.885 -77.052) ;
}

SELECT ?geom ?feature {
  ?f a :Location ;
    rdfs:label ?feature ;
    geo:hasGeometry ?geom .
  ?geom geof:within ("POINT(-77.111 38.855)"^^geo:wktLiteral "POINT(-77.052 38.885)"^^geo:wktLiteral) .
}

prefix geo: <http://www.opengis.net/ont/geosparql#>
prefix geof: <http://www.opengis.net/def/function/geosparql/>
prefix : <http://blog.stardog.com/geons/>

SELECT ?geom ?feature {
  ?f a :Location ;
    rdfs:label ?feature ;
    geo:hasGeometry ?geom .
  ?geom geof:within ("POINT(-77.111 38.855)"^^geo:wktLiteral "POINT(-77.052 38.885)"^^geo:wktLiteral) .
}

geom feature

http://blog.stardog.com/geons/PentagonGeom “The Pentagon”

http://blog.stardog.com/geons/TombOfUnknownGeom “Tomb of the Unknown Soldier”

We can also use geof:within as a filter by passing in our Geometry as the first argument and then using any of the accepted sets of paramters.

prefix geo: <http://www.opengis.net/ont/geosparql#>
prefix geof: <http://www.opengis.net/def/function/geosparql/>
prefix : <http://blog.stardog.com/geons/>

SELECT ?geom ?feature {
    ?f a :Location;
       rdfs:label ?feature;
       geo:hasGeometry ?geom .
    # We've expanded the box here to cover the entire DC metro area
    FILTER(geof:within(?geom, 38, -77.2, 39, -77.0))
}

geom	feature
http://blog.stardog.com/geons/PentagonGeom	“The Pentagon”
http://blog.stardog.com/geons/TombOfUnknownGeom	“Tomb of the Unknown Soldier”
http://blog.stardog.com/geons/JeffMemGeom	“Jefferson Memorial”
http://blog.stardog.com/geons/LincolnMemGeom	“Lincoln Memorial”
http://blog.stardog.com/geons/WashingtonMonGeom	“Washington Monument”
http://blog.stardog.com/geons/CapitolGeom	“US Capitol Building”
http://blog.stardog.com/geons/NatlMallGeom	“National Mall”
http://blog.stardog.com/geons/NASAHQGeom	“NASA Headquarters”
http://blog.stardog.com/geons/VietnamMemGeom	“Vietnam Veterans’ Memorial”
http://blog.stardog.com/geons/WhiteHouseGeom	“The White House”

`geof:nearby`

This will return all Geometries that are within a specified radius of a given point. It has two forms:

<Geometry> geof:nearby (<Geometry> <Number of units> <Unit>)
<Geometry> geof:nearby (LAT LONG <Number of units> <Unit>)

prefix geo: <http://www.opengis.net/ont/geosparql#>
prefix geof: <http://www.opengis.net/def/function/geosparql/>
prefix : <http://blog.stardog.com/geons/>
prefix unit: <http://qudt.org/vocab/unit#>

# Get all features within 2km of the Kennedy Center
SELECT ?geom ?feature {
  ?f a :Location ;
    rdfs:label ?feature ;
    geo:hasGeometry ?geom .
  ?geom geof:nearby (38.896004 -77.054995 2 unit:Kilometer) ;
}

geom	feature
http://blog.stardog.com/geons/LincolnMemGeom	“Lincoln Memorial”
http://blog.stardog.com/geons/WashingtonMonGeom	“Washington Monument”
http://blog.stardog.com/geons/VietnamMemGeom	“Vietnam Veterans’ Memorial”
http://blog.stardog.com/geons/WhiteHouseGeom	“The White House”

`geof:area`

This returns the area of a given Geometry in the specified unit. It can be used either to bind a variable or as part of a filter.

geof:area(<Geometry|WKT Literal>, <Unit>)

prefix geo: <http://www.opengis.net/ont/geosparql#>
prefix geof: <http://www.opengis.net/def/function/geosparql/>
prefix unit: <http://qudt.org/vocab/unit#>
prefix : <http://blog.stardog.com/geons/>

# Retrieve the area in km^2 of each shape in our dataset
SELECT ?feature ?area {
  ?f a :Area ;
    rdfs:label ?feature ;
    geo:hasGeometry ?geom .
  BIND(geof:area(?geom, unit:Kilometer) as ?area)
}

feature area

“Arlington, VA” 1.7038960485445116E1

“DC Metro Area” 5.77753702962143E2

feature	area
“Arlington, VA”	1.7038960485445116E1
“DC Metro Area”	5.77753702962143E2

prefix geo: <http://www.opengis.net/ont/geosparql#>
prefix geof: <http://www.opengis.net/def/function/geosparql/>
prefix unit: <http://qudt.org/vocab/unit#>
prefix : <http://blog.stardog.com/geons/>

# Retrieve the shapes in our dataset that are bigger than 100 km^2
SELECT ?feature {
  ?f a :Area ;
    rdfs:label ?feature ;
    geo:hasGeometry ?geom .
  FILTER(geof:area(?geom, unit:Kilometer) > 100)
}

feature

“DC Metro Area”

feature
“DC Metro Area”

`geof:distance`

This returns the distance between two spatial objects in the specified unit. It can also be used as a variable binding or as a filter.

geof:distance(<Geometry|WKT Literal>, <Geometry|WKT Literal>, <Unit>)

prefix geo: <http://www.opengis.net/ont/geosparql#>
prefix geof: <http://www.opengis.net/def/function/geosparql/>
prefix unit: <http://qudt.org/vocab/unit#>
prefix : <http://blog.stardog.com/geons/>

# Retrieve each feature and its distance in Yards from the White House
SELECT ?feature ?distance {
  ?f a :Location ;
    rdfs:label ?feature ;
    geo:hasGeometry ?geom .
  BIND(geof:distance(?geom, :WhiteHouseGeom, unit:Yard) as ?distance)
}
ORDER BY DESC(?distance)

feature	distance
“Tomb of the Unknown Soldier”	4.27410351039E3
“The Pentagon”	3.70238119121E3
“US Capitol Building”	2.76937468038E3
“NASA Headquarters”	2.60127942558E3
“Jefferson Memorial”	2.01863079957E3
“Lincoln Memorial”	1.63871432741E3
“National Mall”	1.59761525629E3
“Vietnam Veterans’ Memorial”	1.28566137839E3
“Washington Monument”	9.8463529491E2
“The White House”	0.0E0

prefix geo: <http://www.opengis.net/ont/geosparql#>
prefix geof: <http://www.opengis.net/def/function/geosparql/>
prefix unit: <http://qudt.org/vocab/unit#>
prefix : <http://blog.stardog.com/geons/>

# Retrieve the features in our dataset that are at least 2 miles from the White House
SELECT ?feature  {
  ?f a :Location ;
    rdfs:label ?feature ;
    geo:hasGeometry ?geom .
  FILTER(geof:distance(?geom, :WhiteHouseGeom, unit:MileUSStatute) > 2)
}

feature

“The Pentagon”

“Tomb of the Unknown Soldier”

feature
“The Pentagon”
“Tomb of the Unknown Soldier”

`geof:relate`

This returns the relationship between two Geometries. Possible results are geo:contains, geo:within, geo:intersects, geo:equals, geo:disjoint.

This function has slightly different forms, depending on if you’re using it as a BGP or a filter:

?relation geof:relate (<Geometry> <Geometry>)
FILTER(geof:relate(<Geometry>, <Geometry>, <desired result>))

prefix geo: <http://www.opengis.net/ont/geosparql#>
prefix geof: <http://www.opengis.net/def/function/geosparql/>
prefix unit: <http://qudt.org/vocab/unit#>
prefix : <http://blog.stardog.com/geons/>

# Retrieve each area and its relation to the others
SELECT ?feature1 ?feature2 ?rel {
  ?f a :Area ;
    rdfs:label ?feature1 ;
    geo:hasGeometry ?geom1 .
  ?f2 a :Area ;
    rdfs:label ?feature2 ;
    geo:hasGeometry ?geom2 .
  ?rel geof:relate (?geom1 ?geom2) .
}

feature1	feature2	rel
“Arlington, VA”	“DC Metro Area”	http://www.opengis.net/ont/geosparql#within
“Arlington, VA”	“Arlington, VA”	http://www.opengis.net/ont/geosparql#equals
“DC Metro Area”	“Arlington, VA”	http://www.opengis.net/ont/geosparql#contains
“DC Metro Area”	“DC Metro Area”	http://www.opengis.net/ont/geosparql#equals

prefix geo: <http://www.opengis.net/ont/geosparql#>
prefix geof: <http://www.opengis.net/def/function/geosparql/>
prefix unit: <http://qudt.org/vocab/unit#>
prefix : <http://blog.stardog.com/geons/>

# Retrieve the areas in our dataset where one contains the other
SELECT ?feature1 ?feature2 {
  ?f a :Area ;
    rdfs:label ?feature1 ;
    geo:hasGeometry ?geom1 .
  ?f2 a :Area ;
    rdfs:label ?feature2 ;
    geo:hasGeometry ?geom2 .
  FILTER(geof:relate(?geom1, ?geom2, geo:contains))
}

feature1 feature2

“DC Metro Area” “Arlington, VA”

feature1	feature2
“DC Metro Area”	“Arlington, VA”

The `SERVICE` syntax

All spatial operators listed above are internally supported through the SERVICE extension point in Stardog (see the Query Stardog chapter for the list of supported services). In most cases, the BGP syntax for spatial operators should suffice, but it’s also possible to use the SERVICE form explicitly (see below). The namespace prefix for all spatial service IRIs is defined as prefix geo_srv: <tag:stardog:api:geo:>.

geof:nearby

     SERVICE geo_srv:nearby {
        [] geo:result ?shape ; # the output variable
           geo:inputs (input radius unit) # or
           geo:inputs (lat lon radius unit)
     }

input, lat, and lon can be variables or constants. radius and unit must be constants.

geof:within

     SERVICE geo_srv:within {
         [] geo_srv:result ?result ; 
            geo_srv:input (input1 input2) # or
            geo_srv:input (input1 lower_left upper_right) # or
            geo_srv:input (input1 lower_left_lat lower_left_lon upper_right_lat upper_right_top_lon)
     }

as of Stardog 11, the ?result variable is always set to true. That is, it is not possible to query for shapes that are not within a given shape. The service does not return results when both inputs are constants and the first is not within the second. input1 and input2 can be variables or constants. lower_left and upper_right could be variables or WKT literals defining points. lower_left_lat, lower_left_lon, upper_right_lat, and upper_right_top_lon should be valid coordinates of the bounding box points.

geof:distance

      SERVICE geo_srv:distance {
          [] geo_srv:result ?distance ;
             geo_srv:input (input1 input2 unit)
      }

input1 and input2 can be variables or constants, unit must be a valid constant.

geof:area

      SERVICE geo_srv:area {
          [] geo_srv:result ?area ;
             geo_srv:input (input unit)
      }

geof:relate

      SERVICE geo_srv:relate {
          [] geo_srv:result ?relation ;
             geo_srv:input (input1 input2)
      }

input1 and input2 can be variables or constants.

Known Issues and Limitations

The spatial operators in Stardog 11 exhibit the following properties, which sometimes affect query results:

geof:distance measures the distance between the center points of two objects. For points, the center is naturally the point itself. However, for polygons, the distance in Stardog is different from the distance between the two nearest points of the polygons (which is how GeoSPARQL defines it).
geof:nearby is not a symmetric relation: for spatial objects A and B, B is nearby A (given a distance R) if B overlaps with the circle centered in the center of A with the radius R. This is only symmetric for points but not for polygons. One important consequence of this is that the results of

SELECT * {
  :WhiteHouseGeom geof:nearby ( ?location 10 unit:Kilometer )
}

and

SELECT * {
  ?location geof:nearby ( :WhiteHouseGeom 10 unit:Kilometer )
}

are not necessarily the same. Furthermore, Stardog can only use its spatial index efficiently for the latter query and not the former. However, if the data only contains points, then Stardog’s optimizer will take advantage of geof:nearby being symmetric and will use the spatial index for both queries. Stardog automatically tracks which spatial objects are added to the database and maintains the spatial.features database property to decide how to optimize geof:nearby calls. Note that if polygons had been added to the data but later removed, one may need to perform db optimize to refresh the value of spatial.features.

Both geof:within and geof:neaby are binary relations between spatial objects that are represented as RDF resources in the data (that is, IRIs or blank nodes). That is, there are three indexed spatial objects in the following data:

@prefix wgs: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix geo: <http://www.opengis.net/ont/geosparql#> .
@prefix : <http://blog.stardog.com/geons/> .

:nationalZoo a geo:Feature ;
    rdfs:label "National Zoo" ;
    geo:hasGeometry :zooLocation.

:zooLocation a geo:Geometry ;
    geo:asWKT "Point(-77.0576387 38.9248554 )"^^geo:wktLiteral .
    
:camdenArea a geo:Feature ;
    rdfs:label "Area around Camden Yards" ;
    geo:hasGeometry :camdenAreaGeo .

:camdenAreaGeo a geo:Geometry ;
    geo:asWKT "Polygon(-76.63, -76.61, 39.29, 39.27)"^^geo:wktLiteral .    

:lincolnMemorial a :Location ;
    rdfs:label "Lincoln Memorial" ;
    geow:lat "38.889269"^^xsd:float ;
    geow:long "-77.050176"^^xsd:float .

The objects are :zooLocation, :camdenAreaGeo, and :lincolnMemorial. The :zooLocation object, for example, is not the same as the WKT literal that defines it: those are different RDF terms (one is an IRI and the other a literal). Therefore, only spatial resources (and not literals) can appear in results returned by geof:nearby and geof:within operators.

Nonetheless, for convenience reasons both geof:nearby and geof:within also accept WKT literals as constant inputs in SPARQL queries. For example,

SELECT * {
  ?location geof:nearby ( "Point(-77.0576387 38.9248554 )"^^geo:wktLiteral 1 unit:Kilometer ) 
}

is a valid query that will return the same results as

SELECT * {
  ?location geof:nearby ( :zooLocation 1 unit:Kilometer ) 
}

since :zooLocation is the spatial object that corresponds to the above WKT literal.

The subtle issue here is that one can also use literals for which no spatial resource exists in the data (i.e. they are not indexed). Such queries return well-defined results only if the literals occur directly in geof:nearby and geof:within calls. If they occur in other parts of the query, possibly joined with the results of geof:nearby or geof:within, the final query results are undefined. The following are some examples of queries with undefined spatial results:

SELECT * {
  ?location geof:nearby ( ?geo 1 unit:Kilometer )
  VALUES ?geo { "Polygon(...)"^^geo:wktLiteral } 
}

SELECT * {
  BIND("Polygon(...)"^^geo:wktLiteral as ?geo)
  ?location geof:nearby ( ?geo 1 unit:Kilometer ) 
}

One way to think about this restriction is that it enables the query optimizer to choose the evaluation order for spatial patterns in all well-defined queries without changing the results. Since querying for arbitrary (including non-indexed) spatial objects is allowed, it wouldn’t be possible to guarantee that, for example,

SELECT * {
  ?location geof:nearby ( ?geo 1 unit:Kilometer )
  VALUES ?geo { "Polygon(...)"^^geo:wktLiteral } 
}

is equivalent to

SELECT * {
  ?location geof:nearby ( "Polygon(...)"^^geo:wktLiteral 1 unit:Kilometer )
}

under the standard SPARQL semantics because ?location geof:nearby ( ?geo 1 unit:Kilometer ) would never return results where ?geo is bound to a non-indexed spatial object (or any WKT literal for that matter). It is not difficult to imagine more complex queries where the optimization decisions are not obvious and could impact the results in a confusing way, which is something that we try to avoid.

Background
Creating Geographical Data
- Representing single points
- More complicated shapes
Querying Geographical Data
Known Issues and Limitations