Entity Extraction and Linking

Learn how to use entity extraction and linking in the Knowledge Graph.

Page Contents

Entity Recognition and Linking
Finding Celebrities in Movie News Articles
Preprocessing
Extracting Entities
Entity Linking

Stardog now lets you extract named entity mentions from text documents and link those mentions to existing entities in a knowledge graph. This tutorial covers the basics.

Entity Recognition and Linking

Named entity recognition is one of the most well-known NLP tasks. The main idea is simple: given some text, can we locate which words identify entities of certain categories? For example:

Stardog is the world’s leading Knowledge Graph platform for the enterprise.

An entity recognizer might notice that Stardog and Knowledge Graph are entities in some knowlege base.

There is an extensive body of research in this area, and most NLP libraries implement some kind of technique to identify named entities of the most common categories (e.g., person, organization, etc).

When you have a knowledge graph, with its rich structure and detailed information, simply extracting named entity mentions falls short of what could possibly be done. Why can’t we go a step further and assert that those mentions refer to actual entities in the knowledge graph? This task is commonly called entity linking, and Stardog supports a simple but effective pipeline to perform entity linking from any kind of text.

Stardog is the world’s leading Knowledge Graph platform for the enterprise.

With entity linking in Stardog, all mentions or occurrence of some entity are linked to the knowlege base item (a node or edge) they represent.

In this blog post, we will show you how to use this new capability.

Finding Celebrities in Movie News Articles

Surprise! You now have access to a knowledge graph about movies.

t:tt1454468 a :Movie ;
    rdfs:label "Gravity" ;
    :description "Two astronauts work together to survive after an accident which leaves them alone in space." ;
    :actor n:nm0000123, n:nm0000113 , n:nm0000438 , n:nm1241511 ;
    :director n:nm0190859 ;
    :author n:nm0190859 , n:nm0190861 ;
    :genre "Sci-Fi" , "Adventure" ;
	:copyrightYear 2013 .

n:nm0000123 a :Person ;
    rdfs:label "George Clooney" .

There are many amazing things you could do with this data. I personally like the idea of being able to find which celebrities are being talked about in all the juicy news articles about upcoming TV shows.

A drama titled Watergate is being developed by George Clooney and Bridge of Spies writer Matt Charman. Clooney’s Smokehouse Pictures will produce the eight-episode limited series, with the film star and his partner Grant Heslov serving as executive producers.

Let’s find out how to do this with Stardog.

Preprocessing

Named entity recognition in Stardog is based on OpenNLP, a well-known NLP library. As a configuration option, we need to tell Stardog which category of entities we want to extract.

OpenNLP provides several basic models for different languages. In this case, we are interested in finding people’s name in English language documents. So we download en-ner-person.bin to a folder. Two extra models are always required: a sentence detector and a tokenizer. In this case, we will also download en-sent.bin and en-token.bin to the same place.

Next we need to tell Stardog where this folder of stuff is located. This is done through a database configuration option, docs.opennlp.models.path, which can be set during database creation.

stardog-admin db create -o search.enabled=true docs.opennlp.models.path=/path/to/folder -n movies person_movie.ttl

And that’s it! No extra configuration is required.

Extracting Entities

As an introduction, let’s simply extract named entity mentions, without actually linking them to the knowledge graph. This can be done by setting the RDF extractor to entities, giving the text content of the news article as an argument.

The document is added to the database, and the extracted entities can be queried with SPARQL.

select ?mention where {
  graph <tag:stardog:api:docs:movies:article.txt> {
    ?s rdfs:label ?mention .
  }
}

+------------------+
|     mention      |
+------------------+
| "Matt Charman"   |
| "Grant Heslov"   |
| "George Clooney" |
+------------------+

Entity Linking

By setting the RDF extractor to linker, entities are not only extracted but also, whenever possible, automatically linked to entities in the knowledge graph.

stardog doc put movies -r linker article.txt

+------------------|--------------------------------------+
|     mention      |                entity                |
+------------------|--------------------------------------+
| "George Clooney" | <http://www.imdb.com/name/nm0000123> |
| "Matt Charman"   | <http://www.imdb.com/name/nm4131020> |
| "Grant Heslov"   | <http://www.imdb.com/name/nm0381416> |
+------------------|--------------------------------------+

All three named entity mentions were found to be present in the knowledge graph. This assumption is made by heuristically matching the mention with the expected string representation of a resource. For this, Stardog will look at the similarity of the mention to things such as label properties (e.g., rdfs:label, foaf:name) and an IRI’s local name.

Entity Recognition and Linking
Finding Celebrities in Movie News Articles
Preprocessing
Extracting Entities
Entity Linking