OpenNLPDocumentParser (Stardog-7.8.3 API)

java.lang.Object
- com.complexible.stardog.docs.nlp.impl.OpenNLPDocumentParser

All Implemented Interfaces:

DocumentParser, Function<String,Document>
```
public class OpenNLPDocumentParser
extends Object
implements DocumentParser
```
DocumentParser based on OpenNLP

Since:

5.2

Version:

5.2

Author:

Pedro Oliveira

Constructor Summary

Constructors
Constructor and Description
`OpenNLPDocumentParser(opennlp.tools.sentdetect.SentenceDetector theSentenceDetector, opennlp.tools.tokenize.Tokenizer theTokenizer)`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`void`	`add(opennlp.tools.namefind.TokenNameFinder theNameFinder)`
`Document`	`apply(String theText)` `NameFinderME`s are not thread-safe, and share internal state between calls.
`static opennlp.tools.chunker.ChunkerME`	`chunker(File theModel)`
`static void`	`clearSharedModels()` Allow shared models to be GC'd as they potentially have a large memory footprint
`static opennlp.tools.lemmatizer.DictionaryLemmatizer`	`dictionaryLemmatizer(File theDictionary)`
`static opennlp.tools.namefind.DictionaryNameFinder`	`dictNameFinder(File theModel, String theType)`
`static OpenNLPDocumentParser`	`getDefault(Connection theConnection)` Lazily load `OpenNLPDocumentParser` models from the given the database configurations
`static opennlp.tools.lemmatizer.LemmatizerME`	`lemmatizer(File theModel)`
`static OpenNLPDocumentParser`	`loadFrom(File theDirectory)` Loads OpenNLP models, in their default name formats, from the given directory.
`static opennlp.tools.namefind.NameFinderME`	`nameFinder(File theModel)`
`static opennlp.tools.postag.POSTaggerME`	`posTagger(File theModel)`
`static opennlp.tools.sentdetect.SentenceDetectorME`	`sentenceDetector(File theModel)`
`void`	`set(opennlp.tools.chunker.Chunker theChunker)`
`void`	`set(opennlp.tools.lemmatizer.Lemmatizer theLemmatizer)`
`void`	`set(opennlp.tools.postag.POSTagger thePOSTagger)`
`static opennlp.tools.tokenize.TokenizerME`	`tokenizer(File theModel)`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface com.complexible.stardog.docs.nlp.DocumentParser
apply

Methods inherited from interface java.util.function.Function
andThen, compose, identity

Constructor Detail

OpenNLPDocumentParser

public OpenNLPDocumentParser(opennlp.tools.sentdetect.SentenceDetector theSentenceDetector,
                             opennlp.tools.tokenize.Tokenizer theTokenizer)

Method Detail

add

public void add(opennlp.tools.namefind.TokenNameFinder theNameFinder)

set

public void set(opennlp.tools.postag.POSTagger thePOSTagger)

set

public void set(opennlp.tools.lemmatizer.Lemmatizer theLemmatizer)

set

public void set(opennlp.tools.chunker.Chunker theChunker)

apply
```
public Document apply(String theText)
```
NameFinderMEs are not thread-safe, and share internal state between calls. They can only be safely used from another thread after clearAdaptiveData is called. Due to the method being basically a loop, it's safer and easier just to make it synchronized as a whole. Other option would be to cache the TokenNameFinderModel, which is thread safe, and create a new NameFinderME in each call, but those objects are heavy and create a lot of other complex objects.

Specified by:

apply in interface Function<String,Document>

clearSharedModels
```
public static void clearSharedModels()
```
Allow shared models to be GC'd as they potentially have a large memory footprint

getDefault

public static OpenNLPDocumentParser getDefault(Connection theConnection)
                                        throws IOException

Lazily load OpenNLPDocumentParser models from the given the database configurations

Throws:: IOException

loadFrom
```
public static OpenNLPDocumentParser loadFrom(File theDirectory)
                                      throws IOException
```
Loads OpenNLP models, in their default name formats, from the given directory. E.g., folder with files ['en-sent.bin', 'en-token.bin', 'en-ner-organization.bin', 'en-ner-person.bin']

Throws:

IOException

sentenceDetector

public static opennlp.tools.sentdetect.SentenceDetectorME sentenceDetector(File theModel)
                                                                    throws IOException

Throws:: IOException

tokenizer

public static opennlp.tools.tokenize.TokenizerME tokenizer(File theModel)
                                                    throws IOException

Throws:: IOException

nameFinder

public static opennlp.tools.namefind.NameFinderME nameFinder(File theModel)
                                                      throws IOException

Throws:: IOException

dictNameFinder

public static opennlp.tools.namefind.DictionaryNameFinder dictNameFinder(File theModel,
                                                                         String theType)
                                                                  throws IOException

Throws:: IOException

posTagger

public static opennlp.tools.postag.POSTaggerME posTagger(File theModel)
                                                  throws IOException

Throws:: IOException

chunker

public static opennlp.tools.chunker.ChunkerME chunker(File theModel)
                                               throws IOException

Throws:: IOException

lemmatizer

public static opennlp.tools.lemmatizer.LemmatizerME lemmatizer(File theModel)
                                                        throws IOException

Throws:: IOException

dictionaryLemmatizer

public static opennlp.tools.lemmatizer.DictionaryLemmatizer dictionaryLemmatizer(File theDictionary)
                                                                          throws IOException

Throws:: IOException

Class OpenNLPDocumentParser

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface com.complexible.stardog.docs.nlp.DocumentParser

Methods inherited from interface java.util.function.Function

Constructor Detail

OpenNLPDocumentParser

Method Detail

add

set

set

set

apply

clearSharedModels

getDefault

loadFrom

sentenceDetector

tokenizer

nameFinder

dictNameFinder

posTagger

chunker

lemmatizer

dictionaryLemmatizer