java.lang.Object
- com.complexible.stardog.docs.nlp.impl.OpenNLPDocumentParser

All Implemented Interfaces:

DocumentParser, java.util.function.Function<java.lang.String,Document>
```
public class OpenNLPDocumentParser
extends java.lang.Object
implements DocumentParser
```
DocumentParser based on OpenNLP

Since:

5.2

Version:

5.2

Author:

Pedro Oliveira

Constructor Summary

Constructors
Constructor	Description
`OpenNLPDocumentParser(opennlp.tools.sentdetect.SentenceDetector theSentenceDetector, opennlp.tools.tokenize.Tokenizer theTokenizer)`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`void`	`add(opennlp.tools.namefind.TokenNameFinder theNameFinder)`
`Document`	`apply(java.lang.String theText)`	`NameFinderME`s are not thread-safe, and share internal state between calls.
`static opennlp.tools.chunker.ChunkerME`	`chunker(java.io.File theModel)`
`static void`	`clearSharedModels()`	Allow shared models to be GC'd as they potentially have a large memory footprint
`static opennlp.tools.lemmatizer.DictionaryLemmatizer`	`dictionaryLemmatizer(java.io.File theDictionary)`
`static opennlp.tools.namefind.DictionaryNameFinder`	`dictNameFinder(java.io.File theModel, java.lang.String theType)`
`static OpenNLPDocumentParser`	`getDefault(Connection theConnection)`	Lazily load `OpenNLPDocumentParser` models from the given the database configurations
`static opennlp.tools.lemmatizer.LemmatizerME`	`lemmatizer(java.io.File theModel)`
`static OpenNLPDocumentParser`	`loadFrom(java.io.File theDirectory)`	Loads OpenNLP models, in their default name formats, from the given directory.
`static opennlp.tools.namefind.NameFinderME`	`nameFinder(java.io.File theModel)`
`static opennlp.tools.postag.POSTaggerME`	`posTagger(java.io.File theModel)`
`static opennlp.tools.sentdetect.SentenceDetectorME`	`sentenceDetector(java.io.File theModel)`
`void`	`set(opennlp.tools.chunker.Chunker theChunker)`
`void`	`set(opennlp.tools.lemmatizer.Lemmatizer theLemmatizer)`
`void`	`set(opennlp.tools.postag.POSTagger thePOSTagger)`
`static opennlp.tools.tokenize.TokenizerME`	`tokenizer(java.io.File theModel)`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface com.complexible.stardog.docs.nlp.DocumentParser
apply

Methods inherited from interface java.util.function.Function
andThen, compose

Constructor Detail

OpenNLPDocumentParser

public OpenNLPDocumentParser(opennlp.tools.sentdetect.SentenceDetector theSentenceDetector,
                             opennlp.tools.tokenize.Tokenizer theTokenizer)

Method Detail

add

public void add(opennlp.tools.namefind.TokenNameFinder theNameFinder)

set

public void set(opennlp.tools.postag.POSTagger thePOSTagger)

set

public void set(opennlp.tools.lemmatizer.Lemmatizer theLemmatizer)

set

public void set(opennlp.tools.chunker.Chunker theChunker)

apply
```
public Document apply(java.lang.String theText)
```
NameFinderMEs are not thread-safe, and share internal state between calls. They can only be safely used from another thread after clearAdaptiveData is called. Due to the method being basically a loop, it's safer and easier just to make it synchronized as a whole. Other option would be to cache the TokenNameFinderModel, which is thread safe, and create a new NameFinderME in each call, but those objects are heavy and create a lot of other complex objects.

Specified by:

apply in interface java.util.function.Function<java.lang.String,Document>

clearSharedModels
```
public static void clearSharedModels()
```
Allow shared models to be GC'd as they potentially have a large memory footprint

getDefault

public static OpenNLPDocumentParser getDefault(Connection theConnection)
                                        throws java.io.IOException

Lazily load OpenNLPDocumentParser models from the given the database configurations

Throws:: java.io.IOException

loadFrom
```
public static OpenNLPDocumentParser loadFrom(java.io.File theDirectory)
                                      throws java.io.IOException
```
Loads OpenNLP models, in their default name formats, from the given directory. E.g., folder with files ['en-sent.bin', 'en-token.bin', 'en-ner-organization.bin', 'en-ner-person.bin']

Throws:

java.io.IOException

sentenceDetector

public static opennlp.tools.sentdetect.SentenceDetectorME sentenceDetector(java.io.File theModel)
                                                                    throws java.io.IOException

Throws:: java.io.IOException

tokenizer

public static opennlp.tools.tokenize.TokenizerME tokenizer(java.io.File theModel)
                                                    throws java.io.IOException

Throws:: java.io.IOException

nameFinder

public static opennlp.tools.namefind.NameFinderME nameFinder(java.io.File theModel)
                                                      throws java.io.IOException

Throws:: java.io.IOException

dictNameFinder

public static opennlp.tools.namefind.DictionaryNameFinder dictNameFinder(java.io.File theModel,
                                                                         java.lang.String theType)
                                                                  throws java.io.IOException

Throws:: java.io.IOException

posTagger

public static opennlp.tools.postag.POSTaggerME posTagger(java.io.File theModel)
                                                  throws java.io.IOException

Throws:: java.io.IOException

chunker

public static opennlp.tools.chunker.ChunkerME chunker(java.io.File theModel)
                                               throws java.io.IOException

Throws:: java.io.IOException

lemmatizer

public static opennlp.tools.lemmatizer.LemmatizerME lemmatizer(java.io.File theModel)
                                                        throws java.io.IOException

Throws:: java.io.IOException

dictionaryLemmatizer

public static opennlp.tools.lemmatizer.DictionaryLemmatizer dictionaryLemmatizer(java.io.File theDictionary)
                                                                          throws java.io.IOException

Throws:: java.io.IOException

Class OpenNLPDocumentParser

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface com.complexible.stardog.docs.nlp.DocumentParser

Methods inherited from interface java.util.function.Function

Constructor Detail

OpenNLPDocumentParser

Method Detail

add

set

set

set

apply

clearSharedModels

getDefault

loadFrom

sentenceDetector

tokenizer

nameFinder

dictNameFinder

posTagger

chunker

lemmatizer

dictionaryLemmatizer