Class OpenNLPDocumentParser
- java.lang.Object
-
- com.complexible.stardog.docs.nlp.impl.OpenNLPDocumentParser
-
- All Implemented Interfaces:
DocumentParser
,java.util.function.Function<java.lang.String,Document>
public class OpenNLPDocumentParser extends java.lang.Object implements DocumentParser
DocumentParser
based on OpenNLP- Since:
- 5.2
- Version:
- 5.2
- Author:
- Pedro Oliveira
-
-
Constructor Summary
Constructors Constructor Description OpenNLPDocumentParser(opennlp.tools.sentdetect.SentenceDetector theSentenceDetector, opennlp.tools.tokenize.Tokenizer theTokenizer)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
add(opennlp.tools.namefind.TokenNameFinder theNameFinder)
Document
apply(java.lang.String theText)
NameFinderME
s are not thread-safe, and share internal state between calls.static opennlp.tools.chunker.ChunkerME
chunker(java.io.File theModel)
static void
clearSharedModels()
Allow shared models to be GC'd as they potentially have a large memory footprintstatic opennlp.tools.lemmatizer.DictionaryLemmatizer
dictionaryLemmatizer(java.io.File theDictionary)
static opennlp.tools.namefind.DictionaryNameFinder
dictNameFinder(java.io.File theModel, java.lang.String theType)
static OpenNLPDocumentParser
getDefault(Connection theConnection)
Lazily loadOpenNLPDocumentParser
models from the given the database configurationsstatic opennlp.tools.lemmatizer.LemmatizerME
lemmatizer(java.io.File theModel)
static OpenNLPDocumentParser
loadFrom(java.io.File theDirectory)
Loads OpenNLP models, in their default name formats, from the given directory.static opennlp.tools.namefind.NameFinderME
nameFinder(java.io.File theModel)
static opennlp.tools.postag.POSTaggerME
posTagger(java.io.File theModel)
static opennlp.tools.sentdetect.SentenceDetectorME
sentenceDetector(java.io.File theModel)
void
set(opennlp.tools.chunker.Chunker theChunker)
void
set(opennlp.tools.lemmatizer.Lemmatizer theLemmatizer)
void
set(opennlp.tools.postag.POSTagger thePOSTagger)
static opennlp.tools.tokenize.TokenizerME
tokenizer(java.io.File theModel)
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface com.complexible.stardog.docs.nlp.DocumentParser
apply
-
-
-
-
Method Detail
-
add
public void add(opennlp.tools.namefind.TokenNameFinder theNameFinder)
-
set
public void set(opennlp.tools.postag.POSTagger thePOSTagger)
-
set
public void set(opennlp.tools.lemmatizer.Lemmatizer theLemmatizer)
-
set
public void set(opennlp.tools.chunker.Chunker theChunker)
-
apply
public Document apply(java.lang.String theText)
NameFinderME
s are not thread-safe, and share internal state between calls. They can only be safely used from another thread after clearAdaptiveData is called. Due to the method being basically a loop, it's safer and easier just to make it synchronized as a whole. Other option would be to cache theTokenNameFinderModel
, which is thread safe, and create a newNameFinderME
in each call, but those objects are heavy and create a lot of other complex objects.- Specified by:
apply
in interfacejava.util.function.Function<java.lang.String,Document>
-
clearSharedModels
public static void clearSharedModels()
Allow shared models to be GC'd as they potentially have a large memory footprint
-
getDefault
public static OpenNLPDocumentParser getDefault(Connection theConnection) throws java.io.IOException
Lazily loadOpenNLPDocumentParser
models from the given the database configurations- Throws:
java.io.IOException
-
loadFrom
public static OpenNLPDocumentParser loadFrom(java.io.File theDirectory) throws java.io.IOException
Loads OpenNLP models, in their default name formats, from the given directory. E.g., folder with files ['en-sent.bin', 'en-token.bin', 'en-ner-organization.bin', 'en-ner-person.bin']- Throws:
java.io.IOException
-
sentenceDetector
public static opennlp.tools.sentdetect.SentenceDetectorME sentenceDetector(java.io.File theModel) throws java.io.IOException
- Throws:
java.io.IOException
-
tokenizer
public static opennlp.tools.tokenize.TokenizerME tokenizer(java.io.File theModel) throws java.io.IOException
- Throws:
java.io.IOException
-
nameFinder
public static opennlp.tools.namefind.NameFinderME nameFinder(java.io.File theModel) throws java.io.IOException
- Throws:
java.io.IOException
-
dictNameFinder
public static opennlp.tools.namefind.DictionaryNameFinder dictNameFinder(java.io.File theModel, java.lang.String theType) throws java.io.IOException
- Throws:
java.io.IOException
-
posTagger
public static opennlp.tools.postag.POSTaggerME posTagger(java.io.File theModel) throws java.io.IOException
- Throws:
java.io.IOException
-
chunker
public static opennlp.tools.chunker.ChunkerME chunker(java.io.File theModel) throws java.io.IOException
- Throws:
java.io.IOException
-
lemmatizer
public static opennlp.tools.lemmatizer.LemmatizerME lemmatizer(java.io.File theModel) throws java.io.IOException
- Throws:
java.io.IOException
-
dictionaryLemmatizer
public static opennlp.tools.lemmatizer.DictionaryLemmatizer dictionaryLemmatizer(java.io.File theDictionary) throws java.io.IOException
- Throws:
java.io.IOException
-
-