public class OpenNLPDocumentParser extends Object implements DocumentParser
DocumentParser
based on OpenNLPConstructor and Description |
---|
OpenNLPDocumentParser(opennlp.tools.sentdetect.SentenceDetector theSentenceDetector,
opennlp.tools.tokenize.Tokenizer theTokenizer) |
Modifier and Type | Method and Description |
---|---|
void |
add(opennlp.tools.namefind.TokenNameFinder theNameFinder) |
Document |
apply(String theText)
NameFinderME s are not thread-safe, and share internal state between calls. |
static opennlp.tools.chunker.ChunkerME |
chunker(File theModel) |
static void |
clearSharedModels()
Allow shared models to be GC'd as they potentially have a large memory footprint
|
static opennlp.tools.lemmatizer.DictionaryLemmatizer |
dictionaryLemmatizer(File theDictionary) |
static opennlp.tools.namefind.DictionaryNameFinder |
dictNameFinder(File theModel,
String theType) |
static OpenNLPDocumentParser |
getDefault(Connection theConnection)
Lazily load
OpenNLPDocumentParser models from the given the database configurations |
static opennlp.tools.lemmatizer.LemmatizerME |
lemmatizer(File theModel) |
static OpenNLPDocumentParser |
loadFrom(File theDirectory)
Loads OpenNLP models, in their default name formats, from the given directory.
|
static opennlp.tools.namefind.NameFinderME |
nameFinder(File theModel) |
static opennlp.tools.postag.POSTaggerME |
posTagger(File theModel) |
static opennlp.tools.sentdetect.SentenceDetectorME |
sentenceDetector(File theModel) |
void |
set(opennlp.tools.chunker.Chunker theChunker) |
void |
set(opennlp.tools.lemmatizer.Lemmatizer theLemmatizer) |
void |
set(opennlp.tools.postag.POSTagger thePOSTagger) |
static opennlp.tools.tokenize.TokenizerME |
tokenizer(File theModel) |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
apply
public OpenNLPDocumentParser(opennlp.tools.sentdetect.SentenceDetector theSentenceDetector, opennlp.tools.tokenize.Tokenizer theTokenizer)
public void add(opennlp.tools.namefind.TokenNameFinder theNameFinder)
public void set(opennlp.tools.postag.POSTagger thePOSTagger)
public void set(opennlp.tools.lemmatizer.Lemmatizer theLemmatizer)
public void set(opennlp.tools.chunker.Chunker theChunker)
public Document apply(String theText)
NameFinderME
s are not thread-safe, and share internal state between calls.
They can only be safely used from another thread after clearAdaptiveData is called.
Due to the method being basically a loop, it's safer and easier just to make it synchronized as a whole.
Other option would be to cache the TokenNameFinderModel
, which is thread safe,
and create a new NameFinderME
in each call, but those objects are heavy and create a lot of other complex objects.public static void clearSharedModels()
public static OpenNLPDocumentParser getDefault(Connection theConnection) throws IOException
OpenNLPDocumentParser
models from the given the database configurationsIOException
public static OpenNLPDocumentParser loadFrom(File theDirectory) throws IOException
IOException
public static opennlp.tools.sentdetect.SentenceDetectorME sentenceDetector(File theModel) throws IOException
IOException
public static opennlp.tools.tokenize.TokenizerME tokenizer(File theModel) throws IOException
IOException
public static opennlp.tools.namefind.NameFinderME nameFinder(File theModel) throws IOException
IOException
public static opennlp.tools.namefind.DictionaryNameFinder dictNameFinder(File theModel, String theType) throws IOException
IOException
public static opennlp.tools.postag.POSTaggerME posTagger(File theModel) throws IOException
IOException
public static opennlp.tools.chunker.ChunkerME chunker(File theModel) throws IOException
IOException
public static opennlp.tools.lemmatizer.LemmatizerME lemmatizer(File theModel) throws IOException
IOException
public static opennlp.tools.lemmatizer.DictionaryLemmatizer dictionaryLemmatizer(File theDictionary) throws IOException
IOException
Copyright © 2010-2016 Stardog Union. All Rights Reserved.