Class OpenNLPDocumentParser
- java.lang.Object
 - 
- com.complexible.stardog.docs.nlp.impl.OpenNLPDocumentParser
 
 
- 
- All Implemented Interfaces:
 DocumentParser,java.util.function.Function<java.lang.String,Document>
public class OpenNLPDocumentParser extends java.lang.Object implements DocumentParser
DocumentParserbased on OpenNLP- Since:
 - 5.2
 - Version:
 - 5.2
 - Author:
 - Pedro Oliveira
 
 
- 
- 
Constructor Summary
Constructors Constructor Description OpenNLPDocumentParser(opennlp.tools.sentdetect.SentenceDetector theSentenceDetector, opennlp.tools.tokenize.Tokenizer theTokenizer) 
- 
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voidadd(opennlp.tools.namefind.TokenNameFinder theNameFinder)Documentapply(java.lang.String theText)NameFinderMEs are not thread-safe, and share internal state between calls.static opennlp.tools.chunker.ChunkerMEchunker(java.io.File theModel)static voidclearSharedModels()Allow shared models to be GC'd as they potentially have a large memory footprintstatic opennlp.tools.lemmatizer.DictionaryLemmatizerdictionaryLemmatizer(java.io.File theDictionary)static opennlp.tools.namefind.DictionaryNameFinderdictNameFinder(java.io.File theModel, java.lang.String theType)static OpenNLPDocumentParsergetDefault(Connection theConnection)Lazily loadOpenNLPDocumentParsermodels from the given the database configurationsstatic opennlp.tools.lemmatizer.LemmatizerMElemmatizer(java.io.File theModel)static OpenNLPDocumentParserloadFrom(java.io.File theDirectory)Loads OpenNLP models, in their default name formats, from the given directory.static opennlp.tools.namefind.NameFinderMEnameFinder(java.io.File theModel)static opennlp.tools.postag.POSTaggerMEposTagger(java.io.File theModel)static opennlp.tools.sentdetect.SentenceDetectorMEsentenceDetector(java.io.File theModel)voidset(opennlp.tools.chunker.Chunker theChunker)voidset(opennlp.tools.lemmatizer.Lemmatizer theLemmatizer)voidset(opennlp.tools.postag.POSTagger thePOSTagger)static opennlp.tools.tokenize.TokenizerMEtokenizer(java.io.File theModel)- 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait 
- 
Methods inherited from interface com.complexible.stardog.docs.nlp.DocumentParser
apply 
 - 
 
 - 
 
- 
- 
Method Detail
- 
add
public void add(opennlp.tools.namefind.TokenNameFinder theNameFinder)
 
- 
set
public void set(opennlp.tools.postag.POSTagger thePOSTagger)
 
- 
set
public void set(opennlp.tools.lemmatizer.Lemmatizer theLemmatizer)
 
- 
set
public void set(opennlp.tools.chunker.Chunker theChunker)
 
- 
apply
public Document apply(java.lang.String theText)
NameFinderMEs are not thread-safe, and share internal state between calls. They can only be safely used from another thread after clearAdaptiveData is called. Due to the method being basically a loop, it's safer and easier just to make it synchronized as a whole. Other option would be to cache theTokenNameFinderModel, which is thread safe, and create a newNameFinderMEin each call, but those objects are heavy and create a lot of other complex objects.- Specified by:
 applyin interfacejava.util.function.Function<java.lang.String,Document>
 
- 
clearSharedModels
public static void clearSharedModels()
Allow shared models to be GC'd as they potentially have a large memory footprint 
- 
getDefault
public static OpenNLPDocumentParser getDefault(Connection theConnection) throws java.io.IOException
Lazily loadOpenNLPDocumentParsermodels from the given the database configurations- Throws:
 java.io.IOException
 
- 
loadFrom
public static OpenNLPDocumentParser loadFrom(java.io.File theDirectory) throws java.io.IOException
Loads OpenNLP models, in their default name formats, from the given directory. E.g., folder with files ['en-sent.bin', 'en-token.bin', 'en-ner-organization.bin', 'en-ner-person.bin']- Throws:
 java.io.IOException
 
- 
sentenceDetector
public static opennlp.tools.sentdetect.SentenceDetectorME sentenceDetector(java.io.File theModel) throws java.io.IOException- Throws:
 java.io.IOException
 
- 
tokenizer
public static opennlp.tools.tokenize.TokenizerME tokenizer(java.io.File theModel) throws java.io.IOException- Throws:
 java.io.IOException
 
- 
nameFinder
public static opennlp.tools.namefind.NameFinderME nameFinder(java.io.File theModel) throws java.io.IOException- Throws:
 java.io.IOException
 
- 
dictNameFinder
public static opennlp.tools.namefind.DictionaryNameFinder dictNameFinder(java.io.File theModel, java.lang.String theType) throws java.io.IOException- Throws:
 java.io.IOException
 
- 
posTagger
public static opennlp.tools.postag.POSTaggerME posTagger(java.io.File theModel) throws java.io.IOException- Throws:
 java.io.IOException
 
- 
chunker
public static opennlp.tools.chunker.ChunkerME chunker(java.io.File theModel) throws java.io.IOException- Throws:
 java.io.IOException
 
- 
lemmatizer
public static opennlp.tools.lemmatizer.LemmatizerME lemmatizer(java.io.File theModel) throws java.io.IOException- Throws:
 java.io.IOException
 
- 
dictionaryLemmatizer
public static opennlp.tools.lemmatizer.DictionaryLemmatizer dictionaryLemmatizer(java.io.File theDictionary) throws java.io.IOException- Throws:
 java.io.IOException
 
 - 
 
 -