Stanford.NLP.NET

Simple CoreNLP

This package is deprecated and should not be used. Read more.

This page is direct translation of the original Simple CoreNLP page

Simple CoreNLP

In addition to the fully-featured annotator pipeline interface to CoreNLP, Stanford provides a simple API for users who do not need a lot of customization. The intended audience of this package is users of CoreNLP who want “just use nlp” to work as fast and easily as possible, and do not care about the details of the behaviors of the algorithms.

An example usage is given below:

#r "IKVM.OpenJDK.Core.dll"
#r "IKVM.OpenJDK.Util.dll"
#r "stanford-corenlp-4.5.0.dll"

open System
open java.util
open edu.stanford.nlp.simple

// Path to the folder with models extracted from `stanford-corenlp-4.5.0-models.jar`
let jarRoot = (__SOURCE_DIRECTORY__)+ @"/../../../data/paket-files/nlp.stanford.edu/"
                                    + @"stanford-corenlp-4.5.0/models/"
System.IO.Directory.SetCurrentDirectory(jarRoot)

// Custom properties for annotators
let props = Properties()
props.setProperty("ner.useSUTime","0") |> ignore

let sent : Sentence = new Sentence("Lucy is in the sky with diamonds.")
let nerTags : List = sent.nerTags(props);
let firstPOSTag : string = sent.posTag(0);
// [fsi:> ]
// [fsi:val sent : Sentence = Lucy is in the sky with diamonds.]
// [fsi:val nerTags : List = seq ["PERSON"; "O"; "O"; "O"; ...]]
// [fsi:val firstPOSTag : string = "NNP"]

The API is included in the CoreNLP release from 3.6.0 onwards. Visit the download page to download CoreNLP; make sure to set current directory to folder with models!

If you use Simple CoreNLP API, your current directory should always be set to the root folder of an unzipped model, since Simple CoreNLP loads models lazily. [Read more about model loading](../faq.html#Stanford-NLP-CoreNLP-not-loading-models)

Advantages and Disadvantages

This interface offers a number of advantages (and a few disadvantages – see below) over the default annotator pipeline:

In exchange for these advantages, users should be aware of a few disadvantages:

Usage

There are two main classes in the interface: Document and Sentence. Tokens are represented as array elements in a sentence; e.g., to get the lemma of a token, get the lemmas array from the sentence and index it at the appropriate index. A constructor is provided for both the Document and Sentence class. For the former, the text is treated as an entire document containing potentially multiple sentences. For the latter, the text is forced to be interpreted as a single sentence.

An example program using the interface is given below:

open edu.stanford.nlp.simple

// Create a document. No computation is done yet.
let doc : Document = new Document("add your text here! It can contain multiple sentences.");
let sentences = doc.sentences().toArray()
for sentObj in sentences do  // Will iterate over two sentences
    let sent : Sentence = sentObj :?> Sentence
    // We're only asking for words -- no need to load any models yet
    Console.WriteLine("The second word of the sentence '{0}' is {1}", sent, sent.word(1));
    // When we ask for the lemma, it will load and run the part of speech tagger
    Console.WriteLine("The third lemma of the sentence '{0}' is {1}", sent, sent.lemma(2));
    // When we ask for the parse, it will load and run the parser
    Console.WriteLine("The parse of the sentence '{0}' is {1}", sent, sent.parse());
// [fsi:> ]
// [fsi:The second word of the sentence 'add your text here!' is your]
// [fsi:The third lemma of the sentence 'add your text here!' is text]
// [fsi:The parse of the sentence 'add your text here!' is (ROOT (S (VP (VB add) (NP (PRP$ your) (NN text)) (ADVP (RB here))) (. !)))]
// [fsi:The second word of the sentence 'It can contain multiple sentences.' is can]
// [fsi:The third lemma of the sentence 'It can contain multiple sentences.' is contain]
// [fsi:The parse of the sentence 'It can contain multiple sentences.' is (ROOT (S (NP (PRP It)) (VP (MD can) (VP (VB contain) (NP (JJ multiple) (NNS sentences)))) (. .)))]

Supported Annotators

The interface is not guaranteed to support all of the annotators in the CoreNLP pipeline. However, most common annotators are supported. A list of these, and their invocation, is given below. Functionality is the plain-english description of the task to be performed. The second column lists the analogous CoreNLP annotator for that task. The implementing class and function describe the class and function used in this wrapper to perform the same tasks.

Functionality Anootator in CoreNLP Implementation class Function
Tokenization tokenize Sentence .words() / .word(int)
Sentence Splitting ssplit Document .sentences() / .sentence(int)
Part of Speech Tagging pos Sentence .posTags() / .posTag(int)
Lemmatization lemma Sentence .lemmas() / .lemma(int)
Named Entity Recognition ner Sentence .nerTags() / .nerTag(int)
Constituency Parsing parse Sentence .parse()
Dependency Parsing depparse Sentence .governor(int) / .incomingDependencyLabel(int)
Coreference Resolution dcoref Document .coref()
Natural Logic Polarity natlog Sentence .natlogPolarities() / natlogPolarity(int)
Open Information Extraction openie Sentence .openie() / .openieTriples()

Miscellaneous Extras

Some potentially useful utility functions are implemented in the SentenceAlgorithms class. These can be called from a Sentence object with, e.g.:

open edu.stanford.nlp.ie.machinereading.structure

let sent2 : Sentence = new Sentence("your text should go here");
sent2.algorithms().headOfSpan(new Span(0, 2));  // Should return 1
// [fsi:> ]
// [fsi:val sent2 : Sentence = your text should go here]
// [fsi:val it : int = 1]

A selection of useful algorithms are: