Stanford.NLP.NET


Stanford CoreNLP for .NET

Stanford CoreNLP provides a set of natural language analysis tools which can take raw English language text input and give the base forms of words, their parts of speech, whether they are names of companies, people, etc., normalize dates, times, and numeric quantities, and mark up the structure of sentences in terms of phrases and word dependencies, and indicate which noun phrases refer to the same entities. Stanford CoreNLP is an integrated framework, which makes it very easy to apply a bunch of language analysis tools to a piece of text. Starting from plain text, you can run all the tools on it with just two lines of code. Its analyses provides the foundational building blocks for higher-level and domain-specific text understanding applications.

Stanford CoreNLP integrates all Stanford NLP tools, including the part-of-speech (POS) tagger, the named entity recognizer (NER), the parser, the coreference resolution system, and the sentiment analysis tools, and provides model files for analysis of English. The goal of this project is to enable people to quickly and painlessly get complete linguistic annotations of natural language texts. It is designed to be highly flexible and extensible. With a single option, you can choose which tools should be enabled and which should be disabled.

The Stanford CoreNLP code is licensed under the GNU General Public License (v2 or later). Note that this is the full GPL, which allows many free uses, but not its use in distributed proprietary software.

The Stanford CoreNLP library can be installed from NuGet:
PM> Install-Package Stanford.NLP.CoreNLP

F# Sample of text annotation

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
28: 
29: 
30: 
31: 
32: 
33: 
34: 
35: 
36: 
37: 
38: 
39: 
40: 
41: 
42: 
43: 
44: 
45: 
46: 
47: 
48: 
49: 
50: 
51: 
52: 
53: 
54: 
55: 
56: 
57: 
58: 
59: 
60: 
61: 
62: 
63: 
64: 
65: 
66: 
67: 
68: 
69: 
70: 
71: 
72: 
73: 
74: 
75: 
76: 
77: 
#r "IKVM.OpenJDK.Core.dll"
#r "IKVM.OpenJDK.Util.dll"
#r "stanford-corenlp-3.7.0.dll"

open System
open System.IO
open java.util
open java.io
open edu.stanford.nlp.pipeline

// Path to the folder with models extracted from `stanford-corenlp-3.7.0-models.jar`
let jarRoot = __SOURCE_DIRECTORY__ + @"..\..\data\paket-files\nlp.stanford.edu\stanford-corenlp-full-2016-10-31\models\"

// Text for processing
let text = "Kosgi Santosh sent an email to Stanford University. He didn't get a reply.";

// Annotation pipeline configuration
let props = Properties()
props.setProperty("annotators","tokenize, ssplit, pos, lemma, ner, parse, dcoref") |> ignore
props.setProperty("ner.useSUTime","0") |> ignore

// We should change current directory, so StanfordCoreNLP could find all the model files automatically
let curDir = Environment.CurrentDirectory
Directory.SetCurrentDirectory(jarRoot)
let pipeline = StanfordCoreNLP(props)
Directory.SetCurrentDirectory(curDir)

// Annotation
let annotation = Annotation(text)
pipeline.annotate(annotation)

// Result - Pretty Print
let stream = new ByteArrayOutputStream()
pipeline.prettyPrint(annotation, new PrintWriter(stream))
printfn "%O" <| stream.toString()
stream.close()

Sentence #1 (9 tokens):
Kosgi Santosh sent an email to Stanford University.
[Text=Kosgi CharacterOffsetBegin=0 CharacterOffsetEnd=5 PartOfSpeech=NNP Lemma=Kosgi NamedEntityTag=PERSON] ... 
(ROOT
  (S
    (NP (NNP Kosgi) (NNP Santosh))
    (VP (VBD sent)
      (NP (DT an) (NN email))
      (PP (TO to)
        (NP (NNP Stanford) (NNP University))))
    (. .)))

root(ROOT-0, sent-3)
nn(Santosh-2, Kosgi-1)
nsubj(sent-3, Santosh-2)
det(email-5, an-4)
dobj(sent-3, email-5)
nn(University-8, Stanford-7)
prep_to(sent-3, University-8)

Sentence #2 (7 tokens):
He didn't get a reply.
[Text=He CharacterOffsetBegin=52 CharacterOffsetEnd=54 PartOfSpeech=PRP Lemma=he NamedEntityTag=O] ... 
(ROOT
  (S
    (NP (PRP He))
    (VP (VBD did) (RB n't)
      (VP (VB get)
        (NP (DT a) (NN reply))))
    (. .)))

root(ROOT-0, get-4)
nsubj(get-4, He-1)
aux(get-4, did-2)
neg(get-4, n't-3)
det(reply-6, a-5)
dobj(get-4, reply-6)

Coreference set:
	(2,1,[1,2]) -> (1,2,[1,3]), that is: "He" -> "Kosgi Santosh"

C# Sample of text annotation

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
28: 
29: 
30: 
31: 
32: 
33: 
34: 
35: 
36: 
37: 
38: 
39: 
40: 
41: 
42: 
43: 
44: 
using System;
using System.IO;
using java.util;
using java.io;
using edu.stanford.nlp.pipeline;
using Console = System.Console;

namespace Stanford.NLP.CoreNLP.CSharp
{
    class Program
    {
        static void Main()
        {
            // Path to the folder with models extracted from `stanford-corenlp-3.7.0-models.jar`
            var jarRoot = @"..\..\..\..\data\paket-files\nlp.stanford.edu\stanford-corenlp-full-2016-10-31\models";

            // Text for processing
            var text = "Kosgi Santosh sent an email to Stanford University. He didn't get a reply.";

            // Annotation pipeline configuration
            var props = new Properties();
            props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
            props.setProperty("ner.useSUTime", "0");

            // We should change current directory, so StanfordCoreNLP could find all the model files automatically
            var curDir = Environment.CurrentDirectory;
            Directory.SetCurrentDirectory(jarRoot);
            var pipeline = new StanfordCoreNLP(props);
            Directory.SetCurrentDirectory(curDir);

            // Annotation
            var annotation = new Annotation(text);
            pipeline.annotate(annotation);

            // Result - Pretty Print
            using (var stream = new ByteArrayOutputStream())
            {
                pipeline.prettyPrint(annotation, new PrintWriter(stream));
                Console.WriteLine(stream.toString());
                stream.close();
            }
        }
    }
}

Read more about Stanford CoreNLP on the official page.

Relevant posts

Fork me on GitHub