Stanford.NLP.NET


Stanford CoreNLP for .NET

Stanford CoreNLP provides a set of natural language analysis tools which can take raw English language text input and give the base forms of words, their parts of speech, whether they are names of companies, people, etc., normalize dates, times, and numeric quantities, and mark up the structure of sentences in terms of phrases and word dependencies, and indicate which noun phrases refer to the same entities. Stanford CoreNLP is an integrated framework, which makes it very easy to apply a bunch of language analysis tools to a piece of text. Starting from plain text, you can run all the tools on it with just two lines of code. Its analyses provides the foundational building blocks for higher-level and domain-specific text understanding applications.

Stanford CoreNLP integrates all Stanford NLP tools, including the part-of-speech (POS) tagger, the named entity recognizer (NER), the parser, the coreference resolution system, and the sentiment analysis tools, and provides model files for analysis of English. The goal of this project is to enable people to quickly and painlessly get complete linguistic annotations of natural language texts. It is designed to be highly flexible and extensible. With a single option, you can choose which tools should be enabled and which should be disabled.

The Stanford CoreNLP code is licensed under the GNU General Public License (v2 or later). Note that this is the full GPL, which allows many free uses, but not its use in distributed proprietary software.

The Stanford CoreNLP library can be installed from NuGet:
PM> Install-Package Stanford.NLP.CoreNLP

F# Sample of text annotation

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
28: 
29: 
30: 
31: 
32: 
33: 
34: 
35: 
36: 
37: 
38: 
39: 
40: 
41: 
42: 
43: 
44: 
45: 
46: 
47: 
48: 
49: 
50: 
51: 
52: 
53: 
54: 
55: 
56: 
57: 
58: 
59: 
60: 
61: 
62: 
63: 
64: 
65: 
66: 
67: 
68: 
69: 
70: 
71: 
72: 
73: 
74: 
75: 
76: 
77: 
#r "IKVM.OpenJDK.Core.dll"
#r "IKVM.OpenJDK.Util.dll"
#r "stanford-corenlp-3.4.dll"

open System
open System.IO
open java.util
open java.io
open edu.stanford.nlp.pipeline

// Path to the folder with models extracted from `stanford-corenlp-3.4-models.jar`
let jarRoot = __SOURCE_DIRECTORY__ + @"\..\..\src\temp\stanford-corenlp-full-2014-06-16\stanford-corenlp-3.4-models\"

// Text for processing
let text = "Kosgi Santosh sent an email to Stanford University. He didn't get a reply.";

// Annotation pipeline configuration
let props = Properties()
props.setProperty("annotators","tokenize, ssplit, pos, lemma, ner, parse, dcoref") |> ignore
props.setProperty("sutime.binders","0") |> ignore

// We should change current directory, so StanfordCoreNLP could find all the model files automatically
let curDir = Environment.CurrentDirectory
Directory.SetCurrentDirectory(jarRoot)
let pipeline = StanfordCoreNLP(props)
Directory.SetCurrentDirectory(curDir)

// Annotation
let annotation = Annotation(text)
pipeline.annotate(annotation)
    
// Result - Pretty Print
let stream = new ByteArrayOutputStream()
pipeline.prettyPrint(annotation, new PrintWriter(stream))
printfn "%O" <| stream.toString()
stream.close()

Sentence #1 (9 tokens):
Kosgi Santosh sent an email to Stanford University.
[Text=Kosgi CharacterOffsetBegin=0 CharacterOffsetEnd=5 PartOfSpeech=NNP Lemma=Kosgi NamedEntityTag=PERSON] ... 
(ROOT
  (S
    (NP (NNP Kosgi) (NNP Santosh))
    (VP (VBD sent)
      (NP (DT an) (NN email))
      (PP (TO to)
        (NP (NNP Stanford) (NNP University))))
    (. .)))

root(ROOT-0, sent-3)
nn(Santosh-2, Kosgi-1)
nsubj(sent-3, Santosh-2)
det(email-5, an-4)
dobj(sent-3, email-5)
nn(University-8, Stanford-7)
prep_to(sent-3, University-8)

Sentence #2 (7 tokens):
He didn't get a reply.
[Text=He CharacterOffsetBegin=52 CharacterOffsetEnd=54 PartOfSpeech=PRP Lemma=he NamedEntityTag=O] ... 
(ROOT
  (S
    (NP (PRP He))
    (VP (VBD did) (RB n't)
      (VP (VB get)
        (NP (DT a) (NN reply))))
    (. .)))

root(ROOT-0, get-4)
nsubj(get-4, He-1)
aux(get-4, did-2)
neg(get-4, n't-3)
det(reply-6, a-5)
dobj(get-4, reply-6)

Coreference set:
	(2,1,[1,2]) -> (1,2,[1,3]), that is: "He" -> "Kosgi Santosh"

Read more about Stanford CoreNLP on the official page.

Relevant posts

namespace System
namespace System.IO
namespace java
namespace java.util
namespace java.io
namespace edu
namespace edu.stanford
namespace edu.stanford.nlp
namespace edu.stanford.nlp.pipeline
val jarRoot : string

Full name: StanfordCoreNLP.jarRoot
val text : string

Full name: StanfordCoreNLP.text
val props : Properties

Full name: StanfordCoreNLP.props
Multiple items
type Properties =
  inherit Hashtable
  new : unit -> Properties + 1 overload
  member getProperty : key:string -> string + 1 overload
  member list : out:PrintStream -> unit + 1 overload
  member load : reader:Reader -> unit + 1 overload
  member loadFromXML : in:InputStream -> unit
  member propertyNames : unit -> Enumeration
  member save : out:OutputStream * comments:string -> unit
  member setProperty : key:string * value:string -> obj
  member store : out:OutputStream * comments:string -> unit + 1 overload
  member storeToXML : os:OutputStream * comment:string -> unit + 1 overload
  ...

Full name: java.util.Properties

--------------------
Properties() : unit
Properties(defaults: Properties) : unit
Properties.setProperty(key: string, value: string) : obj
val ignore : value:'T -> unit

Full name: Microsoft.FSharp.Core.Operators.ignore
val curDir : string

Full name: StanfordCoreNLP.curDir
type Environment =
  static member CommandLine : string
  static member CurrentDirectory : string with get, set
  static member Exit : exitCode:int -> unit
  static member ExitCode : int with get, set
  static member ExpandEnvironmentVariables : name:string -> string
  static member FailFast : message:string -> unit + 1 overload
  static member GetCommandLineArgs : unit -> string[]
  static member GetEnvironmentVariable : variable:string -> string + 1 overload
  static member GetEnvironmentVariables : unit -> IDictionary + 1 overload
  static member GetFolderPath : folder:SpecialFolder -> string + 1 overload
  ...
  nested type SpecialFolder
  nested type SpecialFolderOption

Full name: System.Environment
property Environment.CurrentDirectory: string
type Directory =
  static member CreateDirectory : path:string -> DirectoryInfo + 1 overload
  static member Delete : path:string -> unit + 1 overload
  static member EnumerateDirectories : path:string -> IEnumerable<string> + 2 overloads
  static member EnumerateFileSystemEntries : path:string -> IEnumerable<string> + 2 overloads
  static member EnumerateFiles : path:string -> IEnumerable<string> + 2 overloads
  static member Exists : path:string -> bool
  static member GetAccessControl : path:string -> DirectorySecurity + 1 overload
  static member GetCreationTime : path:string -> DateTime
  static member GetCreationTimeUtc : path:string -> DateTime
  static member GetCurrentDirectory : unit -> string
  ...

Full name: System.IO.Directory
Directory.SetCurrentDirectory(path: string) : unit
val pipeline : StanfordCoreNLP

Full name: StanfordCoreNLP.pipeline
Multiple items
type StanfordCoreNLP =
  inherit AnnotationPipeline
  new : unit -> StanfordCoreNLP + 4 overloads
  member annotate : annotation:Annotation -> unit
  member getBeamPrintingOption : unit -> float
  member getConstituentTreePrinter : unit -> TreePrint
  member getDependencyTreePrinter : unit -> TreePrint
  member getEncoding : unit -> string
  member getPrintSingletons : unit -> bool
  member getProperties : unit -> Properties
  member prettyPrint : annotation:Annotation * os:OutputStream -> unit + 1 overload
  member ``process`` : text:string -> Annotation
  ...

Full name: edu.stanford.nlp.pipeline.StanfordCoreNLP

--------------------
StanfordCoreNLP() : unit
StanfordCoreNLP(props: Properties) : unit
StanfordCoreNLP(propsFileNamePrefix: string) : unit
StanfordCoreNLP(props: Properties, enforceRequirements: bool) : unit
StanfordCoreNLP(propsFileNamePrefix: string, enforceRequirements: bool) : unit
val annotation : Annotation

Full name: StanfordCoreNLP.annotation
Multiple items
type Annotation =
  inherit ArrayCoreMap
  new : unit -> Annotation + 3 overloads
  member copy : unit -> Annotation
  member toString : unit -> string

Full name: edu.stanford.nlp.pipeline.Annotation

--------------------
Annotation(text: string) : unit
Annotation(sentences: List) : unit
Annotation(map: Annotation) : unit
AnnotationPipeline.annotate(annotations: java.lang.Iterable) : unit
StanfordCoreNLP.annotate(annotation: Annotation) : unit
AnnotationPipeline.annotate(annotations: java.lang.Iterable, callback: edu.stanford.nlp.util.Function) : unit
AnnotationPipeline.annotate(annotations: java.lang.Iterable, numThreads: int) : unit
AnnotationPipeline.annotate(annotations: java.lang.Iterable, numThreads: int, callback: edu.stanford.nlp.util.Function) : unit
val stream : ByteArrayOutputStream

Full name: StanfordCoreNLP.stream
Multiple items
type ByteArrayOutputStream =
  inherit OutputStream
  new : unit -> ByteArrayOutputStream + 1 overload
  member close : unit -> unit
  member reset : unit -> unit
  member size : unit -> int
  member toByteArray : unit -> byte[]
  member toString : unit -> string + 2 overloads
  member write : b:int -> unit + 1 overload
  member writeTo : out:OutputStream -> unit

Full name: java.io.ByteArrayOutputStream

--------------------
ByteArrayOutputStream() : unit
ByteArrayOutputStream(size: int) : unit
StanfordCoreNLP.prettyPrint(annotation: Annotation, os: PrintWriter) : unit
StanfordCoreNLP.prettyPrint(annotation: Annotation, os: OutputStream) : unit
Multiple items
type PrintWriter =
  inherit Writer
  new : out:Writer -> PrintWriter + 7 overloads
  member ``<bridge>append`` : x0:char -> Writer + 5 overloads
  member append : c:char -> PrintWriter + 2 overloads
  member checkError : unit -> bool
  member close : unit -> unit
  member flush : unit -> unit
  member format : format:string * params args:obj[] -> PrintWriter + 1 overload
  member print : b:bool -> unit + 8 overloads
  member printf : format:string * params args:obj[] -> PrintWriter + 1 overload
  member println : unit -> unit + 9 overloads
  ...

Full name: java.io.PrintWriter

--------------------
PrintWriter(out: Writer) : unit
PrintWriter(out: OutputStream) : unit
PrintWriter(fileName: string) : unit
PrintWriter(file: File) : unit
PrintWriter(out: Writer, autoFlush: bool) : unit
PrintWriter(out: OutputStream, autoFlush: bool) : unit
PrintWriter(fileName: string, csn: string) : unit
PrintWriter(file: File, csn: string) : unit
val printfn : format:Printf.TextWriterFormat<'T> -> 'T

Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.printfn
ByteArrayOutputStream.toString() : string
ByteArrayOutputStream.toString(charsetName: string) : string
ByteArrayOutputStream.close() : unit
Fork me on GitHub