Stanford.NLP.NET


Stanford Named Entity Recognizer (NER) for .NET

Stanford NER is an implementation of a Named Entity Recognizer. Named Entity Recognition (NER) labels sequences of words in a text which are the names of things, such as person and company names, or gene and protein names. It comes with well-engineered feature extractors for Named Entity Recognition, and many options for defining feature extractors. Included with the download are good named entity recognizers for English, particularly for the 3 classes (PERSON, ORGANIZATION, LOCATION), and Stanford NLP Group also makes available on the original page various other models for different languages and circumstances, including models trained on just the CoNLL 2003 English training data. The distributional similarity features in some models improve performance but the models require considerably more memory.

Stanford NER is also known as CRFClassifier. The software provides a general implementation of (arbitrary order) linear chain Conditional Random Field (CRF) sequence models. That is, by training your own models, you can actually use this code to build sequence models for any task.

You can look at a PowerPoint Introduction to NER and the Stanford NER package ppt pdf or the FAQ, which has some information on training models. Further documentation is provided in the included README and in the javadocs.

Stanford NER is available for download, licensed under the GNU General Public License (v2 or later). Source is included. The package includes components for command-line invocation, running as a server, and a Java API. Stanford NER code is dual licensed (in a similar manner to MySQL, etc.). Open source licensing is under the full GPL, which allows many free uses. For distributors of proprietary software, commercial licensing is available. If you don't need a commercial license, but would like to support maintenance of these tools, Stanford NLP Group welcomes gifts.

The Stanford NER library can be installed from NuGet:
PM> Install-Package Stanford.NLP.NER

F# Sample of Named Entity Recognition

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
28: 
29: 
30: 
31: 
32: 
33: 
34: 
35: 
36: 
37: 
#r "IKVM.OpenJDK.Core.dll"
#r "IKVM.OpenJDK.Util.dll"
#r "stanford-ner.dll"

open edu.stanford.nlp.ie.crf

// Path to the folder with classifiers models
let classifiersDirecrory =
    __SOURCE_DIRECTORY__ + @"..\..\paket-files\nlp.stanford.edu\stanford-ner-2016-10-31\classifiers\"

// Loading 3 class classifier model
let classifier =
    CRFClassifier.getClassifierNoExceptions(
        classifiersDirecrory + "english.all.3class.distsim.crf.ser.gz")

let s1 = "Good afternoon Rajat Raina, how are you today?"
printfn "%s\n" (classifier.classifyToString(s1))
> 
Good/O afternoon/O Rajat/PERSON Raina/PERSON,/O how/O are/O you/O today/O?/O
val it : unit = ()

let s2 = "I go to school at Stanford University, which is located in California."
printfn "%s\n" (classifier.classifyWithInlineXML(s2))
> 
I go to school at <ORGANIZATION>Stanford University</ORGANIZATION>, which is 
located in <LOCATION>California</LOCATION>.
val it : unit = ()

printfn "%s\n" (classifier.classifyToString(s2, "xml", true));
> 
<wi num="0" entity="O">I</wi> <wi num="1" entity="O">go</wi> <wi num="2" entity="O">to</wi> 
<wi num="3" entity="O">school</wi> <wi num="4" entity="O">at</wi> 
<wi num="5" entity="ORGANIZATION">Stanford</wi> <wi num="6" entity="ORGANIZATION">University</wi>
<wi num="7" entity="O">,</wi> <wi num="8" entity="O">which</wi> <wi num="9" entity="O">is</wi>
<wi num="10" entity="O">located</wi> <wi num="11" entity="O">in</wi> 
<wi num="12" entity="LOCATION">California</wi><wi num="13" entity="O">.</wi>
val it : unit = ()

C# Sample of Named Entity Recognition

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
using edu.stanford.nlp.ie.crf;
using Console = System.Console;

namespace Stanford.NLP.NER.CSharp
{
    class Program
    {
        static void Main()
        {
            // Path to the folder with classifiers models
            var jarRoot = @"..\..\..\..\paket-files\nlp.stanford.edu\stanford-ner-2016-10-31";
            var classifiersDirecrory = jarRoot + @"\classifiers";

            // Loading 3 class classifier model
            var classifier = CRFClassifier.getClassifierNoExceptions(
                classifiersDirecrory + @"\english.all.3class.distsim.crf.ser.gz");

            var s1 = "Good afternoon Rajat Raina, how are you today?";
            Console.WriteLine("{0}\n", classifier.classifyToString(s1));

            var s2 = "I go to school at Stanford University, which is located in California.";
            Console.WriteLine("{0}\n", classifier.classifyWithInlineXML(s2));

            Console.WriteLine("{0}\n", classifier.classifyToString(s2, "xml", true));
        }
    }
}

Read more about Stanford Stanford Named Entity Recognizer on the official page.

Relevant posts

namespace edu
namespace edu.stanford
namespace edu.stanford.nlp
namespace edu.stanford.nlp.ie
namespace edu.stanford.nlp.ie.crf
val classifiersDirecrory : string

Full name: StanfordNER.classifiersDirecrory
val classifier : CRFClassifier

Full name: StanfordNER.classifier
Multiple items
type CRFClassifier =
  inherit AbstractSequenceClassifier
  new : flags:SeqClassifierFlags -> CRFClassifier + 2 overloads
  member classify : document:List -> List
  member classifyGibbs : document:List -> List + 1 overload
  member classifyMaxEnt : document:List -> List
  member classifyWithGlobalInformation : tokenSeq:List * doc:CoreMap * sent:CoreMap -> List
  member combine : crf:CRFClassifier * weight:float -> unit
  member documentToDataAndLabels : document:List -> Triple
  member documentsToDataAndLabels : documents:Collection -> Triple
  member documentsToDataAndLabelsList : documents:Collection -> List
  member dropFeaturesBelowThreshold : threshold:float -> unit
  ...

Full name: edu.stanford.nlp.ie.crf.CRFClassifier

--------------------
CRFClassifier(flags: edu.stanford.nlp.sequences.SeqClassifierFlags) : unit
CRFClassifier(props: java.util.Properties) : unit
CRFClassifier(crf: CRFClassifier) : unit
CRFClassifier.getClassifierNoExceptions(loadPath: string) : CRFClassifier
val s1 : string

Full name: StanfordNER.s1
val printfn : format:Printf.TextWriterFormat<'T> -> 'T

Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.printfn
edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyToString(sentences: string) : string
edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyToString(sentences: string, outputFormat: string, preserveSpacing: bool) : string
val s2 : string

Full name: StanfordNER.s2
edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyWithInlineXML(sentences: string) : string
Fork me on GitHub