Stanford.NLP.NET


Stanford Named Entity Recognizer (NER) for .NET

Stanford NER is an implementation of a Named Entity Recognizer. Named Entity Recognition (NER) labels sequences of words in a text which are the names of things, such as person and company names, or gene and protein names. It comes with well-engineered feature extractors for Named Entity Recognition, and many options for defining feature extractors. Included with the download are good named entity recognizers for English, particularly for the 3 classes (PERSON, ORGANIZATION, LOCATION), and Stanford NLP Group also makes available on the original page various other models for different languages and circumstances, including models trained on just the CoNLL 2003 English training data. The distributional similarity features in some models improve performance but the models require considerably more memory.

Stanford NER is also known as CRFClassifier. The software provides a general implementation of (arbitrary order) linear chain Conditional Random Field (CRF) sequence models. That is, by training your own models, you can actually use this code to build sequence models for any task.

You can look at a PowerPoint Introduction to NER and the Stanford NER package ppt pdf or the FAQ, which has some information on training models. Further documentation is provided in the included README and in the javadocs.

Stanford NER is available for download, licensed under the GNU General Public License (v2 or later). Source is included. The package includes components for command-line invocation, running as a server, and a Java API. Stanford NER code is dual licensed (in a similar manner to MySQL, etc.). Open source licensing is under the full GPL, which allows many free uses. For distributors of proprietary software, commercial licensing is available. If you don't need a commercial license, but would like to support maintenance of these tools, Stanford NLP Group welcomes gifts.

The Stanford NER library can be installed from NuGet:
PM> Install-Package Stanford.NLP.NER

F# Sample of Named Entity Recognition

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
28: 
29: 
30: 
31: 
32: 
33: 
34: 
35: 
36: 
37: 
#r "IKVM.OpenJDK.Core.dll"
#r "IKVM.OpenJDK.Util.dll"
#r "stanford-ner.dll"

open edu.stanford.nlp.ie.crf

// Path to the folder with classifiers models
let classifiersDirecrory =
    __SOURCE_DIRECTORY__ + @"..\..\data\paket-files\nlp.stanford.edu\stanford-ner-2016-10-31\classifiers\"

// Loading 3 class classifier model
let classifier =
    CRFClassifier.getClassifierNoExceptions(
        classifiersDirecrory + "english.all.3class.distsim.crf.ser.gz")

let s1 = "Good afternoon Rajat Raina, how are you today?"
printfn "%s\n" (classifier.classifyToString(s1))
> 
Good/O afternoon/O Rajat/PERSON Raina/PERSON,/O how/O are/O you/O today/O?/O
val it : unit = ()

let s2 = "I go to school at Stanford University, which is located in California."
printfn "%s\n" (classifier.classifyWithInlineXML(s2))
> 
I go to school at <ORGANIZATION>Stanford University</ORGANIZATION>, which is 
located in <LOCATION>California</LOCATION>.
val it : unit = ()

printfn "%s\n" (classifier.classifyToString(s2, "xml", true));
> 
<wi num="0" entity="O">I</wi> <wi num="1" entity="O">go</wi> <wi num="2" entity="O">to</wi> 
<wi num="3" entity="O">school</wi> <wi num="4" entity="O">at</wi> 
<wi num="5" entity="ORGANIZATION">Stanford</wi> <wi num="6" entity="ORGANIZATION">University</wi>
<wi num="7" entity="O">,</wi> <wi num="8" entity="O">which</wi> <wi num="9" entity="O">is</wi>
<wi num="10" entity="O">located</wi> <wi num="11" entity="O">in</wi> 
<wi num="12" entity="LOCATION">California</wi><wi num="13" entity="O">.</wi>
val it : unit = ()

C# Sample of Named Entity Recognition

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
using edu.stanford.nlp.ie.crf;
using Console = System.Console;

namespace Stanford.NLP.NER.CSharp
{
    class Program
    {
        static void Main()
        {
            // Path to the folder with classifiers models
            var jarRoot = @"..\..\..\..\data\paket-files\nlp.stanford.edu\stanford-ner-2016-10-31";
            var classifiersDirecrory = jarRoot + @"\classifiers";

            // Loading 3 class classifier model
            var classifier = CRFClassifier.getClassifierNoExceptions(
                classifiersDirecrory + @"\english.all.3class.distsim.crf.ser.gz");

            var s1 = "Good afternoon Rajat Raina, how are you today?";
            Console.WriteLine("{0}\n", classifier.classifyToString(s1));

            var s2 = "I go to school at Stanford University, which is located in California.";
            Console.WriteLine("{0}\n", classifier.classifyWithInlineXML(s2));

            Console.WriteLine("{0}\n", classifier.classifyToString(s2, "xml", true));
        }
    }
}

Read more about Stanford Stanford Named Entity Recognizer on the official page.

Relevant posts

Fork me on GitHub