Safe Haskell | None |
---|---|
Language | Haskell2010 |
A parser for the Wiki NER work presented in:
@Article{nothman2012:artint:wikiner, author = {Joel Nothman and Nicky Ringland and Will Radford and Tara Murphy and James R. Curran}, title = {Learning multilingual named entity recognition from {Wikipedia}}, journal = {Artificial Intelligence}, publisher = {Elsevier}, volume = {194}, pages = {151--175}, year = {2012}, doi = {10.1016/j.artint.2012.03.006}, url = {http:/dx.doi.org10.1016/j.artint.2012.03.006} }
And provided here: http://schwa.org/projects/resources/wiki/Wikiner
The format does not appear to be documented, but it looks like:
- One sentence per line.
- Tagged tokens are separated by spaces
- Items in a tagged token are separated by vertical bars ('|')
- Each line of
n
text tokens contains 3*n items, starting with a text token, a POS tag, then a IOB tag with one of the NER classes
For example, the sentence: The Oxford Companion to Philosophy says, "there is no single defining position that all anarchists hold, and those considered anarchists at best sharae a certain family resemblance."
Is rendered as: The|DT|I-MISC Oxford|NNP|I-MISC Companion|NNP|I-MISC to|TO|I-MISC Philosophy|NNP|I-MISC says|VBZ|O ,|,|O "|LQU|O there|EX|O is|VBZ|O no|DT|O single|JJ|O defining|VBG|O position|NN|O that|IN|O all|DT|O anarchists|NNS|O hold|VBP|O ,|,|O and|CC|O those|DT|O considered|VBN|O anarchists|NNS|O at|IN|O best|JJS|O share|NN|O a|DT|O certain|JJ|O family|NN|O resemblance|NN|O .|.|O "|RQU|O
This module also provides a trained model for NER via the averaged perceptron chunker. This actually kindof works, which is a bit amazing. For example:
import NLP.Corpora.WikiNer import NLP.POS import NLP.Chunk tgr <- defaultTagger chk <- wikiNerChunker chunkText tgr chk "Real World Haskell is a book created by Don Stewart, Bryan O'Sullivan, and Jon Goerzen." "[ORG Real/NNP] [MISC World/NNP] [PER Haskell/NNP] is/VBZ a/DT book/NN created/VBN by/IN [PER Don/NNP Stewart/NNP] ,/, [PER Bryan/NNP O'Sullivan/NNP] ,/, and/CC [PER Jon/NNP Goerzen/NNP] ./."
Documentation
Different classes of Named Entity used in the WikiNER data set.