[VOTE] Chunking or NER

Paul Kalmar

unread,

Sep 22, 2009, 12:03:54 AM9/22/09

to Natural Language Processing Virtual Reading Group

Hello all,

For the next topic, I thought we could do either chunking or Named Entity Recognition, both of which build immediately off of tagging and parsing. Chunking is a shallow parse, where words are grouped into syntactic chunks, such as noun phrases, without building a full tree. Named Entity Recognition (NER) is a specific type of tagging which looks for phrasal entities such as proper nouns in text. Here are some papers on each, two of which are shared tasks describing multiple papers on the topics. Please vote on which paper you prefer, or if you prefer a different one not listed, by 9/25. Thank you.

NER:

1. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition - EFTK Sang, F De Meulder

http://acl.ldc.upenn.edu/W/W03/W03-0419.pdf

Cites: 310

Year: 2003

Desc: A shared task on Named Entity Recognition (NER)

2. Named entity recognition using an HMM-based chunk tagger - GD Zhou, J Su

http://acl.ldc.upenn.edu/acl2002/MAIN/pdfs/Main036.pdf

Cites: 160

Year: 2002

Desc: NER via HMMs

Chunking:

3. Introduction to the CoNLL-2000 Shared Task: Chunking - Erik F. Tjong Kim Sang and Sabine Buchholz

http://www.cnts.ua.ac.be/conll2000/pdf/12732tjo.pdf

Cites: 254

Year: 2000

Desc: A shared task on Chunking

4. Text chunking using transformation-based learning

http://acl.ldc.upenn.edu/W/W95/W95-0107.pdf

Cites: 636

Year: 1995

Desc: An approach to chunking based off of Brill

5. Chunking with support vector machines

http://acl.ldc.upenn.edu/N/N01/N01-1025.pdf

Cites: 343

Year: 2001

Desc: An approach to chunking using SVMs

--Paul Kalmar

http://www.KalmarResearch.com

Scott Frye

unread,

Sep 22, 2009, 10:08:29 AM9/22/09

to Natural Language Processing Virtual Reading Group

My vote would be for #5 because I want to learn about Support Vector
Machines:

5. Chunking with support vector machines
http://acl.ldc.upenn.edu/N/N01/N01-1025.pdf
Cites: 343
Year: 2001
Desc: An approach to chunking using SVMs

-Scott Frye

Alexandre Rafalovitch

unread,

Sep 22, 2009, 11:38:16 PM9/22/09

to Natural Language Processing Virtual Reading Group

I vote NER: 2nd, then 1st.

Regards,
Alex.

Personal blog: http://blog.outerthoughts.com/
Research group: http://www.clt.mq.edu.au/Research/
- I think age is a very high price to pay for maturity (Tom Stoppard)

Grant Ingersoll

unread,

Sep 23, 2009, 6:52:46 AM9/23/09

to Paul Kalmar, Natural Language Processing Virtual Reading Group

On Sep 22, 2009, at 12:03 AM, Paul Kalmar wrote:

2. Named entity recognition using an HMM-based chunk tagger - GD Zhou, J Su

   http://acl.ldc.upenn.edu/acl2002/MAIN/pdfs/Main036.pdf
   Cites: 160
   Year: 2002
   Desc: NER via HMMs

+1

Elmer Garduno

unread,

Sep 23, 2009, 10:38:16 AM9/23/09

to Natural Language Processing Virtual Reading Group

+1 NER

Jason Adams

unread,

Sep 23, 2009, 10:40:20 AM9/23/09

to Natural Language Processing Virtual Reading Group

+1 NER w/HMMs

-- Jason

Ozgur Yilmazel

unread,

Sep 24, 2009, 3:41:22 PM9/24/09

to Jason Adams, Natural Language Processing Virtual Reading Group

+1 NER w.HMMs

Ozgur

Scott Frye

unread,

Oct 2, 2009, 11:39:52 AM10/2/09

to Natural Language Processing Virtual Reading Group

Do we have a winner yet?

On Sep 22, 12:03 am, Paul Kalmar <pkal...@gmail.com> wrote:

Paul Kalmar

unread,

Oct 2, 2009, 11:41:40 AM10/2/09

to Scott Frye, Natural Language Processing Virtual Reading Group

Sorry for not announcing that yet, the winner is NER with HMM which we will discuss 10/13.

--Paul

Scott Frye

unread,

Oct 13, 2009, 6:38:29 AM10/13/09

to Natural Language Processing Virtual Reading Group

Here is my summary and thoughts ...

In this paper the authors claim that their HMM chunker and tagger
system is used to create a named entity recognition system that
performs better than other machine learning systems and even better
than handcrafted rules.

In the introduction, they claim that "rule based systems lack the
ability of coping with the problems of robustness and portability".
The claim that their system just needs to be retrained on new data and
will apply to any system. I find this amusing because it is the same
reason that authors of non-supervised systems use over these systems.

They mention other techniques specifically another HMM, Maximum
Entropy, Decision Tree and a system based on Transformation based
rules. They claim that the HMM performance is higher than those of
others and indicate that the reason is because it captures the
locality of phenomena better. I am surprised because I would have
expected this of the Maximum Entropy system.

They also claim that the performance of a machine learning system is
always poorer than a rule based system by about 2%. They claim their
system does better than rule based systems though.

They mention two kinds of evidence that can be used to the ambiguity,
robustness and portability issues: Internal (to the word) and
external evidence (context)

The HMM model that is proposed uses mutual information between the
tags and the tokens instead of Bayes Rule which is used in traditional
HMM models. This allows them directly generate the original Named
Entity tags instead of modeling the original process that generates
the NE-class annoted words from the original words. (confused, what
does this mean). Because of this, mutual information independence
instead of conditional probability independence needs to be assumed
but otherwise the formulas are similar.

The tokens in the formula are a structure consisting of word sequence
and a word feature sequence. The word feature sequence consists of 4
features types
1) internal - capitalization, numeric, etc (77.6% performance)
2) semantic classification - month, Weekend, Quarter, etc. (10% extra
performance)
3) gazetteer feature - drawn from lists of names, places,
organizations etc. (1.2% extra performance)
4) external macro context? I don't understand this one but think it
might be how the word fits a syntactic template? (5.5% extra
performance)

What they don't explain is if the order of these made a difference.
For instance, if the gazetteer feature was implemented alone, would it
have give only 1.2 performance. Would the internal feature had such
an effect if it was measured incrementally after another feature was
used?

Back-off modeling is used as the smoothing technique. There is a
complex order to the backoff strategy.

Another issue was that they compare the Mutual information model to
this study and conclude that the Mutual Information makes for better
performance. However there are a lot of factors in the feature
classes and its not clear if this was a result of the features or the
algorithm.

> > > --Paul Kalmarhttp://www.KalmarResearch.com- Hide quoted text -
>
> - Show quoted text -

-Scott Frye

Jason Adams

unread,

Oct 13, 2009, 10:27:54 AM10/13/09

to Natural Language Processing Virtual Reading Group

On Tue, Oct 13, 2009 at 6:38 AM, Scott Frye <scott...@aol.com> wrote:

They also claim that the performance of a machine learning system is
always poorer than a rule based system by about 2%. They claim their
system does better than rule based systems though.

I think they are saying that, as of the writing, the best ML systems were being outperformed by rule based systems by about 2%..

Scott Frye

unread,

Oct 15, 2009, 7:30:58 AM10/15/09

to Natural Language Processing Virtual Reading Group

Yea, I think your right. They were pretty vague about it though. You
figure that if this was the first ML system to outperform rule based
systems they would have made more of it. But that appears to be
exactly what they are saying.

It think that it is a little odd to call these systems different from
"Rule Based". They seem to be hybrids with all the rules just worked
into the "features" they incorporate.

On Oct 13, 10:27 am, Jason Adams <jasonmad...@gmail.com> wrote:

Scott Frye

unread,

Oct 20, 2009, 7:07:48 PM10/20/09

to Natural Language Processing Virtual Reading Group

Kinda quiet on this paper. Anyone have any comments? Should we start
the next topic?

> > being outperformed by rule based systems by about 2%..- Hide quoted text -

Ted Dunning

unread,

Oct 21, 2009, 11:03:28 AM10/21/09

to Natural Language Processing Virtual Reading Group

This is the best comment for this paper so far.

A big issue some time ago was the supposed opposition between "rules-
based" and "learning-based" system.

As you point out, this was a false opposition.

Reply all

Reply to author

Forward