Reading Group format.

Scott Frye

unread,

Jul 7, 2009, 2:23:47 PM7/7/09

to Natural Language Processing Virtual Reading Group

How should we format this group?

Here is a suggetion:
- Vote each week for a paper to read (maybe all votes by Monday)
- Read paper by following week (Monday?)
- Post Disucssion thread and have people comment as desired?

questions:
- Who tablulates the vote and picks the paper?
- Do we go right to another paper or discuss for a week?

Grant

unread,

Jul 7, 2009, 10:44:54 PM7/7/09

to Natural Language Processing Virtual Reading Group

For me, I know once a week is going to be too steep of a curve. Of
course, that shouldn't stop others. I just don't want to lose out if
we pick papers that build on the previous weeks.

A couple of other things come to mind, and these are just suggestions:

1. It would be great if we can make sure we encourage people to ask
questions, regardless of the level of sophistication. For example, my
stats background could definitely use some shoring up, so I may ask
"dumb" questions related to that. I hope this will be a place where
newbies on through experts can join and participate and be welcome.
2. I'd suggest, when picking papers, that we try to establish some
foundation of knowledge that it's reasonable to expect people to have
and that then the discussions get archived so others can fall back on
them when joining the group. So, maybe, in these early stages it
makes sense to read some of the earlier papers in the various fields
first before proceeding into newer stuff. Or, is this too boring or
should we just assume some basic knowledge?

Paul Kalmar

unread,

Jul 8, 2009, 12:46:13 AM7/8/09

to Natural Language Processing Virtual Reading Group

On Jul 7, 7:44 pm, Grant <grant.ingers...@gmail.com> wrote:
> For me, I know once a week is going to be too steep of a curve. Of
> course, that shouldn't stop others. I just don't want to lose out if
> we pick papers that build on the previous weeks.

Perhaps to accommodate both people who want a weekly group and those
who want a monthly group, we can have one broad topic per month and
have the intermediate weekly topics build from that monthly topic.
That way everyone gets exposed to all of the main topics and the
people who are interested in going in depth can focus that way.

> 1. It would be great if we can make sure we encourage people to ask
> questions, regardless of the level of sophistication. For example, my

I agree that all questions should be encouraged.

> 2. I'd suggest, when picking papers, that we try to establish some
> foundation of knowledge that it's reasonable to expect people to have

I think it would be great to start with foundations -- we all come
from different backgrounds and have different ideas of what "basic"
knowledge is.

Mark A. Mandel

unread,

Jul 8, 2009, 1:35:31 AM7/8/09

to Natural Language Processing Virtual Reading Group

Summary: Second everything Paul just said.

On Jul 8, 12:46 am, Paul Kalmar <pkal...@gmail.com> wrote:
> On Jul 7, 7:44 pm, Grant <grant.ingers...@gmail.com> wrote:
> > For me, I know once a week is going to be too steep of a curve.

I think that's likely to hold for me as well.

> Perhaps to accommodate both people who want a weekly group and those
> who want a monthly group, we can have one broad topic per month and
> have the intermediate weekly topics build from that monthly topic.
> That way everyone gets exposed to all of the main topics and the
> people who are interested in going in depth can focus that way.

Sounds good.

> > 1. It would be great if we can make sure we encourage people to ask
> > questions, regardless of the level of sophistication.

> I agree that all questions should be encouraged.

YES!

> > 2. I'd suggest, when picking papers, that we try to establish some
> > foundation of knowledge that it's reasonable to expect people to have
> I think it would be great to start with foundations -- we all come
> from different backgrounds and have different ideas of what "basic"
> knowledge is.

Agreed.

popo

unread,

Jul 8, 2009, 11:41:30 AM7/8/09

to Natural Language Processing Virtual Reading Group

Agree, if we start from basic knowledge.
I'm a newbie in NLP :)

Joan

unread,

Jul 8, 2009, 12:57:52 PM7/8/09

to Natural Language Processing Virtual Reading Group

Is there a list of foundational or survey papers in this area that I
begin to familiarize myself with. Sort of like the bibliography that
resulted from the "Top 10 data mining algorithms" paper (http://
www.cs.umd.edu/~samir/498/10Algorithms-08.pdf)
A few suggestions (maybe 10) would be very helpful or even one good
book.

-Nkechi

Scott Frye

unread,

Jul 8, 2009, 3:26:49 PM7/8/09

to Natural Language Processing Virtual Reading Group

WOW. A lot of interest in this area apparently. There have been a
lot of people that joined the list in just 2 days.

Some thoughts:
1) I think I was the only one that recommended a paper every week.
Everyone else seems to favor one every month. In retrospect, every
week is more than I want to tackle as well. One month seems more
manageable.

2) The question on how we select which paper to review hasn't really
been addressed. I don't really have any ideas. I've never been part
of a virtual reading group before. Does anyone have any experience
with how other's handle this?

3) A number of people have mentioned that we need to start with the
basics but how do we define "the basics"? Everyone is at a different
level apparently.

To address #3...I am relatively new to the field as well and the
advice I got to enter into the field is to read the two most popular
texts in the field and then start in on papers in the area to see what
is happening. I checked out many courses that teach NLP at various
colleges and came to the conclusion that the most popular texts are:
- Jurafsky, D. and Martin, J. (2008). Speech and Language Processing
2nd Edition. Prentice Hall, Englewood Cliffs, New Jersey 07632
- Manning C. and Schutze H. (2000). Foundations of Statistical Natural
Language Processing. MIT Press
- Another good foundation text would be: Russell, S. and Norvig, P.
(1995). Artificial Intelligence: A Modern Approach. Prentice Hall,
Englewood Cliffs, NJ.

I also feel that the best place for papers on the topic is
http://aclweb.org/anthology-new/

I also try to visit the web sites of authors of the two above books
regularly and see the papers that they have been involved in.

I've found the Jurafsky and Martin book to be more accessible than the
Manning and Schutz book (especially for the statistically
challenged). I would recommend as a definition of "basic" anything
that is covered in the Jurafsky and Martin book (http://
www.cs.colorado.edu/~martin/slp2.html) but that might be a bit
aggressive because there is a LOT in there. Because the 2nd edition
was just released, it also covers all the important papers before 2006
or so.

I offer this info just as a starting point for discussion...

Grant Ingersoll

unread,

Jul 9, 2009, 7:47:24 AM7/9/09

to Scott Frye, Natural Language Processing Virtual Reading Group

On Jul 8, 2009, at 3:26 PM, Scott Frye wrote:

>
> WOW. A lot of interest in this area apparently. There have been a
> lot of people that joined the list in just 2 days.

Pent up demand! I posted on the Apache Lucene and Mahout mailing
lists plus blogged it, which I assume others did as well.

>
> Some thoughts:
> 1) I think I was the only one that recommended a paper every week.
> Everyone else seems to favor one every month. In retrospect, every
> week is more than I want to tackle as well. One month seems more
> manageable.

+1, although I could likely handle once every two weeks, too. I'm
guessing, that what will happen, is that over time, newcomers will
look in the archives and ask questions on older discussions, too.

>
> 2) The question on how we select which paper to review hasn't really
> been addressed. I don't really have any ideas. I've never been part
> of a virtual reading group before. Does anyone have any experience
> with how other's handle this?

I haven't participated in one either.

>
> 3) A number of people have mentioned that we need to start with the
> basics but how do we define "the basics"? Everyone is at a different
> level apparently.

Here's some classes:
http://en.wikipedia.org/wiki/User:Stevenbird/List_of_NLP_Courses

I'd suggest we need to walk before we can run, so, to me, the basics
start by looking at the lower levels of language like morphology and
syntax as well as Part of Speech (POS) and the idea of parsing out
sentences into a parse tree.

Perhaps, by a simple show of response, people can indicate their level
of comfort with those areas. That is, do people understand what those
things are in terms of language, never mind NLP? I'm not saying we need
to know what algorithms are for them, I'm saying just basic
definitions. I'd say most of it is covered by High School grammar,
but that was a long time ago, so it may be worth a refresher.

After that, is it safe to say people understand the basics of what
tokenization/segmentation and sentence detection (at least for English
and other whitespace delimited languages) are? Again, it isn't
important that one knows how to actually implement them, just be
familiar with the concept.

If we can establish that foundation, then I'd suggest we start looking
at papers that actually discuss how to implement POS tagging, since
POS tagging is often one of the things you need to do the higher level
stuff. From there, I'd then suggest looking at parsing. Given those
two foundational pieces, we can then start looking into deeper things
like word sense disambiguation, info extraction, emotion detection,
etc. Basically, wherever people want to go.

Thus, I'd suggest the following start:
1. Part of Speech Tagging
2. Parsing
3. Named Entity Recognition - identifying people, places, nouns.

From there, we'll have more of a sense of the group dynamic and how
all of this plays out.

As for a model, I'd _suggest_: At the beginning of the next reading
period (which may not be the first of the month depending on when we
start), we ask for a volunteer (the "Editor of the Month" - EOTM) to
spend 2-3 days researching the topic and then come back with a few
suggested papers (2-5). Then, the group votes on the list and the
paper with the most votes is selected. Votes are open for 3 days (72
hours) so that we can account for time differences, travel, etc.
Then, readers have two weeks to read the paper (and feel free to ask
questions as you go). Once it is assumed everyone has read it, more
discussion can follow. I'd also add suggest that the EOTM is
responsible for coming up with 5-10 questions to help seed the
discussion.

As for the EOTM job, this is often as simple as going to Google
Scholar or CiteSeer or some other scholar search tool and plugging in
the topic and then finding the most cited papers and doing a little
pre-reading and a little verification to come up with reasonable
results, plus maybe looking at some online syllabi, etc. I'd say it
is probably 1-2 hours of time, likely less. For example, for POS
tagging, Google Scholar suggests (http://scholar.google.com/scholar?q=part+of+speech+tagging&hl=en&btnG=Search
) the Brill paper, which is one of the papers I had in mind. The
other trick, is that the EOTM needs to pick papers that are freely
available and not locked up in journals.

Just a suggestion, please add your own. Also, we need not feel like
we have to solve it all now. The group can evolve as the membership
changes/grows.

Cheers,
Grant

Alexandre Rafalovitch

unread,

Jul 9, 2009, 9:17:11 AM7/9/09

to Natural Language Processing Virtual Reading Group

On Wed, Jul 8, 2009 at 3:26 PM, Scott Frye<scott...@aol.com> wrote:
>
> 2) The question on how we select which paper to review hasn't really
> been addressed. I don't really have any ideas. I've never been part
> of a virtual reading group before. Does anyone have any experience
> with how other's handle this?

I have not been a part of (virtual or otherwise) reading group before,
but I wonder if the virtual format could actually take supposed
benefits to the next level.

What if this was a reading _and writing_ group. And I mean writing code.

Say, we are looking at a new algorithm. Let's create an open source
project to implement that algorithm. So, as people are reading
different paper aspects they can put their new knowledge into code,
samples, new algorithms for GATE/NLTK/Mahout/etc. At the end those
bits can be contributed to the underlying project or kept separately.
And people at different levels of understanding can still contribute
at their level of understanding. Having a centrally referenced
repository could also simply the discussion by just pointing URL at
relevant part.

This does require a slower pace of reading, but I think may have a
stronger effect long term. It also has some network-effect benefits.

And we could have the running code demonstrations on Google AppEngine.
That would support both Java and Python, so covers multiple good
libraries.

Just a thought!

Regards,
Alex.
Personal blog: http://blog.outerthoughts.com/
Research group: http://www.clt.mq.edu.au/Research/
- I think age is a very high price to pay for maturity (Tom Stoppard)

thnidu

unread,

Jul 9, 2009, 6:02:25 PM7/9/09

to Natural Language Processing Virtual Reading Group

Alexandre Rafalovitch proposed:

> What if this was a reading _and writing_ group. And I mean writing code.

(Quote continued below comment.) I think writing code is fine, but
remember that some of us, possibly many of us, are not familiar with
GATE/NLTK/Mahout/etc. Heck, I'm barely one for three here: I've heard
of NLTK but I don't know it, and I've never heard of the other two.

Mark Mandel

Ozgur Yilmazel

unread,

Jul 10, 2009, 4:09:57 AM7/10/09

to Natural Language Processing Virtual Reading Group

I found the following reading lists from CMU (Thanks to Tom Mitchell,
William Cohen, Scott Fahlman and Eric Nyberg) very helpful in the area of
active learning and bootstrap learning for NLP. These list might not be as

basic as some people on the list would like, but we can still keep them for
future use.

Bootstrap Learning:
http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-21/www/semisupervised.html

Active Learning:
http://www.cs.cmu.edu/~ReadTheWeb/activelearning/activelearningbib.html



Ozgur

Scott Frye

unread,

Jul 10, 2009, 10:02:03 AM7/10/09

to Natural Language Processing Virtual Reading Group

I think this is a great idea but I also think that these efforts
should be kept partially seperate from the efforts here. That way
people can participate in these projects if they have the knowledge or
interest, but those that just want to partake in the "reading group"
don't get discouraged.

In other words, let's create the projects as interest dictates and
refer to them here, but make this group about the "reading" part and
the source forge sites about the "implemention part"

IMHO.

On Jul 9, 9:17 am, Alexandre Rafalovitch <arafa...@gmail.com> wrote:

Scott Frye

unread,

Jul 10, 2009, 10:19:36 AM7/10/09

to Natural Language Processing Virtual Reading Group

This sounds great to me. !

In the interests of maintaing momentum, I would recommend that you
take first rotation as EOTM Grant. Possibly targeting July 15th to
have 2-5 papers from POS or Parsing for us to vote on. I would be
willing volunteer to take the second rotation targeting August 1st to
have a second paper on POS or parsing. After about two rotations,
there should be more feedback in this area about pace and format.

To summarize what you presented (and other posts)

1) Editor of the Month (EOTM) will spend 2-3 days selecting 2-5 papers
for consideration
2) Everyone votes on papers for 3 days.
3) Everyone reads papers for 2 weeks.
4) At end of 2 weeks EOTM posts 5-10 seed questions.
5) Next EOTM starts process at #1 again.

Some questions:
- How do we pick EOTM? I suggest it be on a volunteer basis. We can
start a dissussion for people to volunteer with their topic. We could
also have the current EOTM responsible for tallying the votes or
selecting the next EOTM.
- How do we vote for papers? I suggest emails be sent directly to the
EOTM to avoid confusion on the list and make it easy to tally them. I
think we can rely on the honor of the EOTM for a honest tally.
- How long should we discuss a paper? I suggest we start a discussion
specifically for the paper, probably with the start read date as part
of the subject (as well as topic and paper), and then discussion can
continue as long as it takes. However we could start getting the next
paper ready immediatly (or maybe 2 weeks after if we do one every
month?). This allows people that want to do biweekly papers to skip
every other one. In fact, members could set their own pace to
whatever they want, one every month, two months, every six weeks, etc.

-Scott Frye

> tagging, Google Scholar suggests (http://scholar.google.com/scholar?q=part+of+speech+tagging&hl=en&btnG...

thnidu

unread,

Jul 10, 2009, 10:28:58 AM7/10/09

to Natural Language Processing Virtual Reading Group

I like your detailing of the proposal. I'd add a reminder to step 1
that the EOTM should give some description of each proposed paper,
including its level, so we can be informed voters. I guess that's
obvious, though.

Mark Mandel

Grant Ingersoll

unread,

Jul 10, 2009, 10:50:52 AM7/10/09

to Scott Frye, Natural Language Processing Virtual Reading Group

On Jul 10, 2009, at 10:19 AM, Scott Frye wrote:

>
> This sounds great to me. !
>
> In the interests of maintaing momentum, I would recommend that you
> take first rotation as EOTM Grant. Possibly targeting July 15th to
> have 2-5 papers from POS or Parsing for us to vote on. I would be
> willing volunteer to take the second rotation targeting August 1st to
> have a second paper on POS or parsing. After about two rotations,
> there should be more feedback in this area about pace and format.
>

I accept and the 15th sounds reasonable.

> To summarize what you presented (and other posts)
>
> 1) Editor of the Month (EOTM) will spend 2-3 days selecting 2-5 papers
> for consideration
> 2) Everyone votes on papers for 3 days.
> 3) Everyone reads papers for 2 weeks.
> 4) At end of 2 weeks EOTM posts 5-10 seed questions.
> 5) Next EOTM starts process at #1 again.
>
> Some questions:
> - How do we pick EOTM? I suggest it be on a volunteer basis. We can
> start a dissussion for people to volunteer with their topic. We could
> also have the current EOTM responsible for tallying the votes or
> selecting the next EOTM.

I think volunteer basis is good. Like anything done in open source,
the group will only be viable if there are volunteers willing to
sustain it. If people don't volunteer, then it shows the group is not
viable and we can all move on.

> - How do we vote for papers? I suggest emails be sent directly to the
> EOTM to avoid confusion on the list and make it easy to tally them. I
> think we can rely on the honor of the EOTM for a honest tally.

I think the EOTM should just start a thread like:

Subject: [VOTE] Select paper on Part of Speech

Content:

Please place a [x] by the paper you would like to read:

[] POS Tagging using Magic
Abstract: .....
EOTM Comments: ....
[] POS Tagging using the Dark Arts
Abstract: .....
EOTM Comments: ....
[] POS Tagging using chemical reactions
Abstract: .....
EOTM Comments: ....

At the end of three days, the EOTM calls the vote.

By doing it on the list, we don't have to worry about spam filters,
etc. and there is a public record. I personally don't want any
private email. To me, much like open source, a group like this is all
about things happening in the open.

> - How long should we discuss a paper? I suggest we start a discussion
> specifically for the paper, probably with the start read date as part
> of the subject (as well as topic and paper), and then discussion can
> continue as long as it takes. However we could start getting the next
> paper ready immediatly (or maybe 2 weeks after if we do one every
> month?). This allows people that want to do biweekly papers to skip
> every other one. In fact, members could set their own pace to
> whatever they want, one every month, two months, every six weeks, etc.

I'd suggest we try the month approach for the first couple, but after
that let's just see where the group goes. We're all volunteers here
and no one is paying any money, so we should feel free to refactor as
we see fit.

Scott Frye

unread,

Jul 10, 2009, 10:57:58 AM7/10/09

to Natural Language Processing Virtual Reading Group

All sounds good to me.

On Jul 10, 10:50 am, Grant Ingersoll <grant.ingers...@gmail.com>
wrote:

Elmer Garduno

unread,

Jul 10, 2009, 11:32:19 AM7/10/09

to Scott Frye, Natural Language Processing Virtual Reading Group

+1 for Grant's proposal

Paul Kalmar

unread,

Jul 10, 2009, 1:32:02 PM7/10/09

to Natural Language Processing Virtual Reading Group

Sounds great. I can volunteer for the third rotation.

On Jul 10, 8:32 am, Elmer Garduno <gard...@gmail.com> wrote:
> +1 for Grant's proposal
>

Grant Ingersoll

unread,

Jul 14, 2009, 8:27:38 AM7/14/09

to Natural Language Processing Virtual Reading Group

I like the idea, as it is often useful to put these ideas into
practice to really understand them, but I think it can be done as a
background task for those who are interested. Some implementations
may take months to complete in Open Source, which would likely remove
any momentum the group has from a reading/discussion standpoint.
Plus, it will likely be hard to decide which project to contribute
to. I'm partial to Mahout since I am a co-founder and others are
likely partial to some of the other projects listed. Now, one thing
that is likely useful is to actually, during discussion, say things
like: "Try this out in GATE by doing X, Y, Z" or "Take a look at how
this is implemented in Mahout by looking here: ..."

Certainly, however, I'd personally extend a welcome to anyone that
wants to contribute to Mahout who has an itch to scratch when it comes
to Machine Learning, but that isn't why I'm here.

I also think that discussions on papers are likely to go beyond just
the reading/discussion paper as people come and go from the project.
This, to me, is one of the real benefits of a group like this over a
live discussion in some meeting place (which has its own merits).

-Grant

Alexandre Rafalovitch

unread,

Jul 14, 2009, 11:05:01 AM7/14/09

to Natural Language Processing Virtual Reading Group

You are all correct of course, especially in terms of speed.

I like Grant's version of doing it by referencing where appropriate.
With most code bases on the web, that's probably the best way. Other
(shorter) things can be done on individual blogs.

Regards,
Alex.

Personal blog: http://blog.outerthoughts.com/
Research group: http://www.clt.mq.edu.au/Research/
- I think age is a very high price to pay for maturity (Tom Stoppard)

lianiana

unread,

Jul 14, 2009, 2:41:00 PM7/14/09

to Natural Language Processing Virtual Reading Group

Hello!
i am glad to join this group. Have read the whole discussion and agree
with the proposed ideas. So now waiting for the list of proposed
papes.

On 14 июл, 19:05, Alexandre Rafalovitch <arafa...@gmail.com> wrote:
> You are all correct of course, especially in terms of speed.
>
> I like Grant's version of doing it by referencing where appropriate.
> With most code bases on the web, that's probably the best way. Other
> (shorter) things can be done on individual blogs.
>
> Regards,
> Alex.
>
> Personal blog:http://blog.outerthoughts.com/
> Research group:http://www.clt.mq.edu.au/Research/
> - I think age is a very high price to pay for maturity (Tom Stoppard)
>
> On Tue, Jul 14, 2009 at 8:27 AM, Grant
>
>
>
> Ingersoll<grant.ingers...@gmail.com> wrote:
>
> > On Jul 9, 2009, at 9:17 AM, Alexandre Rafalovitch wrote:
>

> > -Grant- Скрыть цитируемый текст -
>
> - Показать цитируемый текст -

Joan

unread,

Jul 15, 2009, 4:29:38 PM7/15/09

to Natural Language Processing Virtual Reading Group

I also agree with the proposed format and I volunteer for the 4th
rotation.
Looking forward to starting!

-Nkechi

Reply all

Reply to author

Forward