Your advice will be greatly appreciated!!

1 view
Skip to first unread message

newhorizon

unread,
Oct 30, 2009, 10:26:51 AM10/30/09
to Natural Language Processing Virtual Reading Group
My background is pure linguistics without much computer science, or
math or statistical training. But since I left school, I have been
exposed to NLP systems a lot because I worked in different companies
working on natural language processing, information extraction and so
on. I usually work as a linguist writing linguistic rules or as a
lexicographer.

Right now I wish to turn my career into a REAL computational
linguist. I know I lack a lot of knowledge including programming,
machine learning, computational linguistic technologies and so on.
But on one hand, I wish I can get into this field, and on the other
hand, I don't know where to start and feel a little overwhelmed with
so much stuff that is not my field and not in my comfort zone.

Any of your advice is greatly valued. Thanks a lot.

Newhorizon

Jeremy K

unread,
Oct 30, 2009, 2:52:52 PM10/30/09
to newhorizon, Natural Language Processing Virtual Reading Group
Newhorizon -- and other linguistics folk who are realizing that they
want to stretch into natural language processing --

For someone in your position (or for that matter, if you're a stats
person or a programmer who thinks that NLP might be cool) I think there
are really three things that you'll need to become an "NLP wizard" --
two of them are things you'll need to be a [programming] wizard of any
flavor (I use zero-based indexing below to put the CS peeps at ease):

[0] learn the specialty: Make sure you are actually doing what the
scientific/engineering NLP papers are talking about. A really fantastic
place to begin is to splurge the sixty or seventy bucks to buy a copy of
Manning and Schutze's "Introduction to Statistical Natural Language
Processing". This textbook has two or three introductory chapters for
basic linguistics concepts and two or three more for basic
math/statistical-learning concepts. In my experience working with both
linguists and statistics people, these are all really great introductory
chapters; you as a linguist will probably skim the linguistics
introduction (you know what a phrase is, and what a part-of-speech, etc)
but the statistics intro will be most useful (what's an expectation,
what's a KL distance, what do we mean by "distribution", some basic set
and probability theory, etc). Stats people who are interested in NL
work should have the converse experience, with the linguistics chapters
being useful catchups and the stats chapters review.

[1] Learn the languages: learn at least two programming languages that
are used in NLP. Python should probably be one of them, because the
NLTK is a very accessible (if not always the fastest) collection of
libraries for doing the sort of natural language processing research
described in Manning and Schutze. I recommend the "Learning XXX" series
from O'Reilly publishers; they seem to have found a pretty good formula
for working your way through a new programming language. [I personally
am most comfortable in Perl, but that's inertial, because I learned Perl
before Python even existed.] Learning a second programming language --
like learning a second natural language -- gives you perspective on the
first and stretches your conception of what a [programming] language can
do, plus it also may make you a more valuable hire.

[2] Learning to program *with others*: Programming is not just making
the computer do what you want; it's actually a social activity: your
colleague, your manager, your user support team, or your QA team ---
among other candidates --- will want to share or modify or read or
improve the code you're writing. Or you'll want to share theirs, etc.
Steve McConnell's "Code Complete" book is a great resource for learning
the basics of how to write code for the sake of collaborating with
others, whether in a Free Software model or in a for-profit company.

If I had to guess, I would say that you should take on #0 and #1 first,
and once you feel like you have made some progress, start working on
#2. But don't wait too long; you don't want to develop too many bad
collaboration habits.

Good luck!

-Jeremy
> --
>
> You received this message because you are subscribed to the Google Groups "Natural Language Processing Virtual Reading Group" group.
> To post to this group, send email to nlp-r...@googlegroups.com.
> To unsubscribe from this group, send email to nlp-reading...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/nlp-reading?hl=en.
>
>
>

Reply all
Reply to author
Forward
0 new messages