How are words used within an NLP logit regression?

29 views
Skip to first unread message

Benedict Holland

unread,
Jul 23, 2017, 11:12:08 PM7/23/17
to nltk-users
Hi Everyone,

I don't know if this is an answered question, but I am trying to wrap my head around classification algorithms. I have used logit, MNL, mixed logit, and a bunch of other discrete choice models extensively when analyzing discrete choice surveys. I am trying to figure out how an NLP applies a logit model to classify text. Are there papers (published articles, peer reviewed journals, etc.) that detail this process or does the NLTK book explain it? I am even open to <gasp> buying books. On real paper.

It seems like most literature basically starts with "logit models classify text." and basically moves on from there. It never bothers to address how logit models actually classify text or the implemented algorithms that kits use. 

Also, this seems like as good a place as any to ask. NLP seems to be on the forefront of machine learning etc. Who should I follow and read to get up to speed on the latest and greatest advancements? What publications are hot right now with regards to NLP and machine learning in general? 

I come from the applied statistics and applied econometrics world with a strong CS background so basically, NLP is about as perfect as can be but I don't know where to start learning the really good bits.

Thanks,
~Ben

Alex Rudnick

unread,
Aug 1, 2017, 7:55:10 PM8/1/17
to nltk-...@googlegroups.com
Hey Ben,

Sort of the default thing for using a linear model for classifying
text is like this:

- Every word in your vocabulary maps on to a "feature" for your
classifier. When you consider your input text, you find the words that
are in your vocabulary and count them up. (or maybe just see whether
they're present)

- Then the learned model for the classifier has a weight for each of
those features. Say, for example, you want to consider 10K different
features (which is to say, your vocabulary is limited to 10K words).
Now you've got 10K different weights, and maybe your input text
contains 10 of those words.

- The linear model then just adds up the weights for the words that
you found. This is called the "bag of words" classification style --
it disregards order completely.

For the more mathematically inclined, you might want to think of the
weights as a big matrix (assuming a multiclass classifier) and the
input document as a vector (of length |V|, for vocabulary size). Then
the whole process is mostly just a matrix-vector multiplication.

Most NLP/ML software you'll use nowadays abstracts over this stuff for
you! You can use classifiers in NLTK without thinking about vocabulary
size, for example.
To see what's happening in current NLP research, check out ACL
Anthology! NLP conferences are basically all open access nowadays.
http://aclweb.org/anthology/

If you want to drink from the firehose, check out arXiv's cs.CL --
this is the latest up-to-the-minute stuff, not for the faint of heart
: https://arxiv.org/list/cs.CL/recent

Hope this helps!

--
-- alexr
Reply all
Reply to author
Forward
0 new messages