Review: Natural Language Processing with Chang and John

JnBrymn

unread,

Feb 28, 2017, 10:08:47 AM2/28/17

to Penny University

I had a fun discussion with Chang Lee about various Natural Language Processing topics.

The first thing I did was give Chang my copy of Taming Text. It's a quick read and really helped to fill me in on a great variety of the NLP methods that exists (especially those near to search technology).

On this list of possible things to discuss:

Entity extraction. (the Who-What-When-Where mentioned in text)
The text processing that is used with search engines. Bag of words. Coding similarly. TF*IDF
Topic modeling (Latent Semantic Analysis, Alternating Least Squares, Latent Dirichlet Allocation)
Hidden Markov Model
statistically significant strings
clustering (LSA vs carrot2)
document summarization

We spent the first several minutes talking about Markov Models of text and then moved on to how Hidden Markov Models could be used to perform parts of speech tagging. We then went into how search engines work for about 10 minutes.

Then, for the next hour we did some work to index the SciFi Stack Exchange Posts into Elasticsearch in order to build a k-Nearest Neighbors tagging algorithm. After an initial stalled attempt we ended up with a tagger that at least seems like a good starting point. See for yourself!

---------------------------------------

I look forward to someone else asking me to do this again. Even if I'm driving the conversation I always learn something new and interesting. And, as Chang said, I always meet such interesting people.

¢¢

John

Message has been deleted

Chang Lee

unread,

Mar 1, 2017, 2:32:35 PM3/1/17

to Penny University

So this is my first time seeing elasticsearch in action and was an awesome pair programming time. For me the biggest takeaway was seeing how John chopped the tagging problem into smaller pieces and figuring it out with different tools. I had some Python experience but seeing how John glues everything together in a short amount of time gave me idea on things I can work on.

I'll be working on the dataset a bit more and see if I can come up with something else later this week. 1 on 1 coffee talks are awesome!!