Review: Natural Language Processing with Chang and John

28 views
Skip to first unread message

JnBrymn

unread,
Feb 28, 2017, 10:08:47 AM2/28/17
to Penny University
I had a fun discussion with Chang Lee about various Natural Language Processing topics.

The first thing I did was give Chang my copy of Taming Text. It's a quick read and really helped to fill me in on a great variety of the NLP methods that exists (especially those near to search technology).


On this list of possible things to discuss:
  • Entity extraction. (the Who-What-When-Where mentioned in text)
  • The text processing that is used with search engines. Bag of words. Coding similarly. TF*IDF
  • Topic modeling (Latent Semantic Analysis, Alternating Least Squares, Latent Dirichlet Allocation)
  • Hidden Markov Model
  • statistically significant strings
  • clustering (LSA vs carrot2)
  • document summarization 
We spent the first several minutes talking about Markov Models of text and then moved on to how Hidden Markov Models could be used to perform parts of speech tagging. We then went into how search engines work for about 10 minutes.

Then, for the next hour we did some work to index the SciFi Stack Exchange Posts into Elasticsearch in order to build a k-Nearest Neighbors tagging algorithm. After an initial stalled attempt we ended up with a tagger that at least seems like a good starting point. See for yourself!

---------------------------------------

I look forward to someone else asking me to do this again. Even if I'm driving the conversation I always learn something new and interesting. And, as Chang said, I always meet such interesting people.

¢¢
John
Message has been deleted

Chang Lee

unread,
Mar 1, 2017, 2:32:35 PM3/1/17
to Penny University
So this is my first time seeing elasticsearch in action and was an awesome pair programming time. For me the biggest takeaway was seeing how John chopped the tagging problem into smaller pieces and figuring it out with different tools. I had some Python experience but seeing how John glues everything together in a short amount of time gave me idea on things I can work on.

I'll be working on the dataset a bit more and see if I can come up with something else later this week. 1 on 1 coffee talks are awesome!!

Chang

JnBrymn於 2017年2月28日星期二 UTC-6上午9時08分47秒寫道:
Reply all
Reply to author
Forward
0 new messages