I had a fun discussion with Chang Lee about various Natural Language Processing topics.
The first thing I did was give Chang my copy of
Taming Text. It's a quick read and really helped to fill me in on a great variety of the NLP methods that exists (especially those near to search technology).
On this list of possible things to discuss:
- Entity extraction. (the Who-What-When-Where mentioned in text)
- The text processing that is used with search engines. Bag of words. Coding similarly. TF*IDF
- Topic modeling (Latent Semantic Analysis, Alternating Least Squares, Latent Dirichlet Allocation)
- Hidden Markov Model
- statistically significant strings
- clustering (LSA vs carrot2)
- document summarization
We spent the first several minutes talking about Markov Models of text and then moved on to how Hidden Markov Models could be used to perform parts of speech tagging. We then went into how search engines work for about 10 minutes.
Then, for the next hour we did some work to index the
SciFi Stack Exchange Posts into Elasticsearch in order to build a k-Nearest Neighbors tagging algorithm. After an initial stalled attempt we ended up with a tagger that at least seems like a good starting point.
See for yourself!
---------------------------------------
I look forward to someone else asking me to do this again. Even if I'm driving the conversation I always learn something new and interesting. And, as Chang said, I always meet such interesting people.
¢¢
John