Natural Language Processing and Sentiment Analysis
Alexander Gelbukh
Instituto Politécnico Nacional (México)
(
gel...@gelbukh.com)
1 General Description
This seminar will introduce you to fundamental concepts on natural
language processing and sentiment analysis. The main topics covered
will be:
• Introduction to Natural Language Processing
- Python and NLTK
• Text Similarity Metrics
• Opinion Mining and Sentiment Analysis
-Current trends and implementation with NLTK
No previous knowledge on natural language processing is assumed and
therefore this seminar is meant to be an introduction mainly
focusing on sentiment analysis / opinion mining.
A tentative list of topics to be covered can be found at the end of
this file.
2 Schedule
7th and 9th June, 2016.
From 15h to 19h.
A5-104.
3 Evaluation
The students will be asked to solve some exercises at home and
deliver them by e-mail as to be agreed with the lecturer.
--------------------------------------------------------------------
APPENDIX
Tentative list of contents.
1. Introduction to Natural Language Processing
1.1. Definition
1.1.1. Natural Language Processing
1.1.2. Text Analysis
1.1.3. Text Generation
1.1.4. Speech Analysis
1.1.5. Speech Generation
1.2. Text Analysis
1.2.1. Levels of analysis
1.2.2. Tokenizer
1.2.3. Stemmer and Lemmatizer
1.2.4. POS tagger
1.2.5. Parser
1.2.5.1. Constituency tree
1.2.5.2. Dependency tree
1.2.6. Beyond grammar
1.2.6.1. Name entity recognition
1.2.6.2. Anaphora resolution
1.2.6.3. Co-reference resolution
1.2.7. Co-relation between complexity and quality
1.2.8. Real life applications
1.2.9. Example Pipeline 1 (Opinion mining)
1.2.10. Example Pipeline 2 (Information retrieval)
1.3. Vector Space Model
1.3.1. Definition
1.3.2. Examples
1.3.3. Text representation
1.3.3.1. N-grams
1.3.3.1.1. Characters
1.3.3.1.2. Words
1.3.3.2. LSA
1.3.3.3. Syntatic n-grams
1.3.4. Weighting Schemes
1.3.4.1. Boolean
1.3.4.2. Term frequency
1.3.4.3. Inverse Term frequency
1.4. Current trends
1.4.1. Heuristics
1.4.2. Linguistic-based methods
1.4.3. Machine learning
1.4.4. Deep learning
1.4.5. Everything goes together in real life
1.5. Useful resources
2. Introduction to Python and NLTK
2.1. Quick intro to Python
2.2. Reading files
2.3. Useful string functions
2.4. Regular expressions
2.5. Resources found in NLTK
2.6. Tokenizer
2.7. Lemmatizer
2.8. POS tagger
2.9. Parser
2.9.1. Constituency tree
2.9.2. Dependency tree
3. Text Similarity Metrics
3.1. Cosine similarity
3.2. Jaccard score
3.3. Soft Cosine similarity
3.4. Wordnet Metrics
3.4.1. Wu-Palmer
3.4.2. Resnik
3.4.3. Path distance
3.4.4. Lin
3.4.5. Leacock Chodorov
3.4.6. Jiang-Conrath
4. Opinion Mining and Sentiment Analysis
4.1. Definition
4.2. Formal definition
4.3. Opinion holders
4.4. Measuring sentiments
4.5. Identifying the target and its aspects
4.6. Time is important too!
4.7. Current trends
4.8. Useful resources
4.9. Unsupervised method (PMI)
4.9.1. Let’s implement it with nltk! (Programming assignment)
4.10. Supervised method
4.10.1. Let’s implement it with nltk! (Programming assignment)
4.11. Brief review of the real deal
4.11.1. How we did it in the past?
4.11.2. How Netflix does it?