Natural Language Processing and Sentiment Analysis

9 views
Skip to first unread message

TALP Research Center

unread,
May 30, 2016, 9:40:18 AM5/30/16
to semina...@googlegroups.com

Natural Language Processing and Sentiment Analysis

Alexander Gelbukh
Instituto Politécnico Nacional (México)
(gel...@gelbukh.com)


1 General Description

This seminar will introduce you to fundamental concepts on natural language processing and sentiment analysis. The main topics covered will be:

• Introduction to Natural Language Processing
      - Python and NLTK
• Text Similarity Metrics
• Opinion Mining and Sentiment Analysis
      -Current trends and implementation with NLTK

No previous knowledge on natural language processing is assumed and therefore this seminar is meant to be an introduction mainly focusing on sentiment analysis / opinion mining.
A tentative list of topics to be covered can be found at the end of this file.

2 Schedule

7th and 9th June, 2016.
From 15h to 19h.
A5-104.

3 Evaluation

The students will be asked to solve some exercises at home and deliver them by e-mail as to be agreed with the lecturer.


--------------------------------------------------------------------
APPENDIX
Tentative list of contents.

1. Introduction to Natural Language Processing

1.1. Definition
1.1.1. Natural Language Processing
1.1.2. Text Analysis
1.1.3. Text Generation
1.1.4. Speech Analysis
1.1.5. Speech Generation

1.2. Text Analysis
1.2.1. Levels of analysis
1.2.2. Tokenizer
1.2.3. Stemmer and Lemmatizer
1.2.4. POS tagger
1.2.5. Parser
1.2.5.1. Constituency tree
1.2.5.2. Dependency tree
1.2.6. Beyond grammar
1.2.6.1. Name entity recognition
1.2.6.2. Anaphora resolution
1.2.6.3. Co-reference resolution
1.2.7. Co-relation between complexity and quality
1.2.8. Real life applications
1.2.9. Example Pipeline 1 (Opinion mining)
1.2.10. Example Pipeline 2 (Information retrieval)

1.3. Vector Space Model
1.3.1. Definition
1.3.2. Examples
1.3.3. Text representation
1.3.3.1. N-grams
1.3.3.1.1. Characters
1.3.3.1.2. Words
1.3.3.2. LSA
1.3.3.3. Syntatic n-grams
1.3.4. Weighting Schemes
1.3.4.1. Boolean
1.3.4.2. Term frequency
1.3.4.3. Inverse Term frequency

1.4. Current trends
1.4.1. Heuristics
1.4.2. Linguistic-based methods
1.4.3. Machine learning
1.4.4. Deep learning
1.4.5. Everything goes together in real life

1.5. Useful resources

2. Introduction to Python and NLTK
2.1. Quick intro to Python
2.2. Reading files
2.3. Useful string functions
2.4. Regular expressions
2.5. Resources found in NLTK
2.6. Tokenizer
2.7. Lemmatizer
2.8. POS tagger
2.9. Parser
2.9.1. Constituency tree
2.9.2. Dependency tree

3. Text Similarity Metrics
3.1. Cosine similarity
3.2. Jaccard score
3.3. Soft Cosine similarity
3.4. Wordnet Metrics
3.4.1. Wu-Palmer
3.4.2. Resnik
3.4.3. Path distance
3.4.4. Lin
3.4.5. Leacock Chodorov
3.4.6. Jiang-Conrath

4. Opinion Mining and Sentiment Analysis
4.1. Definition
4.2. Formal definition
4.3. Opinion holders
4.4. Measuring sentiments
4.5. Identifying the target and its aspects
4.6. Time is important too!
4.7. Current trends
4.8. Useful resources
4.9. Unsupervised method (PMI)
4.9.1. Let’s implement it with nltk! (Programming assignment)
4.10. Supervised method
4.10.1. Let’s implement it with nltk! (Programming assignment)
4.11. Brief review of the real deal
4.11.1. How we did it in the past?
4.11.2. How Netflix does it?
Reply all
Reply to author
Forward
0 new messages