Dear Friends,
Welcome to the
Learning July 2026!
So far, we have explored
Supervised Machine Learning for Text Analysis, where the dependent variable (DV) is known and guides the learning process. Now, it's time to move on to
Unsupervised Machine Learning for Text Analysis, where no dependent variable is predefined. Instead, the goal is to uncover hidden patterns, structures, and relationships within textual data.
In this session, we'll begin by exploring the fascinating world of
semantic vector spaces and develop an intuition for how machines learn and understand relationships between words.
Before we begin, let's understand two important terms:
- A vector is simply a list of numbers that represents an object. For example, the word "King" might be represented as [0.82, -0.15, 0.64, ...] , while "Queen" has its own unique numeric representation.
- A vector space is a geometric space where these vectors are placed. Words with similar meanings, such as King–Queen or Doctor–Nurse , tend to appear closer together than unrelated words like King–Banana .
Some of the topics we'll cover include:
- The intuition behind semantic vector spaces: Learn how words are converted into vectors so that mathematical operations can capture semantic meaning.
- The famous vector algebra example: King − Man + Woman = Queen . We'll see how shifting vectors in a high-dimensional space enables machines to discover meaningful relationships and analogies using pure mathematics.
- Navigating semantic spaces with cosine similarity: Learn how cosine similarity measures the angle between word vectors to determine how conceptually similar two words are, regardless of document length.
- Building vs. importing semantic spaces: Explore the two primary approaches for obtaining semantic vector spaces for your projects:
- Training your own semantic space: Generate a domain-specific semantic space by applying Latent Semantic Analysis (LSA) to your own text corpus.
- Using pre-trained semantic spaces: Leverage large, publicly available semantic spaces trained on billions of words from trusted repositories, such as the Homepage of Fritz Günther – Semantic Spaces
I've explained these concepts in detail in the following video:
Happy learning
Neeraj
Happy Learning
Neeraj