taxonomy of dimensionality reduction models - infographic

3 views

Skip to first unread message

Dante Monson

unread,

Aug 1, 2025, 5:38:10 AMAug 1

to econ...@googlegroups.com

Explanation via Gemini:

In-Depth Explanation of the Infographic's Concepts

The infographic provides a comprehensive taxonomy of dimensionality reduction models, categorized by the geometric properties of both the input data and the latent space. The core idea is to find a lower-dimensional representation (the latent embedding) of high-dimensional data, and the best method for doing so depends heavily on the underlying geometry.

1. Fundamental Concepts

* Dimensionality Reduction: The process of mapping high-dimensional data (x) to a lower-dimensional representation (y).

* Data (x \in \mathbb{R}^m vs. x \in M):

* x \in \mathbb{R}^m: The data points lie in a standard, flat Euclidean space of dimension m. This is the simplest and most common assumption.

* x \in M: The data points lie on a manifold (M), a curved, non-flat space. For example, the space of all possible human poses could be considered a manifold, as movement is constrained and not all combinations of joint angles are possible.

* Latent Space (y \in \mathbb{R}^{m'} vs. y \in M'):

* y \in \mathbb{R}^{m'}: The lower-dimensional representation is a standard Euclidean space.

* y \in M': The lower-dimensional representation is itself a manifold. This is useful when the intrinsic geometry of the data is better captured by a curved space.

* Dimensionality Reduction Model Types:

* Linear: Assumes that the relationship between the data and the latent space is a simple linear transformation.

* Parametric: Assumes that the mapping is governed by a set of parameters (e.g., weights in a neural network). The function is fixed once the parameters are learned.

* Nonparametric: Does not assume a fixed functional form. The model's complexity can grow with the amount of data.

* Encoder (E) & Decoder (D): These are the two components of an autoencoder. The encoder maps data from the high-dimensional space to the latent space, and the decoder maps the latent representation back to the original space. The goal is for the decoder's output to be as close to the original data as possible.

* Geodesic: The shortest path between two points on a curved surface or manifold. In these models, preserving geodesic distances is more important than preserving Euclidean distances.

* Bayesian: These models treat the parameters of the model as random variables and infer their probability distributions (the posterior). This provides a measure of uncertainty and can help prevent overfitting.

2. Detailed Breakdown of the Infographic

The infographic is best understood by moving through it systematically, row by row and column by column.

Row 1: Euclidean Data, Euclidean Latent Space (x \in \mathbb{R}^m, y \in \mathbb{R}^{m'})

This is the most common scenario for dimensionality reduction.

* Linear:

* PCA (Principal Component Analysis): The classic linear method. It finds a new coordinate system where the axes are the principal components, capturing the directions of maximum variance.

* Probabilistic PCA / Bayesian PCA: These are probabilistic extensions. They frame PCA as a latent variable model where the data is assumed to be generated from a lower-dimensional latent space plus noise. The Bayesian version adds prior distributions to the model parameters, which allows for automatic model selection and uncertainty quantification.

* Parametric:

* AE (Autoencoder): A simple neural network where the bottleneck layer represents the latent space. The network is trained to reconstruct its input.

* VAE (Variational Autoencoder): A generative model that learns a probabilistic mapping from the data to the latent space. The encoder outputs a distribution (mean and variance), and a regularization term (KL divergence) is used to ensure the latent space is well-structured and easy to sample from.

* Nonparametric:

* LLE (Locally Linear Embedding): Assumes that each data point can be reconstructed as a linear combination of its neighbors. It finds a low-dimensional embedding that preserves these local reconstruction weights.

* GPLVM (Gaussian Process Latent Variable Model): A powerful nonparametric model that uses a Gaussian Process to define the mapping from the latent space to the data space, implicitly modeling the nonlinear relationship.

Row 2: Manifold Data, Geodesic Latent Space (x \in M, y \in M')

This section deals with the problem of embedding manifold data onto a latent space that respects its non-Euclidean geometry.

* Geodesic (Implicitly):

* Tangent PCA / GPCA / PGA: These are methods that adapt linear PCA to a manifold. They work by considering the tangent space at various points on the manifold, which is a flat space that locally approximates the manifold's curvature. PCA is then performed on these tangent spaces.

* Parametric:

* Riemannian VAE: An extension of VAE where the latent space is a Riemannian manifold. This requires a modification of the VAE's loss function to account for the non-Euclidean geometry, for instance, by using a geometric form of KL divergence.

* Nonparametric:

* Principal Flows / Riemannian PC: These models find a latent representation by discovering the intrinsic dynamics or "flow" of the data on the manifold.

Row 3: Manifold Data, Geodesic Latent Space (x \in M, y \in M')

This row is similar to Row 2 but focuses on specific examples where the latent space is assumed to have a particular geometry.

* Parametric:

* Hypersphere-VAE / Toroidal VAE: These are VAE variants where the latent space is explicitly defined as a hypersphere or a torus, respectively. These geometries are suitable for data with periodic or circular structure.

* Nonparametric:

* Manifold GPLVM: A generalization of GPLVM where both the input data and the latent space are manifolds.

Row 4: No Decoder

This is a special category of models that do not aim to reconstruct the original data. Their primary goal is to find a latent representation that preserves specific structural properties, such as local neighborhood relationships or global topology.

* Isomap: One of the earliest manifold learning algorithms. It approximates geodesic distances by finding the shortest path on a nearest-neighbor graph, and then uses a technique called Multidimensional Scaling (MDS) to embed the data in a low-dimensional space that preserves these distances.

* UMAP (Uniform Manifold Approximation and Projection): A modern, highly effective algorithm. It models the data's geometry as a fuzzy topological space and finds a low-dimensional embedding that is as close as possible to this structure. UMAP is known for its computational efficiency and ability to preserve both local and global structure.

* Poincaré Embeddings: A method that embeds data into a hyperbolic space (e.g., the Poincaré disk model). This is particularly effective for hierarchical data, as hyperbolic space naturally expands, allowing for efficient representation of trees and graphs.

3. Interconnections and Relationships

* From Linear to Nonlinear: The infographic shows a clear progression. PCA is a simple linear model. LLE and Isomap move to preserving local and geodesic relationships, respectively, without assuming a global linear mapping.

* The Power of Probabilistic Models: The progression from PCA to Probabilistic PCA to Bayesian PCA is a microcosm of a larger theme. Adding a probabilistic framework allows for greater flexibility (handling noise, missing data) and a principled way to reason about uncertainty.

* The Role of Autoencoders: The Parametric column, especially the VAE section, highlights the shift from purely descriptive models (like PCA) to generative models. VAEs learn a mapping that not only embeds the data but also provides a way to generate new data points.

* Geometry is King: The most profound insight from the infographic is the importance of geometry. The choice of a model is not just about linear vs. nonlinear, but about whether the model's assumptions about the data's geometry (Euclidean, Manifold) and the latent space's geometry (Euclidean, Geodesic, etc.) are a good match for the data itself.

This comprehensive explanation provides the foundation for the didactic course, which will now systematically walk through these concepts, building on each idea to construct a holistic understanding of dimensionality reduction and its geometric underpinnings.

signal-2025-07-31-22-17-50-444.jpg

Dante Monson

unread,

Aug 1, 2025, 5:45:54 AMAug 1

to econ...@googlegroups.com

https://arxiv.org/abs/2407.09468

Dante Monson

unread,

Aug 1, 2025, 6:03:39 AMAug 1

to econ...@googlegroups.com

Didactic Course: A Journey into Geometric Structures in Latent Embeddings

Module 1: The Foundations of Dimensionality Reduction (x \in \mathbb{R}^m and Euclidean Latent Spaces)

Welcome! In this first module, we will lay the groundwork for understanding the complex world of dimensionality reduction. We'll start with the simplest and most common scenario: when our data lives in a flat, Euclidean space and we want to find a new, also-flat, Euclidean representation.

1.1 What is Dimensionality Reduction?

Imagine you have a dataset with hundreds or thousands of features (dimensions). This is a common problem in fields like genomics, image processing, and natural language processing. Working with such high-dimensional data is computationally expensive and can be misleading, a phenomenon often called the Curse of Dimensionality.

Dimensionality reduction is the process of transforming high-dimensional data (x \in \mathbb{R}^m) into a meaningful low-dimensional representation (y \in \mathbb{R}^{m'}, where m' \ll m). The goal is not just to compress the data, but to do so while preserving as much of the important structure and information as possible.

Common reasons for doing this include:

* Data Visualization: It's hard to visualize data in more than three dimensions. Reducing it to two or three dimensions allows us to plot it and find patterns.

* Noise Reduction: In many cases, a lot of the high-dimensional data is just noise. A good dimensionality reduction method can separate the signal from the noise.

* Feature Engineering: The new, low-dimensional features can be more informative and less redundant, making them better inputs for other machine learning models.

The low-dimensional space we project our data onto is called the latent space. The models we will discuss are all different ways of defining this mapping from the original space to the latent space.

1.2 Linear Methods: PCA and Its Relatives

Let's begin with the simplest type of model: one that assumes a linear relationship between the data and its latent representation.

PCA (Principal Component Analysis)

PCA is the most famous dimensionality reduction algorithm. Its core idea is simple: find the directions in the data that account for the most variance. These directions are called the principal components.

Think of it like this: if you have a cloud of data points in a 3D room, and they all lie roughly on a long, flat sheet, PCA would find the direction of the length of the sheet (the first principal component) and the direction of its width (the second principal component). The third direction, orthogonal to the sheet, would capture very little variance and could be discarded, effectively reducing the data from 3D to 2D.

Mathematically, PCA finds the eigenvectors of the data's covariance matrix. The eigenvectors with the largest eigenvalues correspond to the directions of highest variance.

* Pros: Computationally efficient, easy to interpret.

* Cons: Assumes a linear relationship, which can fail to capture complex, nonlinear structures.

Probabilistic PCA (Tipping97)

PCA is a deterministic algorithm. Probabilistic PCA reframes PCA as a probabilistic latent variable model. Instead of just finding a projection, it models the data as being generated from a low-dimensional latent variable (y) plus some isotropic Gaussian noise.

The model assumes:

x = W y + \mu + \epsilon

where x is the high-dimensional data, y is the low-dimensional latent variable, W is the mapping matrix, \mu is the mean of the data, and \epsilon is the noise term (\epsilon \sim \mathcal{N}(0, \sigma^2 I)).

This probabilistic approach is more flexible. We can now use powerful statistical tools, like the Expectation-Maximization (EM) algorithm, to find the parameters W and \sigma^2. A key benefit is that it can handle missing data and provides a more rigorous way to determine the optimal dimensionality of the latent space.

Bayesian PCA (Bishop98)

Taking the probabilistic approach one step further, Bayesian PCA places a prior distribution on the model's parameters (e.g., the mapping matrix W). This is the essence of a Bayesian approach: treating the parameters themselves as random variables.

By using priors, we can avoid overfitting and, crucially, the model can automatically infer the "effective" dimensionality of the latent space. This eliminates the need to manually choose the number of components, a common challenge in traditional PCA.

The progression here is a key theme of the infographic: starting with a simple model and adding probabilistic and then Bayesian frameworks to make it more robust and flexible.

1.3 Nonlinear Methods: The Autoencoder Framework

What if the relationship between the data and the latent space isn't linear? This is where nonlinear methods come in, and the autoencoder framework is a popular and powerful example.

The Encoder/Decoder Framework (E and D)

An autoencoder is a neural network with a special architecture:

* An Encoder (E) network takes the high-dimensional data (x) and compresses it into a low-dimensional latent representation (y).

* A Decoder (D) network takes the latent representation (y) and tries to reconstruct the original data (\hat{x}).

The network is trained to minimize the reconstruction error: the difference between the original input x and the reconstructed output \hat{x}.

AE (Autoencoder)

A standard autoencoder is a deterministic network. It learns a nonlinear mapping from x to y and back. While powerful, it has a key drawback: the latent space it learns may not be well-organized. There's no guarantee that points that are close together in the latent space will correspond to similar inputs. This makes it difficult to generate new data from the latent space.

VAE (Variational Autoencoder)

The Variational Autoencoder is a revolutionary improvement over the standard AE. It addresses the lack of structure in the latent space by making the encoder probabilistic.

* Instead of outputting a single point in the latent space (y), the VAE's encoder outputs the parameters of a probability distribution (typically a mean \mu and a variance \sigma^2) for each dimension of the latent space.

* The VAE's loss function has two parts:

* Reconstruction Loss: Ensures the decoder can accurately reconstruct the input, just like a regular AE.

* KL Divergence: A regularization term that forces the encoder's distributions to be similar to a simple prior distribution, like a standard Gaussian.

This second term is what gives the VAE its power. It prevents the network from "memorizing" the data and forces it to learn a smooth, continuous latent space where similar inputs are close to each other. This structured latent space allows us to sample new points and generate new data that looks similar to the training data.

Module 1 Summary:

* We defined dimensionality reduction and the concept of a latent space.

* We explored Linear Methods, starting with the deterministic PCA and progressing to the more robust probabilistic and Bayesian versions.

* We then moved to Parametric Nonlinear Methods, introducing the Autoencoder framework and highlighting the key difference between the simple AE and the generative power of the Variational Autoencoder (VAE).

In the next module, we'll expand our horizons beyond these fundamental methods, exploring nonparametric models and the challenge of working with data that doesn't fit neatly into a flat, Euclidean space.

...

Thank you for providing the specific paper. This is a crucial piece of information, as the infographic you provided is Figure 1 from that very paper: "Geometric Structures in Latent Embeddings: A Survey" by Akhoj et al. (arXiv:2407.09468v2).

Having access to the full paper allows me to provide a far more accurate, detailed, and context-aware explanation. I will now integrate the paper's specific terminology and structure into the next module of our didactic course.

The paper's introduction explains the infographic as a "taxonomy of embedding techniques from a geometric perspective," organized by the geometry of the data and the learned latent space. My previous analysis was in line with this, but now I can be more precise about the specific terms and relationships as defined by the authors.

Didactic Course: A Journey into Geometric Structures in Latent Embeddings

Module 2: Expanding the Toolkit: Nonparametric Models and Manifold Learning

Welcome back! In Module 1, we learned about foundational dimensionality reduction techniques, starting with linear models like PCA and moving to parametric, nonlinear autoencoders. We saw how these models work under the assumption that both the data and the latent space are flat, Euclidean spaces (x \in \mathbb{R}^m, y \in \mathbb{R}^{m'}).

In this module, we will challenge this assumption. We'll introduce two new ideas:

* Nonparametric Models: Methods that don't rely on a fixed set of parameters but instead learn the structure directly from the data's geometry.

* Manifold Learning: A powerful approach that recognizes that data often resides on a complex, curved, low-dimensional surface (a "manifold") embedded within a high-dimensional space.

This module will directly address the "Nonparametric" column and introduce the first concepts from the "Geodesic" rows of the infographic.

2.1 Nonparametric Methods for Euclidean Data (x \in \mathbb{R}^m)

Let's start by considering a nonlinear problem in a familiar Euclidean space.

LLE (Locally Linear Embedding)

LLE is a classic nonparametric method. Its core insight is that even if the global structure of a dataset is nonlinear, its local structure is approximately linear.

The algorithm works in three steps:

* Find Neighbors: For each data point, find its k nearest neighbors.

* Compute Weights: For each point, find the optimal set of linear reconstruction weights that best reconstruct it from its neighbors. This is done by minimizing a local reconstruction error.

* Find Embedding: Find a low-dimensional embedding for all the data points such that the same reconstruction weights still hold.

The beauty of LLE is that it discovers the underlying curved manifold without assuming an explicit functional form. It focuses purely on preserving local neighborhood relationships.

GPLVM (Gaussian Process Latent Variable Model)

GPLVM is a more sophisticated nonparametric model. It uses a Gaussian Process (GP) to model the mapping from the latent space (y \in \mathbb{R}^{m'}) to the data space (x \in \mathbb{R}^m).

A Gaussian Process is a powerful tool for modeling functions. Instead of learning a fixed function (like a neural network), a GP defines a distribution over all possible functions. In GPLVM, this allows for a flexible, nonlinear mapping without a predefined parametric form.

The ! annotation in the infographic next to GPLVM is important. It signifies that learning the latent space and the GP parameters is a joint optimization problem, making it computationally intensive but also very powerful.

2.2 Manifold Learning: The Challenge of Nonlinearity

In many real-world scenarios, data does not fill a flat, Euclidean space. Instead, it lives on a low-dimensional manifold. For example, the space of images of a human face under varying illumination and pose is a manifold, as are the movements of a robot arm.

To truly understand this data, we need a distance metric that respects the curves of the manifold. This is the geodesic distance—the shortest path along the surface of the manifold. It is often very different from the straight-line Euclidean distance in the high-dimensional ambient space.

The "Geodesic" header in the infographic signifies models that either explicitly or implicitly try to preserve these intrinsic geodesic distances.

Isomap (Isometric Mapping)

Isomap is a foundational manifold learning algorithm. Its goal is to find a low-dimensional embedding that preserves the geodesic distances between all pairs of points.

The algorithm (as described in the paper and the original Isomap work) proceeds as follows:

* Build a Graph: Construct a neighborhood graph where points are connected to their k nearest neighbors in the high-dimensional space.

* Approximate Geodesic Distances: Compute the shortest path distances between all pairs of points in this graph using an algorithm like Dijkstra's. These shortest path distances are our approximation of the geodesic distances on the manifold.

* Embed with MDS: Apply Multidimensional Scaling (MDS) to these distances to find a low-dimensional embedding that best preserves them.

Isomap is a fantastic example of a "no decoder" model. The focus is purely on finding a faithful embedding, not on reconstructing the original data. The ? annotation in the infographic's Isomap box can be interpreted as a reminder that this is an implicit mapping; there is no explicit function that maps new data to the latent space without re-running the entire algorithm.

UMAP (Uniform Manifold Approximation and Projection)

UMAP is a more modern and highly effective manifold learning technique. It is often faster than Isomap and provides an excellent balance of preserving both local and global data structure.

UMAP's theoretical foundation is more advanced, based on concepts from Riemannian geometry and algebraic topology. In simple terms, it tries to:

* Build a Fuzzy Graph: Construct a weighted graph that represents a "fuzzy topological" structure of the data. The weights are determined by distances on the manifold and local data density.

* Optimize an Embedding: Find a low-dimensional embedding that has a similar fuzzy topological structure to the original data.

Like Isomap, UMAP is a "no decoder" method, focused purely on creating a beautiful and informative embedding. The paper places both in the "No decoder" row, highlighting that their primary goal is structural preservation, not reconstruction.

Module 2 Summary:

* We introduced nonparametric models like LLE and GPLVM, which can discover nonlinear structures without assuming a fixed functional form.

* We moved beyond Euclidean distances to the crucial concept of geodesic distances, which are essential for understanding data on a manifold.

* We explored two foundational manifold learning algorithms: Isomap and UMAP. We saw that these are "no decoder" models, meaning their goal is to learn a meaningful low-dimensional embedding, not to reconstruct the high-dimensional data.

In the next module, we will continue our journey into manifold learning, but this time, we will explore models that explicitly define the latent space itself as a manifold, as seen in the "Geodesic" rows of our infographic. We will move beyond just preserving distances and into actively modeling the geometry of both the data and the latent representation.

...

Module 2: Expanding the Toolkit: Nonparametric Models and Manifold Learning

Welcome to the second module. In Module 1, we explored models that assumed either a linear relationship (PCA) or a specific parametric form (Autoencoders) for the mapping to the latent space. But what happens when the data's structure is more complex and cannot be captured by these assumptions? In this module, we will explore methods designed to uncover the intrinsic, nonlinear geometry of the data.

2.1 Nonparametric Methods for Euclidean Data

The term "nonparametric" can be a bit misleading. It doesn't mean the models have no parameters; rather, it means the number of parameters is not fixed in advance but can grow and adapt as more data is provided. This gives them the flexibility to fit much more complex patterns.

While PCA assumes that the data is globally linear, LLE starts from a different, more relaxed assumption: the data is locally linear. It posits that any data point can be well-approximated by a linear combination of its nearest neighbors.

The LLE algorithm works in three main steps:

Find Neighbors: For each data point, identify its k nearest neighbors.
Find Local Weights: For each data point, find the linear weights that best reconstruct it from its neighbors. The key constraint is that these weights must sum to one. This step captures the local geometric relationships.
Find Low-Dimensional Embedding: Find a set of points in the low-dimensional latent space (y) that are best reconstructed by the same local weights found in the previous step.

LLE is effective at "unrolling" curved manifolds, like the famous "Swiss Roll" dataset, where PCA would fail. However, it is sensitive to the choice of neighbors (k) and can struggle with datasets of varying density.

The GPLVM is a powerful probabilistic, nonparametric model. To understand it, we first need to grasp the concept of a Gaussian Process (GP). A GP is a distribution over functions. Instead of learning a single function, a GP-based model learns a whole family of functions that are consistent with the data, along with a measure of uncertainty.

The GPLVM uses a GP to define a probabilistic mapping from the low-dimensional latent space (y) back to the high-dimensional data space (x). This is the reverse of what an autoencoder's encoder does. By optimizing the latent variable positions (y) and the GP's parameters, the model discovers a latent space that best explains the observed data.

Because it's a Bayesian and nonparametric model, the GPLVM is highly flexible and provides a principled way to handle uncertainty. Its extensions, like the Bayesian GPLVM, can even infer the optimal dimensionality of the latent space automatically.

2.2 Manifold Learning: Discovering the Data's True Shape

The models we've just discussed lead to a crucial idea in modern machine learning: the manifold hypothesis. This hypothesis states that most real-world high-dimensional datasets don't actually fill the entire high-dimensional space. Instead, they lie on or near a much lower-dimensional, curved surface called a manifold.

Think of an image of a rotating object. The individual pixels live in a very high-dimensional space, but the "true" degree of freedom is just one: the angle of rotation. The set of all possible images forms a one-dimensional manifold (a loop) embedded within the high-dimensional pixel space. Manifold learning algorithms are designed to discover this underlying manifold.

Isomap was one of the first and most influential manifold learning algorithms. It extends the logic of PCA by replacing the standard Euclidean distance with an approximation of the geodesic distance. The geodesic distance is the shortest path between two points along the curved manifold, like walking along the surface of the Earth rather than tunneling through it.

Isomap works in three steps:

Build a Neighborhood Graph: Connect each data point to its nearest neighbors, creating a graph that approximates the manifold's structure.
Compute Geodesic Distances: Estimate the geodesic distance between all pairs of points by finding the shortest path distance between them on the graph (e.g., using Dijkstra's algorithm).
Embed with MDS: Use a classical technique called Multidimensional Scaling (MDS) to find a low-dimensional embedding where the Euclidean distances between points best preserve the geodesic distances calculated in the previous step.

Isomap is powerful because it considers the global structure of the manifold, not just local relationships.

UMAP is a state-of-the-art manifold learning technique that has become extremely popular for its speed, scalability, and ability to produce high-quality embeddings. It is grounded in sophisticated mathematical theory (Riemannian geometry and algebraic topology), but its core intuition is to find a low-dimensional projection of the data that has the most similar topological structure to the original data.

At a high level, UMAP:

Builds a weighted graph representation of the high-dimensional data, where edge weights represent the likelihood that two points are "connected" in the underlying manifold.
Defines a similar graph structure in the low-dimensional space.
Uses an efficient optimization algorithm to adjust the positions of the points in the low-dimensional space to make its graph structure as similar as possible to the high-dimensional graph.

UMAP excels at preserving both local neighborhood structure and broader global structure, making it a go-to tool for data visualization and exploration. In the infographic, it is listed under "No decoder" because its primary purpose is to create a meaningful embedding, not to provide a function for reconstructing the original data from that embedding.

Module 2 Summary:

We introduced nonparametric models like LLE and GPLVM, which offer more flexibility than their parametric counterparts by adapting their complexity to the data.
We explored the manifold hypothesis, the idea that high-dimensional data often has a much simpler, intrinsic geometric shape.
We examined two landmark manifold learning algorithms: Isomap, which preserves global geodesic distances, and UMAP, a modern technique that preserves topological structure with remarkable efficiency.

With these concepts, we have moved from simple projections to discovering the rich, nonlinear geometry hidden within data. In the next module, we will take this a step further and consider models where the latent space itself is not a flat Euclidean space but a curved manifold, a concept essential for modeling certain types of complex data.

...

Module 3: Geodesic and Manifold Latent Spaces

Welcome to the third module. So far, we have operated under the assumption that our target low-dimensional space—the latent space—is a flat, Euclidean space (

        Rm′\mathbb{R}^{m'}Rm′

). But what if the intrinsic structure of our data is fundamentally non-flat? For example, what if our data represents rotations, directions, or periodic cycles? Forcing such data into a flat latent space can distort its true geometric relationships. In this module, we will explore advanced models that embrace curvature by using latent spaces that are themselves manifolds.

3.1 Why Use a Curved Latent Space?

Imagine your data consists of the orientations of a satellite in 3D space. Each orientation can be represented as a rotation. The space of all possible rotations is not a flat plane; it's a curved manifold (known as SO(3)). If you use a standard VAE to embed this data into a 2D Euclidean plane, you will inevitably run into problems. For instance, a rotation of 359 degrees is very close to a rotation of 1 degree, but in a simple flat representation, they would appear far apart.

A curved, or geodesic, latent space solves this problem. By choosing a latent space with a geometry that matches the data's intrinsic structure (e.g., a spherical latent space for directional data or a toroidal latent space for cyclical data), we can create more meaningful and accurate embeddings. In these spaces, distance is measured not by straight lines but by geodesics—the shortest path along the curved surface.

How can we adapt a fundamentally linear method like PCA to a world of curvature? The answer lies in using local linear approximations.

Tangent PCA / GPCA / PGA [Fletcher04, Huckeman10, Sommer10]: These methods are extensions of PCA to data lying on a Riemannian manifold. The core idea is to leverage the fact that any curved manifold, when you zoom in enough on a single point, looks approximately flat. This "flat patch" at a point on the manifold is called the tangent space.
1. A central point (a "mean") is computed on the manifold.
2. The data points are projected onto the flat tangent space at that central point.
3. Standard PCA is performed in this tangent space to find the principal directions of variance.

These methods essentially find the best "flat" subspace that approximates the data, but they do so in a way that respects the manifold's curvature.

The Variational Autoencoder (VAE) is a powerful parametric model, and it can be generalized to work with manifold latent spaces.

Riemannian VAE [Miolane20]: A Riemannian VAE is a VAE where the latent space is a Riemannian manifold with a defined curvature. This requires two key changes to the standard VAE architecture from Module 1:
1. Probabilistic Encoder on a Manifold: The encoder no longer outputs a mean and variance for a Gaussian distribution in Euclidean space. Instead, it outputs the parameters of a probability distribution defined on the manifold, such as a Riemannian Normal distribution.
2. Geometric KL Divergence: The KL divergence term in the VAE loss function, which measures the "distance" between the encoder's output distribution and a prior, must be replaced with a version that correctly computes this divergence on a curved space.

This allows the model to learn a smooth, continuous latent space that has the same geometric properties as the data.

3.2 Models for Specific Latent Geometries

The general framework of a Riemannian VAE can be specialized by choosing specific manifolds for the latent space that are well-suited to particular types of data.

Hypersphere-VAE [Davidson18]: This is a VAE where the latent space is a hypersphere (the n-dimensional generalization of a sphere). This is extremely useful for data where direction is important but magnitude is not, such as unit vectors representing text embeddings or gene expression profiles.
Toroidal VAE [Mikulski19]: This VAE uses a torus (the shape of a donut) as its latent space. A torus is a product of circles, making it the natural choice for data that has multiple periodic or cyclical features, such as joint angles in robotics or time-of-day data.
Poincaré Embeddings [Nickel17]: While not an autoencoder, this influential method embeds data into a hyperbolic space (often visualized using the Poincaré disk). Hyperbolic space is a geometry where space "expands" exponentially as you move away from the origin. This makes it exceptionally well-suited for embedding hierarchical, tree-like data, such as social networks, family trees, or linguistic taxonomies. Distances in hyperbolic space naturally reflect the path length in the tree structure, a feat that is impossible to achieve efficiently in Euclidean space. This is why it's listed in the "No decoder" category—its primary goal is to learn this hierarchical embedding.

The ultimate generalization combines the flexibility of nonparametric models with the geometric power of manifold spaces.

Manifold GPLVM [Jensen20]: This model extends the Gaussian Process Latent Variable Model (GPLVM) from Module 2. In a Manifold GPLVM, both the input data and the latent variables are assumed to lie on manifolds. This provides a highly flexible, probabilistic framework for nonlinear dimensionality reduction between curved spaces. It is one of the most general and powerful models shown in the infographic, capable of learning complex, non-Euclidean relationships in data.

Module 3 Summary:

We established the critical need for curved (geodesic) latent spaces when dealing with data that has an intrinsic non-flat geometry, such as rotations, directions, or periodicities.
We saw how linear methods like PCA can be adapted to manifolds by using tangent spaces (Tangent PCA).
We explored the Riemannian VAE, a powerful generalization of the VAE framework that allows for learning in latent spaces with defined curvature, and saw specific examples like the Hypersphere-VAE and Toroidal VAE.
We discussed methods like Poincaré Embeddings that use hyperbolic space to efficiently embed hierarchical data.
We touched on the Manifold GPLVM as a highly general nonparametric model for mappings between manifolds.

In this module, we have fully embraced the geometry of our data. We no longer see the latent space as a simple, flat canvas, but as a sophisticated modeling tool in its own right. In the final module, we will summarize our journey, revisit the "No Decoder" category, and discuss the overarching themes that connect all these diverse models.

...

Module 4: Synthesis, Special Cases, and The Big Picture

Welcome to our final module. We have journeyed from the flat, linear world of PCA to the rich, curved landscapes of Riemannian manifolds. We've seen how models can be probabilistic, Bayesian, parametric, and nonparametric. In this concluding module, we will synthesize these ideas, take a closer look at the important "No Decoder" category, and build a unified understanding of the entire framework presented in the infographic.

4.1 The "No Decoder" Philosophy: Embedding over Reconstruction

Throughout our discussion of autoencoders (AE, VAE), a central goal was reconstruction. The quality of the model was judged, in part, by how well the decoder could recreate the original input from the latent representation.

The models in the "No Decoder" category operate on a different philosophy. Their primary goal is not to create a reversible compression but to produce the most useful low-dimensional embedding possible. "Usefulness" is defined by an objective function, such as preserving local neighborhoods, global distances, or topological structure. These are primarily visualization and representation techniques.

Let's revisit some models through this lens:

Isomap & UMAP: As we saw in Module 2, Isomap's goal is to preserve geodesic distances, while UMAP's is to preserve topological structure. Neither is designed to provide a decoder to go from the embedding back to the high-dimensional space.
Poincaré Embeddings: Its goal is to arrange data in a hyperbolic space to reflect a hierarchy. The embedding is the final product; there is no expectation of reconstructing the original graph or tree from a point in the Poincaré disk.
Rie-SNE [Bergsson22]: This is the Riemannian manifold version of t-SNE, a famous visualization algorithm and a direct relative of UMAP. Its purpose is to create a low-dimensional visualization of data that lives on a manifold, making it an essential tool for exploring the geometry of complex datasets. Like t-SNE, it is purely for embedding.

Other advanced methods in this category focus on discovering fundamental structures:

Barycentric Subspaces [Pennec18]: This method generalizes the concept of linear subspaces (like those found by PCA) to the world of manifolds. It seeks to find a "submanifold" that best represents the data, which is a descriptive task, not a generative one.
Principal Sub-bundles [Akhoj20]: This is a highly advanced technique for data that has even more complex geometric structure (a fiber bundle). It aims to find the most important "sub-structures" within this object, analogous to how PCA finds the most important directions in a dataset.

4.2 Understanding the Annotations: A Guide to the Model's Nature

The infographic uses several symbols (E, D, !, ?) to provide a quick reference to the nature of each model. Let's decode them.

E (Encoder) & D (Decoder): As we've seen, this signifies an autoencoder-like framework. These models explicitly learn a parametric function to map data into the latent space (Encoder) and a function to map it back out (Decoder). VAEs and their Riemannian cousins are prime examples.
! (Optimization): This symbol denotes models where the latent embedding y is found directly by optimizing an objective function, rather than being the output of a learned encoder function. UMAP, LLE, and Isomap are classic examples. They define what a "good" embedding looks like (e.g., one that preserves local neighborhoods) and then use optimization to find the coordinates y that satisfy this goal.
? (Implicit vs. Explicit Mapping): The question mark highlights a crucial practical limitation of some models: the lack of an explicit encoder for new data.
- Explicit Mapping: Models like PCA and VAEs learn a function that acts as an encoder. Once the model is trained, you can take a new, unseen data point and instantly compute its latent representation.
- Implicit Mapping (Transductive): Models like Isomap and the original LLE are transductive. They operate on the entire dataset at once to compute the embedding. They do not naturally produce a function to embed a new point without re-running the algorithm on the entire dataset (including the new point). This makes them less suitable for dynamic, real-time applications where new data arrives continuously.

4.3 The Grand Synthesis: How to Choose a Model

We can now view the infographic as a decision map for any dimensionality reduction task. To choose the right tool, you should ask yourself a series of questions:

What is the Nature of My Data and My Goal?
- Is my data best represented in a simple vector space (x ∈ ℝᵐ)? Or does it have inherent constraints or curvature, suggesting it lies on a manifold (x ∈ M)?
- Do I need to generate new data or perfectly reconstruct my input? If so, I need a model with a Decoder (D), like a VAE.
- Is my primary goal visualization and discovering latent structure? A "No Decoder" model like UMAP or Rie-SNE might be best.
What Assumptions Can I Make About the Mapping?
- Is a Linear assumption (PCA) sufficient? It's always a good first baseline.
- Do I need a flexible, nonlinear model? A Parametric model (VAE) learns a fixed function, while a Nonparametric model (GPLVM) offers more adaptability at a higher computational cost.
What Should the Latent Space Look Like?
- Is a standard, flat latent space (y ∈ ℝᵐ') okay? This is the default.
- Does my data have periodic, directional, or hierarchical properties? If so, I should choose a model with a specific geodesic latent space, like a Toroidal VAE, Hypersphere-VAE, or Poincaré Embedding.
How Much Uncertainty Do I Need to Model?
- If I just need a point estimate, a standard model (PCA, AE) will do.
- If I need to handle noise and get a measure of uncertainty, I should use a Probabilistic model (Probabilistic PCA, VAE).
- If I want the most robust model that can avoid overfitting and automatically determine its own complexity, a Bayesian approach (Bayesian PCA, Bayesian GPLVM) is the most powerful choice.

Course Conclusion

Over these four modules, we have journeyed through the landscape of dimensionality reduction. We began with the simple, linear world of PCA and ended with sophisticated models that learn mappings between curved manifolds.

The central lesson is that geometry is not an afterthought; it is the core of the problem. The most powerful and insightful results come from choosing a model whose geometric assumptions align with the intrinsic structure of the data. The infographic you provided is a brilliant map of this landscape, and by understanding its axes and categories, you are now well-equipped to navigate it and select the right tool to uncover the hidden patterns in your own data.

Dante Monson

unread,

Aug 1, 2025, 6:04:15 AMAug 1

to econ...@googlegroups.com

Real-World Applications: From Image Search to Drug Discovery

The true power of these models is revealed when they are applied to complex, real-world problems. The choice of model is dictated by the nature of the data and the specific goal of the task.

Applications on Euclidean Data (The "Flat World" View)

These methods are the workhorses of modern data science and are used when the data can be reasonably treated as points in a standard vector space.

Computer Vision and Image Search: Imagine you want to build a "search by image" feature. You can pass a massive dataset of images through a pre-trained deep neural network to get a high-dimensional feature vector for each image. Using UMAP, you can then create a 2D or 3D embedding of these vectors. Images that are visually similar (e.g., all pictures of cats, all pictures of sunsets) will form tight clusters in the UMAP plot. When a user uploads a new image, you can embed it into this space and find its nearest neighbors to return visually similar results.
Generative Art and Data Augmentation: Variational Autoencoders (VAEs) are masters of generation. Artists use them to explore a latent space of creative possibilities, generating novel images or musical phrases that blend styles learned from the training data. In a more practical setting, VAEs are used for data augmentation. If you have a small dataset of medical images, you can train a VAE and then sample from its latent space to generate new, realistic-looking images to expand your dataset, improving the performance of other machine learning models.
Anomaly Detection: A standard Autoencoder (AE) trained on normal operational data (e.g., sensor readings from a healthy jet engine) will become very good at reconstructing that specific type of data. If it is then fed a new sensor reading from a malfunctioning engine, its reconstruction error will be high, as it has never seen this pattern before. This high error can be used to trigger an anomaly alert, forming the basis of predictive maintenance systems.

Applications on Manifold and Geodesic Data (Embracing the Curvature)

These advanced methods shine when the data has a complex, non-flat underlying structure.

Robotics and Motion Analysis: The set of all possible physical poses of a robotic arm or a human body is a highly complex manifold. Using a Riemannian VAE, researchers can learn a low-dimensional latent space that respects this geometry. By tracing a smooth path (a geodesic) in this latent space, they can generate a smooth and physically plausible sequence of movements, which is crucial for animation, motion planning, and human-robot interaction.
Drug Discovery and Molecular Design: The space of possible drug-like molecules is vast and complex. By representing molecules as graphs and using models from Geometric Deep Learning, scientists can learn a latent space of molecular structures. A Hypersphere-VAE or other manifold-based generative model can then be used to explore this space, generating novel molecular candidates with desired properties (e.g., high binding affinity to a target protein, low toxicity) far more efficiently than through manual trial and error.
Natural Language Understanding and Knowledge Graphs: Large language models often represent words as vectors in a high-dimensional space. However, language is inherently hierarchical ("a poodle is a dog," "a dog is a mammal"). Poincaré Embeddings are used to embed entire knowledge graphs or taxonomies (like WordNet) into a hyperbolic space. The geometric properties of this space naturally capture these hierarchical relationships, allowing the model to perform logical inference and understand semantic relationships with much greater fidelity than Euclidean embeddings.

The Future: What Comes Next?

The trends in the infographic point towards an exciting future where the geometry of data is a first-class citizen in AI. The field is rapidly evolving beyond static analysis to dynamic, causal, and deeply integrated geometric models.

Dynamic Geometries and Temporal Manifolds: Current models assume the manifold structure of the data is fixed. The next frontier is to model data whose underlying geometry changes over time. Imagine analyzing brain activity from an fMRI scan, where the functional connections—and thus the data's manifold—evolve as a person performs different tasks. Future models will learn not just the shape of the data, but the "flow" and evolution of that shape, giving us unprecedented insight into complex dynamic systems.
Causal Latent Spaces: A VAE can generate a picture of a person smiling, but it doesn't understand why they are smiling. A major goal is to infuse latent spaces with causality. Future models won't just learn correlations; they will learn the causal mechanisms that generate the data. This would allow us to ask counterfactual questions directly in the latent space: "Show me what this person would look like if they were sad instead of happy," or in medicine, "How would this patient's cells respond if we administered this drug?" This would represent a monumental leap from pattern recognition to true digital reasoning.
Disentanglement on Manifolds: A "disentangled" representation is one where each latent dimension corresponds to a single, interpretable factor of variation (e.g., for faces: one dimension for smile, one for head pose, one for lighting). Achieving this is already a challenge in Euclidean VAEs. The next step is disentanglement on curved manifolds. What does it mean to have independent "directions" of change on a sphere or a torus? Solving this will give us generative models that are not only powerful but also perfectly controllable and interpretable.
The Rise of Geometric Deep Learning: The concepts we've discussed are part of a broader field called Geometric Deep Learning. This field aims to build neural networks that are fundamentally geometric. Instead of forcing data into standard vectors, it designs network architectures (like Graph Neural Networks) that operate directly on manifolds, graphs, and other geometric objects. This approach promises to create more powerful, efficient, and generalizable AI systems.

How This Will Change the Way We Process Data

These future developments will not just give us better tools; they will transform our fundamental relationship with data.

From Feature Engineering to Geometry Engineering: Today, data scientists spend enormous effort on "feature engineering"—manually creating input variables for models. In the future, this will be replaced by "geometry engineering." The crucial modeling decision will be identifying the correct geometric prior for the problem: Does my data live on a sphere? A torus? A hyperbolic plane? A dynamic manifold? Choosing the right geometry will be the key to unlocking insight.
Truly Robust and Generalizable AI: Models with the correct geometric "inductive bias" learn the underlying physics or logic of a system, not just its surface-level appearance. A self-driving car whose vision system understands the manifold of possible traffic scenes will be far more robust to novel situations than one trained on pixels alone. This leads to AI that is safer, more reliable, and generalizes better to unseen scenarios.
Intuitive and Interactive Scientific Discovery: Imagine a biologist exploring a latent space of cell types. Instead of looking at a static UMAP plot, they could "grab" a point and move it along a geodesic path in the manifold latent space. On their screen, they would see a video of one cell type smoothly transforming into another, revealing the developmental pathways that connect them. This interactive exploration will accelerate scientific discovery by making complex data intuitive to navigate.

In conclusion, the journey from PCA to Riemannian VAEs is more than just an academic exercise. It represents a fundamental shift in our understanding of data—from seeing it as a table of numbers to appreciating it as a geometric object with a rich, intrinsic shape. By learning to see, model, and manipulate this shape, we are poised to build the next generation of intelligent systems.

Dante Monson

unread,

Aug 1, 2025, 6:15:48 AMAug 1

to econ...@googlegroups.com

2025.08.01 Geometric, Topological, and Algebraic Structures - Course & Paper.txt

NotebookLM Mind Map Dimensionality Reduction A Geometric Taxonomy.png

Dante Monson

unread,

Aug 1, 2025, 7:02:57 AMAug 1

to econ...@googlegroups.com

https://notebooklm.google.com/notebook/b170ba5b-e5a3-455d-8578-59d1255459e6

Dante Monson

unread,

Aug 3, 2025, 8:54:48 AMAug 3

to econ...@googlegroups.com

Summary
https://docs.google.com/document/d/1ZqiKUe3UhnU4mCERN84alMn-GSyi7I4LRpKflfvyPvY/

Geometric Structures in Latent Embeddings: A Comprehensive Survey and Didactic Course

Abstract

This report provides a comprehensive examination of geometric structures in latent embeddings, building upon the taxonomy presented in the infographic "Geometric Structures in Latent Embeddings" and referencing the survey paper arXiv:2407.09468v2. The analysis highlights the evolving landscape of machine learning, transitioning from traditional Euclidean assumptions to sophisticated non-Euclidean geometries. A detailed explanation of each concept and model within the infographic is provided, elucidating their interrelationships, historical development, and broader implications for representation learning. Furthermore, the report outlines a structured, iterative didactic course designed to guide learners through the complexities of this field, emphasizing foundational principles, advanced models, and practical applications.

1. Introduction: The Geometric Revolution in Machine Learning

1.1 Beyond Euclidean: Why Geometry Matters in Latent Spaces

The field of machine learning is undergoing a profound transformation, moving beyond the conventional reliance on Euclidean spaces to embrace the rich mathematical frameworks of non-Euclidean geometry, topology, and abstract algebra.¹ Historically, machine learning models have largely operated under the assumption that data points reside in flat, Euclidean vector spaces, a perspective that formed the "backbone" of classical methods for decades.¹ However, contemporary datasets are often characterized by high dimensionality and intricate, non-linear relationships, exhibiting inherent geometric, topological, and algebraic structures that defy simple Euclidean descriptions.² Examples of such data range from the curvature of spacetime to the complex electrical patterns of neurons and the symmetries observed in physical systems.²

This recognition of data's intrinsic non-Euclidean nature is driven by the Manifold Hypothesis, a fundamental concept positing that high-dimensional data frequently lies on or near a lower-dimensional, non-linear manifold embedded within the higher-dimensional ambient space.³ The limitations of Euclidean assumptions become apparent when attempting to model such data. For instance, in Variational Autoencoders (VAEs), a standard Euclidean latent space with a Gaussian prior can lead to "arbitrary Euclidean representations" and "mismappped latent manifolds".⁸ This can manifest as issues like "posterior collapse" or latent representations that fail to capture essential angular relationships, particularly for data with inherent cyclic or directional properties.⁸

The emergence of non-Euclidean machine learning directly stems from these challenges. This development parallels the 19th-century mathematical revolutions that expanded geometry beyond Euclid's postulates.¹ By generalizing classical statistical theory and machine learning to incorporate geometric, topological, and algebraic structures, new subfields like Geometric Statistics, Geometric Deep Learning, and Topological Machine Learning have arisen.¹ This shift is not merely an incremental refinement but a fundamental redefinition of how machine learning models data. By explicitly modeling the underlying geometry of data, these advanced approaches promise improved performance, enhanced interpretability, and more faithful representations.³ This paradigm shift is poised to revolutionize how complex phenomena are modeled with machines in the 21st century.¹

1.2 Understanding Latent Embeddings: Data, Latent Space, and Dimensionality Reduction Paradigms

At the heart of representation learning is the transformation of raw, high-dimensional inputs into compact, informative representations that capture the underlying geometry, patterns, and semantic relationships within the data.¹⁰ This process involves mapping input data from its original space to a lower-dimensional

latent space.

The infographic categorizes models based on the nature of both the Data and the Latent space:

Data ( or ): This refers to the input observations. Data can reside in a high-dimensional Euclidean space (), implying points are vectors in a flat space. Alternatively, data can inherently exist on a manifold (), such as rotations, shapes, or periodic signals, where its intrinsic structure is curved.
Latent Space ( or ): This is the lower-dimensional space where the compressed, meaningful representations of the data reside. Similar to the input data, this space can be Euclidean () or a manifold (), such as a hypersphere, torus, or Lie group.

The Dimensionality Reduction Model is the function or process that performs this mapping. These models are further categorized by several characteristics:

Linear/Parametric/Nonparametric: These terms describe the nature of the mapping function.
- Linear: The mapping is a simple linear transformation, such as a matrix multiplication.
- Parametric: The mapping is defined by a fixed-form function with a finite set of learnable parameters. Neural networks, for example, are parametric models, where the weights and biases are the parameters.
- Nonparametric: The mapping is more flexible, often learned directly from the data without assuming a rigid functional form. These methods typically rely on local relationships or flexible statistical models, such as Gaussian Processes.
Encoder (E) / Decoder (D): Many modern dimensionality reduction models, particularly autoencoder architectures, consist of two main components. An Encoder (E) maps the high-dimensional data to the lower-dimensional latent space. A Decoder (D) then attempts to reconstruct the original data from this latent representation. Some methods, however, focus solely on the embedding and do not require a decoder for reconstruction, as indicated in Row 5 of the infographic.
Optimization / No Posterior / w. Posterior / Bayesian: These labels relate to the learning paradigm, especially for probabilistic models.
- ! Optimization: This typically denotes a non-probabilistic model where learning involves direct optimization of an objective function, such as minimizing reconstruction error or preserving local neighborhood structures.
- no posterior: This signifies models that do not explicitly model or infer a posterior probability distribution over the latent variables given the data.
- w. posterior: These models estimate or approximate a posterior distribution over the latent variables, providing a probabilistic understanding of the latent representation.
- Bayesian: This indicates a fully probabilistic framework where priors are incorporated over both the model parameters and the latent variables. Bayesian approaches typically aim to infer full posterior distributions, allowing for robust uncertainty quantification.

1.3 Navigating the Landscape: Overview of the Infographic's Taxonomy

The infographic presents a graphical taxonomy that integrates recent advancements in geometric structures in latent embeddings.² This framework organizes models based on the mathematical structure of both the input data and the target latent space.² The arrangement of the infographic reveals a progression in the sophistication of geometric modeling, moving from simpler assumptions to more complex and faithful representations.

The rows of the infographic categorize models based on the nature of the input data and the latent space:

Row 1: Data in Euclidean space (), Latent in Euclidean space (). This represents the classical setting where both input and latent representations are assumed to be in flat, Euclidean vector spaces.
Row 2: Data in Euclidean space (), Latent on a Manifold (). This category addresses scenarios where input data is observed in a Euclidean space, but its underlying intrinsic structure is assumed to be a lower-dimensional non-linear manifold. The goal is to embed this data into a latent space that itself is a manifold, thereby respecting the data's intrinsic geometry.
Row 3: Data on a Manifold (), Latent on a Manifold (). This represents the most geometrically aligned scenario, where the input data is explicitly understood to reside on a manifold (e.g., rotations, shapes, periodic signals), and the goal is to embed this data into a latent space that also possesses a specific manifold structure (e.g., hypersphere, Lie group, torus).
Row 4: Data on a Manifold (), Latent in Euclidean space (). While less explicitly detailed in the main cells of the infographic, this category would typically involve projecting manifold data onto a Euclidean space, often implying a simplification or a linear approximation of the manifold.
Row 5: Models with No Decoder. These methods primarily focus on learning a low-dimensional embedding or representation of the data without an explicit decoder component for reconstruction. They are often used for visualization, clustering, or as a pre-processing step, leveraging geometric or topological principles to create meaningful embeddings.

The columns classify the Dimensionality Reduction Models by their functional form: Linear, Parametric, and Nonparametric, with specific categories for Geodesic approaches.

This structured arrangement of the infographic illustrates a conceptual journey through the evolution of dimensionality reduction techniques. Initially, methods were rooted in the simplicity of flat Euclidean spaces. As the understanding of complex data grew and the Manifold Hypothesis gained prominence, models adapted to learn or represent data on curved spaces. This progression signifies a continuous effort to better align mathematical models with the perceived reality of data structure. Row 1 represents "classical" methods. Row 2 introduces the idea that even if data appears Euclidean, its intrinsic structure might be manifold-like, leading to manifold-aware latent spaces. Row 3 represents the most geometrically "faithful" scenario where both input and latent spaces are explicitly modeled as manifolds. Row 5 highlights methods that prioritize embedding and visualization without the generative aspect of a decoder, often still leveraging geometric principles. This structure guides the exploration through increasing levels of geometric complexity and fidelity in modeling.

2. Foundations of Dimensionality Reduction: Euclidean Spaces (Infographic Row 1)

2.1 Data in Euclidean Space (), Latent in Euclidean Space ()

This foundational row of the infographic depicts the classical setting for dimensionality reduction. In this paradigm, both the high-dimensional input data () and its corresponding low-dimensional latent representation () are assumed to reside in standard Euclidean vector spaces. In such spaces, distances are measured using the familiar Euclidean metric, and "straight lines" are simply straight lines. Data points are treated as coordinates in a flat space.² While this assumption offers significant mathematical convenience and computational tractability, it often falls short in capturing the true, non-linear, or intrinsically curved geometry of many real-world datasets.¹

2.2 Linear Models: PCA, Probabilistic PCA, Bayesian PCA

Linear dimensionality reduction techniques aim to find linear projections that capture the most significant variations within the data. These methods are computationally efficient and offer a clear interpretation of the principal directions.

PCA [Pearson01]: Principal Component Analysis is a cornerstone technique for linear dimensionality reduction. It operates by finding a new set of orthogonal axes, known as principal components, along which the variance of the projected data is maximized.¹¹ Essentially, PCA rotates the coordinate system to align with the directions of greatest data spread, allowing for data compression by retaining only the components with the highest variance.
Probabilistic PCA: This method extends classical PCA by introducing a generative probabilistic model. It posits that data points are generated by a linear transformation of a lower-dimensional latent variable, corrupted by isotropic Gaussian noise. This probabilistic formulation provides a likelihood-based framework for PCA, which is advantageous for handling missing data and offers a more principled approach to model selection.
Bayesian PCA: Building upon Probabilistic PCA, Bayesian PCA incorporates Bayesian inference by placing prior distributions over the model parameters. This allows for a more robust estimation of the parameters, automatic determination of the intrinsic latent dimensionality from the data, and a comprehensive quantification of uncertainty in the latent representations.

The progression from deterministic PCA to Probabilistic PCA and then to Bayesian PCA illustrates a general trend in machine learning: the move from purely algorithmic, point-estimation approaches to more robust probabilistic and Bayesian frameworks. This evolution is driven by the desire to quantify uncertainty, handle incomplete data, and provide a more comprehensive generative model of the underlying data distribution, rather than just a transformation.

2.3 Parametric Autoencoders: AE, VAE, Full-VAE

Autoencoders (AEs) and their variants represent a class of neural network-based models that learn a compressed representation of data in a bottleneck layer.

AE: An Autoencoder is a neural network designed to learn an identity function, meaning its output should ideally be identical to its input. It achieves dimensionality reduction by forcing the input through a bottleneck layer, which forms the lower-dimensional latent code. The network consists of an encoder, which maps the input to this latent code, and a decoder, which reconstructs the input from the latent code. The learned latent code serves as the compressed representation.
VAE [Kingma14]: The Variational Autoencoder marks a significant conceptual advancement over traditional AEs. VAEs are generative models that learn a probabilistic mapping between the data space and a latent space. Instead of encoding an input to a single fixed point in the latent space, the encoder of a VAE outputs parameters (typically the mean and variance) of a probability distribution (commonly a Gaussian) in the latent space. The decoder then samples from this distribution to reconstruct the input. VAEs are trained by maximizing a lower bound on the data log-likelihood, known as the Evidence Lower Bound (ELBO). The ELBO comprises two terms: a reconstruction loss, which encourages fidelity to the input, and a Kullback-Leibler (KL) divergence term, which regularizes the learned latent distribution towards a predefined prior distribution (e.g., a standard Gaussian).⁸
Full-VAE [Kingma14]: This term typically refers to the standard VAE architecture as originally introduced by Kingma and Welling, emphasizing its complete framework for both probabilistic encoding and decoding.

VAEs represent a crucial bridge from simple dimensionality reduction to probabilistic generative modeling. By learning a distribution over the latent space, VAEs enable the generation of new, unseen data samples and facilitate interpolation within the latent space (though interpolation in Euclidean VAEs can sometimes be problematic, as discussed later).¹⁴ The explicit regularization of the latent space through the KL divergence term encourages a desired structure, such as a Gaussian distribution.⁸ This framework became a cornerstone for exploring latent space geometry, as its probabilistic nature naturally led to considerations of how distributions behave in these spaces. This, in turn, directly motivated the development of Riemannian VAEs to address the limitations inherent in assuming a flat Euclidean prior.⁸

2.4 Nonparametric Methods: PC, LLE, GPLVM

Nonparametric dimensionality reduction methods offer greater flexibility by not assuming a fixed parametric form for the mapping or the latent space. Instead, they infer relationships directly from the data, often by preserving local neighborhood structures or employing flexible models like Gaussian Processes.

PC [Hastie89]: Principal Curves are a non-linear generalization of PCA. Instead of finding linear subspaces, Principal Curves seek a smooth, one-dimensional curve that passes through the "middle" of the data, minimizing the squared distance from each data point to the curve.¹¹ This allows them to capture curvilinear structures that linear PCA cannot.
LLE: Locally Linear Embedding is a manifold learning algorithm that focuses on preserving the local neighborhood structure of the data. It operates under the assumption that each data point can be accurately reconstructed as a linear combination of its nearest neighbors. LLE then seeks a low-dimensional embedding where these same linear reconstruction weights hold true, thereby preserving the local geometry.
GPLVM [Lawrence03]: The Gaussian Process Latent Variable Model is a probabilistic, non-linear dimensionality reduction technique. It employs Gaussian Processes to define a flexible, non-parametric mapping from a low-dimensional latent space to the high-dimensional data space. It is a generative model where the latent variables are inferred, and the non-parametric nature of the Gaussian Process prior allows it to discover complex, non-linear relationships without pre-defining a specific functional form for the mapping.¹⁵
Probabilistic PC [Chang01]: This is likely a probabilistic extension of Principal Curves, similar in spirit to Probabilistic PCA but adapted for non-linear curves, providing a generative framework for such structures.
Bayesian GP LVM [Lawrence03]: This is a Bayesian extension of the GPLVM. It places priors over the Gaussian Process hyperparameters and the latent variables, enabling robust uncertainty quantification and facilitating automatic model selection for the latent dimensionality.

Even within the "Euclidean latent space" category, these nonparametric methods implicitly acknowledge the manifold hypothesis. LLE explicitly aims to preserve local geometry, suggesting that the data, while residing in Euclidean space, might be sampled from an underlying curved manifold. Principal Curves directly seek a non-linear "backbone" for the data, moving beyond linear approximations. GPLVM's inherent flexibility allows it to discover complex, non-linear relationships without rigid parametric assumptions, effectively learning an implicit manifold. These methods thus laid crucial groundwork for later, more explicit geometric approaches by demonstrating the limitations of strictly linear or pre-defined parametric models when dealing with complex data.

2.5 Interconnections and Limitations of Euclidean Approaches

The models within the Euclidean framework exhibit a clear evolutionary path and distinct functionalities. PCA serves as a fundamental linear baseline, with its probabilistic and Bayesian extensions adding robustness and uncertainty quantification. Autoencoders provide a neural network-based, often non-linear, approach to reconstruction. VAEs further enhance AEs by introducing probabilistic generative capabilities, enabling sampling and more principled representation learning. Nonparametric methods like LLE and GPLVM offer flexible ways to uncover non-linear structures, even if the latent space itself remains Euclidean.

Despite their utility, the primary limitation of these approaches is their inherent assumption of a flat, Euclidean latent space. This assumption can lead to several challenges:

Geometric Distortions: Euclidean distances in the latent space may not accurately preserve the true geodesic distances on an underlying, intrinsically curved manifold.¹⁷ This can lead to misleading interpretations of similarity or dissimilarity between data points.
Inefficient Representation: Particularly in high dimensions, standard Gaussian priors used in VAEs tend to concentrate probability mass on a thin shell, leading to "voids and sparsity" in the latent space.⁹ This sparsity can hinder the generation of meaningful new samples and limit the effective utilization of the latent space.
Lack of Interpretability: The latent dimensions learned by these models can often be entangled, making it difficult to attribute specific semantic meanings or disentangle underlying factors of variation.⁵
Inability to Capture Intrinsic Geometry: Data with inherent cyclic, directional, or hierarchical structures are poorly represented by standard Gaussian distributions or flat Euclidean spaces.⁹ For example, a Euclidean latent space struggles to model rotations or angles naturally.

The collective limitations of Euclidean approaches underscore the necessity for more geometrically informed methods that can explicitly model and leverage the non-Euclidean structures inherent in complex data.

Model Name	Type	Core Mechanism	Key Strengths	Key Limitations	Probabilistic/Bayesian Variants
PCA	Linear	Variance maximization (linear projection)	Simplicity, interpretability, computational efficiency	Only captures linear relationships, sensitive to outliers	Probabilistic PCA, Bayesian PCA
AE	Parametric (NN)	Reconstruction (bottleneck compression)	Non-linear compression, learns complex mappings	No probabilistic sampling, can lead to "dead" latent units	N/A
VAE	Parametric (NN)	Probabilistic generative model (KL regularization)	Generative capabilities, principled regularization	Interpolation issues in Euclidean space, posterior collapse, "voids" in latent space	Full-VAE
Principal Curves	Nonparametric	Non-linear curve fitting	Captures curvilinear trends, generalizes PCA	Computationally intensive, sensitive to initialization	Probabilistic PC
LLE	Nonparametric	Local linearity preservation	Handles complex topologies, preserves local structure	Global structure not guaranteed, sensitive to neighborhood size	N/A
GPLVM	Nonparametric (Gaussian Process)	Flexible non-linear mapping (generative)	Probabilistic, discovers non-linear manifolds	Scalability issues for large datasets, computationally demanding	Bayesian GP LVM

Table 1: Comparison of Euclidean Dimensionality Reduction Models

3. Embracing Curvature: Geodesic Structures in Latent Embeddings (Infographic Row 2)

3.1 Data in Euclidean Space (), Latent on a Manifold ()

This section of the infographic addresses a crucial shift in perspective: while input data may be observed in a Euclidean space (), its underlying, intrinsic structure is assumed to be a lower-dimensional non-linear manifold. The objective here is to embed this data into a latent space that itself is a manifold (), thereby respecting the data's true intrinsic geometry. This approach is motivated by the understanding that many real-world datasets, even when represented in a flat ambient space, inherently live on curved substructures.

A central concept in this context is that of geodesics. On a curved manifold, geodesics are the "straightest possible paths" or the shortest paths between two points, serving as the direct analogue to straight lines in Euclidean space.⁸ They are fundamental for defining meaningful distances, performing interpolations, and identifying "principal directions" that accurately reflect the geometry of the curved space. The primary problem that geodesic approaches aim to solve is the distortion introduced by standard Euclidean methods. A linear interpolation in a flat Euclidean latent space, for example, might traverse "voids" or low-density regions that hold no semantic meaning on the data's true underlying manifold.¹⁴ By explicitly modeling the latent space as a manifold and using geodesics, these methods ensure that relationships and transformations within the latent space are geometrically consistent with the data's intrinsic structure.

3.2 Geodesic Approaches: Tangent PCA, GPCA, PGA (and their Probabilistic/Bayesian variants)

These methods generalize Principal Component Analysis (PCA) to data that lies on or near Riemannian manifolds. Instead of projecting data onto linear subspaces, they seek "principal geodesics" or manifold-aware directions of variation.

Tangent PCA [Fletcher04]: This approach addresses the challenge of applying PCA to curved data by locally approximating the manifold. It identifies a reference point on the manifold (often the Fréchet mean of the data) and then projects nearby data points onto the tangent space at that point.¹¹ Standard PCA is then performed within this flat tangent space. The mechanism involves estimating the tangent space at a point from local data samples and subsequently applying PCA to these tangent vectors.²⁴ While computationally advantageous due to its reliance on linear operations, Tangent PCA has limitations. It can introduce distortion, particularly if the data is widely spread or if the manifold exhibits high curvature in the region of interest.²¹ Its effectiveness relies on the assumption that data points are clustered sufficiently close to the chosen intrinsic mean.²⁶
GPCA [Huckeman10]: Generalized Principal Component Analysis, in the context of manifold learning, typically refers to methods that explicitly account for the manifold's curvature by finding geodesic principal components. These components minimize the variance of projection residuals, offering a more geometrically coherent approach than simple tangent space approximations.¹³ Some interpretations of GPCA also describe it as an algebro-geometric solution for segmenting multiple subspaces from data, extending PCA to scenarios with varying dimensions.²⁸ In the infographic's context, the focus is on its geodesic generalization.
PGA: Principal Geodesic Analysis is a direct and widely used generalization of PCA for data residing on Riemannian manifolds. It aims to identify geodesic submanifolds that best capture the data's variability by maximizing projected variance or minimizing reconstruction error, effectively replacing the straight lines of PCA with geodesics.¹⁸
- Exact PGA (EPGA): A specific variant of PGA that avoids linear approximations, striving for greater geometric fidelity. However, EPGA is known to be computationally expensive, requiring iterative non-linear optimization steps for both projection onto the geodesic submanifold and finding the optimal geodesic direction.²⁶
Probabilistic PGA [Zhang13] & Bayesian PGA [Zhang14]: These extend PGA into a probabilistic framework, providing a latent variable model for factor analysis on manifolds.²⁹ This allows for the principled handling of uncertainty and can facilitate automatic dimensionality reduction by suppressing irrelevant principal geodesics.³⁰

The development of these geodesic PCA variants highlights a fundamental trade-off between geometric fidelity and computational tractability. Tangent PCA offers computational efficiency but at the risk of distorting data in regions of high curvature. In contrast, Exact PGA provides a more geometrically accurate solution but comes with a significantly higher computational cost. This ongoing tension drives research to develop more efficient algorithms for exact methods or more accurate, yet tractable, approximations. Probabilistic and Bayesian variants attempt to manage this balance by providing a principled framework for uncertainty quantification, which can inform the reliability of the learned geometric structures.

3.3 Parametric Models: Riemannian VAE, Riemannian Full-VAE

These models represent a class of Variational Autoencoders where the latent space is explicitly structured as a Riemannian manifold, moving beyond the traditional flat Euclidean assumption. This design allows the VAE to learn and respect the intrinsic geometry of the data, leading to more meaningful representations and generative capabilities.

Riemannian VAE [Miolane20]: This model learns a Riemannian latent structure by defining a parametrized Riemannian metric, which dictates how distances and curvatures are measured in the latent space.³¹ A key innovation is the replacement of the standard Gaussian prior with a Riemannian Brownian motion prior. This prior is specifically designed to align with the manifold immersed by the decoder, ensuring that sampling from the latent space is meaningful and restricted to high-density, relevant regions of the manifold, thereby avoiding the "voids" or low-density areas that can plague standard Euclidean VAEs.⁸ The benefits include addressing issues like arbitrary Euclidean representation and mismapped latent manifolds, leading to more principled latent codes.⁸
Riemannian Full-VAE [Miolane20]: This term implies a comprehensive framework for Riemannian VAEs, encompassing the full generative and inference capabilities on a Riemannian manifold, leveraging the principles of Riemannian geometry for both encoding and decoding processes.

The evolution of VAEs from their Euclidean origins to Riemannian VAEs demonstrates their utility as a powerful testbed for exploring latent space geometry. Standard VAEs often struggle with Euclidean priors when the underlying data manifold is non-Euclidean, leading to issues such as posterior collapse or poor interpolation.⁸ Riemannian VAEs directly address these challenges by introducing Riemannian metrics and Brownian motion priors, showcasing that aligning the latent space's geometry with the data's intrinsic structure is crucial for improved generation, interpolation, and interpretability. This development underscores that VAEs are not merely generative models but also potent tools for learning and understanding the geometric structure of latent spaces.

3.4 Nonparametric Methods: Principal Flows, Riemannian PC, Riemannian LLE

These nonparametric methods extend their Euclidean counterparts to Riemannian manifolds, offering flexibility without assuming a fixed parametric form for the manifold or the mapping.

Principal Flows [Panaretos14]: This method generalizes PCA to Riemannian manifolds by defining curves, termed "principal flows," that are capable of capturing not only geodesic variations but also more complex, non-geodesic forms of variation.⁵ A principal flow is a curve on the manifold that passes through the mean of the data. At any point along this curve, its tangent velocity vector attempts to locally fit the first eigenvector of a tangent space PCA, subject to a smoothness constraint.³² This distinguishes Principal Flows from methods like PGA, which strictly focus on geodesics, allowing for a more flexible modeling of complex, possibly meandering, patterns of variation on a manifold.
Riemannian PC [Hauberg16]: This refers to Riemannian Principal Components, a broad category of generalizations of PCA for data on Riemannian manifolds. While specific implementations may vary, the core principle is to find principal components that respect the manifold's geometry, adapting the concept of variance maximization to curved spaces.¹²
Riemannian LLE [Maignant23]: This is a generalization of Locally Linear Embedding to Riemannian manifolds. Its objective is to preserve local neighborhood relationships and distances, but with respect to geodesic distances on the curved manifold rather than Euclidean distances.

The distinction between methods like PGA, which focus on geodesics (shortest paths on manifolds), and Principal Flows, which can capture non-geodesic forms of variation, highlights a nuanced understanding of "principal directions" on manifolds. While geodesics are the most direct generalization of straight lines, real-world data variation might not always follow the shortest path. Principal Flows offer a more flexible way to model complex, yet still geometrically informed, trajectories. This indicates a growing sophistication in modeling manifold data, moving beyond just shortest paths to more complex, yet geometrically constrained, patterns of variation. The choice between geodesic-focused methods and more flexible path-finding methods depends on the assumed nature of data variation, reflecting the increasing maturity of the field in adapting to diverse data characteristics.

3.5 How Geodesic Models Extend Euclidean Concepts

The transition from Euclidean to geodesic models involves a systematic generalization of fundamental mathematical concepts:

Generalization of Distance: The familiar Euclidean distance is replaced by the geodesic distance, which is the length of the shortest path between two points along the curved surface of the manifold.⁸
Generalization of Mean: The Euclidean mean is replaced by the Fréchet mean (also known as the intrinsic mean or Karcher mean), which is the point on the manifold that minimizes the sum of squared geodesic distances to all data points.¹⁸
Generalization of Linear Subspaces: Linear subspaces, which are central to PCA, are replaced by geodesic submanifolds (in PGA) or principal flows (in Principal Flows).¹³ These are curved analogues that capture modes of variation while respecting the manifold's curvature.
Generalization of Gaussian Priors: Standard Gaussian priors, commonly used in Euclidean VAEs, are replaced by Riemannian Brownian motion priors or other manifold-aware distributions that are naturally defined on curved spaces.⁸ This ensures that probability distributions and sampling processes are consistent with the underlying manifold geometry.

This systematic generalization is not merely a direct translation but a fundamental re-conceptualization of machine learning principles. It involves replacing Euclidean notions of distance, linearity, and probability distributions with their manifold-equivalent counterparts, requiring a deep understanding of differential geometry.⁸ This allows machine learning models to operate directly on the intrinsic geometry of data, rather than forcing data into an ill-fitting Euclidean mold. This capability opens up new possibilities for more accurate modeling of phenomena where curvature and non-linearity are inherent, such as in shape analysis, medical imaging, and neuroscience.

Model Name	Generalization of PCA	Approach to Manifold Data	Key Strengths	Key Limitations	Probabilistic/Bayesian Variants
Tangent PCA	PCA	Local linearization (tangent space at mean)	Computationally efficient, conceptually simple	Distortion in high curvature/data spread, requires data clustering around mean	None explicitly cited
GPCA	PCA (generalized)	Explicitly accounts for curvature (geodesic principal components)	Geometrically coherent, robust to curvature	Computationally complex, multiple interpretations of "GPCA"	None explicitly cited
PGA	PCA	Geodesic submanifolds (finds shortest paths of variation)	Captures intrinsic non-linear variation	Computationally expensive (Exact PGA), sensitive to optimization	Probabilistic PGA, Bayesian PGA
Principal Flows	PCA	Non-geodesic curves (flexible paths of variation)	Captures flexible, non-geodesic patterns	More complex definition and solution, computationally demanding	None explicitly cited

Table 2: Geodesic PCA Variants and their Manifold Adaptations

4. Intrinsic Manifold Data: Learning Curved Latent Spaces (Infographic Row 3)

4.1 Data on a Manifold (), Latent on a Manifold ()

This category represents the most geometrically aligned scenario in latent embedding, where the input data itself is explicitly understood to reside on a manifold (), and the objective is to embed this data into a latent space that also possesses a specific manifold structure (). This "matching" of geometries, where the latent space's inherent structure mirrors that of the data, is expected to yield the most faithful, interpretable, and performant representations.⁹

Examples of data inherently residing on manifolds include:

Rotations and Orientations: Data representing 3D rotations (e.g., in robotics or computer graphics) naturally lie on Lie groups like SO(3).
Shapes: Collections of shapes (e.g., medical images, anatomical structures) can be modeled as points on shape manifolds.
Periodic Signals: Data with cyclic or periodic characteristics, such as time of day, angles, or biological cycles, are naturally represented on a torus.³⁴
Hierarchical Structures: Data exhibiting tree-like or hierarchical relationships (e.g., taxonomies, ontologies) can be effectively embedded in hyperbolic spaces.
Neural Activity: In neuroscience, latent states like head direction or spatial position are hypothesized to reside on specific geometric structures like rings or spheres.¹⁶

When the model's latent space geometry is specifically designed to match these intrinsic data structures, it leads to representations that are not only more accurate but also more semantically meaningful. This informed design approach leverages prior knowledge about the data's underlying properties to create latent spaces that inherently capture and preserve these characteristics.

4.2 Parametric Models: Hypersphere-VAE, LieVAE, Toroidal VAE

These are specialized Variational Autoencoder (VAE) architectures engineered to explicitly enforce specific non-Euclidean geometries, such as hyperspheres, Lie groups, or tori, within their latent spaces.

Hypersphere-VAE: This class of VAEs models the latent space as a hypersphere (a unit sphere in higher dimensions). The motivation for this design stems from the limitations of standard Gaussian VAEs, which struggle with directional or cyclic data. Gaussian distributions do not naturally capture angular relationships, and in high dimensions, their probability mass tends to concentrate on a thin shell, failing to align with the intrinsic geometry of many datasets.⁹ Hypersphere-VAEs address these issues by naturally respecting and preserving the directional and cyclic structure inherent in the data, and by preventing problems like posterior collapse.⁹
- Early work, such as Davidson18, established the concept. More recent advancements utilize distributions like the spherical Cauchy (spCauchy) distribution, which is defined directly on the unit sphere. The spCauchy-VAE offers geometric consistency, improved numerical stability (by avoiding Bessel functions used in von Mises-Fisher (vMF) distributions), and efficient sampling.⁹ Other approaches involve parameterizing latent variables using hyperspherical coordinates, which can compress latent vectors towards a specific region on the hypersphere, thereby reducing sparsity and improving generation ability.¹⁴
LieVAE [Falorsi18]: These VAEs model their latent space as a Lie group. Lie groups are smooth manifolds that also possess a group structure, allowing for continuous transformations and symmetries.⁷ This makes them particularly well-suited for modeling data that exhibits inherent symmetries, such as rotations, translations, or other transformations observed in images or physical systems.⁷ LieVAEs operate by learning Lie group operators that model infinitesimal transformations on the manifold.⁷ A variational distribution over coefficients provides a generative model of the manifold, enabling the generation of feature augmentations that preserve identity and disentangle factors of variation.⁷ This framework provides a geometric description of how high-dimensional data can vary with a few degrees of freedom, formally aligning with the manifold hypothesis.⁷
Toroidal VAE [Mikulski19]: These VAEs employ a torus as their latent space. This architecture is ideal for data characterized by periodic features, such as time of day, angles of rotation, or biological cycles, where Euclidean or spherical spaces are topologically unsuitable.³⁴ Toroidal VAEs typically use a "circular spring loss" to enforce a uniform distribution on the periodic latent variables, often working with polar coordinates.³⁵ A significant benefit of the toroidal topology is its enablement of "multiple-path morphing," where interpolations between two points in the latent space can follow different geodesic paths around the torus, potentially crossing through various semantic regions.³⁵

The deliberate choice of specific latent manifold geometries (hypersphere, Lie group, torus) for these VAEs, rather than a generic manifold, is driven by the intrinsic nature of the data. If data inherently exhibits directional, rotational, or periodic properties, imposing a latent space with the corresponding geometry leads to more faithful, interpretable, and often improved performance.⁹ For example, interpolating on a sphere yields smooth angular transitions, while on a torus, it enables multi-path morphing.⁹ This trend signifies a move towards

informed design of latent spaces, where prior knowledge about the data's intrinsic structure is leveraged to define the latent manifold. This not only enhances model performance but also improves the interpretability and controllability of the learned representations, which is critical for applications like style transfer in language models or generative molecular design.⁹

4.3 Nonparametric Methods: Manifold GPLVM

Manifold GPLVM [Jensen20]: The Manifold Gaussian Process Latent Variable Model (mGPLVM) extends the traditional GPLVM to operate with non-Euclidean latent spaces. This probabilistic latent variable model is designed to discover latent states that reside on specific symmetric manifolds, such as spheres, tori, or rotation groups.¹⁵ The mGPLVM jointly infers these global latent states on the manifold and learns non-parametric tuning curves for each data dimension (e.g., neuronal activity in neuroscience) using Gaussian Processes.¹⁶ It employs variational inference, often leveraging frameworks like ReLie for reparameterization on Lie groups, to handle the complexities of probabilistic inference on curved spaces.¹⁶ A significant advantage of mGPLVM is its ability to perform principled model selection between candidate manifolds and to quantify uncertainty in the learned representations.¹⁶ For instance, in neuroscience applications, it has been shown to correctly recover the ring topology expected from neural populations encoding head direction.¹⁶

The probabilistic nature of mGPLVM allows for more than just assuming a manifold; it enables the discovery or selection of the most appropriate manifold geometry for a given dataset. The quantification of uncertainty provides robustness and a principled way to compare different geometric hypotheses about the data. This positions mGPLVM as a powerful tool for scientific discovery, particularly in fields where underlying "latent states" (e.g., head direction, spatial position) are hypothesized to reside on specific geometric structures. It provides a data-driven method to validate or refine these hypotheses, moving beyond mere modeling to actual scientific insight.

4.4 The Power of Matching Data and Latent Geometries

Aligning the geometry of the latent space with the intrinsic geometry of the data offers several significant advantages:

Enhanced Fidelity: When the latent space geometry matches the intrinsic geometry of the data, the resulting representations are more faithful and less prone to distortion. This ensures that distances and relationships in the latent space accurately reflect those in the original data manifold.
Improved Interpretability: Geometric structures in the latent space can directly correspond to meaningful variations in the data. For example, movement along a geodesic in a LieVAE's latent space might correspond to a specific rotation in an image, or a path in a Toroidal VAE might represent a smooth transition through a periodic cycle.⁹ This direct correspondence enhances the interpretability of the learned representations.
Better Generalization and Generation: By respecting the data's true manifold, models can generate more realistic and diverse samples. Furthermore, interpolations within such geometrically informed latent spaces become more meaningful, as they follow paths consistent with the data's natural variations, avoiding "voids" or unrealistic transitions.⁹
Addressing Challenges: Explicitly modeling manifold geometry helps mitigate common issues observed in standard Euclidean VAEs, such as posterior collapse (where the latent space becomes uninformative) and sparse latent spaces (where the model fails to utilize the full capacity of the latent space).⁸

Model Name	Target Latent Manifold	Primary Use Case / Data Type	Key Mechanism	Advantages	Limitations
Hypersphere-VAE	Hypersphere (S^n)	Directional/cyclic data (word embeddings, image features)	Spherical Cauchy/vMF distributions, spherical coords	Geometric consistency, numerical stability (spCauchy), avoids posterior collapse	Potential for sparsity/voids (vMF), complexity of spherical math
LieVAE	Lie Group	Data with inherent symmetries (rotations, images, transformations)	Lie group operators, variational inference	Captures symmetries, identity preservation, disentangles factors	Computational complexity, specific to Lie group structures
Toroidal VAE	Torus (T^d)	Periodic data (angles, time of day, biological cycles)	Circular spring loss, polar coordinates	Multi-path morphing, naturally handles periodic features	Can struggle with correlations in latent space, may create "gaps"
Manifold GPLVM	Symmetric Manifolds (S^n, T^d, SO(3))	Neuroscience (head direction), complex time series	Gaussian Processes on manifolds, variational inference	Probabilistic model selection, uncertainty quantification, discovers topology	Scalability for very large datasets, inference complexity on general manifolds

Table 3: Specialized VAEs for Manifold Latent Spaces

5. Beyond Encoder-Decoder: Direct Embeddings (Infographic Row 5)

5.1 Models Without a Decoder: UMAP, Barycentric Subspaces, IFBD, Poincare Embeddings, Principal Sub-bundles, Rie-SNE

This final category of models in the infographic focuses on learning low-dimensional embeddings or representations of data without an explicit decoder component for reconstruction. These methods are primarily used for visualization, clustering, or as a preprocessing step for other machine learning tasks. Despite the absence of a decoder, they fundamentally leverage geometric or topological principles to create meaningful and structured embeddings.

UMAP [Leland18]: Uniform Manifold Approximation and Projection is a non-linear dimensionality reduction technique widely used for visualization and general-purpose manifold learning. It operates by constructing a high-dimensional graph that represents the data's topological structure (its connectivity and neighborhood relationships). Subsequently, it optimizes a low-dimensional graph to be as structurally similar as possible to the high-dimensional graph. UMAP's theoretical foundation draws from both Riemannian geometry and algebraic topology, making it effective at preserving both local and global data structures.
Barycentric subspaces [Pennec18]: This refers to a class of methods that represent data points as barycentric combinations within a geometric space, often specifically on manifolds. This approach is valuable for tasks such as computing averages (Fréchet means) or performing interpolations on curved spaces, where standard linear averaging is inappropriate.
IFBD: While the full name is not provided in the snippet, "IFBD" likely refers to methods like "Intrinsic Fréchet Barycentric Discriminant" or similar techniques that leverage intrinsic means and barycentric coordinates on manifolds for tasks such as classification, clustering, or dimensionality reduction, by considering the intrinsic geometry.
Poincare Embeddings [Nickel17]: These methods embed data into a hyperbolic space, often using the Poincaré disk model. Hyperbolic geometry is particularly well-suited for representing data with hierarchical or tree-like structures, as it can embed trees with minimal distortion. Distances in hyperbolic space naturally reflect the hierarchical relationships, making it effective for tasks involving classification or knowledge graph representation.
Principal Sub-bundles [Akhoj20]: This likely refers to methods that identify principal structures within fiber bundles. Fiber bundles are sophisticated geometric constructions used to model spaces that locally resemble a product of two simpler spaces. Identifying "principal sub-bundles" could be relevant for analyzing data with complex, multi-layered dependencies or structures where local variations are organized around a global framework.
Rie-SNE: Riemannian t-Distributed Stochastic Neighbor Embedding is a generalization of the popular t-SNE algorithm ¹² to Riemannian manifolds. While t-SNE focuses on preserving local relationships for effective data visualization in Euclidean space, Rie-SNE extends this by ensuring that the preservation of local neighborhoods is performed with respect to geodesic distances on the curved manifold, thereby providing a more geometrically accurate visualization of manifold data.

5.2 Their Role in Geometric Representation Learning and Visualization

Models without decoders play a vital role in geometric representation learning and visualization:

Visualization: These methods are crucial for understanding the complex structure of high-dimensional data. By projecting data into 2D or 3D, they allow human interpretation while striving to preserve meaningful relationships and the intrinsic geometry of the data.¹²
Topology Preservation: Algorithms like UMAP and Rie-SNE are specifically designed to preserve the topological structure of the data, including its connectivity and neighborhood relationships.¹ This is paramount for accurately representing complex relationships and for tasks like anomaly detection or cluster analysis.
Clustering and Downstream Tasks: The learned embeddings, being compact and semantically meaningful, are frequently used as input for clustering algorithms or as features for other machine learning tasks, often leading to improved performance due to the better-structured representation.⁹
Exploration of Latent Spaces: These methods empower researchers to explore and manipulate representations within discrete or continuous latent spaces. This exploration can target fine-grained semantic control, allowing for a deeper understanding of the underlying data generation process or semantic properties.¹⁰

Although these models lack a decoder and thus are not "generative" in the autoencoder sense (i.e., they don't reconstruct the input), a high-quality embedding implicitly creates a meaningful low-dimensional representation that reflects the underlying data structure. For visualization purposes, this means that proximity in the embedding space should correspond to similarity in the original data, suggesting an underlying generative process or inherent structure. For instance, Poincaré embeddings implicitly "generate" a hierarchical structure in hyperbolic space that mirrors the data's hierarchy. The absence of a decoder, therefore, does not signify an absence of geometric principles or a model of the data's intrinsic organization. Instead, these methods are powerful tools for discovering and visualizing hidden geometries, which can then inform other generative or analytical tasks, emphasizing that representation learning is not solely about reconstruction but also about revealing latent structures.

6. Interrelationships and Building Blocks: A Unified Perspective

6.1 Evolution from Euclidean to Non-Euclidean: A Conceptual Journey

The progression of geometric structures in latent embeddings represents a profound conceptual journey in machine learning. It commences with the mathematical simplicity and computational tractability of Euclidean spaces, which formed the traditional "backbone" of classical machine learning.¹ This initial phase (Row 1 of the infographic) relied on linear transformations and flat latent spaces.

However, the increasing complexity of real-world data and the growing understanding of the Manifold Hypothesis ³ revealed the limitations of these Euclidean assumptions. The recognition that data often resides on intrinsically non-linear, curved manifolds, even if observed in a high-dimensional Euclidean space, spurred the development of methods that acknowledge these intrinsic structures. This led to the paradigm represented in Row 2, where the latent space is explicitly modeled as a manifold, allowing for the application of geodesic concepts like Tangent PCA, GPCA, and PGA.

The journey culminates in Row 3, where both the input data and the latent space are explicitly understood to be manifolds. This represents the most geometrically aligned scenario, leading to the design of specialized latent spaces (e.g., hyperspheres, Lie groups, tori) that precisely match the inherent geometries of the data. This approach offers the highest fidelity and interpretability.

This evolution is not a series of disconnected advancements but an iterative refinement of geometric assumptions. Initial models were simple, reflecting the computational constraints and theoretical understanding of the time. As data complexity increased and computational power grew, researchers could afford to make more sophisticated geometric assumptions. This continuous effort aims to better align the mathematical models with the perceived reality of data structure. The trajectory from flat to increasingly curved and specialized geometries suggests that future advancements will likely involve even more complex and adaptive methods that can infer the optimal geometry for a given dataset, moving beyond fixed assumptions to dynamically discover the most appropriate underlying structure.

6.2 Parametric vs. Nonparametric: Flexibility, Assumptions, and Trade-offs

The choice between parametric and nonparametric models is a recurring theme in dimensionality reduction, each offering distinct advantages and trade-offs.

Parametric Models: These models, exemplified by Autoencoders, VAEs, Hypersphere-VAEs, LieVAEs, and Toroidal VAEs, rely on fixed functional forms, often implemented as neural networks, with a finite set of learnable parameters. They offer strong generalization capabilities, especially when the chosen functional form is a good approximation of the true underlying data generating process. Once trained, inference (encoding or decoding) is typically computationally efficient. However, their rigidity can be a limitation if the true underlying structure deviates significantly from the assumed parametric form.
Nonparametric Models: Methods such as LLE, GPLVM, Principal Curves, and Principal Flows do not assume a fixed functional form for the mapping or the latent space. Instead, they infer relationships directly from the data, often by preserving local neighborhood structures or employing flexible statistical models like Gaussian Processes. This offers greater flexibility to capture arbitrary non-linear structures without strong prior assumptions about their form. However, nonparametric models can be computationally more intensive, especially during training, and may require more data to accurately learn complex relationships.

The selection between parametric and nonparametric approaches depends on several factors: the extent of prior knowledge about the data's structure, the desired level of interpretability, available computational resources, and the specific requirements for explicit generative capabilities. Parametric models are often preferred for large-scale generative tasks where efficiency is critical, while nonparametric models excel in discovering complex, implicit structures where the underlying geometry is unknown or highly irregular.

6.3 The Role of Optimization, Priors, and Metric Learning

Effective geometric latent learning is a confluence of sophisticated optimization techniques, principled probabilistic frameworks, and often, the ability to learn the appropriate metric itself.

Optimization: All models, whether parametric or nonparametric, fundamentally involve optimization processes to learn their parameters or embeddings. For models operating on non-Euclidean spaces, this often necessitates the use of specialized Riemannian optimization algorithms. These algorithms, such as Riemannian gradient descent, are designed to perform optimization directly on the manifold, respecting its curvature and structure, rather than relying on Euclidean approximations that can lead to suboptimal solutions or invalid points.¹⁶
Priors: In probabilistic models like VAEs and GPLVMs, prior distributions over latent variables or model parameters are crucial for regularization and ensuring meaningful sampling. For geometric models, these priors are adapted to the manifold's structure. For instance, Riemannian VAEs utilize a Riemannian Brownian motion prior instead of a standard Gaussian, which is better suited for sampling from the manifold immersed by the decoder.⁸ Similarly, Manifold GPLVMs employ uniform priors for bounded manifolds, ensuring that the latent variables are distributed appropriately on the curved space.¹⁶ These manifold-aware priors help regularize the latent space, prevent issues like posterior collapse, and ensure that generated samples are geometrically consistent.
Metric Learning: A key aspect of advanced geometric deep learning involves not just operating on a predefined manifold but actively learning a metric that better reflects the data's intrinsic geometry. Instead of assuming a fixed metric (like the Euclidean metric), models can infer a metric tensor that adapts to the local curvature and density of the data manifold. This is evident in Riemannian VAEs, which learn a parametrized Riemannian metric ⁸, or in approaches that reconstruct the Fisher information metric to analyze latent space geometry in generative models.¹¹ The ability to learn the metric allows the model to dynamically adapt to the nuances of the data's underlying manifold, leading to more accurate and robust representations.⁴

The strong interplay between optimization, priors, and metric learning is evident across the infographic's taxonomy. Effective geometric latent learning requires sophisticated optimization techniques that respect the manifold's structure, principled probabilistic frameworks that define distributions on these manifolds, and often, the ability to learn the appropriate metric itself. These three pillars are interdependent, and their tighter integration will likely define the future of geometric machine learning, leading to models that can dynamically adapt their geometric assumptions, learn optimal metrics, and perform robust inference on increasingly complex and diverse data.

6.4 Challenges and Future Directions in Geometric Latent Learning

Despite significant advancements, the field of geometric structures in latent embeddings faces several ongoing challenges and offers exciting avenues for future research:

Computational Scalability: Exact methods for computing geodesics or performing optimization on general manifolds can be computationally expensive, particularly for high-dimensional data or large datasets.²⁶ Developing scalable algorithms that can handle real-world data efficiently remains a critical challenge.
Model Complexity and Interpretability: While geometric models can enhance interpretability by aligning latent structures with meaningful data variations, the underlying mathematics can be highly complex. Balancing this mathematical rigor with accessibility and intuitive understanding for practitioners is important.
Learning Arbitrary Topologies: Learning and representing data on manifolds with arbitrary or highly complex topologies is still an open problem.⁵ Approaches like mixture models of VAEs, where each component represents a chart of a manifold, offer one promising direction.³⁶
Hybrid Geometries: Many real-world datasets may exhibit a combination of different geometric structures. Developing models that can seamlessly integrate and learn hybrid geometries (e.g., combining spherical and Euclidean latent spaces within a single VAE architecture) is an important area for future work.⁹
Uncertainty Quantification: Robustly quantifying uncertainty in latent representations and model predictions on non-Euclidean spaces is an active research area. While some probabilistic models like mGPLVM offer uncertainty quantification ¹⁶, extending this to more general manifold types and complex models remains challenging.
Semantic Representation Learning: A crucial frontier involves explicitly embedding symbolic, compositional, and semantic properties into latent spaces.¹⁰ This direction, termed "semantic representation learning," aims to bridge the gap between symbolic and distributional semantics. It includes learning structured latent spaces that represent topic attributes, hierarchical syntactic structures, and syntax-semantic relationships, which can enhance the controllability and interpretability of large language models.¹⁰ The "Linear Representation Hypothesis," suggesting that high-level concepts can be encoded within causally separated subspaces or directions of the high-dimensional latent space, provides a guiding principle for this effort.¹⁰
Security Implications: The ability to translate text embeddings from one vector space to another while preserving their geometry, even without paired data, has serious implications for the security of vector databases. Adversaries with access to embedding vectors could potentially extract sensitive information, sufficient for classification and attribute inference.³⁷ This highlights the need for research into the security and privacy aspects of geometrically structured latent spaces.

The field is moving towards not just discovering geometry but engineering it to reflect human-understandable semantic properties. This is crucial for controlling generative models, particularly large language models, and for building truly interpretable and robust artificial intelligence systems.

7. Didactic Iterative Course: Mastering Geometric Latent Embeddings

7.1 Course Philosophy and Structure

This course is meticulously designed for technically proficient learners and educators, emphasizing a foundational understanding before progressing to advanced topics. The philosophy centers on an "iterative" approach, revisiting core concepts with increasing geometric sophistication. Practical exercises and case studies will be integral to reinforce theoretical knowledge and bridge the gap between abstract mathematics and real-world applications.

The course structure is modular, with each module building upon the previous one. It begins with familiar Euclidean concepts and gradually introduces manifold theory and its applications. Each module will include comprehensive theoretical explanations, relevant mathematical background, and illustrative examples of models.

7.2 Module 1: Euclidean Foundations (PCA, VAE basics, data in , latent in )

This module establishes the baseline understanding of dimensionality reduction in flat spaces.

Topics:
- Review of linear algebra and vector spaces, essential for understanding Euclidean transformations.
- Principal Component Analysis (PCA): A detailed derivation, its properties (variance maximization, orthogonality), and classical applications in data compression and visualization.¹¹
- Introduction to Autoencoders (AEs): Architecture (encoder-decoder), training objectives (reconstruction loss), and their limitations as non-probabilistic models.
- Variational Autoencoders (VAEs): The core principles, including the Evidence Lower Bound (ELBO), the reparameterization trick, the role of the Gaussian prior, and their capabilities as generative models.⁸
- Conceptual overview of nonparametric Euclidean methods like Locally Linear Embedding (LLE) and Gaussian Process Latent Variable Models (GPLVM), highlighting their ability to capture non-linear structures even in Euclidean spaces.¹²
Learning Objectives: Learners will gain a solid understanding of classical dimensionality reduction techniques, the fundamental concept of a latent space, and the basic generative principles underlying VAEs.
Practical Exercise: Implement PCA and a simple VAE on a standard dataset (e.g., MNIST, Fashion-MNIST). Analyze the resulting latent space visualizations to observe the effects of dimensionality reduction.

7.3 Module 2: Introduction to Manifolds and Geodesics

This module provides the essential geometric intuition and mathematical tools for understanding curved spaces.

Topics:
- What is a manifold? An intuitive understanding, using examples like the surface of a sphere or a torus, distinguishing them from Euclidean spaces.
- Riemannian manifolds: Introduction to the metric tensor, which defines an inner product at each point, giving meaning to lengths and angles on curved surfaces. The concept of tangent spaces at points on a manifold will be explored.⁸
- Geodesics: Formal definition as the "straightest possible paths" or shortest paths on a curved manifold, contrasting them with straight lines in Euclidean space. Their critical importance for defining meaningful distances and interpolations on curved spaces will be emphasized.⁸
- Exponential and Logarithmic maps: These fundamental concepts connect points on a manifold to vectors in its tangent space, facilitating geometric operations.⁸
- The Manifold Hypothesis: A deeper dive into why high-dimensional data is often assumed to lie on a lower-dimensional manifold, and its implications for representation learning.³
Learning Objectives: Develop a foundational geometric intuition for curved spaces and acquire the mathematical vocabulary necessary to describe them.
Practical Exercise: Visualize simple 2D manifolds (e.g., a sphere, a torus) and trace geodesics on them using interactive tools. Explore the concept of tangent spaces at different points on these surfaces.

7.4 Module 3: Manifold Learning with Tangent Spaces

This module explores how classical linear methods are adapted to curved spaces, introducing the trade-offs between local approximations and global geodesic approaches.

Topics:
- Tangent PCA [Fletcher04]: The concept of approximating manifolds locally by their tangent spaces, and performing PCA within these flat tangent spaces.¹³ Discussion on its computational efficiency versus potential for distortion in high-curvature regions.
- Principal Geodesic Analysis (PGA): Generalizing PCA to find geodesic modes of variation, replacing linear subspaces with geodesic submanifolds.¹⁸ Discussion of Exact PGA (EPGA) and its computational challenges.²⁶
- Generalized PCA (GPCA) in the manifold context [Huckeman10]: Exploring its role in finding geodesic principal components that account for manifold curvature.²¹
- Probabilistic PGA [Zhang13] and Bayesian PGA [Zhang14]: Introducing probabilistic frameworks for factor analysis on manifolds, allowing for uncertainty quantification and automatic dimensionality reduction.²⁹
- Principal Flows [Panaretos14]: A method for capturing non-geodesic variations on manifolds, offering more flexibility than strict geodesic approaches.⁵
Learning Objectives: Understand the strategies for adapting classical linear dimensionality reduction to curved data, appreciating the trade-offs between computational cost and geometric fidelity.
Practical Exercise: Apply Tangent PCA or PGA to a dataset with a known manifold structure (e.g., synthetic data on a sphere or a shape dataset). Compare the results with standard PCA to observe the benefits of manifold-aware methods.

7.5 Module 4: Generative Models on Manifolds

This module delves into the design principles and benefits of VAEs specifically tailored for non-Euclidean latent geometries.

Topics:
- Riemannian VAEs [Miolane20]: Adapting VAEs to Riemannian latent spaces, introducing Riemannian Brownian motion priors, and understanding the role of the pull-back metric in defining the latent geometry.⁸
- Hypersphere-VAEs: Designing latent spaces on a hypersphere, utilizing spherical Cauchy or von Mises-Fisher distributions, and their effectiveness in handling directional or cyclic data.⁹
- LieVAEs [Falorsi18]: Modeling latent spaces as Lie groups to capture symmetries and transformations inherent in data, and the use of Lie group operators.⁷
- Toroidal VAEs [Mikulski19]: Structuring latent spaces as tori for periodic data, employing concepts like circular spring loss, and enabling multi-path morphing.³⁴
Learning Objectives: Grasp the specialized design principles and unique benefits of VAEs tailored for specific manifold geometries, and how these models overcome the limitations of Euclidean VAEs.
Practical Exercise: Experiment with a Hypersphere-VAE on a directional dataset (e.g., word embeddings with angular relationships) or a Toroidal VAE on periodic data. Analyze the quality of interpolations and generated samples to observe the impact of the chosen latent geometry.

7.6 Module 5: Advanced Nonparametric Manifold Learning

This module explores sophisticated nonparametric methods for discovering and modeling complex manifold structures without explicit parametric assumptions.

Topics:
- Deeper dive into Principal Flows [Panaretos14]: Detailed formulation and applications, emphasizing their ability to capture flexible, non-geodesic variations.⁵
- Manifold GPLVM (mGPLVM) [Jensen20]: Probabilistic modeling on symmetric manifolds, the use of Gaussian Process priors defined on manifolds, and advanced variational inference techniques for non-Euclidean spaces.¹⁵
- Riemannian LLE [Maignant23] and other nonparametric manifold embedding techniques that preserve local geodesic distances.
Learning Objectives: Understand advanced nonparametric methods for discovering and modeling complex manifold structures, particularly when the underlying geometry is unknown or highly intricate.
Practical Exercise: Explore a pre-implemented mGPLVM on a neuroscience dataset (e.g., neural activity related to head direction) to observe how it recovers latent manifold structures like a ring topology.

7.7 Module 6: Decoder-less Embeddings and Specialized Geometries

This module covers alternative approaches that focus on embedding and visualization, demonstrating how they leverage diverse geometric and topological principles without requiring a decoder for reconstruction.

Topics:
- UMAP [Leland18]: Its graph-based manifold approximation and projection technique, and its reliance on topological data analysis principles for preserving both local and global structures.
- Poincaré Embeddings [Nickel17]: Conceptual understanding of hyperbolic geometry for effectively embedding hierarchical data structures.
- Rie-SNE: The Riemannian extension of t-SNE, designed for local neighborhood preservation on manifolds using geodesic distances.
- Barycentric subspaces [Pennec18] and related concepts for performing statistical operations like averaging and interpolation on manifold data.
Learning Objectives: Understand the principles behind methods primarily focused on learning meaningful low-dimensional embeddings for visualization and downstream tasks, and how they utilize various geometric and topological concepts.
Practical Exercise: Apply UMAP to a high-dimensional dataset (e.g., a complex image dataset or a large text embedding dataset) and analyze the resulting 2D/3D visualization for clusters, topological structures, and semantic relationships.

7.8 Suggested Practical Exercises and Case Studies for each module.

Each module will incorporate hands-on programming exercises, primarily using Python libraries such as PyTorch for deep learning models, scikit-learn for classical methods, and specialized libraries like geomstats for Riemannian geometry. These exercises will involve implementing or applying the discussed models to various datasets.

Case studies will be integrated throughout the course, focusing on analyzing real-world datasets to demonstrate the practical utility and impact of geometric approaches. Examples include:

Image Datasets: Applying VAEs to generate images or using manifold learning for image feature extraction.
Text Embeddings: Exploring how geometric structures in latent spaces can enhance the interpretability and controllability of language models, particularly for tasks like style transfer or compositional semantics.¹⁰
Biological Data: Analyzing neuroscience data (e.g., neural recordings) using mGPLVM to discover underlying manifold structures related to brain activity.¹⁶
Shape Analysis: Using PGA or other geodesic methods to analyze the variability of shapes in medical imaging or computer vision.
Periodic Data: Applying Toroidal VAEs to time-series data with inherent periodicity.

These practical components are designed to solidify theoretical understanding and equip learners with the skills to apply geometric machine learning techniques to complex problems.

Conclusion

The exploration of geometric structures in latent embeddings reveals a transformative shift in machine learning, moving decisively beyond the confines of traditional Euclidean assumptions. The journey, as depicted in the infographic, progresses from classical linear and parametric models in flat spaces to sophisticated techniques that explicitly embrace the intrinsic curvature and topology of data. This evolution is driven by the fundamental understanding that real-world, high-dimensional data often resides on lower-dimensional, non-linear manifolds.

The report has detailed how classical concepts like PCA and VAEs have been generalized to their Riemannian counterparts, such as Principal Geodesic Analysis and Riemannian VAEs. These advancements involve a re-conceptualization of distance, mean, and probability distributions, replacing Euclidean notions with their manifold-equivalent forms. The development of specialized VAEs (Hypersphere-VAE, LieVAE, Toroidal VAE) further underscores the power of aligning the latent space's geometry with the data's intrinsic structure, leading to more faithful, interpretable, and effective representations, particularly for data with inherent directional, symmetric, or periodic properties. Furthermore, methods without decoders, like UMAP and Poincaré Embeddings, highlight the importance of geometric and topological principles for visualization and discovery of hidden data structures.

The field continues to navigate challenges related to computational scalability, model complexity, and the ability to learn arbitrary topologies. However, the ongoing convergence of advanced optimization techniques, principled probabilistic frameworks, and the burgeoning field of metric learning promises to yield models that can dynamically adapt their geometric assumptions and perform robust inference on increasingly complex data. A particularly exciting frontier lies in "semantic representation learning," where the goal is to engineer latent geometries to reflect human-understandable semantic properties, thereby enhancing the controllability and interpretability of generative models, including large language models. This geometric revolution is not merely an academic pursuit but holds profound implications for building more robust, interpretable, and powerful machine learning systems across diverse scientific and engineering domains.

Dante Monson

unread,

Aug 3, 2025, 8:59:05 AMAug 3

to econ...@googlegroups.com

bullet points :

Geometric Structures in Latent Embeddings: A Comprehensive Survey and Didactic Course

Abstract:
- Examines geometric structures in latent embeddings.
- Builds upon the infographic "Geometric Structures in Latent Embeddings" and references arXiv:2407.09468v2.
- Highlights the evolution of machine learning from Euclidean to non-Euclidean geometries.
- Provides detailed explanations of concepts and models from the infographic.
- Elucidates interrelationships, historical development, and implications for representation learning.
- Outlines a structured, iterative didactic course on the field.
- Emphasizes foundational principles, advanced models, and practical applications.

1. Introduction: The Geometric Revolution in Machine Learning

1.1 Beyond Euclidean: Why Geometry Matters in Latent Spaces

Machine Learning Transformation:
- Moving beyond conventional reliance on Euclidean spaces.
- Embracing non-Euclidean geometry, topology, and abstract algebra.
Historical Context:
- Traditional models assumed data in flat, Euclidean vector spaces.
- Euclidean perspective formed the "backbone" of classical methods.
Contemporary Data Characteristics:
- High dimensionality and intricate, non-linear relationships.
- Exhibit inherent geometric, topological, and algebraic structures.
- Defy simple Euclidean descriptions.
Examples of Non-Euclidean Data:
- Curvature of spacetime.
- Complex electrical patterns of neurons.
- Symmetries observed in physical systems.
Manifold Hypothesis:
- Fundamental concept: high-dimensional data often lies on or near a lower-dimensional, non-linear manifold.
- Manifold is embedded within a higher-dimensional ambient space.
Limitations of Euclidean Assumptions:
- Apparent when modeling non-Euclidean data.
- Variational Autoencoders (VAEs):
  - Standard Euclidean latent space with Gaussian prior can lead to "arbitrary Euclidean representations."
  - Can result in "mismapped latent manifolds."
  - Manifests as "posterior collapse" or failure to capture angular relationships (for cyclic/directional data).
Emergence of Non-Euclidean Machine Learning:
- Directly stems from these challenges.
- Parallels 19th-century mathematical revolutions expanding geometry beyond Euclid.
Generalization of Theories:
- Classical statistical theory and machine learning incorporating geometric, topological, and algebraic structures.
- Led to new subfields: Geometric Statistics, Geometric Deep Learning, Topological Machine Learning.
Paradigm Shift:
- Fundamental redefinition of how machine learning models data.
- Explicitly modeling underlying data geometry.
- Promises improved performance, enhanced interpretability, and more faithful representations.
- Poised to revolutionize modeling complex phenomena in the 21st century.

1.2 Understanding Latent Embeddings: Data, Latent Space, and Dimensionality Reduction Paradigms

Representation Learning Core:
- Transformation of raw, high-dimensional inputs into compact, informative representations.
- Captures underlying geometry, patterns, and semantic relationships in data.
- Involves mapping input data from its original space to a lower-dimensional latent space.
Infographic Categorization:
- Models categorized based on the nature of Data and Latent space.

Data ( or ):

- Refers to input observations.
- Euclidean Space (): Points are vectors in a flat space.
- Manifold (): Data inherently exists on a curved structure (e.g., rotations, shapes, periodic signals).

Latent Space ( or ):

- Lower-dimensional space for compressed, meaningful data representations.
- Euclidean (): Flat space.
- Manifold (): Curved space (e.g., hypersphere, torus, Lie group).
Dimensionality Reduction Model:
- Function or process performing the mapping.
- Categorized by several characteristics:
  - Linear/Parametric/Nonparametric: Describe the nature of the mapping function.
    - Linear: Simple linear transformation (e.g., matrix multiplication).
    - Parametric: Fixed-form function with learnable parameters (e.g., neural networks with weights/biases).
    - Nonparametric: Flexible mapping, learned directly from data, often relying on local relationships (e.g., Gaussian Processes).

- - Encoder (E) / Decoder (D):

- - - Encoder (E): Maps high-dimensional data to lower-dimensional latent space.
    - Decoder (D): Attempts to reconstruct original data from latent representation.
    - Some methods focus solely on embedding, no decoder needed (Infographic Row 5).
  - Optimization / No Posterior / w. Posterior / Bayesian: Relate to the learning paradigm, especially for probabilistic models.
    - Optimization: Non-probabilistic, direct optimization of objective function (e.g., minimizing reconstruction error).
    - No Posterior: Models do not explicitly model or infer a posterior probability distribution over latent variables.
    - w. Posterior: Models estimate or approximate a posterior distribution, providing probabilistic understanding.
    - Bayesian: Fully probabilistic framework, incorporates priors over model parameters and latent variables, infers full posterior distributions for uncertainty quantification.

1.3 Navigating the Landscape: Overview of the Infographic's Taxonomy

Infographic Purpose:
- Presents a graphical taxonomy of recent advancements in geometric structures in latent embeddings.
- Organizes models based on mathematical structure of input data and target latent space.
- Reveals progression in sophistication of geometric modeling.
Infographic Rows (Categorization by Data and Latent Space Nature):

- Row 1: Data in Euclidean space (), Latent in Euclidean space ().

- - Classical setting: both input and latent representations in flat, Euclidean vector spaces.

- Row 2: Data in Euclidean space (), Latent on a Manifold ().

- - Input data in Euclidean space, but underlying intrinsic structure is a non-linear manifold.
  - Goal: embed data into a latent manifold space to respect intrinsic geometry.

- Row 3: Data on a Manifold (), Latent on a Manifold ().

- - Most geometrically aligned scenario.
  - Input data explicitly on a manifold (e.g., rotations, shapes).
  - Goal: embed into a latent space with specific manifold structure (e.g., hypersphere, Lie group).

- Row 4: Data on a Manifold (), Latent in Euclidean space ().

- - Involves projecting manifold data onto Euclidean space.
  - Often implies simplification or linear approximation.

- Row 5: Models with No Decoder.

- - Focus on learning low-dimensional embedding without explicit decoder for reconstruction.
  - Used for visualization, clustering, or pre-processing.
  - Leverage geometric or topological principles.
Infographic Columns (Classification by Dimensionality Reduction Model Functional Form):
- Linear
- Parametric
- Nonparametric
- Specific categories for Geodesic approaches.
Conceptual Journey Illustrated:
- Evolution of dimensionality reduction techniques.
- Started with simple flat Euclidean spaces.
- Adapted to curved spaces as understanding of complex data and Manifold Hypothesis grew.
- Represents continuous effort to align mathematical models with data structure reality.
- Row 1: "Classical" methods.
- Row 2: Data appears Euclidean, but intrinsic structure is manifold-like, leading to manifold-aware latent spaces.
- Row 3: Most geometrically "faithful" scenario, both input and latent spaces explicitly modeled as manifolds.
- Row 5: Methods prioritizing embedding and visualization without generative decoder aspect, often leveraging geometric principles.
- Structure guides exploration through increasing levels of geometric complexity and fidelity.

2. Foundations of Dimensionality Reduction: Euclidean Spaces (Infographic Row 1)

2.1 Data in Euclidean Space (), Latent in Euclidean Space ()

Classical Setting:
- Both high-dimensional input data () and low-dimensional latent representation () reside in standard Euclidean vector spaces.
Euclidean Space Properties:
- Distances measured using familiar Euclidean metric.
- "Straight lines" are simply straight lines.
- Data points treated as coordinates in a flat space.
Convenience vs. Reality:
- Offers mathematical convenience and computational tractability.
- Often fails to capture true, non-linear, or intrinsically curved geometry of many real-world datasets.

2.2 Linear Models: PCA, Probabilistic PCA, Bayesian PCA

Linear Dimensionality Reduction Aim:
- Find linear projections capturing most significant data variations.
- Computationally efficient with clear interpretation.
PCA [Pearson01]:
- Cornerstone technique for linear dimensionality reduction.
- Finds orthogonal axes (principal components) maximizing variance of projected data.
- Rotates coordinate system to align with directions of greatest data spread.
- Allows data compression by retaining highest variance components.
Probabilistic PCA:
- Extends classical PCA with a generative probabilistic model.
- Posits data points generated by linear transformation of lower-dimensional latent variable, corrupted by isotropic Gaussian noise.
- Likelihood-based framework, advantageous for missing data and model selection.
Bayesian PCA:
- Builds on Probabilistic PCA, incorporates Bayesian inference.
- Places prior distributions over model parameters.
- Enables robust parameter estimation.
- Automatic determination of intrinsic latent dimensionality.
- Comprehensive quantification of uncertainty in latent representations.
Evolutionary Trend:
- Deterministic PCA Probabilistic PCA Bayesian PCA.
- Illustrates move from algorithmic, point-estimation to probabilistic/Bayesian frameworks.
- Driven by desire to quantify uncertainty, handle incomplete data, and provide comprehensive generative model.

2.3 Parametric Autoencoders: AE, VAE, Full-VAE

Autoencoders (AEs) Class:
- Neural network-based models.
- Learn compressed data representation in a bottleneck layer.
AE:
- Neural network designed to learn an identity function (output input).
- Achieves dimensionality reduction by forcing input through a bottleneck layer (latent code).
- Consists of an encoder (input to latent code) and a decoder (reconstructs input from latent code).
- Learned latent code serves as compressed representation.
VAE [Kingma14]:
- Significant conceptual advancement over traditional AEs.
- Generative models learning probabilistic mapping between data and latent space.
- Encoder outputs parameters (mean and variance) of a probability distribution (e.g., Gaussian) in latent space.
- Decoder samples from this distribution to reconstruct input.
- Trained by maximizing Evidence Lower Bound (ELBO) on data log-likelihood.
- ELBO comprises:
  - Reconstruction loss (fidelity to input).
  - Kullback-Leibler (KL) divergence term (regularizes latent distribution towards predefined prior, e.g., standard Gaussian).
Full-VAE [Kingma14]:
- Typically refers to the standard VAE architecture as originally introduced.
- Emphasizes its complete framework for probabilistic encoding and decoding.
VAEs as a Bridge:
- Crucial bridge from simple dimensionality reduction to probabilistic generative modeling.
- Learning a distribution over latent space enables generation of new data and interpolation.
- Explicit regularization via KL divergence encourages desired latent structure.
- Became a cornerstone for exploring latent space geometry.
- Probabilistic nature led to considerations of distribution behavior in latent spaces.
- Motivated development of Riemannian VAEs to address flat Euclidean prior limitations.

2.4 Nonparametric Methods: PC, LLE, GPLVM

Nonparametric Methods Flexibility:
- Offer greater flexibility by not assuming fixed parametric form for mapping or latent space.
- Infer relationships directly from data, often preserving local neighborhood structures.
PC [Hastie89] (Principal Curves):
- Non-linear generalization of PCA.
- Seeks a smooth, one-dimensional curve passing through the "middle" of the data.
- Minimizes squared distance from each data point to the curve.
- Captures curvilinear structures that linear PCA cannot.
LLE (Locally Linear Embedding):
- Manifold learning algorithm preserving local neighborhood structure.
- Assumes data point can be reconstructed as linear combination of its nearest neighbors.
- Seeks low-dimensional embedding where same linear reconstruction weights hold true.
- Preserves local geometry.
GPLVM [Lawrence03] (Gaussian Process Latent Variable Model):
- Probabilistic, non-linear dimensionality reduction technique.
- Uses Gaussian Processes to define flexible, non-parametric mapping from low-dimensional latent space to high-dimensional data space.
- Generative model where latent variables are inferred.
- Non-parametric nature allows discovery of complex, non-linear relationships without predefined functional form.
Probabilistic PC [Chang01]:
- Likely a probabilistic extension of Principal Curves.
- Similar to Probabilistic PCA but adapted for non-linear curves.
- Provides a generative framework for such structures.
Bayesian GP LVM [Lawrence03]:
- Bayesian extension of GPLVM.
- Places priors over Gaussian Process hyperparameters and latent variables.
- Enables robust uncertainty quantification and automatic model selection for latent dimensionality.
Implicit Manifold Hypothesis Acknowledgment:
- Even within "Euclidean latent space" category, these methods implicitly acknowledge manifold hypothesis.
- LLE explicitly preserves local geometry, suggesting data from underlying curved manifold.
- Principal Curves seek non-linear "backbone" beyond linear approximations.
- GPLVM's flexibility discovers complex, non-linear relationships, learning an implicit manifold.
- Laid groundwork for explicit geometric approaches by showing limitations of strictly linear/parametric models.

2.5 Interconnections and Limitations of Euclidean Approaches

Model Interconnections:
- PCA: Fundamental linear baseline, with probabilistic/Bayesian extensions for robustness/uncertainty.
- Autoencoders: Neural network-based, often non-linear, for reconstruction.
- VAEs: Enhance AEs with probabilistic generative capabilities, enabling sampling and principled representation.
- Nonparametric methods (LLE, GPLVM): Flexible ways to uncover non-linear structures, even with Euclidean latent space.
Primary Limitation:
- Inherent assumption of a flat, Euclidean latent space.
Challenges due to Euclidean Assumption:
- Geometric Distortions: Euclidean distances in latent space may not accurately preserve true geodesic distances on underlying curved manifold.
  - Leads to misleading interpretations of similarity/dissimilarity.
- Inefficient Representation: Standard Gaussian priors in VAEs concentrate probability mass on thin shell.
  - Results in "voids and sparsity" in latent space.
  - Hinders meaningful sample generation and limits latent space utilization.
- Lack of Interpretability: Latent dimensions often entangled, difficult to attribute semantic meanings or disentangle factors.
- Inability to Capture Intrinsic Geometry: Data with cyclic, directional, or hierarchical structures poorly represented by standard Gaussian distributions/flat Euclidean spaces.
  - Example: Euclidean latent space struggles to model rotations or angles naturally.
Necessity for Geometrically Informed Methods:
- Collective limitations underscore need for methods that explicitly model and leverage non-Euclidean structures.

3. Embracing Curvature: Geodesic Structures in Latent Embeddings (Infographic Row 2)

3.1 Data in Euclidean Space (), Latent on a Manifold ()

Crucial Shift in Perspective:
- Input data observed in Euclidean space ().
- Underlying, intrinsic structure assumed to be a lower-dimensional non-linear manifold.
Objective:
- Embed this data into a latent space that is itself a manifold ().
- Respects the data's true intrinsic geometry.
Motivation:
- Many real-world datasets, even in flat ambient space, inherently live on curved substructures.
Central Concept: Geodesics:
- On a curved manifold, geodesics are "straightest possible paths" or shortest paths between two points.
- Analogue to straight lines in Euclidean space.
- Fundamental for defining meaningful distances, performing interpolations, and identifying "principal directions" reflecting curved space geometry.
Problem Solved by Geodesic Approaches:
- Distortion introduced by standard Euclidean methods.
- Linear interpolation in flat Euclidean latent space might traverse "voids" or low-density regions lacking semantic meaning.
Benefit of Manifold Latent Space and Geodesics:
- Ensure relationships and transformations within latent space are geometrically consistent with data's intrinsic structure.

3.2 Geodesic Approaches: Tangent PCA, GPCA, PGA (and their Probabilistic/Bayesian variants)

Generalization of PCA:
- Methods generalize Principal Component Analysis (PCA) to data on or near Riemannian manifolds.
- Seek "principal geodesics" or manifold-aware directions of variation instead of linear subspaces.
Tangent PCA [Fletcher04]:
- Addresses applying PCA to curved data by locally approximating the manifold.
- Identifies a reference point (e.g., Fréchet mean) on the manifold.
- Projects nearby data points onto the tangent space at that point.
- Standard PCA performed within this flat tangent space.
- Mechanism: Estimates tangent space from local data, applies PCA to tangent vectors.
- Limitations:
  - Can introduce distortion, especially for widely spread data or high manifold curvature.
  - Effectiveness relies on data points being clustered sufficiently close to chosen intrinsic mean.
- Strengths: Computationally advantageous due to linear operations.
GPCA [Huckeman10] (Generalized Principal Component Analysis):
- In manifold learning context: explicitly accounts for manifold's curvature.
- Finds geodesic principal components.
- Minimizes variance of projection residuals, offering geometrically coherent approach.
- Some interpretations: algebro-geometric solution for segmenting multiple subspaces.
- Infographic's focus: geodesic generalization.
PGA (Principal Geodesic Analysis):
- Direct and widely used generalization of PCA for data on Riemannian manifolds.
- Identifies geodesic submanifolds best capturing data variability.
- Maximizes projected variance or minimizes reconstruction error.
- Replaces straight lines of PCA with geodesics.
Exact PGA (EPGA):
- Specific PGA variant avoiding linear approximations, striving for greater geometric fidelity.
- Limitation: Computationally expensive, requires iterative non-linear optimization for projection and optimal geodesic direction.

Probabilistic PGA [Zhang13] & Bayesian PGA [Zhang14]:

- Extend PGA into a probabilistic framework.
- Provide latent variable model for factor analysis on manifolds.
- Allows principled handling of uncertainty.
- Facilitates automatic dimensionality reduction by suppressing irrelevant principal geodesics.
Trade-off in Geodesic PCA Variants:
- Geometric fidelity vs. computational tractability.
- Tangent PCA: efficient, but risks distortion in high curvature.
- Exact PGA: more accurate, but higher computational cost.
- Ongoing research to improve efficiency of exact methods or accuracy of tractable approximations.
- Probabilistic and Bayesian variants: manage balance by providing principled uncertainty quantification, informing reliability of learned structures.

3.3 Parametric Models: Riemannian VAE, Riemannian Full-VAE

Model Class:
- Variational Autoencoders with latent space explicitly structured as a Riemannian manifold.
- Moves beyond traditional flat Euclidean assumption.
- Learns and respects data's intrinsic geometry, leading to meaningful representations and generative capabilities.
Riemannian VAE [Miolane20]:
- Learns Riemannian latent structure by defining a parametrized Riemannian metric.
- Metric dictates how distances and curvatures are measured in latent space.
- Key Innovation: Replaces standard Gaussian prior with a Riemannian Brownian motion prior.
  - Prior designed to align with manifold immersed by decoder.
  - Ensures meaningful sampling, restricted to high-density regions, avoiding "voids."
- Benefits: Addresses arbitrary Euclidean representation and mismapped latent manifolds.
- Leads to more principled latent codes.
Riemannian Full-VAE [Miolane20]:
- Implies comprehensive framework for Riemannian VAEs.
- Encompasses full generative and inference capabilities on a Riemannian manifold.
- Leverages Riemannian geometry principles for both encoding and decoding.
Evolution of VAEs:
- Demonstrates VAEs as powerful testbeds for latent space geometry.
- Standard VAEs struggle with Euclidean priors for non-Euclidean data (posterior collapse, poor interpolation).
- Riemannian VAEs address these challenges directly with Riemannian metrics and Brownian motion priors.
- Aligning latent space geometry with data's intrinsic structure is crucial for improved generation, interpolation, and interpretability.
- Underscores VAEs as tools for learning and understanding geometric structure of latent spaces.

3.4 Nonparametric Methods: Principal Flows, Riemannian PC, Riemannian LLE

Extension of Euclidean Counterparts:
- Extend nonparametric methods to Riemannian manifolds.
- Offer flexibility without assuming fixed parametric form for manifold or mapping.
Principal Flows [Panaretos14]:
- Generalizes PCA to Riemannian manifolds.
- Defines curves ("principal flows") capturing geodesic and more complex non-geodesic variations.
- Curve on manifold passing through data mean.
- Tangent velocity vector attempts to locally fit first eigenvector of tangent space PCA, with smoothness constraint.
- Distinguishes from PGA (strictly geodesic-focused).
- Allows more flexible modeling of complex, possibly meandering, patterns of variation.
Riemannian PC [Hauberg16] (Riemannian Principal Components):
- Broad category of PCA generalizations for data on Riemannian manifolds.
- Core principle: find principal components respecting manifold's geometry.
- Adapts variance maximization to curved spaces.
Riemannian LLE [Maignant23]:
- Generalization of Locally Linear Embedding to Riemannian manifolds.
- Objective: preserve local neighborhood relationships and distances.
- Uses geodesic distances on curved manifold, not Euclidean distances.
Nuanced Understanding of "Principal Directions":
- Distinction between PGA (geodesics) and Principal Flows (non-geodesic variations).
- Geodesics are direct generalization of straight lines, but real-world data variation may not follow shortest path.
- Principal Flows offer more flexible modeling of complex, geometrically informed trajectories.
- Indicates growing sophistication in modeling manifold data, moving beyond shortest paths to complex, constrained patterns.
- Choice depends on assumed nature of data variation, reflecting field's increasing maturity.

Dante Monson

unread,

Aug 6, 2025, 11:27:35 AMAug 6

to econ...@googlegroups.com

The field of geometric structures in latent embeddings is characterized by a rich interplay of diverse approaches, each with its unique strengths and limitations. Understanding their differences, compatibilities, and the potential for hybrid capabilities is crucial for advancing the field. Furthermore, it's important to distinguish these representation learning techniques from visualization tools like tensor glyphs.

1. Differences, Compatibilities, and Hybrid Capabilities

The various approaches to geometric latent embeddings, as outlined in the infographic and the survey paper, are not mutually exclusive. Instead, they represent a spectrum of design choices that can often be combined to create more powerful and versatile models.

1.1 Key Differences in Approaches

Euclidean vs. Manifold Latent Spaces:
- Differences: The most fundamental distinction lies in the geometric assumptions. Euclidean latent spaces are flat, where distances are standard straight-line measurements. Manifold latent spaces, conversely, are curved, and distances are measured along geodesics (the shortest paths on the curved surface).¹ This impacts how similarity is defined, how interpolations behave (linear vs. geodesic), and how well the space can represent data with intrinsic non-linear structures (e.g., rotations, cycles).⁴ Euclidean spaces are simpler to optimize but can lead to "voids" or "mismappped latent manifolds" for complex data.²
- Impact: A Euclidean latent space might struggle to capture angular relationships, leading to issues like posterior collapse or inefficient latent space utilization, especially in high dimensions where probability mass concentrates on a thin shell.⁵ Manifold latent spaces, by design, naturally respect and preserve directional, cyclic, or symmetric structures.⁸
Parametric vs. Nonparametric Models:
- Differences: Parametric models (e.g., VAEs, LieVAEs) define the mapping using a fixed set of parameters (like neural network weights). They generalize well if the chosen functional form matches the data. Nonparametric models (e.g., LLE, GPLVM, Principal Flows) do not assume a fixed form; they learn the structure directly from data, often by preserving local relationships or using flexible statistical models like Gaussian Processes.⁹
- Trade-offs: Parametric models are typically more computationally efficient for inference after training. Nonparametric models offer greater flexibility to capture arbitrary, complex non-linear structures without strong prior assumptions, but can be more computationally intensive during training.⁹
Encoder-Decoder (Generative) vs. No Decoder (Embedding-Focused):
- Differences: Models with an encoder and decoder (like VAEs and their manifold variants) are primarily generative. They aim to learn a latent representation that can be used to reconstruct the original data, and often to generate new, similar data samples.⁵ Models with "no decoder" (e.g., UMAP, Isomap, Poincaré Embeddings) focus solely on learning a meaningful low-dimensional embedding for visualization, clustering, or as a feature representation, without explicitly reconstructing the input.¹¹
- Purpose: Generative models are useful for data synthesis, imputation, and understanding the data-generating process. Embedding-focused models excel at revealing hidden structures, preserving topology, and providing intuitive visualizations.¹¹
Specific Manifold Geometries:
- Differences: The choice of a specific manifold (e.g., hypersphere, Lie group, torus) for the latent space is driven by the intrinsic nature of the data.⁸
  - Hypersphere-VAEs: Ideal for directional or cyclic data, where angular relationships are paramount. They use distributions like the spherical Cauchy (spCauchy) to ensure geometric consistency and avoid issues like posterior collapse.⁵
  - LieVAEs: Suited for data exhibiting inherent symmetries or continuous transformations (e.g., rotations, translations). They model the latent space as a Lie group, which has both manifold and group structures.⁸
  - Toroidal VAEs: Designed for periodic features (e.g., time of day, angles). The torus topology allows for unique "multiple-path morphing" where interpolations can follow different geodesic paths around the torus.¹³
- Benefit: Matching the latent space geometry to the data's intrinsic structure leads to more faithful, interpretable, and performant representations, and more meaningful interpolations.⁸
Geodesic vs. Non-Geodesic Paths of Variation:
- Differences: Methods like Principal Geodesic Analysis (PGA) strictly focus on finding "principal geodesics" – the shortest paths on a manifold that capture maximal variation.³ In contrast, Principal Flows can capture "non-geodesic forms of variation," allowing for more flexible, meandering patterns of data change on a manifold.⁹
- Flexibility: While geodesics are the most direct generalization of straight lines, real-world data variation might not always follow the shortest path. Principal Flows offer a more flexible way to model complex, yet still geometrically informed, trajectories.⁹

1.2 Compatibilities and Hybrid Capabilities

The field is actively exploring how to combine these different approaches to create more robust and versatile models:

Combining Different Latent Geometries:
- Hybrid VAE Architectures: A promising future direction for Hypersphere-VAEs is to integrate spherical and Euclidean latent spaces within a single VAE architecture.⁵ This would allow encoding both cyclic/directional and unconstrained Euclidean structures simultaneously, catering to datasets with mixed geometric properties.
- Mixture Models of VAEs: For data residing on manifolds with arbitrary or highly complex topologies that cannot be globally parameterized, mixture models of VAEs can be used. Each encoder-decoder pair in the mixture can represent one "chart" (a locally Euclidean patch) of the manifold, allowing the model to learn complex topologies.¹⁶
Probabilistic Frameworks Across Geometries:
- Uncertainty Quantification: Probabilistic and Bayesian frameworks are highly compatible with various geometric models. For instance, Probabilistic PGA and Bayesian PGA extend the concept of factor analysis to manifolds, providing uncertainty quantification.¹⁷ Similarly, Manifold GPLVM (mGPLVM) is a probabilistic model that can infer latent states on symmetric manifolds (spheres, tori, rotation groups) and quantify uncertainty in these representations.¹⁰ This allows for principled model selection between candidate manifold topologies.¹⁰
Learning the Metric:
- Adaptive Geometry: Instead of assuming a fixed metric (e.g., Euclidean or a predefined manifold metric), some advanced approaches can learn the metric itself. Riemannian VAEs, for example, can learn a parametrized Riemannian metric.² This allows the model to dynamically adapt its understanding of distances and curvature to better reflect the data's intrinsic geometry.²⁰ This is a powerful form of hybrid capability, as it allows the model to discover the most appropriate geometry from the data itself.
Geometric Augmentations for Learning Paradigms:
- Lie Group Operators for Contrastive Learning: The Variational Lie Group Operator (VLGO) model, which learns Lie group operators to model latent manifold structures, can be integrated into contrastive learning frameworks (e.g., ManifoldCLR).⁸ By sampling coefficients from the learned prior network, VLGO generates "manifold feature augmentations" that are identity-preserving. These augmentations enrich the number of views for instance discrimination, improving self-supervised and semi-supervised learning performance.⁸ This is a prime example of how geometric understanding can enhance other machine learning paradigms.
Semantic Representation Learning:
- Bridging Symbolic and Distributional Semantics: This emerging direction explicitly embeds symbolic, compositional, and semantic properties into latent spaces, often leveraging geometric structures.¹¹ For instance, learning topic-aware latent codes (Topic-VQ-VAE), hierarchical syntactic spaces (HRQ-VAE), or factorizing sentence meaning into discrete syntactic and continuous semantic components.¹¹ This is a high-level compatibility where geometric structures provide a framework for more interpretable and controllable language models.¹¹
Equivariant Neural Networks:
- Symmetry-Aware Latent Spaces: Equivariant neural networks are designed to enforce symmetries in their operations, meaning that if an input is transformed by a group action (e.g., rotation, translation), the output (or latent representation) transforms predictably.²¹ This is highly compatible with geometric learning, especially for data with inherent symmetries (like images of objects that can be rotated).
- Interpreting Equivariant Latent Spaces: While equivariant latent representations can be ambiguous due to the group action, invariant projections can be used to extract unambiguous and informative representations.²¹ This allows for a principled way to analyze and utilize the geometrically structured latent spaces learned by equivariant models.

2. Differences with Tensor Glyphs and More Approaches

2.1 What are Tensor Glyphs?

Tensor glyphs are a specific type of data visualization tool used in scientific visualization, particularly for tensor fields.²³ A tensor field means that at every point in a grid or space, there is a data matrix (a tensor) available, which describes properties like acceleration, twist, shear, stress, or diffusion.²³

The core idea of a tensor glyph is to visually represent the complex, multivariate information contained within a tensor by mapping its properties onto the shape, size, orientation, and surface appearance of a base geometric primitive.²³

How they work: For a symmetric second-order tensor (like those from Diffusion Tensor MRI), the tensor can be decomposed into its eigensystem: three real-valued eigenvalues and three mutually orthogonal eigenvectors.²⁴
- The eigenvectors define the orientation of the glyph.²⁴
- The eigenvalues are used to scale the glyph's lengths along the eigenvector directions, determining its shape.²⁴
Common types of glyphs: Ellipsoids, cuboids, cylinders, and superquadrics are frequently used.²³ Ellipsoids, for instance, are scaled by eigenvalues along their principal axes, which are aligned with the eigenvectors.²⁴ Superquadric glyphs offer more tunability and can imitate cuboids and cylinders while maintaining necessary symmetry properties.²³
Applications: Tensor glyphs are widely used in fields like medical imaging (e.g., visualizing diffusion tensors in the brain or heart from MRI data to understand fiber orientation and tissue anisotropy) and fluid dynamics (visualizing stress or velocity gradient tensors).²⁴ They help identify regions of anisotropic structure and function.²⁴

2.2 Differences with Latent Embeddings

While both tensor glyphs and latent embeddings deal with "geometry" and "representation," their fundamental purposes and mechanisms are distinct:

Feature	Tensor Glyphs	Latent Embeddings
Primary Purpose	Visualization of pre-existing multivariate data (tensor fields) at specific spatial points.	Representation Learning (dimensionality reduction, feature extraction, generation) of high-dimensional data.
Input	A tensor (e.g., a 3x3 matrix) at a specific point in space.	A high-dimensional data point (e.g., an image, a document, a biological signal).
Output	A visual geometric primitive (e.g., an ellipsoid, cuboid) whose shape, size, and orientation encode tensor properties.	A low-dimensional vector or point on a manifold in a learned "latent space."
Mechanism	Direct mapping of tensor eigenvalues/eigenvectors to visual attributes of a geometric shape.	Learned mapping (via models like PCA, VAEs, UMAP) that transforms high-dimensional data into a compact, meaningful representation.
"Geometry" Role	Displaying the local geometric properties (e.g., anisotropy, orientation) of a tensor.	Learning the intrinsic, often non-Euclidean, geometry of the entire dataset to find a better representation.
Generative Capability	None. They are purely for display.	Many (e.g., VAEs) are generative, capable of synthesizing new data from the latent space.
Learning Component	Minimal to none; primarily a rendering technique.	Central to the process; models learn complex mappings and structures from data.

In essence, tensor glyphs are like a specialized "display screen" for complex local data, whereas latent embeddings are the "compressed files" or "semantic maps" that machine learning models learn about the entire dataset.

2.3 More Approaches in Geometric Machine Learning

The infographic and our discussion have focused heavily on dimensionality reduction and latent embeddings. However, the broader field of "Geometric Deep Learning" encompasses a much wider array of approaches that leverage non-Euclidean structures:

Geometric Deep Learning (GNNs and Beyond): This is an umbrella term for extending neural networks to handle data residing in non-Euclidean spaces like graphs, manifolds, and other irregular structures.²⁷
- Graph Neural Networks (GNNs): A cornerstone of geometric deep learning, GNNs are specifically designed for processing data represented as graphs (e.g., social networks, molecular structures). They learn representations through message passing and information propagation across graph structures.²⁷
- Equivariant and Gauge Equivariant Networks: These networks are designed to respect symmetries in the data. If the input undergoes a transformation (e.g., rotation), the output (or latent representation) transforms predictably. This is crucial for tasks where physical symmetries are important, such as in physics or molecular biology.²¹
Topological Data Analysis (TDA):
- Focus: TDA is a field that focuses on extracting qualitative geometric properties of datasets, such as identifying topological features like connected components, holes, and cavities of the underlying manifold.¹²
- Applications: TDA can be integrated into deep learning (Topological Machine Learning) for tasks like topological regularization of class distributions in latent spaces, enhancing generalization capabilities.¹² It helps assess the relevance of observed features and can impose specific properties based on topological information.¹²
Normalizing Flows (Manifold Learning Flows):
- Mechanism: Normalizing Flows (NFs) are a class of generative models that learn a bijective (invertible) mapping between a simple latent distribution (e.g., Gaussian) and the complex data distribution. When adapted for manifold learning (Manifold Learning Flows or MLFs), they can learn an injective mapping from a lower-dimensional latent space to a higher-dimensional manifold embedded in the data space.⁹
- Benefits: MLFs can efficiently calculate probability densities on the learned manifold and can be optimized via maximum likelihood.⁹ Recent work focuses on learning a "canonical intrinsic basis" for the manifold, promoting sparsity and local orthogonality in the latent dimensions for more efficient representation.⁹
Explicit Metric Learning:
- Beyond Fixed Geometries: While many geometric models assume a fixed manifold (e.g., sphere, torus), a more advanced approach involves actively learning the metric tensor that defines the geometry of the latent space. This allows the model to discover the most appropriate local curvature and distances from the data itself.²⁰ This is a powerful "more approach" as it allows the model to dynamically adapt its geometric understanding to the nuances of the data. For instance, some Riemannian VAEs learn a parametrized Riemannian metric.²
- Fisher Information Metric: One way to learn the metric is by reconstructing the Fisher information metric, which provides insights into the latent space geometry of generative models and can reveal complex structures like fractal phase transitions.³

These additional approaches demonstrate the breadth and depth of the geometric revolution in machine learning, extending beyond just latent embeddings to encompass new ways of modeling data, designing neural networks, and interpreting complex systems.

Reply all

Reply to author

Forward

0 new messages

taxonomy of dimensionality reduction models - infographic

Dante Monson

Dante Monson

Dante Monson

Module 2: Expanding the Toolkit: Nonparametric Models and Manifold Learning

2.1 Nonparametric Methods for Euclidean Data

2.2 Manifold Learning: Discovering the Data's True Shape

Module 2 Summary:

Module 3: Geodesic and Manifold Latent Spaces

3.1 Why Use a Curved Latent Space?

3.2 Models for Specific Latent Geometries

Module 3 Summary:

Module 4: Synthesis, Special Cases, and The Big Picture

4.1 The "No Decoder" Philosophy: Embedding over Reconstruction

4.2 Understanding the Annotations: A Guide to the Model's Nature

4.3 The Grand Synthesis: How to Choose a Model

Course Conclusion

Dante Monson

Real-World Applications: From Image Search to Drug Discovery

Applications on Euclidean Data (The "Flat World" View)

Applications on Manifold and Geodesic Data (Embracing the Curvature)

The Future: What Comes Next?

How This Will Change the Way We Process Data

Dante Monson

Dante Monson

Dante Monson

Summaryhttps://docs.google.com/document/d/1ZqiKUe3UhnU4mCERN84alMn-GSyi7I4LRpKflfvyPvY/Geometric Structures in Latent Embeddings: A Comprehensive Survey and Didactic Course

Abstract

1. Introduction: The Geometric Revolution in Machine Learning

1.1 Beyond Euclidean: Why Geometry Matters in Latent Spaces

1.2 Understanding Latent Embeddings: Data, Latent Space, and Dimensionality Reduction Paradigms

1.3 Navigating the Landscape: Overview of the Infographic's Taxonomy

2. Foundations of Dimensionality Reduction: Euclidean Spaces (Infographic Row 1)

2.1 Data in Euclidean Space (x∈Rm), Latent in Euclidean Space (y∈Rm′)

2.2 Linear Models: PCA, Probabilistic PCA, Bayesian PCA

2.3 Parametric Autoencoders: AE, VAE, Full-VAE

2.4 Nonparametric Methods: PC, LLE, GPLVM

2.5 Interconnections and Limitations of Euclidean Approaches

3. Embracing Curvature: Geodesic Structures in Latent Embeddings (Infographic Row 2)

3.1 Data in Euclidean Space (x∈Rm), Latent on a Manifold (y∈M′)

3.2 Geodesic Approaches: Tangent PCA, GPCA, PGA (and their Probabilistic/Bayesian variants)

3.3 Parametric Models: Riemannian VAE, Riemannian Full-VAE

3.4 Nonparametric Methods: Principal Flows, Riemannian PC, Riemannian LLE

3.5 How Geodesic Models Extend Euclidean Concepts

4. Intrinsic Manifold Data: Learning Curved Latent Spaces (Infographic Row 3)

4.1 Data on a Manifold (x∈M), Latent on a Manifold (y∈M′)

4.2 Parametric Models: Hypersphere-VAE, LieVAE, Toroidal VAE

4.3 Nonparametric Methods: Manifold GPLVM

4.4 The Power of Matching Data and Latent Geometries

5. Beyond Encoder-Decoder: Direct Embeddings (Infographic Row 5)

5.1 Models Without a Decoder: UMAP, Barycentric Subspaces, IFBD, Poincare Embeddings, Principal Sub-bundles, Rie-SNE

5.2 Their Role in Geometric Representation Learning and Visualization

6. Interrelationships and Building Blocks: A Unified Perspective

6.1 Evolution from Euclidean to Non-Euclidean: A Conceptual Journey

6.2 Parametric vs. Nonparametric: Flexibility, Assumptions, and Trade-offs

6.3 The Role of Optimization, Priors, and Metric Learning

6.4 Challenges and Future Directions in Geometric Latent Learning

7. Didactic Iterative Course: Mastering Geometric Latent Embeddings

7.1 Course Philosophy and Structure

7.2 Module 1: Euclidean Foundations (PCA, VAE basics, data in Rm, latent in Rm′)

7.3 Module 2: Introduction to Manifolds and Geodesics

7.4 Module 3: Manifold Learning with Tangent Spaces

7.5 Module 4: Generative Models on Manifolds

7.6 Module 5: Advanced Nonparametric Manifold Learning

7.7 Module 6: Decoder-less Embeddings and Specialized Geometries

7.8 Suggested Practical Exercises and Case Studies for each module.

Conclusion

Dante Monson

Geometric Structures in Latent Embeddings: A Comprehensive Survey and Didactic Course

1. Introduction: The Geometric Revolution in Machine Learning

1.1 Beyond Euclidean: Why Geometry Matters in Latent Spaces

1.2 Understanding Latent Embeddings: Data, Latent Space, and Dimensionality Reduction Paradigms

1.3 Navigating the Landscape: Overview of the Infographic's Taxonomy

2. Foundations of Dimensionality Reduction: Euclidean Spaces (Infographic Row 1)

2.1 Data in Euclidean Space (x∈Rm), Latent in Euclidean Space (y∈Rm′)

2.2 Linear Models: PCA, Probabilistic PCA, Bayesian PCA

2.3 Parametric Autoencoders: AE, VAE, Full-VAE

2.4 Nonparametric Methods: PC, LLE, GPLVM

2.5 Interconnections and Limitations of Euclidean Approaches

3. Embracing Curvature: Geodesic Structures in Latent Embeddings (Infographic Row 2)

Summary
https://docs.google.com/document/d/1ZqiKUe3UhnU4mCERN84alMn-GSyi7I4LRpKflfvyPvY/

Geometric Structures in Latent Embeddings: A Comprehensive Survey and Didactic Course

2.1 Data in Euclidean Space (), Latent in Euclidean Space ()

3.1 Data in Euclidean Space (), Latent on a Manifold ()

4.1 Data on a Manifold (), Latent on a Manifold ()

7.2 Module 1: Euclidean Foundations (PCA, VAE basics, data in , latent in )

2.1 Data in Euclidean Space (), Latent in Euclidean Space ()

3.1 Data in Euclidean Space (), Latent on a Manifold ()