Title: Contrastive and neighbor embedding methods for data visualization
Abstract: In recent years, neighbor embedding methods like t-SNE and
UMAP have become widely used across several application fields, in
particular in single-cell biology. They are also widely used for
visualizing large collections of documents and/or images used to train
modern deep learning architectures such as large language models or
diffusion models. Given this academic and public attention, it is very
important to understand possibilities, shortcomings, and trade-offs of
neighbor embedding methods. I am going to present our recent work on
the attraction-repulsion spectrum of neighbor embeddings and the
involved trade-offs. I am also going to explain how neighbor
embeddings are related to contrastive learning, a popular framework
for self-supervised learning of image data. This will lead to our
recent work on contrastive visualizations of image datasets. In the
second part of the talk, I will present our ongoing work on
visualization of scientific literature, in particular biomedical
research papers from the PubMed library.
Short bio: Dmitry Kobak is a research scientist and a group leader in
the Berens lab at Tübingen University, Germany. He is interested in
unsupervised and self-supervised learning, in particular contrastive
learning, manifold learning, and dimensionality reduction for 2D
visualization of biological datasets.