Introduction
Self-supervised representation learning has made significant progress over the last years, almost reaching the performance of supervised baselines on many downstream tasks ... VICReg is the latest in a progression of self-supervised methods for image representation learning
Authors of the paper : Adrien Bardes, Jean Ponce, and Professor Yann LeCun (Chief AI Scientist at Meta)
The key insight underpinning these new methods is simple: input images that are similar according to a human should be similar according to the model. By augmenting an image in some semantics-preserving way (meaning the pixel values are not necessarily the same, but a human would still register them as being versions of the same image) we can generate pairs of images that should be encoded as similar vectors by the model.
;;;;;
VICREG: INTUITION
We introduce VICReg (Variance-Invariance-Covariance Regularization), a self-supervised method for
training joint embedding architectures based on the principle of preserving the information content of
the embeddings.
The basic idea is to use a loss function with three terms:
• Invariance: the mean square distance between the embedding vectors.
• Variance: a hinge loss to maintain the standard deviation (over a batch) of each variable of
the embedding above a given threshold. This term forces the embedding vectors of samples
within a batch to be different.
• Covariance: a term that attracts the covariances (over a batch) between every pair of
(centered) embedding variables towards zero. This term decorrelates the variables of each
embedding and prevents an informational collapse in which the variables would vary
together or be highly correlated.