Dear all,
here is an announcement for two seminars that will be given at Inria, F107 at Thursday Oct. 24th, 10:30am, by Romain Couillet and Arthur Mensch:
*** first talk by Romain Couillet at 10:30 ***
Title: Can random matrices change the future of machine learning?
Abstract: Many standard machine learning algorithms and intuitions are known to misbehave, if not dramatically collapse, when operated on large dimensional data. In this talk, we will show that large dimensional statistics, and particularly random matrix theory, not only can elucidate this behavior but provides a new set of tools to understand and (sometimes drastically) improve machine learning algorithms. Besides, we will show that our various theoretical findings are provably applicable to very realistic and not-so-large dimensional data.
*** second talk by Arthur Mensch at 11:15 ***
Title: Predicting probability distributions from data: a smoothing perspectiveAbstract: In this talk, I will present new approaches to train models that predict probability distributions over complex output spaces.For this, we will transform hard selection operators into probabilistic operators using entropy functions, thereby introducing a 'temperature' in the selection.In a first part, we will seek to assign a probability to an exponential set of outputs (e.g. sequences of tags, alignment matrices between two time series). For this, we will consider dynamic programing algorithms, that are able to find the highest-scoring output among a combinatorial set by iteratively breaking down the selection problems into smaller problems. By smoothing the hard operations performed in the DP recursion, we force DP to output a (potentially sparse) probability distribution over the full set. This allows to relax both the optimal value and solution of the original high-scoring selection problem, and turns a broad class of DP algorithms into differentiable operators. This allows to plug them into end-to-end trained models. We provide a new probabilistic perspective on backpropagating through these DP operators, and relate them to inference in graphical models. We showcase the new differentiable DP operators on several structured prediction tasks, that benefits from sparse probability prediction.In a second part, we will design models that output probability distributions over a potentially continuous metric space of output. For this, we will rely on optimal transport to defined a cost-informed entropy function, defined on the set of all distributions. We will add this entropy instead of the classical Shannon entropy to the highest-scoring selection layer of machine-learning models. This effectively replaces the celebrated softmax operator by a geometric-softmax operator, that takes into account the geometry of the output space. We propose an adapted geometric logistic loss function for end-to-end model training. Unlike previous attempts to use optimal transport distances for learning, this loss is convex and supports infinite (or very large) class spaces. The geometric-softmax is suitable for predicting sparse and singular distributions, for instance supported on curves or hyper-surfaces. We showcase its use on two applications: ordinal regression and drawing generation.