Daily TMLR digest for Jul 28, 2022

1 view

Skip to first unread message

TMLR

unread,

Jul 27, 2022, 8:00:08 PM7/27/22

to tmlr-anno...@googlegroups.com

Accepted papers
===============

Title: A Self-Supervised Framework for Function Learning and Extrapolation

Authors: Simon Segert, Jonathan Cohen

Abstract: Understanding how agents learn to generalize — and, in particular, to extrapolate — in
high-dimensional, naturalistic environments remains a challenge for both machine learning
and the study of biological agents. One approach to this has been the use of function
learning paradigms, which allow agents’ empirical patterns of generalization for smooth
scalar functions to be described precisely. However, to date, such work has not succeeded
in identifying mechanisms that acquire the kinds of general purpose representations over
which function learning can operate to exhibit the patterns of generalization observed in
human empirical studies. Here, we present a framework for how a learner may acquire
such representations, that then support generalization-and extrapolation in particular-in a few-shot fashion in the domain of scalar function learning. Taking inspiration from a classic theory of visual processing, we
construct a self-supervised encoder that implements the basic inductive bias of invariance
under topological distortions. We show the resulting representations outperform those from
other models for unsupervised time series learning in several downstream function learning
tasks, including extrapolation.

URL: https://openreview.net/forum?id=ILPFasEaHA

---

New submissions
===============

Title: Learning with Kan Extensions

Abstract: A common problem in machine learning is "use this function defined over this small set to generate predictions over that larger set." Extrapolation, interpolation, statistical inference and forecasting all reduce to this problem. The Kan extension is a powerful tool in category theory that generalizes this notion. In this work we explore applications of the Kan extension to machine learning problems. We begin by deriving a simple classification algorithm as a Kan extension and experimenting with this algorithm on real data. Next, we use the Kan extension to derive a procedure for learning clustering algorithms from labels and explore the performance of this procedure on real data.

Although the Kan extension is usually defined in terms of categories and functors, this paper assumes no knowledge of category theory. We hope this will enable a wider audience to learn more about this powerful mathematical tool.

URL: https://openreview.net/forum?id=eGhPQWO83w

---

Title: Data Budgeting for Machine Learning

Abstract: Data is the fuel powering AI and creates tremendous value for many domains. However, collecting datasets for AI is a time-consuming, expensive, and complicated endeavor. For practitioners, data investment remains to be a leap of faith in practice. In this work, we study the data budgeting problem and formulate it as two sub-problems: predicting (1) what is the saturating performance if given enough data, and (2) how many data points are needed to reach near the saturating performance. Different from traditional dataset-independent methods like PowerLaw, we proposed a learning method to solve data budgeting problems. To support and systematically evaluate the learning-based method for data budgeting, we curate a large collection of 383 tabular ML datasets, along with their data vs performance curves. Our empirical evaluation shows that it is possible to perform data budgeting given a small pilot study dataset with as few as $50$ data points.

URL: https://openreview.net/forum?id=H8Ey4uFsIK

---

Reply all

Reply to author

Forward

0 new messages