Probabilistic Circuits Tutorial

4 views
Skip to first unread message

Nguyet Edmondson

unread,
Jul 26, 2024, 2:42:28 AM7/26/24
to advpl

In this section we motivate the need for tractable probabilistic inference, revisit past and present popular probabilistic models, and classify them w.r.t. the classes of probabilistic queries for which they guarantee tractable inference and their expressiveness.

We introduce probabilistic reasoning and modeling as the de facto frameworks to deal with real-world decision making scenarios where uncertainty arises at any step of the decision making process. To this end, we motivate the adoption of probabilistic generative models through examples of decision making applications in AI application domains such as robotics, computer vision and speech recognition.We argue in favor of exact inference as in many of the above scenarios approximations without guarantees would make the decision making process brittle.

We review commonly used probabilistic generative models such as Bayesian Networks, Markov Random Fields, and the more rencent Generative Adversarial Networks and Variational Autoencoders,as they enjoy a considerable amount of attention due to their expressiveness. However, their capability for performing exact probabilistic inference is limited to a small set of queries and otherwise requires resorting to approximation routines.

In parallel, we will introduce the notation required throughout the whole tutorial and classify probabilistic queries into classes sharing the same computational efforts to be answered.These query classes encompass arbitrary marginals, conditionals and MAP queries as well as more advanced ones, for example involving counting, thresholding and constraints as logical formulas.

While doing so, we will introduce tractable probabilistic models (TPMs) as representations guaranteeing exact and polytime inference for certain classes of probabilistic queries.Among the reviewed TPMs we will show treewidth-bounded probabilistic graphical models (PGMs), trees and mixtures thereof.

At the end of this Section, the audience should have an understanding of the tension between tractability and expressiveness.Moreover, they should have familiarized with the two main modeling ideas: factorized and mixture models as they will be the building blocks of the next Section.

We introduce the framework of PCs by first principles, building in a bottom-up way a grammar for tractable probabilistic models. We show how structural properties cleanly map to tractable query classes and discuss PC expressiveness and applications.

PCs can be represented as computational graphs, out of which we will build a grammar for tractable models. We provide the production rules of this grammar, consisting of stacking factorizations (products), mixtures (sums) and input nodes (simple TPMs such as Bernoullis, Gaussians and trees).

We will define when query classes become tractable by the introduction of key structural properties (sufficient conditions) over the computational graphs of PCs.Namely, we will introduce decomposability and smoothness enabling tractable marginal inference and determinism allowing tractable MAP inference.

We discuss how the operational PC framework clearly differs from the declarative one of PGMs. At the same time we will show how PCs can be cast as deep neural networks with peculiar constraints.In the next Section it will be made clear in which ways these constraints require PCs additional "care" during learning w.r.t. classical neural networks.

At the end of this section, the audience should be familiar with the syntax and semantics of PCs.Moreover, these ideas will provide them the principles behind learning the structure and parameters of PCs which will be the subject of next section.

We provide the audience the main algorithmic principles to learn PCs from data.We guide them through a high-level review of the burgeoning literature of parameter and structure learning by highlighting how the structural properties introduced in the previous section favor efficient learning.

In recent years, many different formalisms for TPMs have been proposed, and for each, several learning algorithms have been devised.We abstract from their different implementation and syntactical details and provide the audience with the algorithmic principles to learn TPMs as PCs from data.

We provide an introduction on parameter learning by starting from simple TPMs encoding parametric distributions such as Bernoullis and Gaussians, revising how to perform maximum likelihood estimation.We generalize this simple learning principle for the larger class of exponential families.

We discuss how to translate parameter learning for mixture models to that for PCs. We go through the explicit introduction of the mixture's latent variables in the circuit to derive the update rules of a general expectation maximization (EM) scheme for PCs.Then, we show how in presence of determinism EM update rules can be cast as closed-form updates performing maximum likelihood estimation.

We highlight several approaches to deal with structure learning: generating random or handcrafted structures as well as learning them from data. For the latter learning principle, we compare local search learning schemes when we consider circuits that are deterministic or not.

By this point, we expect the audience to be familiar with the concept that structural properties enable certain tractable query classes and as such also dictate which learning scheme to adopt to induce PCs from data.

In this section we will familiarize the audience with the theoretical foundations behind tractable inference and the PC framework, while showing advanced query classes and how to formally cast the TPMs introduced so far as PCs.

We will then provide a bridge between probabilistic circuits and their counterparts in propositional logics: logical circuits. Logical circuits have been extensively researched for automated reasoning and verification for enabling efficient computations over boolean formulas via analogous structural properties as those for PCs.

Inspired by the logical circuit literature, we will sketch a map of tractable representations with PCs w.r.t. different probabilistic query classes. We will then relate the model classes in the map to each other w.r.t. their expressive efficiency.

At the end of this Section, we hope people in the audience will have acquired a general overview of the theoretical foundations of the PC framework, as well as of the open questions surrounding them. This bridges to the conclusion section, where we delineate some key challenges for tractable probabilistic inference.

In several real-world scenarios, decision making involves complex reasoning, i.e., the ability to answer complex probabilistic queries. Moreover, in many sensitive domains like health- care and economical decision making, the result of these queries is required to be exact as approximations without guarantees would make the decision making process brittle. In all these scenarios, tractable probabilistic inference and learning are becoming more and more mandatory. In this tutorial, we will introduce the framework of probabilistic circuits (PCs) under which one can learn deep generative models that guarantee exact inference in polynomial (often linear) time. After certain recent algorithmic and theoretical results, which we will discuss in this tutorial, PCs have achieved impressive results in probabilistic modeling, sometimes outperforming intractable models such as variational autoencoders. We will show the syntax and semantics of PCs and show how several commonly used ML models -- from Gaussian mixture models to HMMs and decision trees -- can be understood as computational graphs within the PC framework. We will discuss how PCs are special cases of neural networks, when restricting network with certain structural properties enables different tractability scenarios. This unified view of probabilistic ML models opens up a range of ways to learn PCs from data and use them in real-world applications. We will, in fact, provide a unifying view over several algorithms to learn both the structure and parameters of PCs and discuss modern approaches to scale them on GPU. Lastly, we will showcase several successful application scenarios where PCs have been employed as an alternative to or in conjunction with intractable models, including robust image classification, lossless compression, predictions in the presence of missing values, fairness certification, and scene understanding.

Antonio Vegari is a Lecturer (Assistant Professor) in Machine Learning at the University of Edinburgh. In this tutorial, he introduces the framework of probabilistic circuits (PCs), tractable computational graphs in which imposing structure guarantees exact probabilistic reasoning in polynomial (often linear) time.

Exact and efficient probabilistic inference and learning are becoming more and more mandatory when we want to quickly take complex decisions in presence of uncertainty in real-world scenarios where approximations are not a viable option. In this tutorial, we will introduce probabilistic circuits (PCs) as a unified computational framework to represent and learn deep probabilistic models guaranteeing tractable inference. Differently from other deep neural estimators such as variational autoencoders and normalizing flows, PCs enable large classes of tractable inference with little or no compromise in terms of model expressiveness. Moreover, after showing a unified view to learn PCs from data and several real-world applications, we will cast many popular tractable models in the framework of PCs while leveraging it to theoretically trace the boundaries of tractable probabilistic inference.

While deep learning has driven impressive progress, one of the toughest remaining challenges is generalization beyond the training distribution. Few-shot learning is an area of research that aims to address this, by striving to build models that can learn new concepts rapidly in a more "human-like" way. While many influential few-shot learning methods were based on meta-learning, recent progress has been made by simpler transfer learning algorithms, and it has been suggested in fact that few-shot learning might be an emergent property of large-scale models. In this talk, I will give an overview of the evolution of few-shot learning methods and benchmarks from my point of view, and discuss the evolving role of meta-learning for this problem. I will discuss lessons learned from using larger and more diverse benchmarks for evaluation and trade-offs between different approaches, closing with a discussion about open questions.

Link to slides: -5CDaqgzlrZcaGD/view?usp=sharing

Reply all
Reply to author
Forward
0 new messages