Hello all,
I don’t want to step on anyone’s toes, but I think the referenced analogy
and deep learning is quite thin. The big thing about deep learning was that
it extended the scope of large, numerical, gradient based function approximators
(aka neural networks) to allow deep composition of multiple layers in complex
configurations. This was made possible in large part by the adoption of automatic
differentiation methods. In probabilistic programming (referring now to the
whole field: functional, logical, discrete valued, continuous valued), deep is what we got
from the outset - it just means that functions or predicates can be composed
in arbitrarily nested ways. The simplest recursion already implies unbounded
depth. The hard part is the learning and inference while adhering to the correct
probabilistic composition semantics. Deep learning systems get away with simple
gradient propagation because their modular components are largely used as
function approximators, whereas in probabilistic programming, the modular
components are parameterised distributions, whose composition is governed
by a more complex rule than the a simple dataflow model, but rather an
integral or sum over a space of values (lookup probability monads and their
associated ‘bind operation). Whether or not a particular probabilistic programming
language does these sum literally, or by MCMC or whatever, carrying the process
through all the twists and turns of a Turing complete language is hard. Also,
in Bayesian inference, we are not trying to optimise the parameters of a model
to particular values, but rather trying to estimate their posterior distribution
given all the data we have. In deep learning, Bayesian techniques are beginning
to get some attention, but there are not in widespread use.
Having said all of that, I do see a time when the difference between the two
approaches fades away. This is happening on a number of fronts:
firstly, current deep learning systems let you write loops corresponding to
mapping or reducing numerical operations over sequences of vectors, and they
also allow other constructs like conditionals - these are all operations supported by
automatic differentiation; secondly, there is a growing amount of work where
neural modules (or layers) are combined more ‘algebraically’, for example,
in recursive neural networks and neural network grammars; thirdly, there is a
body of work looking into how to represent algebraically structured data (basically,
terms like we have in Prolog) in fixed dimensional vector spaces, so that composite
objects like lists, syntax trees, or linguistic semantic descriptions can be manipulated
by a fixed-size neural network; fourthly, probabilistic programming systems that work
with real-valued variables (e.g. Stan) already use some of the same techniques as
deep learning, e.g. automatic differentiation for doing Hamiltonian Monte Carlo.
All in all, an exciting prospect, I think.
best wishes,
Samer
> --
> You received this message because you are subscribed to the Google Groups "SWI-Prolog" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
swi-prolog+...@googlegroups.com.