I'm reading through the ISAM2 paper and I'm not really understanding the motivation to use a bayes tree vs keeping things as a matrix(or really how it is any different).
My understanding is the that process essentially turns the factor graph into a single variable that you solve for by sequentially marginalizing out every other variable, once a solution is found you back substitute to generate values for the rest of the variables.
What does creating the bayes tree actually do? How is the product of conditional unormalized densities used? I'm not understanding where it comes into play when you compute the optimal state value.