I couldn't follow their discussion of automatic
sparsity detection and how it leads into equation (8),
so maybe we could talk about that when we're face to
face at some point. Or I can spend more time squinting
at it.
And when they say order 0 forward, do you think they're
just building up the expression graph without any evaluations?
I found all the descriptions pretty confusing.
And where do you need the gradient of the directional
derivative they talk about in (6)?
You can see that they're taping once and reusing, which
limits what their C++ code can be, but is much faster for
them because the CppAD taping is relatively slow.
Their "cheap gradient principle" is misleading. It may take
only a few more arithmetic operators, but there's also memory
locality and essentially interpreter overhead.
a factor of 4 slowdown --- it totally depends on the kinds
of operations going on. We measured 32* and 16* slowdown
on sums and products for CppAD (though that included taping)
and about a 4* slowdown for pow().
And we really need to work on parallelization for matrix
ops! Maybe the next big data thingy we apply for we can
talk about it.
- Bob