Custom differentiation for dummies?

Sam

unread,

Oct 8, 2019, 1:01:45 AM10/8/19

to Swift for TensorFlow

Hi!

The current documentation assumes a lot of higher level calculus knowledge that I don't quite have. I can usually get through stuff fine, but when muddling through these docs I keep finding myself two or three levels deep googling terms I don't know. I hope that a more accessible version will come about at some point.

Perhaps this is me being a little thick, but after a few hours spent looking at it, I'm still not quite sure how to do custom differentiation. I've been trying to understand what to do with pullback, but I don't quite understand what the v argument is and how we're meant to use it. As I understand from reading up on general auto diff techniques, reverse mode differentiation means that we're effectively running the chain rule backwards and upside down, and v might be some kind of adjoint variable, but I'm not actually quite sure what that means. I think it's something to do with the rate of change of the input with respect to the output, which presumably means the parameters and return value of the function respectively.

I find myself in a situation where I know the conventional derivative of some expression, and I know that I have to incorporate v somehow, but I'm not sure quite how. By observation, multiplying by v seems to be the thing a lot of the time, but presumably not all of the time. Is there a plain language explanation of this somewhere? I think it would be a very valuable thing to add to the custom differentiation tutorial, which currently just shows an example of exp being multiplied by v without remarking why.

Let's say I'm creating a custom differential for ax^2 + bx + c. I obviously know how to take that to the 2ax + b stage, but what I'm lacking is a general understanding of what that becomes in pullback: { v in ... } format.

Presumably most of the people using this at the moment have more maths education than I do, so sorry for being that guy, but I hope that it's considered a design goal that people with my level of ability should be able to do these things. Or maybe everyone else gets it instantly and I'm just being thick today, that's very plausible.

I'm having a lot of fun playing with all of this though, and I'm very impressed with it! While I'm experienced with general coding, Swift, TensorFlow, and machine learning are all new areas for me, so I'm throwing myself into the deep end with this.

Sam.

Marc Rasi

unread,

Oct 8, 2019, 4:08:03 PM10/8/19

to Sam, Swift for TensorFlow

The current documentation assumes a lot of higher level calculus knowledge that I don't quite have. I can usually get through stuff fine, but when muddling through these docs I keep finding myself two or three levels deep googling terms I don't know. I hope that a more accessible version will come about at some point.

A very accessible tutorial sounds like a pretty good idea. I don't know of any specific plans to write one, but someone should definitely work on one at some point :)

There's one improvement in the works that will hopefully make custom derivatives in the Swift AD system much easier, so it's probably not worth it to spend too much effort on a great accessible guide before that lands. The improvement is that you'll be able to write most custom derivatives as differentials instead of pullbacks, and differentials are usually easier to understand and write than pullbacks. The compiler will automatically figure out the correct pullback based on your custom differential.

In the meantime, I can try to answer your specific questions about custom differentiation.

I always think of reverse mode autodiff in terms of calculating the gradient of a loss (L) with respect to some model parameters (M). The relationship between L and M might look like this:

A = f(M)

L = g(A)

For simplicity, let's assume that M and A are 1-dimensional.

Backpropagation starts with the fact that dL/dL = 1 and "backpropagates" that through all the relationships, by applying the chain rule:

dL/dA = dL/dA * dL/dL = g' * dL/dL
dL/dM = dA/dM * dL/dA = f' * dL/dA

You can think of each of these lines as functions that tell you how to transform dL/d<output of a function> into dL/d<input of the function>. This function is the pullback. So the pullbacks in this example are:

pullback_g(v) = g' * v // this function transforms dL/d<output of g> into dL/d<input of g>
pullback_f(v) = f' * v // this function transofrms dL/d<output of f> into dL/d<input of f>

And the full backpropagated gradient is calculated as pullback_f(pullback_g(1)).

So the `v` you are asking about is dL/d<output of a function>, and the pullback is responsible for transforming it into dL/d<input of a function>. Similarly in the multivariable case, `v` is the gradient of L with respect to the output of a function, and the pullback is responsible for transforming it into the gradient of L with respect to the input of that function.

Let's say I'm creating a custom differential for ax^2 + bx + c. I obviously know how to take that to the 2ax + b stage, but what I'm lacking is a general understanding of what that becomes in pullback: { v in ... } format.

I think you mean to say "custom pullback," not "custom differential", because a pullback is what you need (for now) to customize gradients in Swift AD. Also it's important to state which variables your pullback are with respect to. It looks like you're just doing it with respect to `x` in your example, so I'll go with that.

So in this example, you're looking for the pullback of f(x) = ax^2 + bx + c, wrt x. As explained above, this is a function whose input is dL/df and whose output is dL/dx. By the chain rule, dL/dx = df/dx * dL/df = (2ax + b) * dL/df. So pullback_f(v) = (2ax + b) * v. In Swift code:

@differentiable(vjp: vjpF)

func f(_ x: Float) -> Float {

return a * x^2 + b * x + c

}

func vjpF(_ x: Float) -> (Float, (Float) -> Float) {

func pullback(_ v: Float) -> Float {

return (2 * a * x + b) * v

}

return (f(x), pullback)

}

(I haven't compiled that so there may be typos).

By observation, multiplying by v seems to be the thing a lot of the time, but presumably not all of the time.

Since the chain rule relates derivatives of inputs and outputs in a linear fashion, it's always going to be some sort of multiplication.

For example, in the 1-dimentional case, `pullback_f(v) = f' * v` always.

In the multi-dimensional/multi-argument case, you need to think a bit carefully about the chain rule to figure out which components get multiplied by which other components. But it's always going to turn out to be some kind of matrix multiplication.

--
To unsubscribe from this group and stop receiving emails from it, send an email to swift+un...@tensorflow.org.

Brennan Saeta

unread,

Oct 8, 2019, 4:55:02 PM10/8/19

to Marc Rasi, Sam, Swift for TensorFlow

Hey Sam!

First of all, no need to apologize about anything. It took me a number of times thinking through the problem before I became more comfortable with it. Marc's thoughts are awesome (as usual). I would be interested in your thoughts about the autodiff notebook I wrote as part of our fast.ai (aka harebrain) collaboration: https://github.com/fastai/course-v3/blob/master/nbs/swift/02c_autodiff.ipynb In that notebook, I walk through how to build up the Swift autodiff system assuming only the derivative of sin (cos), and x^2 (2x). If you have any feedback on what could be done to make that easier to understand, please let me know!

All the best,

-Brennan

Sam

unread,

Oct 9, 2019, 8:10:14 PM10/9/19

to Swift for TensorFlow, pruden...@gmail.com

Thanks so much for this detailed response Marc! I haven't had the chance to go through and fully make sure I understand it all yet, but I've got the gist and it looks like this should give me everything I need. I really appreciate you taking the time to put it all together. The change to writing differentials rather than pullbacks sounds like a fantastic one too, I'm excited for how lazy that will allow me to be with with my calculus knowledge.

To unsubscribe from this group and stop receiving emails from it, send an email to sw...@tensorflow.org.

Sam

unread,

Oct 9, 2019, 8:30:54 PM10/9/19

to Swift for TensorFlow, marc...@google.com, pruden...@gmail.com

That notebook looks great Brennan! I've only gone through it once so far and I'll need to take a more measured run at understanding it to be sure I've got it all, but it's definitely very helpful and it all seems clear. I'll let you know if I have any thoughts about clarifications that could be added, but my main reaction is that this should be linked from TensorFlow's custom differentiation tutorial, because yours does a much better job of explaining how to actually use it. Theirs tells us where to write the code, but yours actually tells us what code to write.

To unsubscribe from this group and stop receiving emails from it, send an email to sw...@tensorflow.org.

--
To unsubscribe from this group and stop receiving emails from it, send an email to sw...@tensorflow.org.

Reply all

Reply to author

Forward