Easiest way to estimate number of FLOPs for a Lasagne implementation

jram...@gmail.com

unread,

May 12, 2016, 3:40:10 PM5/12/16

to lasagne-users

Hi all,

I was wondering what the easiest way to estimate the number of FLOPs (multplies and adds) a deep pipeline implemented in Lasagne might be?

For example, in Figure 3 of the ResNet paper, they describe the number of FLOPs needed for each model. What would be the best way to compute this (in this case, for a Lasagne implementation of the ResNet, like this one).

Thanks!
Joel

PS: To clarify, I think they mean the total number of multiply-adds (Floating point operations) needed for one pass of the model, and not FLOPS (floating point operations per second)

Jan Schlüter

unread,

May 13, 2016, 5:32:35 AM5/13/16

to lasagne-users

Hey,

Theano offers some functionality for printing the flops of an Op, but I don't think there's an easy way to print the flops for a graph (it requires the shapes, which are not encoded in the graph). So probably the easiest will be to define the formulae for convolution, dot product, elementwise addition, global averaging and softmax and then go through the Lasagne layers (via lasagne.layers.get_all_layers) to add things up. You can probably ignore batch normalization arguing that it could be integrated in the linear transformation of the previous layer at no extra cost.
Feel free to post your code here or as a gist if it works -- we might even think about including this in Lasagne/Recipes. Including it directly in Lasagne is probably not an option for now, as we'd need to maintain it.

Best, Jan

jram...@gmail.com

unread,

Sep 11, 2016, 1:04:18 AM9/11/16

to lasagne-users

Hi,

Thank you for your reply. I've gotten back to working on this issue, and will certainly put up a gist (or a PR at Lasagne/Recipes, if this is useful enough) as soon as done. Theano's flops seems to measure floating point operations per second, as opposed to the number of floating point operations a specific function takes. So how would one estimate the number of floating point operations tanh or softmax (or the exp() operation) take? As far as I can see, this might actually be implementation dependent, and a rough estimate might have to be used (which I'm not sure how to get).
Regarding batch norm, wouldn't the computations of \mu and \sigma^2 have to be factored in as well? Which would mean that the number of floating points would now be, not in terms of floating point operations per forward pass per element, but in terms of floating point operations per forward pass per batch?

Thanks again,
Joel

Jan Schlüter

unread,

Sep 12, 2016, 9:28:47 AM9/12/16

to lasagne-users, jram...@gmail.com

Hey,

Thank you for your reply. I've gotten back to working on this issue, and will certainly put up a gist (or a PR at Lasagne/Recipes, if this is useful enough) as soon as done. Theano's flops seems to measure floating point operations per second, as opposed to the number of floating point operations a specific function takes.

I think many Theano Ops can report the number of floating point operations, and Theano uses this to compute the number of operations per second. I don't know what's the easiest way to obtain the number of operations for a graph, though. You may want to try asking on the theano-users list.

Regarding batch norm, wouldn't the computations of \mu and \sigma^2 have to be factored in as well? Which would mean that the number of floating points would now be, not in terms of floating point operations per forward pass per element, but in terms of floating point operations per forward pass per batch?

At inference time, batch normalization works per item (and can be completely merged into the linear operator before batch normalization, such as the preceding convolutional or dense layer). If you want to compute the operations at training time, then yes, if you want the exact number, you need to compute per batch. Most of the cost is still per item, though, and not per batch.

Best, Jan

Reply all

Reply to author

Forward