Understanding the logic behind the implementation of Softmax Layer

Skip to first unread message

Vishal Menon

Dec 6, 2021, 4:09:35 AM12/6/21
to TensorFlow Lite
Hello everyone,
I'm keen to understand the implementation of the Softmax Layer(esp with the int8 operation)
That is when I came across a paper(pg 11-Appendix A.1). And it says that if we could understand the implementation given here, it will be easier to understand.
And as you can see, I'm finding it difficult to understand the math behind it.
Any sort of help in understanding the implementation is highly appreciated.

Thank You,

Pulkit Bhuwalka

Dec 13, 2021, 3:58:55 AM12/13/21
to TensorFlow Lite, vishal.m...@gmail.com

Hi Vishal,

This is a really broad question. To fully understand the implementation, you need to understand TFLite quantization (spec), (the paper you referenced is a good resource), the Softmax function, gemmlowp interface and the numerical rescaling/nudging that happens, not to mention arm intrinsics used for optimization.

There's really no shortcut, you'll have to sweat out the details. The contrib code you referenced is old - I'd recommend looking at the reference kernels for the math, they leave out the arm neon stuff so it's a bit easier to understand.

Roughly, we know the zero-point and scales of the input and logged ranges of the output. That's used by Quantized multiplier to map from one scale to the other. But there's a lot of numerical detail, you'll have to fully write it out. If you work to understand it, and have a specific question it's more answerable.


Vishal Menon

Jan 20, 2022, 5:48:08 AMJan 20
to TensorFlow Lite, Pulkit Bhuwalka, Vishal Menon
Hi Pulkit,
Thank you for your response.
But I have a few doubts, in softmax or logistics, what is the multiplier and left shift value?

Also, we do have input and output scaling factors for the quantization, right? but where do we apply the scaling factor over here? because there's seems to be no multiplication of the op scaling factor or ip scaling factor to the data.

Vishal Menon

Jan 28, 2022, 4:40:03 AMJan 28
to TensorFlow Lite, Vishal Menon, Pulkit Bhuwalka

There are a few more questions I need some clarification on.
a. In the Quantization Scheme, every real number is represented as r=s(q-z)
          so for softmax=> {exp(Beta*x)} /{Sum(exp(Beta*x))}   
          would mean that it should be viewd as 
But here, there is no input argument for zero point and scaling factor to be seen. why is that so? Can we neglect these Quantization parameters? In the case of Logistics, ZP is seen

b.Does the 8 in #L2682 mean that op_scaling_factor is taken care of ( 256=2^8 -> should be multiplied to the result, right? followed by an addition of -128[op_zp])?

c.And how is 'beta' value converted to beta_multiplier and left_shift value? Only the float value is obtained from the .tflite file, is there any preprocessing of the input parameters taken place?

d.Shouldn't I convert the ip_ScalF into the same multiplier & shift value? If so how is it done?  

e.the output from the source code seems to be in the range(0,255), but according to the spec, it should be in range (-128,127). Am I missing something?

Christian Sánchez

Jun 24, 2022, 5:38:56 AMJun 24
to TensorFlow Lite, vishal.m...@gmail.com
Hi Vishal,

Did you get to the bottom of the implementation of the softmax function? I need to understand how it works and your findings can be very helpful for me if you dont mind share them with me.


Reply all
Reply to author
0 new messages