intsimdmatrix: float vs double

111 views
Skip to first unread message

Robin Watts

unread,
May 18, 2020, 8:19:40 AM5/18/20
to tesseract-dev
Hi all.

[ I posted this over in tesseract-ocr the other day, and (understandably) didn't get much of a response. Someone pointed out that this would be a better forum, so I am reposting it (edited) here. In the meantime there has been some discussion of the idea on one of my pull requests, so I've folded that into the repost here. ]

I've been playing with integrating Tesseract with Ghostscript for the past couple of weeks. I have it working nicely, and I've started trying to pass back some of the tweaks I've done along the way as pull requests on github - thanks to all the reviewers/commentators that have helped to get those in.

The biggest of these is an implementation of matrixDotVector for NEON equipped ARMs (intsimdmatrixneon.cpp). This makes a massive difference to the speed on ARM devices (such as my Raspberry pi), but (depending on what language data I feed it), profiles still show 30-55% of runtime still in this function.

I have a couple of ideas about ways to improve this a bit. Both spring from the final stage of the calculation. The initial calculations of parallel SIGMA(a*b) all happen in the SIMD registers, in the integer domain. At that point we:

 1) Cast to double.
 2) Divide by 127
 3) Add the bias (as an integer that needs to be converted to a double each time we run through)
 4) Multiply by the scale.

First idea: We could rejig this a bit to be:

 1) Add the bias*127 (staying in the integer domain, so no conversion required. No multiplication even required for those architectures for which + (bias<<7)-bias is cheaper than +bias*127.)
 2) Cast to double.
 3) Multiply by scale/127.  (And the /127 can be rolled into the scale values at deserialisation time, I think).

Straight off the bat, that saves us a floating-point divide, an int -> double conversion, and a floating-point add at the cost of an integer multiply and add. (For each value). That's probably a win, right?

More importantly, at least some of that can be done within the SIMD domain (for all three of the architectures we have SIMD implementations for, I believe).

Second idea: It'd be nice to do the whole thing in SIMD, but NEON (at least) doesn't support doubles. So can we use floats instead?

I was pointed at a previous discussion during which the idea of using floats instead of doubles came up, here: https://github.com/tesseract-ocr/tesseract/issues/943#issuecomment-303239929

The consensus at that time was that while it might be a win, it was a bad idea because numerical algorithms really shouldn't be doing SIGMA(a * b) in the float domain because the errors would add up.

That discussion predates the appearance of intsimdmatrix, in which the SIGMA(a * b) stage moved to be being done in the int domain. So, that objection, AIUI, no longer applies.

It may be that there are other good reasons why moving to using float (at least on some architectures) is a bad idea, but I couldn't see it there.

Thanks in advance for any help/insight people can offer.

Robin Watts

unread,
May 18, 2020, 7:09:59 PM5/18/20
to tesseract-dev
I spent the day playing with the first idea from above (rolling the scale into the weights, and doing the bias as part of the simd).

In a very quick test, this saves about 10% of runtime in both the NEON and the AVX2 cases for me.

The branch with the work on is here for those interested.

https://github.com/robinwatts/tesseract/commits/simdbias

Stefan Weil

unread,
May 19, 2020, 11:09:10 AM5/19/20
to tesseract-dev
We thought about using float instead of double before Ray introduced the fast models with integer calculation.

In theory double should reduce errors when calculating the dot product, so we considered using float plus Kahan algorithm which improves the accuracy. Practical results had shown that Kahan consumed resources but was not necessary to get the same OCR results as with double. So I simply don't know why Google had decided to work with double instead of float.

Our implementation only covered recognition, not training. Ideally Tesseract would support training and recognition with good structured code, for example templates, which not only works with double and float, but maybe also with fp16.

As far as I know, other LSTM based OCR systems use float, not double.

Robin Watts

unread,
May 19, 2020, 11:46:13 AM5/19/20
to tesseract-dev
Thanks.

Rolling the biasing into the SIMD is a clear win, so I have code that does that. It builds upon earlier tweaks I've made, so I'll wait for those to get through review first, then make it available as a pull request.

As to doing stuff using floats, that looks more involved, and having played a bit with it this morning, I think I'll need to build up strength before trying for that :)

Reply all
Reply to author
Forward
0 new messages