Tesseract lib: double vs float

Robin Watts

unread,

May 14, 2020, 1:56:36 PM5/14/20

to tesseract-ocr

Hi all,

I've been playing with integrating Tesseract with Ghostscript for the past couple of weeks. I have it working nicely, and I've started trying to pass back some of the tweaks I've done along the way as pull requests on github - thanks to stweil for bearing with me!

The biggest of these is an implementation of matrixDotVector for NEON equipped ARMs (intsimdmatrixneon.cpp). This makes a massive difference to the speed on ARM devices (such as my Raspberry pi), but (depending on what language data I feed it), profiles still show 30-55% of runtime still in this function.

It'd be nice to do the whole thing in NEON, but NEON doesn't support doubles, so we have to drop back to "standard" operations to add the biases/apply the scales. If we were using floats, we'd be golden.

It's possible that the calling code could be tweaked to use floats instead of doubles. So, before I dive into this, I thought I'd ask here. Presumably, there is a good reason why the existing code uses doubles rather than floats?

Am I doomed to damage the quality of the results I get out by moving to floats?

Thanks in advance for any help/insight people can offer.

Aaron Stewart

unread,

May 16, 2020, 3:25:13 PM5/16/20

to tesseract-ocr

There is also a group for Tesseract development, and you might get a better answer there:
https://groups.google.com/forum/#!forum/tesseract-dev

Robin Watts

unread,

May 16, 2020, 7:27:27 PM5/16/20

to tesseract-ocr

Thanks, I'll try there instead (but not right now :) ).

There has been some discussion of this on a PR of mine, so it's not been the wall of silence that you might think :)

Thanks again!

Reply all

Reply to author

Forward