tesseract 4 beta: openCL useage

256 views
Skip to first unread message

Janpieter Sollie

unread,
Apr 27, 2018, 4:21:41 AM4/27/18
to tesseract-ocr
Hello everyone,

I have a question about the openCL selection procedure of tesseract:

my output:

[DS] Profile read from file (tesseract_opencl_profile_devices.dat).
[DS] Device[1] 1:Fiji score is 0.202927
[DS] Device[2] 1:Ellesmere score is 1.468799
[DS] Device[3] 1:Ellesmere score is 1.468799
[DS] Device[4] 1:Bonaire score is 1.533776
[DS] Device[5] 1:Tonga score is 0.184236
[DS] Device[6] 0:(null) score is 1.123015
[DS] Selected Device[5]: "Tonga" (OpenCL)

Ugh, this is weird .. why does tesseract take my Tonga instead of my fiji device?  can I force it to use the fiji?
I understand the ellesmere have lower access times (they 're behind a pcie switch), but fiji and tonga are both directly connected via a pcie 2.0 X16 bus.  Do we need a better tesseract selection procedure?
If so, I'm quite skilled at opencl, I'd be glad to help!

kind regards,

Janpieter

Zdenko Podobny

unread,
Apr 27, 2018, 4:51:11 AM4/27/18
to tesser...@googlegroups.com
If you have experience your help will be warmly welcomed. 
OpenCL is not maintained and it is on good way to be removed if maintainer/contributor will not be found.
Anyway it is not used extensively, so there is a place for improvement, 

Zdenko


pi 27. 4. 2018 o 10:21 Janpieter Sollie <janpiet...@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/4568b2b8-532d-457c-920b-60407e7b278e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Janpieter Sollie

unread,
Apr 27, 2018, 4:56:09 AM4/27/18
to tesser...@googlegroups.com
I'd be glad to help.  using tesseract 4, I am able to perform a 90% accuracy on OpenCL.  I do not have any experience with neural networks (i'm just a high-school (no college educated IT-support guy with some knowledge about OpenCL), so can you recommend me some documentation to understand the engine of tesseract 4?

2018-04-27 10:50 GMT+02:00 Zdenko Podobny <zde...@gmail.com>:
If you have experience your help will be warmly welcomed. 
OpenCL is not maintained and it is on good way to be removed if maintainer/contributor will not be found.
Anyway it is not used extensively, so there is a place for improvement, 

Zdenko


pi 27. 4. 2018 o 10:21 Janpieter Sollie <janpiet...@gmail.com> napísal(a):
Hello everyone,

I have a question about the openCL selection procedure of tesseract:

my output:

[DS] Profile read from file (tesseract_opencl_profile_devices.dat).
[DS] Device[1] 1:Fiji score is 0.202927
[DS] Device[2] 1:Ellesmere score is 1.468799
[DS] Device[3] 1:Ellesmere score is 1.468799
[DS] Device[4] 1:Bonaire score is 1.533776
[DS] Device[5] 1:Tonga score is 0.184236
[DS] Device[6] 0:(null) score is 1.123015
[DS] Selected Device[5]: "Tonga" (OpenCL)

Ugh, this is weird .. why does tesseract take my Tonga instead of my fiji device?  can I force it to use the fiji?
I understand the ellesmere have lower access times (they 're behind a pcie switch), but fiji and tonga are both directly connected via a pcie 2.0 X16 bus.  Do we need a better tesseract selection procedure?
If so, I'm quite skilled at opencl, I'd be glad to help!

kind regards,

Janpieter

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/uNRDFTavDfc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Janpieter Sollie

unread,
Apr 27, 2018, 5:18:10 AM4/27/18
to tesser...@googlegroups.com
a thing which I could do quite easily is search for characters with a certain offset.  If you can give me 2^16 offsets
 to analyze, It may be an advantage to do it via OpenCL.  But tdoes the network work this way?

Zdenko Podobny

unread,
Apr 27, 2018, 5:36:53 AM4/27/18
to tesser...@googlegroups.com
Only documentation we have is code itself ;-) But you can start with searching for opencl issue in tesseract issue tracker on github...

Zdenko


pi 27. 4. 2018 o 10:56 Janpieter Sollie <janpiet...@gmail.com> napísal(a):
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/uNRDFTavDfc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Janpieter Sollie

unread,
Apr 27, 2018, 7:55:17 AM4/27/18
to tesseract-ocr
Okay, I took a quick look at the opencl code:
- you are only invoking 255 (16x16) OpenCL kernels at the same time (max).  This is not enough, not even for a budget GPU.
- the code has some points to discuss (eg: why does it keep the kernel and the tesseract dat file in the local directory and not in /usr/local/share/tessdata)
-  Most of the issues which have openCL in it favour removing the code.
- the code is deprecated, quite messy and quite slow.
can we talk about this on IRC?  I'm still willing to help, but it seems fairly complicated.

I will join the tesseract-orc channel on freenode.  We can talk about it when we're both present.

Op vrijdag 27 april 2018 09:36:53 UTC schreef zdenop:

Janpieter Sollie

unread,
Apr 27, 2018, 10:18:49 AM4/27/18
to tesser...@googlegroups.com
I had a quick thought about what you could offload to opencl.  I will need some help from you people (I am a C programmer, not C++, at least not experienced) to do the host code, but this algorithm is perfectly optimizeable in openCL.
the way I'd do it:

prerequirements:
- you can define 65k offsets (x,y) in whose you want the openCL engine to look for dots (x,y), the optimal position and closest neighbour can be reported in the first part.
- you can make a RAW image of both the image and the characters. size of the letters doesn't matter, but they must be trimmed properly

1. you give me a matrix of 256*256 offsets(short, short) to analyze, with a max of 64 dots (char, char) (I assume these are neurons) to analyze in each offset.
so, this gives you a start memory usage of   2⁸ * 2⁸ *4 + 64*2 = 256k + 128 bytes
each dot MUST contain a black pixel.
then we add the image, this is a charimage of max (to be discussed with you guys), I assume a 4096*4096 pixel image would be fine, especially when a character can contain a 4x4 matrix defining a 0/1 (black/white) value.
2. Then I follow these steps in the openCL engine:
- we analyze the neurons
    - draw a cirle around them of x black points. (this circle can be 0, in which case the  neuron is white), for which the circle is completely black
    - when we encounter one or more white points, a direction of the points is calculated. if there's no whitespace at the other side, the neuron offset is moved for x/2 in the opposite direction and analyze neuron is restarted for x/2.  else, quit the 'analyze neuron' part.  This can be done in local memory, in which case it will cost you 256*2=512 bytes of local ram to determine the optimal neuron position. Most graphic cards have a limit of 32k ram, so this is no problem :-)
- determine the closest dot next to this one:
    for each dot != this one, draw a line of black points, if no line can be found, jump to next dot.
    watch distance.  If it's smaller than the previous neuron && this dot id hasn't a link pointing from the destination to this one, save dot id.
so, at the end:
    - each neuron of each offset is optimally centered in a return matrix of 256*256*64*2 = 2²³ = 8M of memory
    - each neuron has a unique id to its closest neighbour, to which it's guaranteed to be attached. an id of -1 means no id could be found. 256*256*64 = 4M of memory

3. we focus on neuron list -> character mapping. this is a separate kernel. A "probability" factor is involved here, but I will think about it further.  I suggest to use a list of 64 character images at once, otherwise you need lots of memory :-)
- define the top, left and right neuron. create a zoom factor for the image. calculate the aspect ratio.  The probability is 1-diff(aspect_ratio1, aspect_ratio2)
- analyze each link in the font character. total probability *= (found_link_length / total_link_length)
- report the probability.
On the PC: the character with the highest probability is the character you 're looking for.  Be aware that you need to compare the possibilities of the different offsets if they overlap.

if the tesseract project can use this, please let me know

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/uNRDFTavDfc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

--
You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/uNRDFTavDfc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

ShreeDevi Kumar

unread,
Apr 27, 2018, 10:28:11 AM4/27/18
to tesser...@googlegroups.com
Please see


For info about neural nets used by tesseract

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/uNRDFTavDfc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

--
You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/uNRDFTavDfc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Janpieter Sollie

unread,
Apr 27, 2018, 11:54:06 AM4/27/18
to tesser...@googlegroups.com
if I'm right, a neural net is about the engine parts, not the image characterisation rendering method, am I right? because I see many presentations, and most of them talk about the history of tesseract, but that's not what I need

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/uNRDFTavDfc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

--
You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/uNRDFTavDfc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

--
You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/uNRDFTavDfc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Janpieter Sollie

unread,
Apr 28, 2018, 3:49:46 AM4/28/18
to tesser...@googlegroups.com
Would it be a problem for you if I rewrite the opencl engine completely, and you people provide me help to link the tesseract kernel -> opencl engine parts?
in attachment, I already have a list of features I'd like to port to openCL.  As this uses the GPU in a heavy way, I will implement multi-card support on the host.
Is it a problem for you guys to think of tesseract 5.0 as a milestone?

ShreeDevi Kumar

unread,
Apr 28, 2018, 4:10:47 AM4/28/18
to tesser...@googlegroups.com
@zdenko This discussion maybe better suited for tesseract-dev forum or do you want to track it as a issue on github?

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

Janpieter Sollie

unread,
Apr 28, 2018, 6:17:48 AM4/28/18
to tesser...@googlegroups.com
Oops, I forgot the attachment.  Here it is :-)
I believe it will help you further to decide, but what it CAN do:
- find whitelines
- map a zone to a certainn character probability
- train itself.
it does NOT decide whether it is a certain character or not, this needs to be decided on the host, not the gpu.

openclkernels.cl

shree

unread,
Apr 29, 2018, 11:55:35 AM4/29/18
to tesseract-ocr

This discussion is better held there. 
Reply all
Reply to author
Forward
0 new messages