Tesseract OpenCL Selects Wrong Compute Device

83 views
Skip to first unread message

Tim Finnegan

unread,
Feb 19, 2020, 2:27:12 AM2/19/20
to tesseract-ocr
I'm attempting to run GPU Acceleration during training using the OpenCL libraries.

I have built tesseract to use openCL, and installed the NVidia Compute driver 440 on my Ubuntu 19.10 installation

Whenever I run tesstrain.sh, however, I run into the issue that the program refuses to select the proper GPU. Rather than use my
NVidia GeForce GTX 1060 6GB device, it will select the CPU as my default OpenCL device even though it detects my GPU,
and scores it better in the built-in benchmark. 

Setting TESSERACT_OPENCL_DEVICE=1 seems to do nothing as nvidia-smi shows that the process is not utilizing my GPU.

Here is my tesstrain.sh output:

=== Starting training for language 'eng'
[Tue 18 Feb 2020 04:55:13 PM PST] /usr/local/bin/text2image --fonts_dir=/usr/share/fonts --ptsize 12 --font=Chit --outputbase=/tmp/font_tmp.Hk8xAdjwI8/sample_text.txt --text=/tmp/font_tmp.Hk8xAdjwI8/sample_text.txt --fontconfig_tmpdir=/tmp/font_tmp.Hk8xAdjwI8
Rendered page 0 to file /tmp/font_tmp.Hk8xAdjwI8/sample_text.txt.tif
=== Phase I: Generating training images ===
Rendering using Chit
[Tue 18 Feb 2020 04:55:15 PM PST] /usr/local/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.Hk8xAdjwI8 --fonts_dir=/usr/share/fonts --strip_unrenderable_words --leading=32 --xsize=3600 --char_spacing=0.0 --exposure=1 --outputbase=/tmp/eng-2020-02-18.Gfj/eng.Chit.exp1 --max_pages=0 --font=Chit --ptsize 12 --text=/home/tim/PycharmProjects/RnD/OCR_Dataset/langdata/eng/eng.training_text
Stripped 35 unrenderable words
Rendered page 0 to file /tmp/eng-2020-02-18.Gfj/eng.Chit.exp1.tif
Stripped 6 unrenderable words
Rendered page 1 to file /tmp/eng-2020-02-18.Gfj/eng.Chit.exp1.tif
Rendering using Chit
[Tue 18 Feb 2020 04:55:17 PM PST] /usr/local/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.Hk8xAdjwI8 --fonts_dir=/usr/share/fonts --strip_unrenderable_words --leading=32 --xsize=3600 --char_spacing=0.0 --exposure=2 --outputbase=/tmp/eng-2020-02-18.Gfj/eng.Chit.exp2 --max_pages=0 --font=Chit --ptsize 12 --text=/home/tim/PycharmProjects/RnD/OCR_Dataset/langdata/eng/eng.training_text
Stripped 35 unrenderable words
Rendered page 0 to file /tmp/eng-2020-02-18.Gfj/eng.Chit.exp2.tif
Stripped 6 unrenderable words
Rendered page 1 to file /tmp/eng-2020-02-18.Gfj/eng.Chit.exp2.tif
Rendering using Chit
[Tue 18 Feb 2020 04:55:19 PM PST] /usr/local/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.Hk8xAdjwI8 --fonts_dir=/usr/share/fonts --strip_unrenderable_words --leading=32 --xsize=3600 --char_spacing=0.0 --exposure=3 --outputbase=/tmp/eng-2020-02-18.Gfj/eng.Chit.exp3 --max_pages=0 --font=Chit --ptsize 12 --text=/home/tim/PycharmProjects/RnD/OCR_Dataset/langdata/eng/eng.training_text
Stripped 35 unrenderable words
Rendered page 0 to file /tmp/eng-2020-02-18.Gfj/eng.Chit.exp3.tif
Stripped 6 unrenderable words
Rendered page 1 to file /tmp/eng-2020-02-18.Gfj/eng.Chit.exp3.tif
Rendering using Chit
[Tue 18 Feb 2020 04:55:22 PM PST] /usr/local/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.Hk8xAdjwI8 --fonts_dir=/usr/share/fonts --strip_unrenderable_words --leading=32 --xsize=3600 --char_spacing=0.0 --exposure=4 --outputbase=/tmp/eng-2020-02-18.Gfj/eng.Chit.exp4 --max_pages=0 --font=Chit --ptsize 12 --text=/home/tim/PycharmProjects/RnD/OCR_Dataset/langdata/eng/eng.training_text
Stripped 35 unrenderable words
Rendered page 0 to file /tmp/eng-2020-02-18.Gfj/eng.Chit.exp4.tif
Stripped 6 unrenderable words
Rendered page 1 to file /tmp/eng-2020-02-18.Gfj/eng.Chit.exp4.tif
Rendering using Chit
[Tue 18 Feb 2020 04:55:24 PM PST] /usr/local/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.Hk8xAdjwI8 --fonts_dir=/usr/share/fonts --strip_unrenderable_words --leading=32 --xsize=3600 --char_spacing=0.0 --exposure=5 --outputbase=/tmp/eng-2020-02-18.Gfj/eng.Chit.exp5 --max_pages=0 --font=Chit --ptsize 12 --text=/home/tim/PycharmProjects/RnD/OCR_Dataset/langdata/eng/eng.training_text
Stripped 35 unrenderable words
Rendered page 0 to file /tmp/eng-2020-02-18.Gfj/eng.Chit.exp5.tif
Stripped 6 unrenderable words
Rendered page 1 to file /tmp/eng-2020-02-18.Gfj/eng.Chit.exp5.tif
Rendering using Chit
[Tue 18 Feb 2020 04:55:27 PM PST] /usr/local/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.Hk8xAdjwI8 --fonts_dir=/usr/share/fonts --strip_unrenderable_words --leading=32 --xsize=3600 --char_spacing=0.0 --exposure=6 --outputbase=/tmp/eng-2020-02-18.Gfj/eng.Chit.exp6 --max_pages=0 --font=Chit --ptsize 12 --text=/home/tim/PycharmProjects/RnD/OCR_Dataset/langdata/eng/eng.training_text
Stripped 35 unrenderable words
Rendered page 0 to file /tmp/eng-2020-02-18.Gfj/eng.Chit.exp6.tif
Stripped 6 unrenderable words
Rendered page 1 to file /tmp/eng-2020-02-18.Gfj/eng.Chit.exp6.tif
=== Phase UP: Generating unicharset and unichar properties files ===
[Tue 18 Feb 2020 04:55:28 PM PST] /usr/local/bin/unicharset_extractor --output_unicharset /tmp/eng-2020-02-18.Gfj/eng.unicharset --norm_mode 1 /tmp/eng-2020-02-18.Gfj/eng.Chit.exp1.box /tmp/eng-2020-02-18.Gfj/eng.Chit.exp2.box /tmp/eng-2020-02-18.Gfj/eng.Chit.exp3.box /tmp/eng-2020-02-18.Gfj/eng.Chit.exp4.box /tmp/eng-2020-02-18.Gfj/eng.Chit.exp5.box /tmp/eng-2020-02-18.Gfj/eng.Chit.exp6.box
Extracting unicharset from box file /tmp/eng-2020-02-18.Gfj/eng.Chit.exp1.box
Extracting unicharset from box file /tmp/eng-2020-02-18.Gfj/eng.Chit.exp2.box
Extracting unicharset from box file /tmp/eng-2020-02-18.Gfj/eng.Chit.exp3.box
Extracting unicharset from box file /tmp/eng-2020-02-18.Gfj/eng.Chit.exp4.box
Extracting unicharset from box file /tmp/eng-2020-02-18.Gfj/eng.Chit.exp5.box
Extracting unicharset from box file /tmp/eng-2020-02-18.Gfj/eng.Chit.exp6.box
Other case É of é is not in unicharset
Wrote unicharset file /tmp/eng-2020-02-18.Gfj/eng.unicharset
[Tue 18 Feb 2020 04:55:29 PM PST] /usr/local/bin/set_unicharset_properties -U /tmp/eng-2020-02-18.Gfj/eng.unicharset -O /tmp/eng-2020-02-18.Gfj/eng.unicharset -X /tmp/eng-2020-02-18.Gfj/eng.xheights --script_dir=/home/tim/PycharmProjects/RnD/OCR_Dataset/langdata
Loaded unicharset of size 102 from file /tmp/eng-2020-02-18.Gfj/eng.unicharset
Setting unichar properties
Other case É of é is not in unicharset
Setting script properties
Warning: properties incomplete for index 25 = ~
Writing unicharset to file /tmp/eng-2020-02-18.Gfj/eng.unicharset
=== Phase E: Generating lstmf files ===
Using TESSDATA_PREFIX=/home/tim/PycharmProjects/RnD/OCR_Dataset/tessdata_best
[Tue 18 Feb 2020 04:55:29 PM PST] /usr/local/bin/tesseract /tmp/eng-2020-02-18.Gfj/eng.Chit.exp1.tif /tmp/eng-2020-02-18.Gfj/eng.Chit.exp1 --psm 6 lstm.train
[Tue 18 Feb 2020 04:55:29 PM PST] /usr/local/bin/tesseract /tmp/eng-2020-02-18.Gfj/eng.Chit.exp3.tif /tmp/eng-2020-02-18.Gfj/eng.Chit.exp3 --psm 6 lstm.train
[Tue 18 Feb 2020 04:55:29 PM PST] /usr/local/bin/tesseract /tmp/eng-2020-02-18.Gfj/eng.Chit.exp4.tif /tmp/eng-2020-02-18.Gfj/eng.Chit.exp4 --psm 6 lstm.train
[Tue 18 Feb 2020 04:55:29 PM PST] /usr/local/bin/tesseract /tmp/eng-2020-02-18.Gfj/eng.Chit.exp2.tif /tmp/eng-2020-02-18.Gfj/eng.Chit.exp2 --psm 6 lstm.train
[Tue 18 Feb 2020 04:55:29 PM PST] /usr/local/bin/tesseract /tmp/eng-2020-02-18.Gfj/eng.Chit.exp5.tif /tmp/eng-2020-02-18.Gfj/eng.Chit.exp5 --psm 6 lstm.train
[Tue 18 Feb 2020 04:55:29 PM PST] /usr/local/bin/tesseract /tmp/eng-2020-02-18.Gfj/eng.Chit.exp6.tif /tmp/eng-2020-02-18.Gfj/eng.Chit.exp6 --psm 6 lstm.train
[DS] Profile read from file (tesseract_opencl_profile_devices.dat).
[DS] Device[1] 1:GeForce GTX 1060 6GB score is 1.846448
[DS] Profile read from file (tesseract_opencl_profile_devices.dat).
[DS] Device[2] 0:(null) score is 0.503215
[DS] Selected Device[2]: "(null)" (Native)
[DS] Overriding Device Selection (TESSERACT_OPENCL_DEVICE=1, 1)
[DS] Overridden Device[1]: "GeForce GTX 1060 6GB" (OpenCL)
[DS] Device[1] 1:GeForce GTX 1060 6GB score is 1.846448
[DS] Device[2] 0:(null) score is 0.503215
[DS] Selected Device[2]: "(null)" (Native)
[DS] Overriding Device Selection (TESSERACT_OPENCL_DEVICE=1, 1)
[DS] Overridden Device[1]: "GeForce GTX 1060 6GB" (OpenCL)
[DS] Profile read from file (tesseract_opencl_profile_devices.dat).
[DS] Device[1] 1:GeForce GTX 1060 6GB score is 1.846448
[DS] Device[2] 0:(null) score is 0.503215
[DS] Selected Device[2]: "(null)" (Native)
[DS] Overriding Device Selection (TESSERACT_OPENCL_DEVICE=1, 1)
[DS] Overridden Device[1]: "GeForce GTX 1060 6GB" (OpenCL)
[DS] Profile read from file (tesseract_opencl_profile_devices.dat).
[DS] Device[1] 1:GeForce GTX 1060 6GB score is 1.846448
[DS] Device[2] 0:(null) score is 0.503215
[DS] Selected Device[2]: "(null)" (Native)
[DS] Overriding Device Selection (TESSERACT_OPENCL_DEVICE=1, 1)
[DS] Overridden Device[1]: "GeForce GTX 1060 6GB" (OpenCL)
[DS] Profile read from file (tesseract_opencl_profile_devices.dat).
[DS] Device[1] 1:GeForce GTX 1060 6GB score is 1.846448
[DS] Device[2] 0:(null) score is 0.503215
[DS] Selected Device[2]: "(null)" (Native)
[DS] Overriding Device Selection (TESSERACT_OPENCL_DEVICE=1, 1)
[DS] Overridden Device[1]: "GeForce GTX 1060 6GB" (OpenCL)
[DS] Profile read from file (tesseract_opencl_profile_devices.dat).
[DS] Device[1] 1:GeForce GTX 1060 6GB score is 1.846448
[DS] Device[2] 0:(null) score is 0.503215
[DS] Selected Device[2]: "(null)" (Native)
[DS] Overriding Device Selection (TESSERACT_OPENCL_DEVICE=1, 1)
[DS] Overridden Device[1]: "GeForce GTX 1060 6GB" (OpenCL)
Tesseract Open Source OCR Engine v5.0.0-alpha with Leptonica
Tesseract Open Source OCR Engine v5.0.0-alpha with Leptonica
Tesseract Open Source OCR Engine v5.0.0-alpha with Leptonica
Page 1
Page 1
Page 1
Tesseract Open Source OCR Engine v5.0.0-alpha with Leptonica
Page 1
Tesseract Open Source OCR Engine v5.0.0-alpha with Leptonica
Tesseract Open Source OCR Engine v5.0.0-alpha with Leptonica
Page 1
Page 1
Page 2
Loaded 56/56 lines (1-56) of document /tmp/eng-2020-02-18.Gfj/eng.Chit.exp4.lstmf
Page 2
Page 2
Page 2
Loaded 56/56 lines (1-56) of document /tmp/eng-2020-02-18.Gfj/eng.Chit.exp1.lstmf
Loaded 56/56 lines (1-56) of document /tmp/eng-2020-02-18.Gfj/eng.Chit.exp3.lstmf
Loaded 56/56 lines (1-56) of document /tmp/eng-2020-02-18.Gfj/eng.Chit.exp2.lstmf
Page 2
Loaded 56/56 lines (1-56) of document /tmp/eng-2020-02-18.Gfj/eng.Chit.exp5.lstmf
Page 2
Loaded 56/56 lines (1-56) of document /tmp/eng-2020-02-18.Gfj/eng.Chit.exp6.lstmf
=== Constructing LSTM training data ===
[Tue 18 Feb 2020 04:55:33 PM PST] /usr/local/bin/combine_lang_model --input_unicharset /tmp/eng-2020-02-18.Gfj/eng.unicharset --script_dir /home/tim/PycharmProjects/RnD/OCR_Dataset/langdata --words /home/tim/PycharmProjects/RnD/OCR_Dataset/langdata/eng/eng.wordlist --numbers /home/tim/PycharmProjects/RnD/OCR_Dataset/langdata/eng/eng.numbers --puncs /home/tim/PycharmProjects/RnD/OCR_Dataset/langdata/eng/eng.punc --output_dir /home/tim/PycharmProjects/RnD/OCR_Dataset/DATA --lang eng
Loaded unicharset of size 102 from file /tmp/eng-2020-02-18.Gfj/eng.unicharset
Setting unichar properties
Other case É of é is not in unicharset
Setting script properties
Config file is optional, continuing...
Failed to read data from: /home/tim/PycharmProjects/RnD/OCR_Dataset/langdata/eng/eng.config
Null char=2
Reducing Trie to SquishedDawg
Reducing Trie to SquishedDawg
Reducing Trie to SquishedDawg
=== Saving box/tiff pairs for training data ===
Moving /tmp/eng-2020-02-18.Gfj/eng.Chit.exp1.box to /home/tim/PycharmProjects/RnD/OCR_Dataset/DATA
Moving /tmp/eng-2020-02-18.Gfj/eng.Chit.exp2.box to /home/tim/PycharmProjects/RnD/OCR_Dataset/DATA
Moving /tmp/eng-2020-02-18.Gfj/eng.Chit.exp3.box to /home/tim/PycharmProjects/RnD/OCR_Dataset/DATA
Moving /tmp/eng-2020-02-18.Gfj/eng.Chit.exp4.box to /home/tim/PycharmProjects/RnD/OCR_Dataset/DATA
Moving /tmp/eng-2020-02-18.Gfj/eng.Chit.exp5.box to /home/tim/PycharmProjects/RnD/OCR_Dataset/DATA
Moving /tmp/eng-2020-02-18.Gfj/eng.Chit.exp6.box to /home/tim/PycharmProjects/RnD/OCR_Dataset/DATA
Moving /tmp/eng-2020-02-18.Gfj/eng.Chit.exp1.tif to /home/tim/PycharmProjects/RnD/OCR_Dataset/DATA
Moving /tmp/eng-2020-02-18.Gfj/eng.Chit.exp2.tif to /home/tim/PycharmProjects/RnD/OCR_Dataset/DATA
Moving /tmp/eng-2020-02-18.Gfj/eng.Chit.exp3.tif to /home/tim/PycharmProjects/RnD/OCR_Dataset/DATA
Moving /tmp/eng-2020-02-18.Gfj/eng.Chit.exp4.tif to /home/tim/PycharmProjects/RnD/OCR_Dataset/DATA
Moving /tmp/eng-2020-02-18.Gfj/eng.Chit.exp5.tif to /home/tim/PycharmProjects/RnD/OCR_Dataset/DATA
Moving /tmp/eng-2020-02-18.Gfj/eng.Chit.exp6.tif to /home/tim/PycharmProjects/RnD/OCR_Dataset/DATA
=== Moving lstmf files for training data ===
Moving /tmp/eng-2020-02-18.Gfj/eng.Chit.exp1.lstmf to /home/tim/PycharmProjects/RnD/OCR_Dataset/DATA
Moving /tmp/eng-2020-02-18.Gfj/eng.Chit.exp2.lstmf to /home/tim/PycharmProjects/RnD/OCR_Dataset/DATA
Moving /tmp/eng-2020-02-18.Gfj/eng.Chit.exp3.lstmf to /home/tim/PycharmProjects/RnD/OCR_Dataset/DATA
Moving /tmp/eng-2020-02-18.Gfj/eng.Chit.exp4.lstmf to /home/tim/PycharmProjects/RnD/OCR_Dataset/DATA
Moving /tmp/eng-2020-02-18.Gfj/eng.Chit.exp5.lstmf to /home/tim/PycharmProjects/RnD/OCR_Dataset/DATA
Moving /tmp/eng-2020-02-18.Gfj/eng.Chit.exp6.lstmf to /home/tim/PycharmProjects/RnD/OCR_Dataset/DATA

Created starter traineddata for LSTM training of language 'eng'


Run 'lstmtraining' command to continue LSTM training for language 'eng' 


And here is my nvidia-smi output during the training process:

Tue Feb 18 17:01:50 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 106...  On   | 00000000:01:00.0  On |                  N/A |
| 22%   56C    P0    23W / 120W |    402MiB /  6072MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1254      G   /usr/lib/xorg/Xorg                            32MiB |
|    0      2003      G   /usr/lib/xorg/Xorg                           155MiB |
|    0      2223      G   /usr/bin/gnome-shell                          97MiB |
|    0      2724      G   ...p/pycharm-professional/183/jbr/bin/java     2MiB |
|    0      4932      G   /usr/bin/nvidia-settings                       0MiB |
|    0      5071      G   ...AAAAAAAAAAAAAAgAAAAAAAAA --shared-files    62MiB |
+-----------------------------------------------------------------------------+

Zdenko Podobny

unread,
Feb 19, 2020, 2:36:44 AM2/19/20
to tesser...@googlegroups.com
Search forum and issue tracker for opencl topic.

Zdenko


st 19. 2. 2020 o 8:27 Tim Finnegan <finnegan...@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/984f04a4-c5da-495c-8231-83c15e51fd77%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages