Train Tesseract 4.0 on Windows 8

cry...@gmail.com

unread,

Apr 19, 2018, 11:01:55 AM4/19/18

to tesseract-ocr

I have installed the lastest tesseract 4.0 binary from UB Mannheim, along with python, Git & Java on my Windows 8 64bit.

I am trying to run the "tesstrain.sh" script, but an erro message appears, any help?

ShreeDevi Kumar

unread,

Apr 19, 2018, 12:32:09 PM4/19/18

to tesser...@googlegroups.com

tesstrain.sh is a bashshell script. You don't need python for it.

try the following: (give the correct path)

bash ./tesstrain.sh

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/8a9b6f88-2770-423f-b566-54846e9e2586%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Message has been deleted

cry...@gmail.com

unread,

Apr 20, 2018, 4:47:29 AM4/20/18

to tesseract-ocr

divana@divana-pc MINGW64 ~/Desktop/train

bash ./tesstrain.sh --fonts_dir ./fonts --lang eng --linedata_only --training_

text ./txt/english.txt --noextract_font_properties --langdata_dir ./langdata --

tessdata_dir ./tessdata --fontlist "Arial,"

./tesstrain_utils.sh: line 106: test: =: unary operator expected

=== Starting training for language 'eng'

[Fri, Apr 20, 2018 11:46:27 AM] /c/Program Files (x86)/Tesseract-OCR/text2image

--fonts_dir=./fonts --font=Arial, --outputbase=/tmp/font_tmp.LJ02UhEA8L/sample_t

ext.txt --text=/tmp/font_tmp.LJ02UhEA8L/sample_text.txt --fontconfig_tmpdir=/tmp

/font_tmp.LJ02UhEA8L

Rendered page 0 to file C:/Users/divana/AppData/Local/Temp/font_tmp.LJ02UhEA8L/s

ample_text.txt.tif

=== Phase I: Generating training images ===

Rendering using Arial,

[Fri, Apr 20, 2018 11:46:28 AM] /c/Program Files (x86)/Tesseract-OCR/text2image

--fontconfig_tmpdir=/tmp/font_tmp.LJ02UhEA8L --fonts_dir=./fonts --strip_unrende

rable_words --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tmp.E

3KrnucotP/eng/eng..exp0 --max_pages=3 --font=Arial, --text=./txt/english.txt

Rendered page 0 to file C:/Users/divana/AppData/Local/Temp/tmp.E3KrnucotP/eng/en

g..exp0.tif

=== Phase UP: Generating unicharset and unichar properties files ===

[Fri, Apr 20, 2018 11:46:29 AM] /c/Program Files (x86)/Tesseract-OCR/unicharset_

extractor --output_unicharset /tmp/tmp.E3KrnucotP/eng/eng.unicharset --norm_mode

1 /tmp/tmp.E3KrnucotP/eng/eng..exp0.box

Extracting unicharset from box file C:/Users/divana/AppData/Local/Temp/tmp.E3Krn

ucotP/eng/eng..exp0.box

ICU ERROR: U_FILE_ACCESS_ERRORERROR: /tmp/tmp.E3KrnucotP/eng/eng.unicharset does

not exist or is not readable

On Thursday, April 19, 2018 at 7:32:09 PM UTC+3, shree wrote:

tesstrain.sh is a bashshell script. You don't need python for it.

try the following: (give the correct path)

bash ./tesstrain.sh

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Thu, Apr 19, 2018 at 8:01 PM, <cry...@gmail.com> wrote:

I have installed the lastest tesseract 4.0 binary from UB Mannheim, along with python, Git & Java on my Windows 8 64bit.
I am trying to run the "tesstrain.sh" script, but an erro message appears, any help?

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

Reply all

Reply to author

Forward