Train Tesseract 4.0 on Windows 8

300 views
Skip to first unread message

cry...@gmail.com

unread,
Apr 19, 2018, 11:01:55 AM4/19/18
to tesseract-ocr
I have installed the lastest tesseract 4.0 binary from UB Mannheim, along with python, Git & Java on my Windows 8 64bit.
I am trying to run the "tesstrain.sh" script, but an erro message appears, any help?


ShreeDevi Kumar

unread,
Apr 19, 2018, 12:32:09 PM4/19/18
to tesser...@googlegroups.com
tesstrain.sh is a bashshell  script. You don't need python for it.

try the following: (give the correct path)

bash ./tesstrain.sh



ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/8a9b6f88-2770-423f-b566-54846e9e2586%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Message has been deleted

cry...@gmail.com

unread,
Apr 20, 2018, 4:47:29 AM4/20/18
to tesseract-ocr
divana@divana-pc MINGW64 ~/Desktop/train
 bash ./tesstrain.sh --fonts_dir ./fonts --lang eng --linedata_only --training_
text ./txt/english.txt --noextract_font_properties --langdata_dir ./langdata --
tessdata_dir ./tessdata --fontlist "Arial,"
./tesstrain_utils.sh: line 106: test: =: unary operator expected

=== Starting training for language 'eng'
[Fri, Apr 20, 2018 11:46:27 AM] /c/Program Files (x86)/Tesseract-OCR/text2image
--fonts_dir=./fonts --font=Arial, --outputbase=/tmp/font_tmp.LJ02UhEA8L/sample_t
ext.txt --text=/tmp/font_tmp.LJ02UhEA8L/sample_text.txt --fontconfig_tmpdir=/tmp
/font_tmp.LJ02UhEA8L
Rendered page 0 to file C:/Users/divana/AppData/Local/Temp/font_tmp.LJ02UhEA8L/s
ample_text.txt.tif

=== Phase I: Generating training images ===
Rendering using Arial,
[Fri, Apr 20, 2018 11:46:28 AM] /c/Program Files (x86)/Tesseract-OCR/text2image
--fontconfig_tmpdir=/tmp/font_tmp.LJ02UhEA8L --fonts_dir=./fonts --strip_unrende
rable_words --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tmp.E
3KrnucotP/eng/eng..exp0 --max_pages=3 --font=Arial, --text=./txt/english.txt
Rendered page 0 to file C:/Users/divana/AppData/Local/Temp/tmp.E3KrnucotP/eng/en
g..exp0.tif

=== Phase UP: Generating unicharset and unichar properties files ===
[Fri, Apr 20, 2018 11:46:29 AM] /c/Program Files (x86)/Tesseract-OCR/unicharset_
extractor --output_unicharset /tmp/tmp.E3KrnucotP/eng/eng.unicharset --norm_mode
 1 /tmp/tmp.E3KrnucotP/eng/eng..exp0.box
Extracting unicharset from box file C:/Users/divana/AppData/Local/Temp/tmp.E3Krn
ucotP/eng/eng..exp0.box
ICU ERROR: U_FILE_ACCESS_ERRORERROR: /tmp/tmp.E3KrnucotP/eng/eng.unicharset does
 not exist or is not readable

On Thursday, April 19, 2018 at 7:32:09 PM UTC+3, shree wrote:
tesstrain.sh is a bashshell  script. You don't need python for it.

try the following: (give the correct path)

bash ./tesstrain.sh



ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Thu, Apr 19, 2018 at 8:01 PM, <cry...@gmail.com> wrote:
I have installed the lastest tesseract 4.0 binary from UB Mannheim, along with python, Git & Java on my Windows 8 64bit.
I am trying to run the "tesstrain.sh" script, but an erro message appears, any help?


--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages