$ tesseract eng.supercell-magic.exp0.tif eng.supercell-magic.exp0 box.train
Tesseract Open Source OCR Engine v3.04.01 with LeptonicaPage 1row xheight=30, but median xheight = 37.5455APPLY_BOXES: Boxes read from boxfile: 1559 Found 1559 good blobs.Generated training data for 34 wordsPage 2APPLY_BOXES: Boxes read from boxfile: 1677 Found 1677 good blobs.Generated training data for 34 wordsPage 3APPLY_BOXES: Boxes read from boxfile: 1362 Found 1362 good blobs.Generated training data for 28 words
$ unicharset_extractor eng.supercell-magic.exp0.boxExtracting unicharset from eng.supercell-magic.exp0.boxWrote unicharset file ./unicharset.110
NULL 0 NULL 0
N 5 59,68,216,255,87,236,0,27,104,227 Latin 11 0 1 N
Y 5 59,68,216,255,91,205,0,47,91,223 Latin 33 0 2 Y
1 8 59,69,203,255,45,128,0,66,74,173 Common 3 2 3 1
9 8 18,66,203,255,89,156,0,39,104,173 Common 4 2 4 9
a 3 58,65,186,198,85,164,0,26,97,185 Latin 56 0 5 a
...
Mine looks more like this:
74 NULL 0 NULL 0 Joined 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # Joined [4a 6f 69 6e 65 64 ] |Broken|0|1 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # Broken t 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # t [74 ] h 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # h [68 ] a 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # a [61 ] n 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # n [6e ] P 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # P [50 ] o 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # o [6f ] e 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # e [65 ] : 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # : [3a ] r 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # r [72 ] l 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # l [6c ] i 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # i [69 ] 1 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # 1 [31 ] N 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # N [4e ]
Why is that ? Thanks in advances.
Im using ubuntu 16.04 with tesseract version:tesseract 3.04.01 leptonica-1.73 libgif 5.1.2 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.1.0
I have attached the box and tiff file and the data file, and the unicharset file.--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/cd052525-9eb7-4527-b75b-82e1a687997d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/3789eb00-d438-4efe-afc3-ce3e3dc60aa2%40googlegroups.com.
/bin/bash ../libtool --tag=CXX --mode=link g++ -g -O2 -std=c++11 -o tesseract tesseract-tesseractmain.o libtesseract.la -lrt -lpthread libtool: link: g++ -g -O2 -std=c++11 -o .libs/tesseract tesseract-tesseractmain.o ./.libs/libtesseract.so -lrt -lpthread/usr/bin/ld: tesseract-tesseractmain.o: undefined reference to symbol 'lept_free'//usr/local/lib/liblept.so.5: error adding symbols: DSO missing from command linecollect2: error: ld returned 1 exit statusMakefile:598: recipe for target 'tesseract' failedmake[2]: *** [tesseract] Error 1make[2]: Leaving directory '/home/david/project/tesseract-3.05.01/api'Makefile:489: recipe for target 'all-recursive' failedmake[1]: *** [all-recursive] Error 1make[1]: Leaving directory '/home/david/project/tesseract-3.05.01'Makefile:398: recipe for target 'all' failedmake: *** [all] Error 2
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/5f0cc56c-ddb0-473d-80b8-0330edc2fa33%40googlegroups.com.
libtool: link: g++ -g -O2 -std=c++11 -o .libs/tesseract tesseract-tesseractmain.o ./.libs/libtesseract.so -lrt -lpthread/usr/bin/ld: tesseract-tesseractmain.o: undefined reference to symbol 'lept_free'/usr/local/lib//liblept.so.5: error adding symbols: DSO missing from command linecollect2: error: ld returned 1 exit statusMakefile:598: recipe for target 'tesseract' failedmake[2]: *** [tesseract] Error 1make[2]: Leaving directory '/home/david/project/tesseract-3.05.01/api'Makefile:489: recipe for target 'all-recursive' failedmake[1]: *** [all-recursive] Error 1make[1]: Leaving directory '/home/david/project/tesseract-3.05.01'Makefile:398: recipe for target 'all' failedmake: *** [all] Error 2
> Do you know why my tesseract isnt compiling ? I would really love a updated version on my ubuntu.Not sure. I haven't built 3.05 branch. For master, I follow the usual autotools method.Have you also built leptonica? Make sure you don't have any old leptonica version already.Make sure you use either autotools for both or cmake for both tesseract and leptonica. Use the latest sources for both from github.
ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Tue, Jun 20, 2017 at 1:20 PM, David Barishev <davi...@gmail.com> wrote:
Thank you so much for your help, i found my error, i need to set script dir to the langdata folder when runnning set_unicharset_properties.Do you know why my tesseract isnt compiling ? I would really love a updated version on my ubuntu.Thank you again.
On Tuesday, June 20, 2017 at 6:59:58 AM UTC+3, shree wrote:See https://github.com/tesseract-ocr/tesseract/issues/318regarding the unicharset formatI was able to do regular tesseract training (not lstm) using tesseract 4.00.00 version from github master and create new unicharset and traineddata with your box/tiff pair. The output on the same tiff file is enclosed.I think you will get better results with the training input text having interword spaces.
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/241d308c-7441-4860-a091-1235fb45c082%40googlegroups.com.
I got the same error building 3.05.01 and have filed it as an issue - https://github.com/tesseract-ocr/tesseract/issues/1000