could you guide me to train tesseract in windows please?

856 views
Skip to first unread message

umcode

unread,
Apr 21, 2014, 11:23:37 AM4/21/14
to tesser...@googlegroups.com
i have read the trainingtesseract3 and install the tesseract 3.02 to my windows xp.
 1. prepare the images as eng.arial.exp0.tif,eng.arial.exp1.tif,eng.arial.exp1.tif,eng.arial.exp2.tif
 2.in cmd console

tesseract eng.arial.exp0.tif eng.arial.exp0 batch.nochop makebox

tesseract eng.arial.exp1.tif eng.arial.exp1 batch.nochop makebox

tesseract eng.arial.exp2.tif eng.arial.exp2 batch.nochop makebox

 program produce the eng.arial.exp0.box,eng.arial.exp1.box,eng.arial.exp2.box files

3. in cmd console
tesseract eng.arial.exp0.tif eng.arial.exp0.box nobatch box.train

tesseract eng.arial.exp1.tif eng.arial.exp1.box nobatch box.train

tesseract eng.arial.exp2.tif eng.arial.exp2.box nobatch box.train

program produces eng.arial.exp0.tr,eng.arial.exp1.tr,eng.arial.exp2.tr files

 4.in cmd console
 unicharset_extractor eng.arial.exp0.box eng.arial.exp1.box eng.arial.exp2.box
 program produces unicharset file

5.in cmd console
 shapeclustering -F unicharset eng.arial.exp0.tr eng.arial.exp1.tr eng.arial.exp2.tr
program launch the error :
 Unable to open eng.arial.exp0.tr!
signal-termination-handler: error:signal-termination-handler called:code3000

why it is ! what is my wrong step?
 thank you !

Quan Nguyen

unread,
Apr 21, 2014, 8:41:05 PM4/21/14
to tesser...@googlegroups.com
Because the command is incorrect. It should be:

shapeclustering -F font_properties -U unicharset eng.timesitalic.exp0.tr

umcode

unread,
Apr 22, 2014, 5:26:43 AM4/22/14
to tesser...@googlegroups.com
thank you for you answer .
  but this command launch the anather error : failed to load font_properties from font_properties
 what must i do?
 thanks again !
22 Nisan 2014 Salı 03:41:05 UTC+3 tarihinde Quan Nguyen yazdı:

Quan Nguyen

unread,
Apr 22, 2014, 10:01:38 PM4/22/14
to tesser...@googlegroups.com
Create and supply a font_properties file.

umcode

unread,
Apr 23, 2014, 2:20:20 PM4/23/14
to tesser...@googlegroups.com
:) ,thank you Quan Nguyen!
 i troubled with this program almost in every step .
 please help me go on !!!!
 what about this  error !

C:\Program Files\Tesseract-OCR>mftraining -F unicharset -O eng.unicharset eng.ar
ial.exp0.tr eng.arial.exp1.tr
Warning: No shape table file present: shapetable
Reading eng.arial.exp0.tr ...
Reading eng.arial.exp1.tr ...
Font id = -1/0, class id = 1/58 on sample 0
font_id >= 0 && font_id < font_id_map_.SparseSize():Error:Assert failed:in file
..\..\classify\trainingsampleset.cpp, line 622

 i try to debug in program on the windows xp +vs2008 express.
 but in shapeclustering project (i try to build it lonely) launched above msg:

------ Build started: Project: shapeclustering, Configuration: LIB_Debug Win32 ------
Linking...
shapeclustering.obj : error LNK2019: unresolved external symbol "void __cdecl tesseract::WriteShapeTable(class STRING const &,class tesseract::ShapeTable const &)" (?WriteShapeTable@tesseract@@YAXABVSTRING@@ABVShapeTable@1@@Z) referenced in function _main
shapeclustering.obj : error LNK2019: unresolved external symbol "public: void __thiscall tesseract::MasterTrainer::SetupMasterShapes(void)" (?SetupMasterShapes@MasterTrainer@tesseract@@QAEXXZ) referenced in function _main
shapeclustering.obj : error LNK2019: unresolved external symbol "public: void __thiscall tesseract::MasterTrainer::DebugCanonical(char const *,char const *)" (?DebugCanonical@MasterTrainer@tesseract@@QAEXPBD0@Z) referenced in function _main
shapeclustering.obj : error LNK2019: unresolved external symbol "public: void __thiscall tesseract::MasterTrainer::DisplaySamples(char const *,int,char const *,int)" (?DisplaySamples@MasterTrainer@tesseract@@QAEXPBDH0H@Z) referenced in function _main
shapeclustering.obj : error LNK2019: unresolved external symbol "public: __thiscall STRING::~STRING(void)" (??1STRING@@QAE@XZ) referenced in function _main
shapeclustering.obj : error LNK2019: unresolved external symbol "class tesseract::MasterTrainer * __cdecl tesseract::LoadTrainingData(int,char const * const *,bool,class tesseract::ShapeTable * *,class STRING *)" (?LoadTrainingData@tesseract@@YAPAVMasterTrainer@1@HPBQBD_NPAPAVShapeTable@1@PAVSTRING@@@Z) referenced in function _main
shapeclustering.obj : error LNK2019: unresolved external symbol "public: __thiscall STRING::STRING(void)" (??0STRING@@QAE@XZ) referenced in function _main
shapeclustering.obj : error LNK2019: unresolved external symbol "void __cdecl ParseArguments(int *,char * * *)" (?ParseArguments@@YAXPAHPAPAPAD@Z) referenced in function _main
shapeclustering.obj : error LNK2019: unresolved external symbol "public: char const * __thiscall STRING::string(void)const " (?string@STRING@@QBEPBDXZ) referenced in function "public: char const * __thiscall tesseract::StringParam::string(void)const " (?string@StringParam@tesseract@@QBEPBDXZ)
shapeclustering.obj : error LNK2019: unresolved external symbol "public: int __thiscall STRING::length(void)const " (?length@STRING@@QBEHXZ) referenced in function "public: bool __thiscall tesseract::StringParam::empty(void)" (?empty@StringParam@tesseract@@QAE_NXZ)
shapeclustering.obj : error LNK2019: unresolved external symbol "public: __thiscall tesseract::MasterTrainer::~MasterTrainer(void)" (??1MasterTrainer@tesseract@@QAE@XZ) referenced in function "public: void * __thiscall tesseract::MasterTrainer::`scalar deleting destructor'(unsigned int)" (??_GMasterTrainer@tesseract@@QAEPAXI@Z)
shapeclustering.obj : error LNK2019: unresolved external symbol "public: void __cdecl ERRCODE::error(char const *,enum TessErrorLogCode,char const *,...)const " (?error@ERRCODE@@QBAXPBDW4TessErrorLogCode@@0ZZ) referenced in function "public: virtual void __thiscall GenericVector<int>::remove(int)" (?remove@?$GenericVector@H@@UAEXH@Z)
shapeclustering.obj : error LNK2019: unresolved external symbol "struct tesseract::ParamsVectors * __cdecl GlobalParams(void)" (?GlobalParams@@YAPAUParamsVectors@tesseract@@XZ) referenced in function "void __cdecl `dynamic initializer for 'FLAGS_display_cloud_font''(void)" (??__EFLAGS_display_cloud_font@@YAXXZ)
shapeclustering.obj : error LNK2019: unresolved external symbol "public: class STRING & __thiscall STRING::operator=(char const *)" (??4STRING@@QAEAAV0@PBD@Z) referenced in function "public: __thiscall tesseract::StringParam::StringParam(char const *,char const *,char const *,bool,struct tesseract::ParamsVectors *)" (??0StringParam@tesseract@@QAE@PBD00_NPAUParamsVectors@1@@Z)
..\LIB_Debug\shapeclusteringd.exe : fatal error LNK1120: 14 unresolved externals
Build log was saved at "file://c:\Copy of BuildFolder1\tesseract-3.02\vs2008\shapeclustering\LIB_Debug\BuildLog.htm"
shapeclustering - 15 error(s), 0 warning(s)
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========


 please help me again! thank you!

23 Nisan 2014 Çarşamba 05:01:38 UTC+3 tarihinde Quan Nguyen yazdı:

Quan Nguyen

unread,
Apr 23, 2014, 8:03:23 PM4/23/14
to tesser...@googlegroups.com
It seems that you did not follow closely the instructions given in the Training wiki.

If possible, use a training tool, such as jTessBoxEditor.

umcode

unread,
Apr 24, 2014, 2:32:03 AM4/24/14
to tesser...@googlegroups.com
thank you again!
 now i want to compile the tesseract code in windows xp +vs2008 ,in case to found the bug in codefile.
 there have 14 error LNK2019: unresolved external symbol . i have searched from internet and learned that it is a problem drived from lacking definition of function. but those  fanctions not in this project.

 i dont know what could i do about this , would you mind compile the source code in your computer please.
 i extreemly need your help!
 thank you

24 Nisan 2014 Perşembe 03:03:23 UTC+3 tarihinde Quan Nguyen yazdı:

Nick White

unread,
Apr 24, 2014, 10:02:23 AM4/24/14
to tesser...@googlegroups.com
Did you follow the guide referenced prominently in the wiki? It's:
http://tesseract-ocr.googlecode.com/svn/trunk/vs2008/doc/setup.html#using-the-latest-tesseractocr-sources

That procedure should work fine. Given your sloppiness at closely
following the training instructions, I would advise you to read the
above carefully, and check that you have done every step.

Computers are pedantic beasts, and demand that you bend that way
also ;)

Nick
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to tesseract-oc...@googlegroups.com.
> To post to this group, send email to tesser...@googlegroups.com.
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/msgid/
> tesseract-ocr/7bf20475-b99c-4426-bd89-2323b3fe3235%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

umcode

unread,
Apr 25, 2014, 10:26:10 AM4/25/14
to tesser...@googlegroups.com
Quan Nguyen.
  thank you very much! you awake me alot

24 Nisan 2014 Perşembe 03:03:23 UTC+3 tarihinde Quan Nguyen yazdı:

umcode

unread,
Apr 25, 2014, 10:29:16 AM4/25/14
to tesser...@googlegroups.com

Nick White.
  thank you . why my attention could`nt paid so good? i want to cry now,it dose`nt work approprately by now,
24 Nisan 2014 Perşembe 17:02:23 UTC+3 tarihinde Nick White yazdı:

Nick White

unread,
Apr 28, 2014, 3:01:33 PM4/28/14
to tesser...@googlegroups.com
On Fri, Apr 25, 2014 at 07:29:16AM -0700, umcode wrote:
> thank you . why my attention could`nt paid so good? i want to cry now,it dose
> `nt work approprately by now,

Just a quick apology in case I hurt your feelings. I probably should
have been nicer, and a winking smiley isn't a good substitute for
communicating decently.

Nick

umcode

unread,
May 6, 2014, 9:05:14 AM5/6/14
to tesser...@googlegroups.com

  thank you nick!
   i use the traindata provided by tesseract for arabic. and i had  worked tesseract 3.02 . it is really converted the tiff picture to text. but the text is not the right, and very terrible!
 what about this ? thank you again
28 Nisan 2014 Pazartesi 22:01:33 UTC+3 tarihinde Nick White yazdı:

umcode

unread,
May 6, 2014, 9:10:07 AM5/6/14
to tesser...@googlegroups.com
they are my input ,output files .(in attachment)

6 Mayıs 2014 Salı 16:05:14 UTC+3 tarihinde umcode yazdı:
input.TIF
output.txt

ben

unread,
May 15, 2014, 9:15:20 AM5/15/14
to tesser...@googlegroups.com
 hello Nick:
  how are you? could you mind look my input and output data in the attatch?  the arabic character are reconnized and the recognition rate is very low , why? thank you 

28 Nisan 2014 Pazartesi 22:01:33 UTC+3 tarihinde Nick White yazdı:
Reply all
Reply to author
Forward
0 new messages