New application for train BOX file

532 views
Skip to first unread message

mazluta

unread,
Feb 18, 2015, 10:09:29 AM2/18/15
to tesser...@googlegroups.com
i write new application to train BOX file.
very easy to use with allot of accessories.

any one can download it from http://hanibaal.co.il/tesseract/mztesseract.zip

it contain 1 exe file + 1 ini file.

just run it.

hope it will help the community

mathieuav...@gmail.com

unread,
Feb 20, 2015, 10:04:33 PM2/20/15
to tesser...@googlegroups.com
Thank you,
I am very interested. The training was too much technical for me... So before, for solving tesseract lack of recognition on few context, I was trying randomly tesseract training files found on internet...

the interface presentation is nice and the idea:

unfortunately, I am not able yet to write the Train Command :s. I believe you are very close to a very user friendly program.
I will read more about training...

If your program can generate a default value depending the image we gave it could be nice: you will interest a public not familiar with training.

Or maybe a small video on youtube with an example of using it could be very nice.

I can not help you to thank you but if one day you need help to translate it in french, just ask me, it will be a pleasure.

mazluta

unread,
Feb 23, 2015, 10:31:10 AM2/23/15
to tesser...@googlegroups.com
try click on the bottun right the command text - see what hapen

sriranga(79yrsold)

unread,
Feb 26, 2015, 12:23:21 PM2/26/15
to tesser...@googlegroups.com
could not understand - when tried to click - nothing happens. Kindly intimate step by step procedure to be followed - screenshots is preferred..
where to click on the bottom right of the command text? you mean command text is command prompt or else?

mazluta

unread,
Mar 4, 2015, 3:15:52 PM3/4/15
to tesser...@googlegroups.com
hi

1. execute mzTesseract.
2. click on Traine  Image.
3. select the file to train.
4. fill or select the data form. e.g. the tesseract exe path, the tesseract lang path....
5. select the image to train.
6. on the right of the "train command" memo you will see button with ... on it.
7. click this button. the memo will fill it self with the command to run.
8. you can change the command if you like or even write new one (any valid tesseract command)
9.click on "start train".
10. after the train complete (the dos windows closed with out error). close the train form.
11. open the image you tarined by clicking on "open image". now you will see the Result of the training.
12. click on any record or select any box.
13. click on the HELP button to see how to fix the box size.

mazluta

unread,
Mar 18, 2015, 12:58:53 PM3/18/15
to tesser...@googlegroups.com
i add some new features.

1. load bmp,png,jpeg,jpg,tif or tiff files.
2. convert between those types.
3. splite tif file to many has are.
4. deskew.
5. ajust one box around the char.
6. ajust all boxes to the char dimention.
7. train image to create new box file.
8. create tr files from pair img/box file.
9. use arrow+sheft+ctrl to fix box dimention
10. sort box file data by any of the columns.
more to come...............


On Wednesday, February 18, 2015 at 5:09:29 PM UTC+2, mazluta wrote:

mazluta

unread,
Mar 22, 2015, 6:26:30 AM3/22/15
to tesser...@googlegroups.com
now the app can help create ne traindata lang from start (create box) to end (create lang.traindata).
all in one exe.

how can i add it to addon applications?

Yossi


On Wednesday, February 18, 2015 at 5:09:29 PM UTC+2, mazluta wrote:

Sriranga(80yrs)

unread,
Mar 22, 2015, 9:44:38 AM3/22/15
to tesser...@googlegroups.com
"now the app can help create ne traindata lang from start (create box) to end (create lang.traindata).
all in one exe."
you mean that using only one exe file can generate traineddata for any lang from the stage of box file? If so - from where I can download your exe file for evaluation purpose?

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/2986678c-6eaf-4047-a43c-07f01643a935%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

mazluta

unread,
Mar 22, 2015, 1:34:39 PM3/22/15
to tesser...@googlegroups.com

Barthelemy

unread,
Mar 25, 2015, 5:34:17 PM3/25/15
to tesser...@googlegroups.com

 

Sorry for late answer: your answer has worked for the box.
I have tested your last version, it is clearly better and very interesting.

I have tesseract 3.01 and 3.02 on my computer

I have already succeeded to make a trainneddata but I had to change between the both version:
for the box I have only good result with 3.01 with 3.02 I have only empty file.

 with 3.02  I have Tesseract Open Source OCR Engine v3.02 with Leptonica
Empty page!!
Empty page!!

and tst.mise.exp1790.box   an empty file...

What does RTL mode mean?
When we create a box, does the language is useful?

When we train a file why not open it directly when it is done?

The possibility to same the image file is usefull  when i use photoshop to make the picture the file can be read by your software but not by tesseract. After saving it inside I have a correct version of my tif for tesseract.

for other files i need to use tesseract 3.02  3.01 does not work for all

Maybe it could be interesting to have the correct version files of tesseract training adapted to your software directly include with it. Or a precise version (url) of the right version to install

What is StdErr mode?  in Dos output result it could be interesting to have the terminal line used.

Actually I have a traindatta file but when I use it, tesseract never end the recognition. I have to Ctrl +C

It could be nice if last folder used remains in memory (the setting option that you added is a nice thing)

Thanks for the new version: it is a very nice evolution :)

Barthelemy

Ps: I have seen some other problems but I will send it in new messages when i will see it again to give you the context



Le mercredi 18 février 2015 16:09:29 UTC+1, mazluta a écrit :

mazluta

unread,
Mar 26, 2015, 4:01:00 AM3/26/15
to tesser...@googlegroups.com
hi barthelemy.

1. thank for checking.
2. i have tesseract 3.2 and i develop this addon for this ver.
3. the empty page come from tesseract in training images. it would be better to create PNG file and work on them (insted of tif).
4. the RTL - for now nothing.
5. stdErr - it anothe way to run tesseract - see docomentation.
6. to see all dos command and result look for MzTesseract.log on the main mz... directory - it's all in there.
7. the mz.. just help to understand the flow of creating the traindata - you have to work close to tesseract documaintation. 
it jus the first ver. i just test some pages. i'm going to create big heb training data and i beleve i will make more change.

see the new "fixchar" bottun near the char view in the bottom-left. nice job :) 
after confirm it change the graphic image - so you can clean the image or correct the char create "big box" around the char, click fix, clean the char box area
and then confirm the changes. now you will have better "area" for tesseract to work on.

yossi


On Wednesday, February 18, 2015 at 5:09:29 PM UTC+2, mazluta wrote:

Milan Lilić

unread,
Dec 14, 2015, 11:06:43 AM12/14/15
to tesseract-ocr
Hallo ,

Your link for download not working.
Please give me new link for download mztesseract

Thanks
Milan

Teerion

unread,
Jan 25, 2016, 4:38:46 AM1/25/16
to tesseract-ocr
Your download link isn't working? Can you please post fresh link. Thank You!
Reply all
Reply to author
Forward
0 new messages