i tried Tesseract training for handwritten mathmatical expression recognition but trained data having 100% error rate

97 views
Skip to first unread message

Haris Sheikh

unread,
Dec 18, 2019, 11:39:55 PM12/18/19
to tesseract-ocr
hi i'm using Linux (ubuntu),
i tried tesseract training by following this https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 and i used data set like:
'=' folder -> 26,000 .jpg image files in which = is written in different forms
'+' folder -> 30,000 .jpg image files in which + is written in different forms
so on

i take all the images from each folder and paste it into ground-truth folder and converted those images into .tif format and also created their labels in .gt.txt format
then execute the command: "make training"
it worked fine and it took 5-6 hours to train the dataset, after that i used the data/foo.traineddata file and paste into /usr/local/share/tessdata/ directory and
run command: "tesseract --list-langs" it showed me that there is my file and then

Issue is this:

when i use a sample image having "x+y=0" written, and run tesseract as my language it gives me output as "xxxx" why?

please tell me where i get wrong! 

Timothy Snyder

unread,
Dec 18, 2019, 11:45:39 PM12/18/19
to tesser...@googlegroups.com
Could you provide sample images from the training and testing set? I haven't tried training Tesseract with single characters at a time but you might want to try training on whole expressions like x+y=0.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/9c24849b-69a5-4f6d-928f-da17420adfa3%40googlegroups.com.

Timothy Snyder

unread,
Dec 18, 2019, 11:46:51 PM12/18/19
to tesser...@googlegroups.com
Also, what sort of results are you getting if you recognize one character at a time instead of an entire expression?

Haris Sheikh

unread,
Dec 19, 2019, 1:06:40 AM12/19/19
to tesseract-ocr
it's giving me this:

even if i use the image from dataset like it's + and it gives me output as (x) which is not accurate!

To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.

Haris Sheikh

unread,
Dec 19, 2019, 1:12:45 AM12/19/19
to tesseract-ocr



also this happens if i give "x+y=0" and it consider all of it as (x) ?

On Thursday, 19 December 2019 09:45:39 UTC+5, Timothy Snyder wrote:
Could you provide sample images from the training and testing set? I haven't tried training Tesseract with single characters at a time but you might want to try training on whole expressions like x+y=0.

On Wed, Dec 18, 2019, 11:39 PM Haris Sheikh <cuteh...@gmail.com> wrote:
hi i'm using Linux (ubuntu),
i tried tesseract training by following this https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 and i used data set like:
'=' folder -> 26,000 .jpg image files in which = is written in different forms
'+' folder -> 30,000 .jpg image files in which + is written in different forms
so on

i take all the images from each folder and paste it into ground-truth folder and converted those images into .tif format and also created their labels in .gt.txt format
then execute the command: "make training"
it worked fine and it took 5-6 hours to train the dataset, after that i used the data/foo.traineddata file and paste into /usr/local/share/tessdata/ directory and
run command: "tesseract --list-langs" it showed me that there is my file and then

Issue is this:

when i use a sample image having "x+y=0" written, and run tesseract as my language it gives me output as "xxxx" why?

please tell me where i get wrong! 

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.
Reply all
Reply to author
Forward
Message has been deleted
0 new messages