What i need to do fine tuning for only numbers and specific font?

268 views
Skip to first unread message

Yasin Nazlıcan

unread,
Aug 25, 2018, 4:47:31 AM8/25/18
to tesseract-ocr
Hello everyone, I trained tesseract 3.0 with jessbox tools 3 years ago and now I see the game has changed with tesseract 4.0 and I have no idea how to train. What I need is; recognizing only numbers with the specific font. I read the documentation and saw that if I want to train from scratch it will require more time and research. So, I guess, I should fine tune the existing trained data. But i have no idea how to do that. I tried to follow the training documentation but had to stop at the first step. I don't even know if I can train with mac? Is the someone who can point the direction, I should take? Thank you very much.

Soumik Ranjan Dasgupta

unread,
Aug 25, 2018, 4:49:48 AM8/25/18
to tesser...@googlegroups.com
You could try changing the training text consisting of only numbers.Tesseract v4 has the option for training with a custom fontlist, please refer to the wiki.

On Sat, Aug 25, 2018, 10:17 AM Yasin Nazlıcan <yasin.n...@gmail.com> wrote:
Hello everyone, I trained tesseract 3.0 with jessbox tools 3 years ago and now I see the game has changed with tesseract 4.0 and I have no idea how to train. What I need is; recognizing only numbers with the specific font. I read the documentation and saw that if I want to train from scratch it will require more time and research. So, I guess, I should fine tune the existing trained data. But i have no idea how to do that. I tried to follow the training documentation but had to stop at the first step. I don't even know if I can train with mac? Is the someone who can point the direction, I should take? Thank you very much.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ac15383e-2499-4b35-a0fb-353732ba9c54%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Yasin Nazlıcan

unread,
Aug 25, 2018, 9:30:03 AM8/25/18
to tesseract-ocr
Hey Soumik Ranjan,

Thank you for reply, mate. Like I said, I tried the follow this documentation, but I couldn't go further. I couldn't find any info about macOS and had to stop. I assume I should create boxes for font and text and make fine-tuning. Do you have any links for macOS, that I can follow? Also, if you don't mind could you give me some more explanation about the process?

Soumik Ranjan Dasgupta

unread,
Aug 28, 2018, 2:08:46 PM8/28/18
to tesser...@googlegroups.com
Hey Yasin,
Sorry to reply so late. As far as I know, Tesseract doesn't work on MacOs yet. Maybe you can install a Linux environment inside a VM and make-do with it?
No, You don't have to create box files manually, tesstrain.sh will do that for you. In fact, it will take care of the entire training procedure.
If you want to fine-tune, you have to specify the modified architecture in the VGSL specifications as the CLI parameter.
In order to train Tesseract on a custom fontslist, you'd have to install them and then mention the names in two separate files - the font_properties file, and the language-specific.sh file. Note that in both files, you need to enlist the fonts in a particular format.
The traineddata for tesseract 3 is not compatible with the version 4, so it's better if you train from scratch.
Do get back to me if you have any more queries. 

On Sat, Aug 25, 2018 at 3:00 PM Yasin Nazlıcan <yasin.n...@gmail.com> wrote:
Hey Soumik Ranjan,

Thank you for reply, mate. Like I said, I tried the follow this documentation, but I couldn't go further. I couldn't find any info about macOS and had to stop. I assume I should create boxes for font and text and make fine-tuning. Do you have any links for macOS, that I can follow? Also, if you don't mind could you give me some more explanation about the process?

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

For more options, visit https://groups.google.com/d/optout.


--
Regards,
Soumik Ranjan Dasgupta

Yasin Nazlıcan

unread,
Sep 2, 2018, 1:05:15 PM9/2/18
to tesseract-ocr
Hello Soumik,

Thank you for replying back, i find out that we can train tesseract in macOS. But i couldn't make it work,  when I say "make training" it gives me "Need to reconfigure project, so there are no errors" error. Also, I couldn't create ScrollView.jar. Do you know how can i find errors?

Soumik Ranjan Dasgupta

unread,
Sep 16, 2018, 6:19:54 PM9/16/18
to tesser...@googlegroups.com
Hey Yasin,
It seems the error you are facing has not yet been resolved for Tesseract v4. Please follow https://github.com/tesseract-ocr/tesseract/issues/1453 for further details.


On Sun, Sep 2, 2018 at 6:35 PM Yasin Nazlıcan <yasin.n...@gmail.com> wrote:
Hello Soumik,

Thank you for replying back, i find out that we can train tesseract in macOS. But i couldn't make it work,  when I say "make training" it gives me "Need to reconfigure project, so there are no errors" error. Also, I couldn't create ScrollView.jar. Do you know how can i find errors?

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages