TextRecognitionDataGenerator generates more realistic images than text2image

54 views
Skip to first unread message

Des Bw

unread,
Nov 8, 2023, 12:51:51 AM11/8/23
to tesseract-ocr
text2image is a great script shipped with Tesseract. It is used to generate synthetic data to produce images from text files. It has a few control parameters to make the generated images similar to scanned images. 

But, I have lately learned that the images generated by text2image are nowhere realistic as the ones generated by https://github.com/Belval/TextRecognitionDataGenerator. The latter tool has more powerful controls to produce the exact type of image you want to generate. 


- has anyway found a way of making tesseract work with other text generation tools such as TextRecognitionDataGenerator?
- if so, what is the experience?
- and for the developers, is there anyways to replace text2image with TextRecognitionDataGenerator?

Des Bw

unread,
Nov 8, 2023, 12:54:39 AM11/8/23
to tesseract-ocr
The interesting part is: TextRecognitionDataGenerator does also generate tesseract compatible box files. But, I find no easy way to produce training files (such as lstm, .tif and the like ones) from the images and the box files made by TextRecognitionDataGenerator.  I am pretty sure a little experienced users already know how to do that. 
Reply all
Reply to author
Forward
0 new messages