What is the working process of doing multiple images OCR using imagelist.txt

54 views
Skip to first unread message

mit

unread,
Apr 17, 2020, 6:42:25 AM4/17/20
to tesseract-ocr
Hi,

I want to know the internal memory working of tesseract when multiple images are defined in a txt file for doing ocr of those.
Does it loads all the images in the RAM and then process it? Or it loads one by one and process it.

Thanks

adamuk73

unread,
Apr 17, 2020, 8:57:38 AM4/17/20
to tesseract-ocr
I ran a test on tesseract 4 on Centos 8 running around 67 MB of tiff images (around 260) and the process consistently took just over 100 meg of RAM though it did take a long time to process.  i7-6600U running 2 cores on a VM with 2 GB of RAM

mit

unread,
Apr 17, 2020, 9:15:28 AM4/17/20
to tesseract-ocr
Hi,

Thanks for answering, but my question was a bit different.
I want to know when we place imagepaths in a txt file,how does tesseract process it? 
Does it load one at a time and process it or loads all the images in memory and process those one by one.

Thanks

adamuk73

unread,
Apr 17, 2020, 9:22:41 AM4/17/20
to tesseract-ocr
Running the same test with double the DPI doubles the memory usage.

mit

unread,
Apr 17, 2020, 9:28:06 AM4/17/20
to tesseract-ocr
I am not sure if you got my question,

Zdenko Podobny

unread,
Apr 17, 2020, 10:14:29 AM4/17/20
to tesser...@googlegroups.com

pi 17. 4. 2020 o 12:42 mit <kollol...@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/1ae57aec-c161-41bd-ac5c-aa7464ec17e7%40googlegroups.com.

mit

unread,
Apr 17, 2020, 11:59:31 AM4/17/20
to tesseract-ocr
Thanks a lot for clarifying.


On Friday, April 17, 2020 at 7:44:29 PM UTC+5:30, zdenop wrote:
pi 17. 4. 2020 o 12:42 mit <kollol...@gmail.com> napísal(a):
Hi,

I want to know the internal memory working of tesseract when multiple images are defined in a txt file for doing ocr of those.
Does it loads all the images in the RAM and then process it? Or it loads one by one and process it.

Thanks

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages