Tesseract 3.0.4 Speed poll

207 views
Skip to first unread message

viraf

unread,
Feb 17, 2016, 5:50:20 PM2/17/16
to tesseract-ocr

I am exploring Tesseract 3.0.4 (using Tess4J 3.0) and wanted to poll the community on the performance (speed) that they are observing for fax documents.

 

I have am processing TIFF (CCITT T.6) images - 2509 x 3530 @ 300 dpi (1 bit - i.e. BW).that are in English (a mixture of forms, letters and diagrams) on an Intel i7-4800 MQ @ 2.7GHz and observing approximately 6 PPM using a single thread and no GPU usage.  I observed similar results on both Windows and Linux.

 

Is this representative of the performance that you are observing?  What did you do to improve the performance ?


Thanks


- viraf

Helmut Wollmersdorfer

unread,
Feb 19, 2016, 2:53:48 AM2/19/16
to tesseract-ocr
That's a reasonable speed. On i5 @ 1500 it's around 1 page per minute. AFAIK you can only use faster or more hardware.

Tom Morris

unread,
Mar 14, 2016, 4:14:32 PM3/14/16
to tesseract-ocr
On Wednesday, February 17, 2016 at 5:50:20 PM UTC-5, viraf wrote:

I am exploring Tesseract 3.0.4 (using Tess4J 3.0) and wanted to poll the community on the performance (speed) that they are observing for fax documents.

 

I have am processing TIFF (CCITT T.6) images - 2509 x 3530 @ 300 dpi (1 bit - i.e. BW).that are in English (a mixture of forms, letters and diagrams) on an Intel i7-4800 MQ @ 2.7GHz and observing approximately 6 PPM using a single thread and no GPU usage.  I observed similar results on both Windows and Linux.

 

Is this representative of the performance that you are observing? 


I recently benchmarked using the UNLV 300 dpi bitonal document sets and got the following results with 3.04.01 running from the command line with a new invocation of Tesseract for every page. The times below are for a single thread on a 2.6 GHz Core i7 MacBook Pro:

bus.3B - business letters, 200 files, 367 seconds
doe3.3B - tech docs, reports & other DOE images, 785 files, 1650 seconds
news.3B - newspaper articles, 200 files, 618 seconds

In summary, about 1.8-3.1 seconds/page, depending on the content. Performance is highly dependent on content. Dense text, noisy scans, small fonts, etc will all increase processing time.

The benchmark files are all available for download if anyone wants to do their own testing:
and there are some basic directions here:

Tom
Reply all
Reply to author
Forward
0 new messages