Well, no statistics from me but I have used "tesseract" (top 3 engines
in the 1995 UNLV Accuracy) http://code.google.com/p/tesseract-ocr/
I receive about 500 faxes a day and wanted to process them. My problem
was that the files needed to be in certain resolution for ocr to work
correctly. Here is the discussion:
http://groups.google.com/group/tesseract-ocr/browse_thread/thread/24bb53ae551eb38c/c25481d22fb10fe3?lnk=gst&q=szybalski#c25481d22fb10fe3
From my initial research I though they were able to achieved 99%+ accuracy.
I didn't have time too look into how to pre-process fax images to
200x200, but if you will be looking into this please let me know as I
would like to get the processing rolling.
Thanks,
Lucas
Just out of curiosity? What type of images are these? Are these fax
images that are received by (hylafax fax server) or these are actual
paper faxes that are somehow scanned into images?
Lucas
Questions to answer here are:
What is a cost per 1000 faxes,
How many faxes can they receive at time (Just 1 line, everybody else
waits in line?)
What is the file format they will email you? (default I think is non
searchable pdf) (option might be a tif file)
How will you transfer files from email account to some kind of
database/managing system. (I guess you could point it to some mailbox
and write a little program to extract attachment and add it to the
database.
Will they provide incoming fax#, date,time. Is that in the email and
available to you or is that in the fax image?
Are they using Class 1 faxing standard that most fax machine
understand, or higher?
Alternative, if you already managing faxlines is to use open source
hylafax fax server on a pc, and few $30 cheap modems. I guess it
really depends on how much control you want to have over incoming
faxes.
>
> To avoid managing the fax lines, pending further investigation. Let
> me know if this is a bad idea.
>
> Really, only the cover page has to be accurately text-decoded so that
> the response can automatically be associated with an account, and the
> requester notified.
What happens if there is no cover page?
Perhaps this can be done using a keyword
> followed by a tracking number. We can provide the return cover page
> with the request, or ask that they include the keyword and tracking
> number in a printed cover page.
If you sending a page and want to receive it back, I guess you could
use bar codes or tracking number as you mentioned.
Responses where the keyword isn't
> recognized can be routed by hand, by volunteers, but it's hoped that
> this is the minority case.
>
> If the rest of the sent documents can be OCR'd if the user decides to
> share them on the site, bonus.
Lucas