Mike and Patrick,
Thank you for the comment.
Mike, can you clarify the "preprocessing well"?
Regards,
mw18888
On Jul 6, 7:07 am, "Lutz, Michael" <
ML...@nds.com> wrote:
> If you are referring tohttp://
www.abbyyusa.com/, then I think the biggest difference is that tesseract is open source and abbyy not :).
> So in ABBYY you pay for the image preprocessing and in tesseract not.
> I totally agree with Patrick, if you do the preprocessing well then I always get perfect result with tesseract, but I never tried ABBYY.
>
> Mike
>
> Von:
tesser...@googlegroups.com [mailto:
tesser...@googlegroups.com] Im Auftrag von Patrick Questembert
> Gesendet: Mittwoch, 6. Juli 2011 12:54
> An:
tesser...@googlegroups.com
> Betreff: Re: Teseract vs Abbyy
>
> It's really a long list of approaches, including:
> - spacing: we don't trust any spacing determination by Tesseract and reevaluate every space indicated by Tesseract for possible elimination or consider every two letters for a possible space insertion
> - obvious mistakes: this is by far the largest category of corrections we make. For example VV is usually corrected back to W - but there are hundreds more cases
> - ambiguous letters such as i versus l: surprisingly, Tesseract makes a ton of incongruous mistakes that lead me to believe there is no feature analysis whatsoever - for example a 'y' may get mapped to 'g', even though there is 0% chance of that based on a wide open gap on top. For these types of mistakes we go back to the source image to apply our own OCR of sorts.
> - dictionaries: another big disappointment - from our testing we found that Tesseract applies the dictionary in less than 5% of the cases where it should (i.e. where the letter mistake is one listed in the ambigs files, with the correct spelling in the user dictionary) so we implemented our own dictionaries
> - pattern matching: the regular expressions we use include wide tolerance for mistakes. Under the "protection" of a regular expression for a specific pattern we have the flexibility to include hundreds of ambiguities (because these trigger only when they help complete a match which makes it more likely to be a valid substitution
>
> PatrickOn Mon, Jul 4, 2011 at 12:56 AM, Andres <
andrej...@gmail.com<mailto:
andrej...@gmail.com>> wrote:
>
> Hello Patrick,
>
> Could you extend a little about what do you mean with Tesseract heuristics ?
>
> Thanks,
>
> Andres
> 2011/7/3 patrickq <
patrick.questemb...@gmail.com<mailto:
patrick.questemb...@gmail.com>>
> The answer is (of course) "it depends":
> 1. If you compare Tesseract and ABBY on a same image, without applying
> preprocessing to it, ABBY wins (because Tesseract's image processing
> is very rudimentary - at best). Of course if your test images are
> produced (for example) by a flatbed scanner, the lack of image
> processing is not an issue and refer to case 2 below.
> 2. If you compare Tesseract and ABBY on a clean (processed) image,
> without applying any post-Tesseract heuristic, ABBY may have an
> advantage
> 3. However, if you compare Tesseract + image processing + heuristics &
> corrections, Tesseract actually beats ABBY hands down.
>
> ScanBizCards is case #3 around Tesseract 3.01. If you want to test
> this combo please do this:
> - go tohttp://
www.scanbizcards.com/webdemo
>
> - upload an image (under Batch Actions). Warning: ScanBizCards is
> geared towards recognizing text on business cards so it would be best
> if you tested on something *like* a business card (sparse text), not a
> full page with lots of text
> - click that image then "Image Editor" on top and OCR it
> - when done testing please delete the test images from this demo
> account (or get your own online account) ...
>
> You can also test instead on your Android or iPhone mobile device by
> installing the free version of ScanBizCards. ABBY powers two iPhone
> apps made by German company - Business Card Reader (by Shape Services)
> and Card Reader (by xRoot Software) - and of course ABBY's own
> iPhone / Android business card reader app.
>
> Patrick
>
> On Jul 3, 10:10 am, mw18888 <
man_...@yahoo.com<mailto:
man_...@yahoo.com>> wrote:
>
> > Can anyone comment on the accuracy of Tesseract vs Abbyy?
>
> > Regards,
>
> > mw18888
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to
tesser...@googlegroups.com<mailto:
tesser...@googlegroups.com>
> To unsubscribe from this group, send email to
>
tesseract-oc...@googlegroups.com<mailto:
tesseract-ocr%2Bunsu...@googlegroups.com>
> For more options, visit this group athttp://
groups.google.com/group/tesseract-ocr?hl=en
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to
tesser...@googlegroups.com<mailto:
tesser...@googlegroups.com>
> To unsubscribe from this group, send email to
>
tesseract-oc...@googlegroups.com<mailto:
tesseract-ocr%2Bunsu...@googlegroups.com>
> For more options, visit this group athttp://
groups.google.com/group/tesseract-ocr?hl=en
> For more options, visit this group athttp://
groups.google.com/group/tesseract-ocr?hl=en
>
> ________________________________
> This message is confidential and intended only for the addressee. If you have received this message in error, please immediately notify the
postmas...@nds.com and delete it from your system as well as any copies. The content of e-mails as well as traffic data may be monitored by NDS for employment and security purposes.