Have you got any good experience with OCR and furigana recently? Would Abby or Adobe Acrobat's OCR be able to handle it?
--
You received this message because you are subscribed to the Google Groups "Honyaku E<>J translation list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to honyaku+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Matt,I seldom get image PDFs anymore, but I was wondering how Abbey and ReadIris compare with Adobe's onboard OCR engine in terms of accuracy and ease of use. Any recommendations?
Matt,I used Abbyy on my previous XP computer and was relatively happy with its performance. I particularly like the preconversion editing option, and it does pick up furigana.The only thing I didn't like was that it tended to insert Chinese characters into the spots it considered to have poor recognition. I was unable to tweak the system (nonunicode programs set to Japanese, etc.) to make it use Japanese kanji automatically even though I had a Japanese font selected, and I had to make the corrections manually from the list of options provided.I wrote that I seldom get "image PDFs" anymore. Almost all my documents are regular Word or text-extractable PDFs, so I usually don't have to mess with seals and other ugliness.Thanks for your insight.John----------------On Thu, May 29, 2014 at 11:23 AM, Matthew Schlecht <matthew.f...@gmail.com> wrote:
On Thu, May 29, 2014 at 11:03 AM, John Stroman <stromana...@gmail.com> wrote:
Matt,I seldom get image PDFs anymore, but I was wondering how Abbey and ReadIris compare with Adobe's onboard OCR engine in terms of accuracy and ease of use. Any recommendations?I don't have any experience with Abbyy.When the legibility is good enough, the onboard PDF OCR engine works fine. Below a certain level, the accuracy drops off for the PDF engine, faster than for ReadIRIS.
I really like OmniPage for European languages because of the spellchecking and dictionary options. A few years ago OP (Nuance) added an engine for Asian languages, but the accuracy was inferior to ReadIRIS. I know that OmniPage came out with a new version within the last year, but I don't know if it includes any upgrade to the kanji/kana capability.
When you write that you seldom receive PDFs any longer, does that mean most of what you get is in DOC or RTF? I've noticed that some clients run whatever they get through OCR to generate what looks like an original in DOC or RTF. The kludgy formatting often gives it away, though. Mid-sentence line or page breaks, odd collection of font attributes, and the rendering of stamps into punctuation gibberish.
Matthew Schlecht, PhD