<http://img90.imageshack.us/img90/2418/badke2.png>
59,520 bytes
When I convert it to PDF (File->Create PDF->From Single File) and then use Acrobat to "Document->OCR Text Recognize->Recognize Text using OCR", Acrobat always crashes.
Is this true for anyone else that could try it?
It kills my batch processing and is making this large conversion quite painful. Is there a way around it?
Here is a two page PDF with a different page that crashes Acrobat 9:
<http://rapidshare.com/files/152486163/TwoPages.pdf.html>
Acrobat 9 crashes when I try to OCR page 2, but works well on page 1. They are both the same DPI as far as I can tell.
It would be nice if someone with AA9 could confirm this.
This thread relates to crashes with OCR. Your post does not. Please
start your own thread. Don't hijack unrelated threads.
Mike
Could you, please send it to me as attachment to osat...@adobe.com.
Thanks,
Olga
Olga Satchouk,
Acrobat QE
Here are the current 8 pages that crash Acrobat 9 during OCR in a single PDF if you can test:
Thanks again for your help.
Thanks
Thanks.
When I try to OCR a .pdf document to perform searches, it always crashes with the following error:
AppName: acrobat.exe AppVer: 9.0.0.332 ModName: ocrlibraryinf.dll
ModVer: 2.0.0.1 Offset: 000206f1
For those of you who only have a few pdf's to ocr, try extracting the culprit pages as tif, then importing them back in. It worked for me, but I cant do that for 1000+ documents.
Does anyone have any idea how to get around this problem now, or when the mystery version 9.1 that supposedly corrects it will be released. Acrobat is useless to our organization with this mission critical bug and it is frustrating that Adbobe has not been more responsive.
Also, does anyone have the e-mail address of someone higher up at Adobe so that we can elevate this issue, or at least let them know of Adobe's failure to repsond to all our requests.
I also found that AA8 crashed during running OCR. Huge shortcoming.... AA8 does everything in memory, stores nothing in temporary files, forgets everything it did upon crashing, forces a mammoth time consuming restart from the beginning.
I finally got through them all by breaking each file into 2 files (a and b), then running OCR 100 pages at a time within each file, with constant attention.
A random error message, "Cannot find file" keept popping up, stopping, waits for me to click on "OK", adding to huge delays. Gotta constantly watch for this error. Even though I click on "Ignore this message", it does not listen, keeps popping up.
But at least, as it successfully gets through 100 pages, I can save it, then carry on to next 100 pages. If it crashes, I can restart to the last successful 100 pages completed.
I conclude that AA8 absolutely hates large files. My computer is a 2.6Ghz DualCore with 3.5 megs of memory, still not enough memory.
Took me an entire week to work my way through these 63 files, which BTW, tied up my computer for that whole time, insufficient memory to do other things.
That AA8 does not work with temporary files is a huge shortcoming when working with large PDF files. I find it astonishing that upon an unexpected failure, it has no way to remember where it was when the crash occured.
This situation begs the question...... Is there any other utility out there that will make a PDF file searchable, that is not made searchable when first created by some utility other than Acrobat?
Regards,
Terry Smythe
Winnipeg, Canada
smy...@shaw.ca
Adobe batch handing capability compounds the problem. Instead of loading, performing OCR and saving one document at a time, it tries loading them all at once into memory. I am trying to perform OCR on 100,000 small documents, so that method is a disaster. One crash and everything is lost. As everyone has noted - a crash is certain, so Acrobat is basically useless in its current state for OCR. This has been true for Acrobat 7, 8 and 9.
This is all vetry frustrating.
I am able to perform OCR on these pages using OmniPage Pro 1.5 with no
problem, so it is not the pages.
Agreed. In every case, where AA8 crashed, I was able to have AA8 run OCR against the offending page as if nothing was wrong, then carry on. Very mysterious, and extremely aggravating.
But the big question..... Is there another utility out there that will run OCR in such a way that the PDF file becomes searchable thereafter?
I have no trouble running OCR from any number of OCR packages, and all work just fine, but the OCR results are always external to the PDF file. The PDF file remains non-searchable even after running it with ABBYY, OmniPagePro, TextBridge, ScanSoft, etc.
So far, AA7 or better is the only utility I have found that when OCR is run, it leaves behind a PDF file that is searchable.
This is important in the case of a very large set of very large PDF files initially created by some utility other than Acrobat. 100% of these PDF files are not searchable.
In my case, some 50,000+ pages of a historical newspaper, 1881 to 1943, were scanned into TIFF format by some automated process, likely using an ADF. Then the TIFF files were converted by some automated utility into PDF files, all non-searchable.
I want to concatenate the TIFF files by year, then convert these yearly files into yearly PDF files. But such a process leaves them all non-searchable.
I've basically done this by using AA8, but the process was incredibly time consuming and aggravating, requiring constant attention for all these dumb repetitive errors that keep popping up, ignoring my earlier selection to ignore all errors. GGrrrrr..................... Urge to kill........... :-)
Regards,
Terry
I was able to reproduce crash in both cases. Fix will be available in
the next dot release of Acrobat - 9.1.
How about the same fix for AA8.1.3, for those of us volunteers who can't afford the high cost of version 9? We don't have a company budget to fall back on, even though the work we are doing clearly benefits society as a whole.
At least you'll have the PDF doc in one piece, even though certain pages in the doc won't have been OCR'd.
Then you can insert re-scanned pages into the appropriate spot.
This was a workaround that worked for me.
Curiously, If I took note of the offending page where it crashed, AA8 would OCR process the 10 pages embracing the offending page, quite normally, if I sent it to process just those pages.
As a consequence, rather than trust it to crash at same spot, I elected to break the files in half, then OCR process 100 pages at a time, saving the file at conclusion of each group. It might still crash occasionally, but at least I did not have to repeat what had already been done successfully.
Knitting the broken files together after successful OCR processing is really quite trivial, done in seconds, not a hardship.
But how nice it would be if the fix applied to Version 9 would also be applied to version 8.1.3. As a volunteer, I can't afford version 9, and my version 8.1.3 otherwise does the job, albeit with aggravation.
When the next big group of similar PDF files emerge, I won't waste so much time experimenting. I'll just repeat this process from the beginning.
Regards, and thank you for thinking of me, appreciated.
Terry Smythe
What kind of fix do you mean Adobe has applied to AA9? Since installing the boxed version, Adobe's Update application never found any updates for AA9 :-(
As far as I can tell the OCR engine is much more stable now! No more crashes so far...