It sort of works, but...
Problem One:
OmniPage thinks that all of my .pdf files are color images. They're not.
Once OmniPage is finished OCRing the pdf file, I save it with text embedded.
The resulting files are obese. My original 10 page document was 400k, and
the pdf with text embedded is now 4.4MB. This does not seem normal, as I
have similar pdf files (10 page image with text embedded, similar
resolution, but converted with other programs) that are around 1MB. My
suspicion is that OmniPage is putting in color data that it shouldn't. I
can't figure out a way to strip it out. Compression within Acrobat doesn't
give any significant reductions. Any ideas on how I can make these files
smaller?
Problem Two:
OmniPage has a automation 'feature' in that you can dump several files on to
the dock icon and the program will ocr the files accordingly. The catch is
that the program will STOP in between each file and prompt you to see if you
want to scan all the pages in the file. This prevents unattended operation
and makes the "feature" completely useless. I'd like to dump a few hundred
files to it, and just let it do its thing over a weekend, but I don't really
want to sit around and tell the program that I really, really do want to ocr
the entire document. (Maybe it's worried about the extra 4mb of disk space
it's going to take up--but I digress.) Anyway, is there some way to turn of
the annoying prompt "feature" so I can do this? Also, it wants to convert
all scanned batch files into one big file. I'd prefer to keep each file
separate and just have it save over the original source file. Any ideas?
Does someone have an AppleScript they'd like to share?
Problem Three:
When I take a pdf that has been ocred with OmniPage and open it in Acrobat,
it appears normal. I hit the "touch up text" tool to make a quick edit.
The result? The page that I'm on and next several pages of the document
turn completely white. I can't select text. The pages remain all white
until I switch tools and page around. I know the text is there, as I can
use the "text select" tool just fine. Anyone have this problem? Is there a
workaround?
--
David L. Leon
Law Office of David L. Leon, P.C. telephone: 214.696.0021
6500 Greenville Ave., Ste. 710 mailto:da...@leonlaw.com
Dallas, Texas, 75206 http://www.dallasbusinessattorneys.com
> I have hundreds of previously scanned .pdf files that are image only. I
> wanted to convert them to pdf with embedded text, so they can be searched
> with Sherlock. I have Acrobat 5 (full) and OmniPage X, running on OS X.2.4
> thinking I could just have OmniPage OCR the documents and then resave them
> as pdf, image with text embedded.
>
> It sort of works, but...
>
> Problem One:
>
> OmniPage thinks that all of my .pdf files are color images. They're not.
> Once OmniPage is finished OCRing the pdf file, I save it with text embedded.
> The resulting files are obese. My original 10 page document was 400k, and
> the pdf with text embedded is now 4.4MB. This does not seem normal, as I
I only have OmniPage 8, and it refuses to open greyscale TIFFs because
"this version does not support color images".
I'm guessing OmniPage X now supports greyscale and color images, but
still thinks greyscale = color. And it probably converts the image to
32-bit color, making it much bigger than it was.