Any help would be much appreciated. Thanks!
Why limit yourself to MS Word? http://openoffice.org/ . Free speech,
free beer. It's about 110M in size, though.
> I can use, what appears to be, the built-in support XP has for
> scanners, but I can only scan to image files basically. I really need
> a way to scan using OCR to create text files that I can edit (e.g. in
> Word 97).
The documents that MS Word produces by default are *not* "text files".
They're Microsoft's proprietary invention, and they can and do contain a
bunch of stuff other than text. Take a look at a .DOC file and a .TXT
file in a hex editor and you'll see what I mean.
> Any ideas how to do this, or do I have to go buy software to
> accomplish my goal?
Unfortunately, there aren't any Free OCR programs that can do decent
recognition right now. GNU OCRad is pretty sad, and it doesn't even
have an interface unless you count kooka. Sometimes, scanner
manufacturers ship scanners with a CD containing limited versions of
popular proprietary OCR packages. If you don't have one of those CDs,
ISTR hearing something about 'DozeXP having some kind of rudimentary OCR
capability built-in. I can't verify that since all I have here is 2K
and I'd have to reboot to use that. Try Googling and digging around on
your system.
If you really need good OCR, there are several commercial packages
available. Abbyy Finereader gets a lot of good press. Expervision
Typereader and Caere Omnipage are OK, but some people say that
Finereader works better. YMMV. If you have good original images
(typeset, high-contrast, few speckles, scannable at 300 DPI
black-and-white, no tiny/gigantic/broken/mangled type) then any of those
engines should do a decent job. No OCR engine is perfect though. HTH,
--
Matt G|There is no Darkness in eternity/But only Light too dim for us to see
Hire me! http://crow202.dyndns.org/~mhgraham/resume/
Programmers are playwrights, Computers are lousy actors,
Users are vicious drama critics, BOFHen burn down theatres!
If it's not included with the scanner software, then you'll have to
buy an OCR program, such as OmniPage Pro (which works pretty darn well).
Keep in mind these are machines reading text, so you'll always get
some errors you have to manually correct. Still, you can process a
several hundred page book scan in about 1/2 an hour with Omnipage since
the error rates are pretty low.
Here, if you're doing large volume text to OCR scans, you've really
bought the wrong scanner. You'll need at least a Canon DR-2080C
continuous feed duplex scanner that'll push through 20+ppm, and works
great at gobbling up hundreds of pages per hour (like it does here)
without a single jam. Highly recommended for this purpose!
=)
The latest version of Vuescan V82.02 can do OCR, just tried it. Will
wonders never cease.
Don't agree, try SimpleOCR from http://www.simpleocr.com/Info.asp
-- I did a plaintext document with BASIC formatting and found
hardly any difference from the result FineReader gave. But yes, anything with
more complex formatting will probably not be comparable. And you don't get
any bells and whistles like the ability to read PDF as input image files or
spell checkers.
Domestic licences for FR9/10 are something in the range of gbp55-75 IIRC
which is not a huge sum if you are going to be using the product for years,
I've used since 2003 and updated once. You can;t download a trial version of
OMNIPAGE and Textbridge doesn't look like it is being as actively developed.
>have an interface unless you count kooka. Sometimes, scanner
>manufacturers ship scanners with a CD containing limited versions of
>popular proprietary OCR packages. If you don't have one of those CDs,
>ISTR hearing something about 'DozeXP having some kind of rudimentary OCR
>capability built-in. I can't verify that since all I have here is 2K
>and I'd have to reboot to use that. Try Googling and digging around on
>your system.
>
>If you really need good OCR, there are several commercial packages
>available. Abbyy Finereader gets a lot of good press. Expervision
>Typereader and Caere Omnipage are OK, but some people say that
>Finereader works better. YMMV. If you have good original images
>(typeset, high-contrast, few speckles, scannable at 300 DPI
>black-and-white, no tiny/gigantic/broken/mangled type) then any of those
>engines should do a decent job. No OCR engine is perfect though. HTH,
>
Few more which I have NOT tried, are these the ones you mention?
http://code.google.com/p/ocropus/
http://www.paperfile.net/
http://www.freeocr.net/