Can you convert printed words into text?

Tom

unread,

Feb 8, 2010, 5:23:59 PM2/8/10

to

Hi,

Is there a software program that lets you scan printed words, convert the
characters into text and save them in a text file?. If so, this is one of
the most useful software.

Tom

Message has been deleted

Frank Slootweg

unread,

Feb 8, 2010, 5:37:00 PM2/8/10

to

The software which came with my (EPSON) scanner does just that. AFAIK,
most scanners which can scan to PDF format have similar software.

Once you have scanned the document into a PDF file, you can do a
(File->) Save as Text... in Adobe Reader and you will get a text (.txt)
file.

Works wonders, especially if the quality/readability of the original
document is good.

idgat

unread,

Feb 8, 2010, 6:06:59 PM2/8/10

to

OCR as mentioned above.

But beware - there's OCR software and there's OCR software. You get
what you pay for.

Some free OCR software is about as useful as tits on a bull. Often,
it's less trouble to retype the whole thing than to proof read the OCR
converted document and correct the errors.

"i" for "l" and vice versa
"v v" for "w"
"m" for "n n"
... the list goes on.

And if there's any detailed formatting (images, fonts changes,
columns, tables, etc) ... FORGET IT!

And even occassionally, the commercial software is not much better.
--
idgat
Compuglobalhypermeganet Inc.

son of a bitch

unread,

Feb 8, 2010, 6:58:45 PM2/8/10

to

It's called OCR, Optical Character Recognition

Not worth the Cardboard box they came in.

Require lots of Proof Reading and Correction. You can expect to
fix at least 10% of any page and can go to 50%. If it has Tables, don't
even think about it. If it has Headers/Footers with Logo, this too will
be screwed up. If the Page has a Font it hasn't seen before you have
to train it by correcting every little mistake it made. The Font
Name rarely matching the Original Font and size. If it's a Copy of the
Original Document, then multiply these problems by 2 for each time the
document has been Photocopied.

Depending how many pages you want do, If it's less than 4 Pages,
it's actually quicker and easier to just re-type it.

A Higher Price does not mean it will do a better job either. I've seen a
$150 program out perform (in the number of manual corrections) a $1,000
program. They have gotten better over the Years, but they're still
rubbish, not to mention complicated to use.

Don McKenzie

unread,

Feb 8, 2010, 7:03:28 PM2/8/10

to

http://tinyurl.com/y9svwbp

Cheers Don...

--
Don McKenzie

Site Map: http://www.dontronics.com/sitemap
E-Mail Contact Page: http://www.dontronics.com/email
Web Camera Page: http://www.dontronics.com/webcam
No More Damn Spam: http://www.dontronics.com/spam

Product Sellout: 15% OFF 4DSystems OLED Displays & modules.
http://www.dontronics-shop.com/micro-oled.html

Rod Speed

unread,

Feb 8, 2010, 8:57:12 PM2/8/10

to

Tom wrote

> Is there a software program that lets you scan printed words, convert the characters into text and save them in a text
> file?.

Yes, usually called OCR.

> If so, this is one of the most useful software.

Not really.

Marts

unread,

Feb 8, 2010, 9:16:44 PM2/8/10

to

Frank Slootweg wrote...

> The software which came with my (EPSON) scanner does just that. AFAIK,
> most scanners which can scan to PDF format have similar software.

I think that most of them use the Omnipage engine, don't they?

Certainly, our range of Canon scanner/printer MFCs do.

atom

unread,

Feb 8, 2010, 11:28:53 PM2/8/10

to

Frank Slootweg <th...@ddress.is.invalid> writes:

Distributed Proofreaders who now feed the largest proportion of new texts into Project Gutenberg,
typically 100-200 per month, use and recommend Abby FineReader. Adobe Acrobat also comes with
OCR, but the text recognition is not as good.

Atom Egoyan,
Melbourne, Australia

Mike

unread,

Feb 9, 2010, 1:01:48 AM2/9/10

to

On 2010-02-09 14:28:53 +1000, atom <at...@vic.bigpond.net.au> said:

>>
>
> Distributed Proofreaders who now feed the largest proportion of new
> texts into Project Gutenberg,
> typically 100-200 per month, use and recommend Abby FineReader. Adobe
> Acrobat also comes with
> OCR, but the text recognition is not as good.
>

I recently tried the Abby FineReader software which came bundled with a
scanner. It did a completely shit job and it would have been faster to
retype the document.

SolomonW

unread,

Feb 9, 2010, 6:02:27 AM2/9/10

to

I used Abby for years and find it extremely good even on lousy texts.

atom

unread,

Feb 9, 2010, 6:57:52 AM2/9/10

to

SolomonW <Solo...@nospamMail.com> writes:

And just to follow up my own post, Distributed Proofreaders is doing
OCR on an industrial scale, thousands of pages a month. They've
tried every product on the market, and Abby was the clear winner.

Mike: It is possible that the Abby bundled with your scanner was
some kind of 'lite' version, although I have no data on that.

Atom Egoyan
Melbourne, Australia

Frank Slootweg

unread,

Feb 9, 2010, 10:53:25 AM2/9/10

to

I've checked, and the software which comes with my (EPSON) scanner is
indeed ABBYY FineReader 6.0 Sprint.

Terryc

unread,

Feb 9, 2010, 9:34:33 PM2/9/10

to

atom wrote:

> And just to follow up my own post, Distributed Proofreaders is doing
> OCR on an industrial scale, thousands of pages a month. They've
> tried every product on the market, and Abby was the clear winner.

Not unless they showed you the test docos it isn't.
Particular scanners can excell in particular areas and any scanner that
doesn' excell at recently printed black text on a white page with
straight level lines is hard to find.

"Industrial scanning" implies auto feed of cut sheet paper, so somethng
optimised for that would excell.

Try scanning war diares, old recipes, etc and see how you go.

Just a heads up to be wary of horses for courses.

atom

unread,

Feb 10, 2010, 1:20:05 AM2/10/10

to

Terryc <newsfour...@woa.com.au> writes:

>atom wrote:

Distributed Proofreaders are only scanning printed works, not manuscripts.
Some of those works date from the 17th century, but the majority are more
recent. Only a tiny proportion would be cut; the majority are bound
works. When I use the term 'on an industrial scale' in this context, I mean
that they are not just scanning and OCRing the occasional work; every month
another couple of hundred complete books find their way into Project
Gutenberg, fully proofread and formatted.

They have fine-tuned their operation over the past few years, including
selecting the best software currently available. As you know, for
OCR to be effective, it has to achieve at least 90% accuracy [which
would still be considered only barely acceptable]. The scanned works
come from a wide range of scanners, and DP specify the desirable
characteristics of the scanned file before putting it into Abby. Only
if a work was especially rare, fragile or important, would they put
up with an inferior scan.

I don't think there's any need to sneer. Since the original poster
wanted to know the 'best' OCR software, I've given him a full context
for my opinion, and he is at liberty to accept it or reject it.

Atom Egoyan
Melbourne, Australia

Tom

unread,

Feb 12, 2010, 6:42:47 PM2/12/10

to

I am now all the more wise about the limitations of OCR after reading your
contributions. My thanks to you all.

Tom

"Tom" <tcl...@hotmail.com> wrote in message
news:3a0cn.6448$pv....@news-server.bigpond.net.au...