I've reached a limit of sorts through trial and error with
commercially available compression programs. For example, when I scan a
page of typewritten or printed text at 300 dpi as a bitmap file, it
comes out about 1.1Mb (a 200 dpi scan comes out about 500Kb). I use 300
dpi because some fonts on some of the scanned pages are small (6 or 7
point).
The same 300 dpi scanned image comes out about 130Kb if I use
the .tif file format, about 110Kb if I use the jpeg format, and about
97Kb if I use the jbg format. The decompressed images from those
formats are all sufficiently readable.
When I do a second compression on the .tif file or the jpeg file
using AT&T's DejaVu compression software, the file reduces to about
55Kb. This is the best I've been able to achieve with readily available
utilities, but it isn't small enough for my purposes. Also, the DejaVu
engine and the overall 2 step compression process are very slow. Hence,
my challenge.
Can anyone offer any advice/guidance/suggestions? I'm willing
to look at beta software or pure research if available. Thanks in
advance for any help.
Jason Wallace
Voice: (310) 546-9430 (California)
>
> I've reached a limit of sorts through trial and error with
> commercially available compression programs. For example, when I scan a
> page of typewritten or printed text at 300 dpi as a bitmap file, it
> comes out about 1.1Mb (a 200 dpi scan comes out about 500Kb). I use 300
> dpi because some fonts on some of the scanned pages are small (6 or 7
> point).
>
bitmap? windows sucks :)
>
> The same 300 dpi scanned image comes out about 130Kb if I use
> the .tif file format, about 110Kb if I use the jpeg format, and about
> 97Kb if I use the jbg format. The decompressed images from those
> formats are all sufficiently readable.
tif/jpeg, not smart he?, tif/jpeg are meant for COLOR FOTO's not b/w
images, try using png, gif or even pcx, and make sure they are saved as
b/w.
>
> Can anyone offer any advice/guidance/suggestions? I'm willing
> to look at beta software or pure research if available. Thanks in
> advance for any help.
>
Png look the best choise to me (Portable Network Graphics).
> I have a practical compression problem that I can't find a solution
> too. I need to compress (quickly) 300 dpi scanned images of black and
> white typewritten or printed text down to about 20-25Kb or less. And on
> the other end, the decompressed image must be readable with a commercial
> image viewer (Windows utility or Internet
> browser).
Use PNG in grayscale mode, and make sure that filtering is
turned on in your preferences.
-jc
--
* -jc IS *NOW* feld...@cryogen.com
* Home page: http://members.tripod.com/~afeldspar/index.html
* The home of >>Failed Pilots Playhouse<<
* "Better you hold me close than understand..." Thomas Dolby
Hi!
I carrie out research work on the topic "Development of lossless image
compression algorithm". As the result of this work new algorithm was
developed. It was tried on scanned A4 text page and gives the best
results (I have no DejaVu compressor):
TIFF Fax 125,694 bytes
TIFF Fax + pkzip 114,146 bytes
PNG 115,278 bytes
PNG + pkzip 111,495 bytes
My format 99,046 bytes
My format + pkzip 98,970 bytes
It's not a 20-25 Kbytes, but it's better than TIFF or PNG.
For details e-mail zh...@mail.ru
With respect
Sergey Zherzdev.
> I have a practical compression problem that I can't find a solution
>too. I need to compress (quickly) 300 dpi scanned images of black and
>white typewritten or printed text down to about 20-25Kb or less. And on
>the other end, the decompressed image must be readable with a commercial
>image viewer (Windows utility or Internet
>browser).
One possible solution is to use lossy JBIG2. Depending on the complexity
of your page images, 20-25K sounds feasible. One attribute of JBIG2 is
that you can compress *multiple pages* at once, improving the compression
ratio as more pages are added. For example, one 6-page document I've
experimented with (each page is a 300dpi scanned image of typewritten text)
compresses down to about 25K total - under 5K per page. Compressing each
page separately yields a 35K file.
Note that this is lossy compression - there is a chance (depending on how
good the compressor is) that you may notice problems in the decompressed
images. However, it may be the only way to get the pages down to 20-25K.
The time taken to encode JBIG2, for the compressor I use, is on the order
of 0.5-5 seconds per page, varying depending on the page complexity.
Decoding is much faster - as low as 0.1 seconds per page.
There aren't currently any commercial viewers that can read JBIG2 files,
but this situation is improving. Source code for a JBIG2 decoder is
available at http://spmg.ece.ubc.ca/jbig2/ .
Note that JBIG2 has not been officially approved as a standard yet, and
there's some chance that the format could change slightly between now and
approval (scheduled for March 2000). The JBIG committee will keep changes,
if any, to a minimum.
> When I do a second compression on the .tif file or the jpeg file
>using AT&T's DejaVu compression software, the file reduces to about
>55Kb. This is the best I've been able to achieve with readily available
>utilities, but it isn't small enough for my purposes. Also, the DejaVu
>engine and the overall 2 step compression process are very slow. Hence,
>my challenge.
AT&T's DjVu compressor uses a precursor to JBIG2, as does the compression
in ScanSoft's Pagis Pro. Neither of these compressors currently exploit
the cross-page capabilities of JBIG2, and neither of them is compatible
with the final form of JBIG2 (nor each other).
> Can anyone offer any advice/guidance/suggestions? I'm willing
>to look at beta software or pure research if available. Thanks in
>advance for any help.
> Jason Wallace
> Voice: (310) 546-9430 (California)
I hope that this has helped - as a member of the JBIG committee and one of
the people behind JBIG2, I hope that it meets your needs. If you like, I'd
be willing to compress some of your documents with JBIG2 and let you know
how quickly they compress, how small, and feed back the decompressed images
to you so you can tell if the loss introduced is acceptable.
--
William Rucklidge ruck...@parc.xerox.com
Xerox Palo Alto Research Center
Haiko - student at TU-Chemnitz, Germany