Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

HIDATA compression

1 view

Skip to first unread message

Roy Hann

unread,

Nov 30, 2008, 7:29:10 AM11/30/08

What is the current thinking on HIDATA compression?

I have lately been converted to the performance benefits of using
COMPRESSION=DATA, but COMPRESSION=HIDATA seems unimpressive.

I don't feel like digging into the code just now; is it done with
Lempel-Ziv-Welch compression (which is what I was once told), or is it
done with a faster technique now? (Or will it one day?)

--
Roy

UK Ingres User Association Conference 2009 will be on Tuesday June 9, 2009
Go to http://www.iua.org.uk/join to get on the mailing list.

Laframboise, André

unread,

Nov 30, 2008, 2:09:21 PM11/30/08

to Ingres and related product discussion forum

We used to use HIDATA a lot when storage was expensive and also because of the old 2GB file limit.
I was also told it was LZW based and I believed it when I was the CPU overhead.

Now that storage is relatively low cost, the space savings no longer outweight the CPU resources
required to support it.
It's better to let the CPUs do some real work than spend 80% of it's time compressing/decompressing.

My $0.02

Andre

--
Roy

_______________________________________________
Info-Ingres mailing list
Info-...@kettleriverconsulting.com
http://www.kettleriverconsulting.com/mailman/listinfo/info-ingres

Karl & Betty Schendel

unread,

Dec 1, 2008, 7:37:19 AM12/1/08

to Ingres and related product discussion forum

On Nov 30, 2008, at 7:29 AM, Roy Hann wrote:

> What is the current thinking on HIDATA compression?
>
> I have lately been converted to the performance benefits of using
> COMPRESSION=DATA, but COMPRESSION=HIDATA seems unimpressive.
>
> I don't feel like digging into the code just now; is it done with
> Lempel-Ziv-Welch compression (which is what I was once told), or is it
> done with a faster technique now? (Or will it one day?)

Yes, it's an LZW style symbol-replacement compressor. Or so
the code claims.

I am not impressed by HIDATA. The typical row is too short to
compress well with LZW because of the dictionary overhead.
HIDATA requires an extra leading byte that says whether or
not that particular row could be compressed, so in the worst
case, the table gets bigger rather than smaller. And yes,
it's extremely CPU intensive. For very long rows in archival
tables it might be a win.

I would like to see a simpler run-length compressor that
is more intelligent than standard trailing-blank compression.
I do have a code candidate, although it was written for
hash-join spill file compression rather than DMF row
compression. With a bit of fiddling to add NULL inspection,
which can be a big win when a table is defined WITH NULL,
the code might work OK for row compression. I have been
meaning to fool with that at some point.

Karl

0 new messages