ColumnReader interface extension

7 views
Skip to first unread message

David Gruzman

unread,
Mar 21, 2011, 9:59:28 AM3/21/11
to opend...@googlegroups.com
HI All,

As we have discussed - making method call for each value is not efficient. So we introduced new fillXXXValues methods to the ColumnReaderInterface.
This method fill the array of values, repetition level values and isNull Flags in one call.
Current implementation is not high-performance but will be replaced with such one. Please start using this new interface while we are
working on making efficient implementation.
Currently implemented only fillIntValues.
With best regards,

David

Constantine Peresypkin

unread,
Mar 26, 2011, 3:01:11 PM3/26/11
to dremel
I did a lot of tests these two days, and I don't see how it will help.
The performance impact of calling methods on each byte/int stored/read
is ineligible if measured correctly.
The performance problems of MappedByteBuffer are much deeper than
that: needless endianess checks, int to byte conversions, shuffling
etc.

I have assembled an example classes (SimpleIntColumnReader/Writer) on
my last commit that can be studied for performance.
Right now I've got to good ratios on bit+dictionary compression for
levels, and I still don't see any good improvements on data (apart
from null suppression, which is also implemented).
And I also have a problem with measuring mmaped in memory buffers
without reading/writing the actual file, do we really need it?
If the dataset will be huge I think we need to trade HDD speeds
(~50-100Mb/sec) for compression (CPU time), this way compression will
give us some great speed improvements.
Compressing/decompressing in-memory data will only slow down things,
no matter how good you do it.

P.S. Java does not have unsigned bytes, I will have nightmares
today....
Reply all
Reply to author
Forward
0 new messages