File format

1 view
Skip to first unread message

jerome

unread,
Apr 2, 2008, 11:04:32 AM4/2/08
to Hypertable Development
Hi,
I wonder if you have some papers on the internal file format that
you're using.
Any comparison with HBase file format?
Thanks in advance,
Jerome.

Doug Judd

unread,
Apr 2, 2008, 12:54:06 PM4/2/08
to hyperta...@googlegroups.com
Hi Jerome,

Currently, we don't have documentation of the various file formats, but this is an excellent idea.  I've filed an issue on it (#95).

Last we chatted with the HBase folks, they were using the MapFile class that comes with Hadoop, but they were having some trouble with it.  I think they may come up with their own format at somepoint.  We talked about trying to standardize, but our key format is sufficiently different than the HBase key format, that we decided it was not worth it.

- Doug

jerome

unread,
Apr 2, 2008, 1:47:01 PM4/2/08
to Hypertable Development
even if there's no formal documentation, can you point me to the right
set of files?
Thanks,
Jerome.

On Apr 2, 12:54 pm, "Doug Judd" <d...@zvents.com> wrote:
> Hi Jerome,
>
> Currently, we don't have documentation of the various file formats, but this
> is an excellent idea. I've filed an issue on it (#95).
>
> Last we chatted with the HBase folks, they were using the MapFile class that
> comes with Hadoop, but they were having some trouble with it. I think they
> may come up with their own format at somepoint. We talked about trying to
> standardize, but our key format is sufficiently different than the HBase key
> format, that we decided it was not worth it.
>
> - Doug
>

Doug Judd

unread,
Apr 2, 2008, 2:01:50 PM4/2/08
to hyperta...@googlegroups.com
I'm assuming you're interested in the CellStore format.  As a brief overview, the format looks like this ...

compressed block of key/value pairs
compressed block of key/value pairs
compressed block of key/value pairs
[...]
fixed index
variable index
optional bloom filter
trailer

The code that writes this format is in src/cc/Hypertable/RangeServer/CellStoreV0.cc.  The methods that do the writing are add() and finalize().  Each compressed block starts with a header, which is written by logic in src/cc/Hypertable/RangeServer/BlockCompressionHeaderCellStore.cc/h  Also, the trailer writing logic is encapsulated in the class CellStoreTrailerV0 in the files src/cc/Hypertable/RangeServer/CellStoreTrailerV0.cc/h

- Doug

jerome

unread,
Apr 2, 2008, 2:11:57 PM4/2/08
to Hypertable Development
Thanks!

On Apr 2, 2:01 pm, "Doug Judd" <d...@zvents.com> wrote:
> I'm assuming you're interested in the CellStore format. As a brief
> overview, the format looks like this ...
>
> compressed block of key/value pairs
> compressed block of key/value pairs
> compressed block of key/value pairs
> [...]
> fixed index
> variable index
> optional bloom filter
> trailer
>
> The code that writes this format is in
> src/cc/Hypertable/RangeServer/CellStoreV0.cc. The methods that do the
> writing are add() and finalize(). Each compressed block starts with a
> header, which is written by logic in
> src/cc/Hypertable/RangeServer/BlockCompressionHeaderCellStore.cc/h Also,
> the trailer writing logic is encapsulated in the class CellStoreTrailerV0 in
> the files src/cc/Hypertable/RangeServer/CellStoreTrailerV0.cc/h
>
> - Doug
>
Reply all
Reply to author
Forward
0 new messages