Currently, we don't have documentation of the various file formats, but this is an excellent idea. I've filed an issue on it (#95).
Last we chatted with the HBase folks, they were using the MapFile class that comes with Hadoop, but they were having some trouble with it. I think they may come up with their own format at somepoint. We talked about trying to standardize, but our key format is sufficiently different than the HBase key format, that we decided it was not worth it.
On Wed, Apr 2, 2008 at 8:04 AM, jerome <b.jero...@gmail.com> wrote:
> Hi, > I wonder if you have some papers on the internal file format that > you're using. > Any comparison with HBase file format? > Thanks in advance, > Jerome.
> Currently, we don't have documentation of the various file formats, but this
> is an excellent idea. I've filed an issue on it (#95).
> Last we chatted with the HBase folks, they were using the MapFile class that
> comes with Hadoop, but they were having some trouble with it. I think they
> may come up with their own format at somepoint. We talked about trying to
> standardize, but our key format is sufficiently different than the HBase key
> format, that we decided it was not worth it.
> - Doug
> On Wed, Apr 2, 2008 at 8:04 AM, jerome <b.jero...@gmail.com> wrote:
> > Hi,
> > I wonder if you have some papers on the internal file format that
> > you're using.
> > Any comparison with HBase file format?
> > Thanks in advance,
> > Jerome.
I'm assuming you're interested in the CellStore format. As a brief overview, the format looks like this ...
compressed block of key/value pairs compressed block of key/value pairs compressed block of key/value pairs [...] fixed index variable index optional bloom filter trailer
The code that writes this format is in src/cc/Hypertable/RangeServer/CellStoreV0.cc. The methods that do the writing are add() and finalize(). Each compressed block starts with a header, which is written by logic in src/cc/Hypertable/RangeServer/BlockCompressionHeaderCellStore.cc/h Also, the trailer writing logic is encapsulated in the class CellStoreTrailerV0 in the files src/cc/Hypertable/RangeServer/CellStoreTrailerV0.cc/h
On Wed, Apr 2, 2008 at 10:47 AM, jerome <b.jero...@gmail.com> wrote:
> even if there's no formal documentation, can you point me to the right > set of files? > Thanks, > Jerome.
> On Apr 2, 12:54 pm, "Doug Judd" <d...@zvents.com> wrote: > > Hi Jerome,
> > Currently, we don't have documentation of the various file formats, but > this > > is an excellent idea. I've filed an issue on it (#95).
> > Last we chatted with the HBase folks, they were using the MapFile class > that > > comes with Hadoop, but they were having some trouble with it. I think > they > > may come up with their own format at somepoint. We talked about trying > to > > standardize, but our key format is sufficiently different than the HBase > key > > format, that we decided it was not worth it.
> > - Doug
> > On Wed, Apr 2, 2008 at 8:04 AM, jerome <b.jero...@gmail.com> wrote:
> > > Hi, > > > I wonder if you have some papers on the internal file format that > > > you're using. > > > Any comparison with HBase file format? > > > Thanks in advance, > > > Jerome.
> I'm assuming you're interested in the CellStore format. As a brief
> overview, the format looks like this ...
> compressed block of key/value pairs
> compressed block of key/value pairs
> compressed block of key/value pairs
> [...]
> fixed index
> variable index
> optional bloom filter
> trailer
> The code that writes this format is in
> src/cc/Hypertable/RangeServer/CellStoreV0.cc. The methods that do the
> writing are add() and finalize(). Each compressed block starts with a
> header, which is written by logic in
> src/cc/Hypertable/RangeServer/BlockCompressionHeaderCellStore.cc/h Also,
> the trailer writing logic is encapsulated in the class CellStoreTrailerV0 in
> the files src/cc/Hypertable/RangeServer/CellStoreTrailerV0.cc/h
> - Doug
> On Wed, Apr 2, 2008 at 10:47 AM, jerome <b.jero...@gmail.com> wrote:
> > even if there's no formal documentation, can you point me to the right
> > set of files?
> > Thanks,
> > Jerome.
> > On Apr 2, 12:54 pm, "Doug Judd" <d...@zvents.com> wrote:
> > > Hi Jerome,
> > > Currently, we don't have documentation of the various file formats, but
> > this
> > > is an excellent idea. I've filed an issue on it (#95).
> > > Last we chatted with the HBase folks, they were using the MapFile class
> > that
> > > comes with Hadoop, but they were having some trouble with it. I think
> > they
> > > may come up with their own format at somepoint. We talked about trying
> > to
> > > standardize, but our key format is sufficiently different than the HBase
> > key
> > > format, that we decided it was not worth it.
> > > - Doug
> > > On Wed, Apr 2, 2008 at 8:04 AM, jerome <b.jero...@gmail.com> wrote:
> > > > Hi,
> > > > I wonder if you have some papers on the internal file format that
> > > > you're using.
> > > > Any comparison with HBase file format?
> > > > Thanks in advance,
> > > > Jerome.