page size constraints

Justin

unread,

Jul 28, 2010, 4:38:31 PM7/28/10

to hawtdb

I am interested in using tiny keys, and very large values (~40-50k).
I also cannot use page chaining. I see that the page size is a short
(max ~30k bytes). Is this going to be a show stopper for me?

Hiram Chirino

unread,

Jul 28, 2010, 11:04:55 PM7/28/10

to haw...@googlegroups.com

Your best option is to allocate the values directly as pages in the
page file and then use the page location as the values in the index.

--
Regards,
Hiram

Blog: http://hiramchirino.com

Open Source SOA
http://fusesource.com/

Justin

unread,

Jul 29, 2010, 11:46:14 AM7/29/10

to hawtdb

Thank you, yes that's what I plan on. It seems that I can force pages
to be allocated contiguously using a custom allocator if I need to.

Q: How can I distinguish index pages created by the IndexFactory<Long,
Integer> from the data pages, which I allocate?
I want to do this simply by looking at one page from C++ (e.g. not
having to scan the index from the root to exclude its pages).

From what I can tell:
HashIndex pages use the magic: { 'h', 'a', 's', 'h' }
BTreeIndex pages use the magic: { 'x' }

I suppose I can just use a magic { 'c' } for custom and all will be
well.
Q: It is not yet clear to me what the simplest way to ensure my custom
magic gets stuck into my custom pages, any suggestions?

btw, It seems much has changed from kahadb:
- this API looks a bit more sophisticated, though it comes with some
complexity, the NIO support seems really nice
- the checksum on each page is gone -- which seems to be necessary
since you now support slices.
- there is no Page class, (it gets replaced by Extent).

Justin

unread,

Jul 29, 2010, 11:50:52 AM7/29/10

to hawtdb

I think I found it, i seem to want to extend
AbstractStreamPagedAccessor

Hiram Chirino

unread,

Jul 29, 2010, 1:58:47 PM7/29/10

to haw...@googlegroups.com

On Thu, Jul 29, 2010 at 11:46 AM, Justin <justin....@gmail.com> wrote:
> Thank you, yes that's what I plan on. It seems that I can force pages
> to be allocated contiguously using a custom allocator if I need to.
>

perfect.

> Q: How can I distinguish index pages created by the IndexFactory<Long,
> Integer> from the data pages, which I allocate?
> I want to do this simply by looking at one page from C++ (e.g. not
> having to scan the index from the root to exclude its pages).
>
> From what I can tell:
> HashIndex pages use the magic: { 'h', 'a', 's', 'h' }
> BTreeIndex pages use the magic: { 'x' }
>

Actually, the x means it's part of an eXtent which can span multiple
contiguous pages. BTree pages add additional magic after the extent
header.

> I suppose I can just use a magic { 'c' } for custom and all will be
> well.
> Q: It is not yet clear to me what the simplest way to ensure my custom
> magic gets stuck into my custom pages, any suggestions?
>
> btw, It seems much has changed from kahadb:
> - this API looks a bit more sophisticated, though it comes with some
> complexity, the NIO support seems really nice

yep. The paging layer is what changed drastically.

> - the checksum on each page is gone -- which seems to be necessary
> since you now support slices.

I was hoping to just track the checksums in the metadata associated
with the redo records.

> - there is no Page class, (it gets replaced by Extent).

That's one way to look at it.

Justin

unread,

Jul 29, 2010, 2:15:48 PM7/29/10

to hawtdb

> I was hoping to just track the checksums in the metadata associated
> with the redo records.

Perhaps I don't fully understand how the paging works, but with
slices, it seems like enforcing/validating checksums require full
serialization/de-serialization of each page:

From the API it seems like you can just update part of a page -- for
example, lets say I have a page index and some data on each page:

pre-image on disk: [ [page-index][page-data-1][page-data-2][page-
data-3] ]

now I do an update:
MyIndex idx = tx.get(m_myIndexAccessor, pageNumber); // only reads
the index
idx.set(2, page-data-2);
m_writeAccessor.write(tx, pageNumber, idx); // updates part of the
index, and only writes to the middle of the page
flush();

post-image on disk: [ [page-index*][page-data-1][page-data-2*][page-
data-3] ] // ranges with * have been modified

Computing the new checksum would require you to fully read the post
image of the page before computing the checksum.
I don't know if you already doing full page reads when accessing any
page, but you will definitely have to if you want to have a checksum.

Hiram Chirino

unread,

Jul 29, 2010, 3:27:44 PM7/29/10

to haw...@googlegroups.com

So I'm specifically talking about using checksums in the transaction
page file case. In that case the disk layout looks more like:

[[tx-file-header][page-index][page-data-1][page-data-2][page-data-3][page-index
shadow][page-data-2 shadow][shadow-updates-list]]

the big difference /w the transactional page file is that we never
directly update existing pages since the updates can be rolled back
and there might be threads still doing read operations against the
original pages.

So the page updates are stored at new 'shadow' page locations. We
track all the committed updates and their associated shadow page
locations in an update list. That list basically tracks the shadow
page location and the original page location it's updating. When I
was talking about tracking checksums, I meant that that checksum of
the shadow page should get added as part of that info. that way
during recovery we can validate that all shadow updates are intact.

Reply all

Reply to author

Forward