Sharding based on a generated id

davidp4web

unread,

Jan 6, 2009, 2:52:02 PM1/6/09

to Hibernate Shards Dev

Hey All,

I'm new to Hibernate Shards and am looking for a sanity check or a
clue. :-)

I've implemented a sharding strategy based on the primary key of the
core object in my database schema, Member. All other objects/tables
in the schema are dependent on Member. To make the shard selection
and resolution efficient, I implemented a
ShardEncodingIdentifierGenerator that extends Hibernate's
SequenceHiLoGenerator and encodes Member ID mod 128 as the shard ID.

This all works except for one detail. All Members end up in shard 0. I
think that the problem arises because of the generated ID for Member.
The shard selection happens before the ID is actually generated.

That's my theory, anyway. Has anyone run into a similar situation?
Or is my diagnosis just plain wrong?

Any guidance and/or examples would be most appreciated.

Thanks!

David

Max Ross

unread,

Jan 18, 2009, 4:49:48 PM1/18/09

to hibernate-...@googlegroups.com

Hi David, thanks for writing and sorry for the slow response.

Sorry if this is a silly question but I just want to make sure I understand the problem you're facing. When you say that your id generator encodes Member ID mod 128 as the shard ID, is the Member ID you're referring to the id returned by the HiLoGenerator or are you modding the value of the id field on your Member object?

Perhaps if you send me the code for your id generator and I can be of greater assistance. If you're not comfortable posting it to the group feel free to email it to me directly (my address is all of the Shards codebase).

Max

David Pellegrini

unread,

Jan 19, 2009, 2:23:31 PM1/19/09

to hibernate-...@googlegroups.com

Hi Max,

Thanks for your response.

The same id generator is used for Member and all of its dependent
objects. The dependent objects will necessarily have their memberId set
before persisting to the database, so at the time that we generate an id
for those objects we can ask them for their memberId and encode it into
the sequence value. In the case of Member, though, the memberId has not
been set before persisting the object. And therein lies the rub, I
think, because the choice of shard has already been made by the time the
id generator is invoked (as I see it -- somebody please prove me wrong).

Anyway, here's the relevant code.

public class AcmeSequenceHiLoGenerator
extends SequenceHiLoGenerator implements
ShardEncodingIdentifierGenerator
{
public synchronized Serializable generate(SessionImplementor
session, Object obj) throws HibernateException
{
if (!(obj instanceof Pojo)) throw new
IllegalArgumentException("Object must be an acme.Pojo.");
long val = ((Number) super.generate(session, obj)).longValue();
if (obj instanceof Member) ((Member) obj).setMemberId(val);
// ensures that getMemberId() returns a value below
return
AcmeShardedIdentifier.createShardedId(((Pojo)obj).getMemberId(), val);
}
...
}

public class AcmeShardedIdentifier {

private static long LOWER7BITS= 0x000000000000007FL;

public static Long createShardedId(long memberId, long sequence)
{
if (sequence > LOWER56BITS) {
throw new IllegalArgumentException();
}
return sequence | ((memberId & LOWER7BITS) << 56);
}
...
}

Unless you see a flaw in my interpretation, or have an alternative, I
think the workaround is to change the hibernate mapping for Member to
use an application-generated id. Then every member instance has an id
assigned by the time we determine the shard id. This is obviously less
than ideal, as I have to write Member-specific application code to get a
sequence from the database, set the member id, etc. ... you know, the
stuff I had Hibernate doing for free. :-/

This reveals a larger issue, perhaps, with the way that shards are
resolved. One would like an independent entity and all of its dependent
entities to be stored in the same shard so that the FK refs are intact.
That implies the following:
1. The shard resolution for the independent entity must be identical to
the one for its dependent entities.
2. The resolution strategy must be based on immutable data in the
independent entity.
3. Whenever resolving shards for the dependent entity instance, it must
have access to its related instance of the independent entity.
4. The resolution strategy should be cheap to compute, yet provide good
distribution.

If shard selection was done _after_ the id was generated, that would
make life easier.

Thanks!

David

Reply all

Reply to author

Forward