Hi Danny,
I'm answering your question using my personal gmail and cc-ing the dev group so that others can benefit from the discussion. Hope you don't mind.
Great question! Writing to a master and reading from a slave is a common technique that definitely increases the capacity of a db-backed system. I've seen it done with Hibernate with good results a number of times. Making it very easy to accomplish this with Hibernate sounds worthwhile, but my initial feeling is that Shards isn't necessarily the right place for this. Shards' focus is on increasing scalability using horizontal partitioning, and the feature you're proposing is more about increasing scalability using replication. Shards assumes that data only lives on a single shard. This feature would assume that all data lives on all shards. If you implemented this it could certainly be used within Shards, and no doubt some projects would benefit from it. Whether you'd add the logic at the SessionFactory, the Session, or even the Connection layer...I'm not sure. But no matter which one you chose it would still plug in to Shards pretty seamlessly.
That said, we do need to think about replication within Shards because of The Static Data Problem. This has been eating at me for awhile, and if you're interested in working on this I'd love to hear your thoughts. Basically, it's pretty common for an application to have what I call static data: enums, country codes, etc. Tables that contain slow-changing data that is basically there to provide referential integrity. These tables pose a problem for Shards because they either need to live on a single shard (no referential integrity), or they need to live on all shards (violates our assumption that data lives on only one shard). I think referential integrity trumps our design assumptions, so we need to figure out how to deal with slow-changing data that lives on all shards.
One thought I had was to allow users to annotate the mapping for a class as "replicated." This would tell us that instances of this class live on all shards instead of just one. There are a number of implementation details that need to be worked out:
Should we prevent cascades from non-replicated to replicated classes? If so, how? If not, how do we make sure the cascade is applied to an instance associated with the same shard as the one on which the upstream save is taking place?
When someone looks up an instance of a replicated class by id, which shard do we hit?
And so on.
Does this sound interesting to you?
Max
-----------------------------------------------------------------------------------------------------------------------
Danny wrote:
I had a question.
I was thinking about the following feature and
I was curious if it had been considered, or if there were other
fundamental problems with it.
In most production systems the
database is setup as a master/slave, so each shard would probably have
a master and one or more slaves.
So would it be possible to add a feature where the master is used for writing and the slave is used for reading.
The
only problem that I can think of would be if the slave was out of date
for a given object as it was written to the master, but this would
addressed by requiring (or highly suggesting) that every object has to
have a Version field, and or querying the slave to verify that it is up
to date.
I feel like I pretty much understand most of the code.
I think I am ready to take a small task.
I am not by any means an expert at sharded databases, so I may need some help working on these tasks.
Thanks
Danny