[shards-dev] ShardResolutionStrategy and Criteria

51 views
Skip to first unread message

davidp4web

unread,
Apr 30, 2010, 12:40:20 PM4/30/10
to Hibernate Shards Dev
In an effort to optimize access to data, I implemented a
ShardEncodingIdentifierGenerator along with a ShardSelectionStrategy
and a ShardResolutionStrategy. The ShardResolutionStrategy examines
IDs to determine which shard to query.

I have a DAO layer that is implemented using the Criteria API, since
the hibernate-shards documentation steered me in that direction.

This all "works" from a coarse functional perspective, but if I enable
hibernate's show_sql, I see that queries are being sent to all shards,
not to a single shard as expected. When I stepped through the queries
in a debugger, I found the following in ShardedCriteriaImpl:

/**
* We don't support shard selection for criteria queries. If you
want
* custom shards, create a ShardedSession with only the shards you
want.
* We're going to concatenate all our results and then use our
* criteria collector to do post processing.
*/

Is there a technical barrier preventing Criteria queries from using a
ShardResolutionStrategy? Or is this simply unfinished work? (BTW,
I'm presuming that "shard selection" in the above comment really meant
"shard resolution".)

I'm willing to work on getting Criteria using ShardResolutionStrategy,
but I need some assurance that it is not an exercise in futility. I
could also use a few pointers on how to get started and what to focus
on.

Thanks!

David

--
You received this message because you are subscribed to the Google Groups "Hibernate Shards Dev" group.
To post to this group, send email to hibernate-...@googlegroups.com.
To unsubscribe from this group, send email to hibernate-shards...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/hibernate-shards-dev?hl=en.

Emmanuel Bernard

unread,
May 5, 2010, 4:16:15 AM5/5/10
to hibernate-...@googlegroups.com
Hey,
I think the technical barrier is that you cannot guess which shard to target for arbitrary queries. They would have to be very very specific (ie something like "where id in ( :listOfId )" ) to know you can safely go for a subset of the shards.

Emmanuel

David Pellegrini

unread,
May 6, 2010, 12:30:26 PM5/6/10
to hibernate-...@googlegroups.com
Hello Emmanuel,

Thanks for your reply.

The ShardResolutionStrategy, as I understand it, is exactly that
mechanism for deciding which shards to target for queries. That's why
I'm puzzled why it is not being used for Criteria queries.

To your second point, there is no need to formulate queries themselves
with a list of target shards. The fundamental behavior is that Queries
are executed in all shards, but that is because all shards are typically
in the collection of shards. Specifically, ShardedCriteriaImpl executes
the query by passing it, along with a collection of shards, to a
ShardAccessStrategy, which executes the query in each shard in the
collection.

The comment I highlighted reflects that, as it suggests that users
"create a ShardedSession with only the shards you want." However,
creating a ShardedSession is not a workable solution because it breaks
the API compatibility with non-sharded Hibernate. I'd have to pass
details of the query (or queries) I intend to run to the Session creator
(which would presumably use those details to determine which shards to
include in the Session).

The right place to determine the target shards is in
ShardedCriteriaImpl, which ought to be delegating the decision to
ShardResolutionStrategy. Since that seems logical from an architectural
point of view, why was it not designed that way? I'm totally fine with
"Because we were concentrating our efforts on other things and didn't
get back to it." That at least gives me confidence to jump in and start
coding. However, if the reason is more obscure and insurmountable, then
I'll save my effort.

Best Regards,

David

Emmanuel Bernard

unread,
May 10, 2010, 2:44:27 AM5/10/10
to hibernate-...@googlegroups.com
ShardResolutionStrategy#selectShardIdsFromShardResolutionStrategyData is the mechanism used to decide on which shard we lookup objects by id which is different than on which shard we execute a query.
My point is that to do what you want we would need another resolution method

ShardResolutionStrategy#selectShardIdsFromQuery(CriteriaQuery query); but as I said, this is quite complex (as in super complex :) ) to recognize the structure of the query, detect a potential safe "where id in ( :listOfId )" and delegate to selectShardIdsFromShardResolutionStrategyData for each of the ids in the where clause. At least in the general case.

Alternatively, people could set some context set manually (in a thead local?) to decide which shard to target.

class MyShardResolutionStrategy implements ShardResolutionStrategy {
List<ShardId> selectShardIdsFromQuery(CriteriaQuery query) {
List<ShardId> restrictToShards = context.get();
return restrictToShards != null ? restrictToShards : ALL_SHARDS;
}
}

Session s = ...;
ShardContext.set( restrictToShards );
List<Address> results = (List<Address>) s.createCriteriaQuery( ... ).list();

David Pellegrini

unread,
May 13, 2010, 8:34:53 PM5/13/10
to hibernate-...@googlegroups.com
Emmanuel,

Thanks again for your reply.

I explored a bit along the lines of what you suggested. The
Criteria/Criterion APIs are distinctly unhelpful in discerning what's
inside. :-/ Rather than resorting to Java Reflection, I decided to go
with the approach suggested in the code: create a session with just the
shards I need.

The ShardedSessionImpl has no API to set the shards directly. The
constructors don't allow specifying shards; they use all the shards in
the ShardedSessionFactoryImpl (a parameter). Chasing that upstream, it
appears that the only point where I have any control over the shards
included is when creating the ShardedConfiguration, which is not
convenient at all. Perhaps I'm missing something, but it seems that the
clearest solution is to add a setter to the ShardedSessionImpl.

Given that ShardedSessionImplementor has getShards(), symmetry dictated
that it would also specify the corresponding setter. As it turns out,
it was convenient to add two methods for setting shards:

/**
* Specify the shards that the session will use.
* @param shards a list of Shards to use
*/
void setShards(List<Shard> shards);

/**
* Specify by ShardId the shards that the session will use.
* @param shardIds a list of ShardIds identifying the Shards to use
*/
void setShardsById(List<ShardId> shardIds);

The latter uses shardIdListToShardList internally to set the shards list.

One might argue for putting these into ShardedSession instead, since
that is "The main runtime inteface between Java application and
Hibernate Shards." I'm open to opinions.

Having the ability to set the shards on the session allows me to do
"shard resolution" in the DAO methods, where the session is accessible
and the query parameters are readily examined. The downside is that I
had to create a shard-aware set of DAOs subclassed from my
hibernate-specific DAOs, then add that as another DAO variant in my
DAOFactory. But all-in-all that's a small price to pay for directing
queries to a single shard rather than dozens. :-)

I'm happy to contribute my changes if anyone finds this worthwhile.

Cheers!

David

Emmanuel Bernard

unread,
May 19, 2010, 1:03:06 PM5/19/10
to hibernate-...@googlegroups.com
I really think it's cleaner to go the strategy approach.

ShardResolutionStrategy#selectShardIdsFromQuery even if we don't pass the query.
One could write a ShardResolutionStrategy that retrieve the list of shards from somewhere else like a thread local.

Exposing the list of shards and mutating them seems wrong to me at the ShardedSession level because that would mean all operations including lookup, create update deletes would be restricted to a subset of shards. That would be wrong.

One remark you've made puzzles me though. You said Criteria are too hard to introspect but that's pretty much what Hibernate Shards does to rewrite the query before executing it on each shard. Maybe their manipulation operations are simple enough.

Emmanuel

David Pellegrini

unread,
May 19, 2010, 2:08:58 PM5/19/10
to hibernate-...@googlegroups.com
Comments inline ...

Emmanuel Bernard wrote:
> I really think it's cleaner to go the strategy approach.
>
I completely agree. However, it was turning into a much larger effort
than I have time for, so I elected to go with the approach that was
recommended in the code -- set the shards beforehand.
> ShardResolutionStrategy#selectShardIdsFromQuery even if we don't pass the query.
> One could write a ShardResolutionStrategy that retrieve the list of shards from somewhere else like a thread local.
>
Perhaps, but that meant much more rewriting of the hibernate-shards code
to ensure that the ShardResolutionStrategy was consulted for _all_
operations. In the interest of time and minimizing the code
perturbation, I elected to go with the approach recommended by the authors.
> Exposing the list of shards and mutating them seems wrong to me at the ShardedSession level because that would mean all operations including lookup, create update deletes would be restricted to a subset of shards. That would be wrong.
>
I respectfully disagree. I _want_ to be able to control which shards
are accessed for _all_ operations. If you know that all of the records
you want to update or delete are in a single shard, you want to execute
that operation on only that shard. When you're dealing with dozens or
hundreds of shards, concurrency/availability/scalability demands dictate
that you hit the minimal set of shards to get the job done. Granted,
some operations require hitting all shards, but a majority of them (at
least in my application domain) can be narrowed to a single shard.

BTW, the default behavior is still to hit all shards. The
setShards/setShardsById methods are simply a means to override when
appropriate; their use is optional.
> One remark you've made puzzles me though. You said Criteria are too hard to introspect but that's pretty much what Hibernate Shards does to rewrite the query before executing it on each shard. Maybe their manipulation operations are simple enough.
>
I did not say that Criteria are too hard to introspect. I said that
their APIs do not expose functionality that is useful to this purpose.
Philosophically, I try to work with a class's public API whenever I can,
so I didn't want to resort to introspection when an alternative approach
(recommended by the authors) was available.
Reply all
Reply to author
Forward
0 new messages