--
You received this message because you are subscribed to the Google Groups "JSR 347 discussions" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jsr347+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Apologies for the delay in responding. Been a busy time.
Rick in outing me you've also outed that I don't know much about infinispan ;) I read a bit on the site. The way I understand the groups operation from what I read on the wiki would, in Hadoop MR parlance, be analogous to a customized way to manage the partitioner/shuffler work, wherein you might be attempting to optimize data location storage by pre-defining some semantics that indicate specific data relations. This would result in less network traffic/payload when pulling stage 1 results together in an MR job.
Hadoop per-se is a big thing, and very heavyweight. Given infinispans, and presumably JSR347's leanings toward "high performance", it might be best to look to other tools for reference and inspiration. Not to say it cannot be high-performance, but out of the box it's more about durability and scale, than performance.
I would probably look toward hbase/hypertable and couchdb, as key/value store examples, rather than mongodb, however I can see sharding as possibly relevant to 'groups'.
I suppose I would need a clearer definition of the goal of the groups API before commenting much beyond this.
I think group API should attack data stored locally use case. Multi-data center replication is a good feature to support.If we take a look at haddop it saves data onto the nearest node and the farthest node depending on the number of replications it has configured. The same can be considered in a datagrid to prevent data loss by communication issues or any catastrophic failure on a group of nodes.It would be interesting if we can provide an interface for grouping strategies, etc.