Group API

64 views
Skip to first unread message

Rich Midwinter

unread,
Feb 2, 2013, 6:46:37 PM2/2/13
to jsr...@googlegroups.com
All

To try and kick things off with the Group API, we already have an
example implementation in Infinispan [1] which seems broadly
consistent with other implementations I've seen, including GridGain.
[2] [3]

Are there other implementations which follow a different pattern, or
which we should be considering for other reasons? I think, in the
absence of any radically different implementations, it would be good
to explore extending mercury with support for this soon. It should
provide an opportunity to touch on other features; configuration,
annotations, etc. too.

Anything else we should consider up front?

Thanks
Rich

[1] https://docs.jboss.org/author/display/ISPN/The+Grouping+API
[2] http://www.gridgain.com/javadoc30E/org/gridgain/grid/cache/affinity/GridCacheAffinityMapped.html
[3] http://www.gridgain.com/javadoc30E/org/gridgain/grid/cache/affinity/GridCacheAffinityMapper.html

Rick Hightower

unread,
Feb 2, 2013, 7:07:07 PM2/2/13
to jsr...@googlegroups.com
What about what Hadoop does?



When I think of groups, maybe wrongly so, I think of running map reduce on nodes where data is replicated. More replicated nodes mean I can run more map reduce procedures on the same sets of data.

Also mongodb....


Mondodb talks more about shards (and probably replica sets in that shard) for map reduce.

Hadoop is built on top of a file system that is aware of rack setup, and proximity. Do we / should we cover things at this level?

I think Java specification that defines how to handle map reduce in Java land might get more inspiration from Hadoop and MongoDB.



I am outing Chris Mathias. He knows a lot about data grids, caching and hadoop as well. 

Chris, 
Having worked with Hadoop and MongoDB map reduce in production, how do you feel about the infnispan and datagrid APIs for grouping?

Do we have any Hadoop/HDFS, MongoDB/GridFS/MapReduce experts on this group?



--
You received this message because you are subscribed to the Google Groups "JSR 347 discussions" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jsr347+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.





--
Rick Hightower
(415) 968-9037
Profile 

Manik Surtani

unread,
Feb 5, 2013, 7:18:33 AM2/5/13
to jsr...@googlegroups.com
That's different from Groups in this sense.  (Maybe we need a better name).  This feature refers to colocation of entries, and for an application to provide some control over that.

Chris Mathias

unread,
Feb 6, 2013, 5:29:03 PM2/6/13
to Rick Hightower, jsr...@googlegroups.com

Apologies for the delay in responding.  Been a busy time.

Rick in outing me you've also outed that I don't know much about infinispan ;)  I read a bit on the site. The way I understand the groups operation from what I read on the wiki would, in Hadoop MR parlance, be analogous to a customized way to manage the partitioner/shuffler work, wherein you might be attempting to optimize data location storage by pre-defining some semantics that indicate specific data relations.  This would result in less network traffic/payload when pulling stage 1 results together in an MR job. 

Hadoop per-se is a big thing, and very heavyweight.  Given infinispans, and presumably JSR347's leanings toward "high performance", it might be best to look to other tools for reference and inspiration.  Not to say it cannot be high-performance, but out of the box it's more about durability and scale, than performance.

I would probably look toward hbase/hypertable and couchdb, as key/value store examples, rather than mongodb, however I can see sharding as possibly relevant to 'groups'. 

I suppose I would need a clearer definition of the goal of the groups API before commenting much beyond this. 

Chris

Zacarias

unread,
Apr 18, 2013, 10:51:10 AM4/18/13
to jsr...@googlegroups.com, Rick Hightower
I think group API should attack data stored locally use case. Multi-data center replication is a good feature to support. 
If we take a look at haddop it saves data onto the nearest node and the farthest node depending on the number of replications it has configured. The same can be considered in a datagrid to prevent data loss by communication issues or any catastrophic failure on a group of nodes.
It would be interesting if we can provide an interface for grouping strategies, etc.

Mircea Markus

unread,
Mar 28, 2014, 9:09:25 AM3/28/14
to jsr...@googlegroups.com, Rick Hightower


On Thursday, April 18, 2013 3:51:10 PM UTC+1, Zacarias wrote:
I think group API should attack data stored locally use case. Multi-data center replication is a good feature to support. 
If we take a look at haddop it saves data onto the nearest node and the farthest node depending on the number of replications it has configured. The same can be considered in a datagrid to prevent data loss by communication issues or any catastrophic failure on a group of nodes.
It would be interesting if we can provide an interface for grouping strategies, etc.

Smart data placement (backup on different machine/rack/datacenter) in order to increase fault tolerance is a pretty useful feature (Infinispan calls this server hinting[1]). Not entirely sure this should be targeted together with grouping which aims to place related data together (on the same node) in order for efficient processing.

- grouping wants to make sure that related data will be collocated on the same node, for efficient processing

Reply all
Reply to author
Forward
0 new messages