Hi everybody,
I'd just like to introduce myself. My names Ged Byrne and I'm a member of the London Java Community. I'm getting involved as part of the Adopt a JSR programme.
One thing I'm keen on working on is a bit of evangalism and to start of I'd like to put together a State of the JSR Blog Post.
The idea is to put together a single post that sums up where we are, where we want to be and how we intend to get there.
I'd love to hear how everybody's thoughts.
To start things of I just like to play devils advocate with the Misson Statement as set out at
https://github.com/datagrids/spec/wiki/Mission-Statement. I just going to consider it line by line and try to challenge what it says. Could I ask everybody else to do the same. Don't look to defend it but be as critical as you possibly can be. Let's see how it holds up to scrutiny.
- This specification (JSR-347) aims to specify APIs and behaviours necessary to be build portable applications which store their data in a distributed data grid, including retrieving, storing and managing the data.
- Is it really about datagrids, or are they just one implementation method? Isn't this really about APIs for working with large distributed data sets. Isn't Datagrid an implementation details, like using a Hash or Linked List?
- Do we need a logical model. The collections API has interfaces for Maps, Sets and Lists. Is there an equivalent set of abstract concepts for dealing with large Data Sets?
- The primary API will be built upon JSR-107, the JCACHE API. In addition to it’s generified Map-like API to access a Cache, JSR-107 defines APIs for spooling in-memory data to persistent storage, an API for obtaining a named Cache from a CacheManager and an API to register event listeners. It also offers a number of optional features such as annotation support and transaction integration.
- Do we need to build upon JSR-107. If we have a separation of concerns can we leave the JCACHE APIs as they are to be used when Caches are being used for implementation?
- Why is JSR-107 being used as a starting point. What about the range of APIs that various products like Cassandra, Pig, MongoDB, Coherence and the like have defined. What commonalities are there? Where to they diverge and why?
- JSR-347 proposes to add a number of abilities, including: an async API, support for distributed code execution (both arbitrary and based around map/reduce),
- Does distributed code execution really belong in this JSR. Shouldn't it have a JSR all to itself.
- a grouping (co-location) API,
- Is Infinispan the best model for this. Do any of the other products take a different approach?
- CDI (Contexts and Dependency Injection) support (building on the annotation support in JSR-107),
- Is there any API changes needed for this? Any new annotations? Isn't a datagrid simply an implementation strategy for Contexts?
- modes of operation (replication vs. distribution), varying levels of consistency for distribution (total vs. eventual) and both programmatic and XML based configuration.
- Isn't all this implementation detail? Isn't it better to leave stuff like this out of the standards?
I being deliberately critical here and trying to push out some straw man arguments to get people talking.
Regards,
Ged