Feature list on github... very early comments

54 views
Skip to first unread message

Rick Hightower

unread,
Sep 27, 2011, 6:14:52 PM9/27/11
to jsr...@googlegroups.com
 https://github.com/datagrids/spec/wiki/Proposed-features

Going through this list....

Seems like at least a simple Query API could be added.
Some subset that Infinispan already offers would be a great start. 

Also not 100% comfortable with Operations Mode. Seems we would/could/should tread lightly since these are more implementation details.
Not everyone has the same ideas of how to create elastic instances and/or what should happen if you add nodes dynamically.

I need to bone up on MapReduce. I think I have some comments there, but at the risk of sounding like a complete rube. I will hold off.

I really like the concepts of the Group API. 

Where are the JSR 347 discussions happening... there does not seem to be much on the group yet.'

Of course I found some small errors, JCache put does not return a value, so it could be async (it returns a void).


Manik Surtani

unread,
Sep 28, 2011, 5:10:17 AM9/28/11
to jsr...@googlegroups.com
Comments inline:

On 27 September 2011 23:14, Rick Hightower <richardh...@gmail.com> wrote:
 https://github.com/datagrids/spec/wiki/Proposed-features

Going through this list....

Seems like at least a simple Query API could be added.
Some subset that Infinispan already offers would be a great start. 

We could start discussing it, but what I find hard is that we need a query language suited for data grids in the first place.  I don't see any existing work as a "good fit".  SQL is too relational, JP-QL also too relational.  And coming up with a query language is definitely out of the scope of this JSR.  Perhaps some rudimentary filters may work.  Infinispan's impl is far too closely coupled with Lucene's query POJOs IMO, to become a standard.
 

Also not 100% comfortable with Operations Mode. Seems we would/could/should tread lightly since these are more implementation details.
Not everyone has the same ideas of how to create elastic instances and/or what should happen if you add nodes dynamically.

Yes, this does need to be defined in a manner that does not force specific implementation techniques.  And this is where more input from other vendors will help.
 

I need to bone up on MapReduce. I think I have some comments there, but at the risk of sounding like a complete rube. I will hold off.

I really like the concepts of the Group API. 

Where are the JSR 347 discussions happening... there does not seem to be much on the group yet.'

Here.  On this mail list.  But yes, not much has happened so far, that doesn't mean we should hold back.
 
Of course I found some small errors, JCache put does not return a value, so it could be async (it returns a void).


Cheers
--
Manik


Nate McCall

unread,
Sep 28, 2011, 7:41:27 AM9/28/11
to jsr...@googlegroups.com
For a query approach, I'd like to put out our "CQL" language in Apache
Cassandra:
http://www.datastax.com/docs/0.8/references/cql#cql-reference

Highlights:
- minimal typing
- no joins
- provides for per-statement consistency level

I'm on the fence about whether this is in scope though.

Another thing that sticks out to me is balancing eventually
consistency with transaction isolation. Transaction recovery becomes
extremely complicated with idempotent storage systems when a write may
or may not have been received (or received partially, but not by
enough hosts to satisfy the consistency level). It is going to be very
difficult to decide what guarantees to provide and when to provide
them.

Regards,
-Nate

Joseph Ottinger

unread,
Sep 28, 2011, 7:46:11 AM9/28/11
to jsr...@googlegroups.com
A query API... that's an interesting thought. To really address it, though, we need to talk about *specific* scope.

For a data grid, you do want to have a way of knowing what's in the grid, but... that goes back to the nature of what the grid can contain. From what I can see, we're discussing a data grid as a map?
--
Joseph B. Ottinger
http://enigmastation.com
Ça en vaut la peine.

Nate McCall

unread,
Sep 28, 2011, 8:11:53 AM9/28/11
to jsr...@googlegroups.com
That's a good point. A lot of this will depend on the dimension of the
map as a query API won't make much sense for simple get/put
operations. Querying would be helpful to "slice" the nested values of
the map to limit the result set size.

Ideally I would like to see this have some more moving parts than a
'key/value' store, but I understand that any idiomatic approach beyond
that gets significantly more difficult given the number of vendors.

Just to be clear - range scanning for arbitrary keys in big systems is
for suckers, IMO. Too much depends on the cluster's partitioning
mechanics. If supported at all, it should be with a big, fat "do this
only in low-priority background jobs" warning sticker.

Joseph Ottinger

unread,
Sep 28, 2011, 8:15:53 AM9/28/11
to jsr...@googlegroups.com
On Wed, Sep 28, 2011 at 8:11 AM, Nate McCall <na...@datastax.com> wrote:
That's a good point. A lot of this will depend on the dimension of the
map as a query API won't make much sense for simple get/put
operations. Querying would be helpful to "slice" the nested values of
the map to limit the result set size.

Ideally I would like to see this have some more moving parts than a
'key/value' store, but I understand that any idiomatic approach beyond
that gets significantly more difficult given the number of vendors.

Sure, but note that going beyond key/value is going to be beyond many potential implementations. A general query API to me would be limited to specific nodes, at best (i.e., "for THIS routing, give me the keys and/or values"); for a more encompassing query, you'd be looking more at map/reduce as a selection mechanism.
 
Just to be clear - range scanning for arbitrary keys in big systems is
for suckers, IMO. Too much depends on the cluster's partitioning
mechanics. If supported at all, it should be with a big, fat "do this
only in low-priority background jobs" warning sticker.

Fully agreed. And I worry that a query API would ALWAYS have this, and a "do this only if absolutely necessary" feature that requires a ton of implementation work is scary. The map/reduce query approach is more flexible for the implementations (and more scalable.)

Rick Hightower

unread,
Sep 28, 2011, 1:18:52 PM9/28/11
to jsr...@googlegroups.com
Give me all employees who salary is greater than 5.
I would imagine this would be implemented via a MapReduce.
Master on each node runs slaves to collect values.

Anyway, it seems like we should have some query support.
The ability to query properties of objects stored as values the v in the k/v.

Implementations could do indexing etc., but this would be an implementation detail.

The query could be expensive although implementations who promote it as a feature would probably optimize it somehow.

(Consider this brainstorming)
--
Rick Hightower
(415) 968-9037
Profile 

Rick Hightower

unread,
Sep 28, 2011, 1:20:15 PM9/28/11
to jsr...@googlegroups.com
I am not suggesting a query language, more like a query API at this point.

Rick Hightower

unread,
Sep 28, 2011, 1:22:05 PM9/28/11
to jsr...@googlegroups.com
Something more like this:



// Get the Datastore Service
DatastoreService datastore = DatastoreServiceFactory.getDatastoreService();

// The Query interface assembles a query
Query q = new Query("Person");
q
.addFilter("lastName", Query.FilterOperator.EQUAL, lastNameParam);
q
.addFilter("height", Query.FilterOperator.LESS_THAN, maxHeightParam);

// PreparedQuery contains the methods for fetching query results
// from the datastore
PreparedQuery pq = datastore.prepare(q);

for (Entity result : pq.asIterable()) {
 
String firstName = (String) result.getProperty("firstName");
 
String lastName = (String) result.getProperty("lastName");
 
Long height = (Long) result.getProperty("height");
 
System.out.println(lastName + " " + firstName + ", " + height.toString() + " inches tall");
}

Joseph Ottinger

unread,
Sep 28, 2011, 3:46:11 PM9/28/11
to jsr...@googlegroups.com
*nod* I guess what I had in mind was something like this, if you'll pardon my entirely-made-up-on-the-fly names:

CacheService service=CacheServiceFactory.getService("people");
List<Person> people=service.map(new RetrievalMapper<Person>(
    new RetrieveFilter<Person>() {
      public boolean accept(Person p) {
         return p.lastName.equals("Johnson") && p.height<42;
      {
    }, new ConcatReducer<People>());

Holy cow that's ugly. However, let's pretend that we know we can have a much cleaner DSL or maybe a closure or something, and walk through it.

The service.map() distributes a mapper to every available node. Theoretically you could add a routing field of some kind to match @Group (from Infinispan) or @SpaceRouting (from GigaSpaces) and limit which datagrid nodes are used for the map phase.

The mapper in THIS case is designed to iterate over candidate nodes (i.e., those of the Person type) and accept only ones that fit a criteria. The mapper returns a known type (List<T>) because its sole function is to find candidate matches (thus a "RetrieveMapper" instead of something that calculated a result.)

The reducer in this case is a generalized on, that takes lists of results from mappers and concatenates them into a cohesive list.

No special query API - that'd be up to the DSL for building the mapper's accept mechanism. That's more freedom; it's not that a query DSL couldn't be specified, but this allows for more flexibility based on what the provider can come up with.

Rick Hightower

unread,
Sep 28, 2011, 5:32:14 PM9/28/11
to jsr...@googlegroups.com
Yes and no.

Yes we need this, but this seems more like mapreduce API from infinispan than a query.

No, this is not enough. we also need some rudimentary query API IMO.

Joseph Ottinger

unread,
Sep 28, 2011, 5:39:05 PM9/28/11
to jsr...@googlegroups.com
Okay. Two thoughts: 

First: let's not think "infinispan" or any other specific product *yet* - I'm trying to keep in mind how features from the JSR would map to existing products in the space, but I'm also not trying to say "let's use X from Y" if I can help it. I've been borrowing Infinispan's annotations mostly because the wiki points to them.

Second: What exactly would a query *return*? Given that we're more or less using a map abstraction, we have two sets to return values from, a keyset and a valueset; however, other query languages can return data that's *not* part of the original query set. Are we suggesting that we run selection criteria? If so, on the key set, or the value set?

How do we do so efficiently?

I guess my thought was that a query "API" could be bundled in as a workable abstraction *on top of* the map/reduce API, so you'd have the appearance of a query API without having the performance-crushing implications of a direct query mechanism.

Pete Muir

unread,
Sep 28, 2011, 5:43:44 PM9/28/11
to jsr...@googlegroups.com
On a meta note the wiki just references infinispan as it's what Manik and I know best. As we come up with better names for concepts or annotations, feel free to replace them out.

Joseph Ottinger

unread,
Sep 28, 2011, 5:45:43 PM9/28/11
to jsr...@googlegroups.com
Yeah, it's no problem - we just need to remember that references should be genericized as much as possible, because I don't want this to be a rubber stamp for *any* specific project, even my own. My goals in the JSR are to yield a generalized API from which we can all build working products and expect users to have a cross-product mechanism for understanding.

Pete Muir

unread,
Sep 28, 2011, 5:48:16 PM9/28/11
to jsr...@googlegroups.com
Yes, we totally agree with this aim :-)

Rick Hightower

unread,
Sep 28, 2011, 5:51:26 PM9/28/11
to jsr...@googlegroups.com
But the query API should be general enough so it does not require mapreduce or rather so it will allow vendors the ability to index properties of the objects being stored ala Lucene or something along that lines. 

inline

On Wed, Sep 28, 2011 at 2:39 PM, Joseph Ottinger <jo...@enigmastation.com> wrote:
Okay. Two thoughts: 

First: let's not think "infinispan" or any other specific product *yet* - I'm trying to keep in mind how features from the JSR would map to existing products in the space, but I'm also not trying to say "let's use X from Y" if I can help it. I've been borrowing Infinispan's annotations mostly because the wiki points to them.

I am not biased to Infinispan. I don't work at JBoss. I was offered a job there once, but that was a while ago. Anyway... I digress. I only use Infinispan as an example bc I have read their documents the most recently (it is fresh in mind).

 

Second: What exactly would a query *return*?

Dunno... A Set of key/value pairs I think.
 
Given that we're more or less using a map abstraction, we have two sets to return values from, a keyset and a valueset; however, other query languages can return data that's *not* part of the original query set. Are we suggesting that we run selection criteria? If so, on the key set, or the value set?

Hmmm... 

A Set of key/value pairs I think.

And then probably some sorts of projections (selects) like totals, avg, stdev, concat, etc. 

Personally I am looking for a 20 / 80 rule. I think datagrids need queries. But I don't want a super complex one for the first release. 

Coherence, Infinispan, and EhCache (not a datagrid but datagrid like) all have queries. I think they comprise a large chunk of datagrid users (or datagrid like users). So let's come up with something that is easy to implement for all of these vendors (not by use but by them). 

Minimal query API IMO.

 

How do we do so efficiently?

MapReduce + indexing. I don't care how you do it efficiently (not personal you, but the royal you as in all vendors). It can be done. Let's provide an API and let the vendors shoot it out if this is feature customers care about. (Speaking as not a non-vendor independent consultant).

If a customer wants some querying speed and vendor X provides the most elastic, parallel distributed, beast, so be it. Some customers wont care. Some will. Some vendors will be faster some wont. 

 

I guess my thought was that a query "API" could be bundled in as a workable abstraction *on top of* the map/reduce API, so you'd have the appearance of a query API without having the performance-crushing implications of a direct query mechanism.

Probably. MapReduce + indexing. Imagine that a lot of them would use Lucene internally like EhCache enterprise does as does Infinispan. 

Nate McCall

unread,
Sep 28, 2011, 6:23:46 PM9/28/11
to jsr...@googlegroups.com
After thinking about this a bit, I would really like to see a
general-purpose query API.

Further, making optional the map-reduce component would be handy - a
couple different folks have secondary indexing capability and can do
queries like this on-line. Maybe allowing it to be applied at an
arbitrary phase of the query to support pre-filtering via M/R and/or
projections on large results (or just querying directly)? Extending
Rick's example:

q.addReducer("lastName", Query.FilterOperator.EQUAL, Reducer.BEFORE,
lastNameParam);

More thinking out loud here. I could see that getting clunky.


On Wed, Sep 28, 2011 at 4:51 PM, Rick Hightower

Manik Surtani

unread,
Oct 2, 2011, 8:06:15 PM10/2/11
to jsr...@googlegroups.com
All good stuff.  Lets try and have a chat about this on Monday (9pm at the JSR 347 BoF), I'll make sure I take notes for those unable to attend (like Joe O).

Cheers
Manik
--
Manik

Sent from my iPad 4 beta 3 with a telekinetic keyboard

Jim Bethancourt

unread,
Oct 20, 2011, 1:02:12 PM10/20/11
to jsr...@googlegroups.com
One thought might be to allow for JPA queryability through a 3rd party mechanism, provided a well-defined interface that the 3rd party can hook into.  Specifically, I'm looking at DataNucleus and it's ability to query HBase datastores and Google's BigTable datastores.  HBase sits on top of Hadoop, and this spec seeks to provide Hadoop-like APIs, so if things could be worked out just right, JPA queryability could be a viable option.  DataNucleus is Apache licensed, so having it embedded in an RI wouldn't introduce a license issue.

The DataGrid story could end up being something like DataGrid -> GridBase - > JPA (ok -- I know there isn't a GridBase...) where an intermediate interface is provided that existing JPA providers can hook into.

Having JPA queryability would also provide a drop-in layer between the querying program and the database -- the developer could write their code and have no clue whether they would be querying a datagrid or a database.  In addition, existing Java EE applications could be easily retrofitted and leverage data grid technology.

Although querying using JPA could could cause slower performance than a more optimal query language, it allows for the use of a familiar metaphor and an already standardized technology.

Rick Hightower

unread,
Oct 20, 2011, 1:44:15 PM10/20/11
to jsr...@googlegroups.com
The problem with swinging for the fences is the risk of failure is too high.
First version can have a simple query (I think).
Others will build frameworks and solutions on top (maybe do some stuff we did not think of... maybe do some stuff that sucks... maybe do some very cool stuff). 
With 20/20 clarity we can see what works and prune what does not work and enhance what does.

Standards
Adoption
Enhancements
Standards
More Adoption
More Enhancements
Standards

Manik Surtani

unread,
Oct 27, 2011, 12:18:30 PM10/27/11
to jsr...@googlegroups.com

Manik Surtani

unread,
Oct 27, 2011, 12:19:44 PM10/27/11
to jsr...@googlegroups.com
On 20 October 2011 18:44, Rick Hightower <richardh...@gmail.com> wrote:
The problem with swinging for the fences is the risk of failure is too high.
First version can have a simple query (I think).

+1.  
 
Others will build frameworks and solutions on top (maybe do some stuff we did not think of... maybe do some stuff that sucks... maybe do some very cool stuff). 
With 20/20 clarity we can see what works and prune what does not work and enhance what does.

Yes, this should be iterative.


Jim Bethancourt

unread,
Oct 27, 2011, 12:26:08 PM10/27/11
to jsr...@googlegroups.com
Will this be addressed later in the lifecycle of JSR 347, or would you rather see it addressed in a separate JSR?

Manik Surtani

unread,
Oct 27, 2011, 12:30:07 PM10/27/11
to jsr...@googlegroups.com
I would think as a separate jsr. 

Sent from my mobile phone

Jim Bethancourt

unread,
Oct 28, 2011, 9:14:56 AM10/28/11
to jsr...@googlegroups.com
Nice!  Thanks for that!

Mircea Markus

unread,
Mar 26, 2014, 9:58:21 AM3/26/14
to jsr...@googlegroups.com


On Wednesday, September 28, 2011 10:10:17 AM UTC+1, Manik Surtani wrote:
Comments inline:

On 27 September 2011 23:14, Rick Hightower <richardh...@gmail.com> wrote:
 https://github.com/datagrids/spec/wiki/Proposed-features

Going through this list....

Seems like at least a simple Query API could be added.
Some subset that Infinispan already offers would be a great start. 

We could start discussing it, but what I find hard is that we need a query language suited for data grids in the first place.  I don't see any existing work as a "good fit".  SQL is too relational, JP-QL also too relational.  And coming up with a query language is definitely out of the scope of this JSR.  Perhaps some rudimentary filters may work.  Infinispan's impl is far too closely coupled with Lucene's query POJOs IMO, to become a standard.

Update: in between Infinispan created an query DSL with no Lucene, a sample can be seen here.

Alex Snaps

unread,
Mar 26, 2014, 10:45:26 AM3/26/14
to jsr...@googlegroups.com
In the example linked, what's "gender" though? a bean style accessor, i.e User.getGender(): User.Gender? A property, i.e. User.gender: User.Gender? 
Are there any limitations on what it might be/do? Say, in case of the former, "age" could it be getAge(): int { return now - dob } ? In the case of the latter, prop "company": Company, what operation does what? e.g. eq() == Object.equals()? .like() == like(Object.toString()) ?

Very fundamentally, I guess I'm asking what would a query language/API mean to the way data gets stored. Whether depending on topology (P2P vs. Client/Server), but also in terms of storage (heap vs. elsewhere, e.g. disk, offheap)... 

Don't get me wrong though, I think a search (maybe even more) is desirable... Not quite sure from what angle it is best tackled though. Hope this makes kinda sense... 
Alex


--
You received this message because you are subscribed to the Google Groups "JSR 347 discussions" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jsr347+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Rick Hightower

unread,
Mar 26, 2014, 4:36:04 PM3/26/14
to jsr...@googlegroups.com
Sort of surrealistic to see comments you made three years ago become a topic of conversation. Is JSR-347 going down the JSR 107 path? Will this get completed in 2021? 


So three years later....

List<Employee> employees = repo.query(
          and(startsWith("firstName", "Bob"), eq("lastName", "Smith"), between("salary", 190_000, 200_000)));

Pull in the sum of salaries of employees with the last name Smith.

int sum = repo.query(eq("lastName", "Smith")).stream().filter(emp -> emp.getSalary()>50_000)
          .mapToInt(b -> b.getSalary())
          .sum();


 List<User> results =
                userRepo.query ( eq ( EMAIL, "rick.hi...@foo.com") );

A simple API might have....


static Group        and(Criteria... expressions) 
static Criterion    between(java.lang.Class clazz, java.lang.Object name, java.lang.String svalue, java.lang.String svalue2) 
static Criterion    between(java.lang.Object name, java.lang.Object value, java.lang.Object value2) 
static Criterion    between(java.lang.Object name, java.lang.String svalue, java.lang.String svalue2) 
static Criterion    contains(java.lang.Object name, java.lang.Object value) 
static Criterion    empty(java.lang.Object name) 
static Criterion    endsWith(java.lang.Object name, java.lang.Object value) 
static Criterion    eq(java.lang.Object name, java.lang.Object value) 
static Criterion    eqNested(java.lang.Object value, java.lang.Object... path) 
static Criterion    gt(java.lang.Object name, java.lang.Object value) 
static Criterion    gt(java.lang.Object name, java.lang.String svalue) 
static Criterion    gte(java.lang.Object name, java.lang.Object value) 
static Criterion    implementsInterface(java.lang.Class<?> cls) 
static Criterion    in(java.lang.Object name, java.lang.Object... values) 
static Criterion    instanceOf(java.lang.Class<?> cls) 
static Criterion    isNull(java.lang.Object name) 
static Criterion    lt(java.lang.Object name, java.lang.Object value) 
static Criterion    lte(java.lang.Object name, java.lang.Object value) 
static Not          not(Criteria expression) 
static Criterion    notContains(java.lang.Object name, java.lang.Object value) 
static Criterion    notEmpty(java.lang.Object name) 
static Criterion    notEq(java.lang.Object name, java.lang.Object value) 
static Criterion    notIn(java.lang.Object name, java.lang.Object... values) 
static Criterion    notNull(java.lang.Object name) 
static Group        or(Criteria... expressions) 
static Criterion    startsWith(java.lang.Object name, java.lang.Object value) 
static Criterion    typeOf(java.lang.String className) 

I just spitballing here.. but you could get a resultset and you could so something like....

rick =  (User)     //expectOne is not generic
                userRepo.results ( eq ( EMAIL, "rick.hi...@foo.com" ) )
                        .expectOne ().firstItem ();


The repo object could be an interface to a grid somehow.


The grid could return a ResultSet that looks something like this:


public
interface ResultSet<T> extends Iterable<T> { ResultSet expectOne(); <EXPECT> ResultSet <EXPECT> expectOne(Class<EXPECT> clz); ResultSet expectMany(); ResultSet expectNone(); ResultSet expectOneOrMany(); ResultSet removeDuplication(); ResultSet sort(Sort sort); Collection<T> filter(Criteria criteria); ResultSet<List<Map<String, Object>>> select(Selector... selectors); int[] selectInts(Selector selector); float[] selectFloats(Selector selector); short[] selectShorts(Selector selector); double[] selectDoubles(Selector selector); byte[] selectBytes(Selector selector); char[] selectChars(Selector selector); Object[] selectObjects(Selector selector); <OBJ> OBJ[] selectObjects(Class<OBJ> cls, Selector selector); <OBJ> ResultSet<OBJ> selectObjectsAsResultSet(Class<OBJ> cls, Selector selector); Collection<T> asCollection(); String asJSONString(); List<Map<String, Object>> asListOfMaps(); List<T> asList(); Set<T> asSet(); List<PlanStep> queryPlan(); T firstItem(); Map<String, Object> firstMap(); String firstJSON(); int firstInt(Selector selector); float firstFloat(Selector selector); short firstShort(Selector selector); double firstDouble(Selector selector); byte firstByte(Selector selector); char firstChar(Selector selector); Object firstObject(Selector selector); <OBJ> OBJ firstObject(Class<OBJ> cls, Selector selector); List<T> paginate(int start, int size); List<Map<String, Object>> paginateMaps(int start, int size); String paginateJSON(int start, int size); //Size can vary if you allow duplication. //The size can change after removeDuplication. int size(); }


I've thought about this subject a bit in the last 3 years. :)

I think we can come up with a basic API.

It should support these basic operations....

public enum Operator {
    EQUAL,          //Indexed
    NOT_EQUAL,      //Not Indexed

    //Not implemented
    NOT_NULL,       //Not Indexed
    IS_NULL,       //Not Indexed
    IS_EMPTY,       //Not Indexed
    NOT_EMPTY,        //Not indexed


    LESS_THAN,      //Indexed
    LESS_THAN_EQUAL, //Indexed
    GREATER_THAN,    //Indexed
    GREATER_THAN_EQUAL,//Indexed
    BETWEEN,   //Indexed for strings


    STARTS_WITH, //Indexed for strings
    ENDS_WITH,  //Not indexed
    CONTAINS,   //Not indexed
    NOT_CONTAINS,//Not indexed
    MATCHES,    //Not implemented yet
    IN,         //Not indexed
    NOT_IN,     //Not Indexed
    NOT,

    AND, 
    OR



}

The rules / filters can use filters or indexes or both....

We can have a JSON file format to store the filters....



            "user":
                ["Select users who are using high end android phones and are using LTE",
                    "AND", [

                        ["lastDeviceUsed", "EQUAL", "ANDROID_HIGH_END_PHONE"],
                        ["lastConnectionSpeed", "IN", ["LTE", "FOUR_G", "WIFI"]],
                        ["entitlement.types", "CONTAINS", ["PREMIUM"]]

                ]
                ],

            "movie":
                ["East coast users where video is longer than 180 seconds and category is DRAMA or TV",
                    "AND", [

                        ["video.lengthInMinutes", "GREATER_THAN", 30],
                        ["video.category", "IN", ["DRAMA", "TV"]],
                        ["user.timeZone.type", "EQUAL", "EST"]
                    ]
                ],
            "movieScore": 888,
            "action": "parseVideoStreamsAndDeliver"

        }

So the system gets the query and then decides if / how to use map/reduce and/or indexes.

Anyway... some subset of the above. It is 2014. This has to move to be relevant. 

Alex Snaps

unread,
Mar 26, 2014, 5:37:53 PM3/26/14
to jsr...@googlegroups.com
I certainly like the stream stuff. Also sorts of addresses some of my question:
Basically I know now that a bunch of employees will be streamed to me for evaluation (certainly would be the most easiest way to implement it at least) and I can then do whatever I want (if Employee.getSalary() does a WS call, all power to me), but nonetheless how is the initial query "resolved". What's "lastName" in your example there. Could be I am missing on somethings here, and JSON has been identified as the way to store things and JGrid is about document data grids (if so, sorry about missing that) and lastName "simply" is an attribute of the document that is of type string (in the json sense). Am certainly going to catch up on all that, but from the https://github.com/datagrids/spec/wiki/Proposed-features page I had not come to that conclusion and it wasn't clear to me how data was to be stored (and to be looked up) other than key/value. 

As a side note I really hope this won't take as long 107... yet I feel I am lacking some background on some of the existing. Hopefully Friday's meeting will put us all on par and ready to make some serious progress...   

Rick Hightower

unread,
Mar 26, 2014, 8:02:00 PM3/26/14
to jsr...@googlegroups.com
I've been working with/on Boon which has a object criteria DSL or sorts. I wrote it in 2012 for a customer as an example of how to query / filter collections using indexes. 

The criteria API was part of Crank (2005 which was based on Presto 2003) which was just a wrapper over Hibernate or iBatis or straight SQL... 

Boon works with Java objects, and JSON and Maps/Lists.It uses in-memory Java collections to index java objects for quick access.

see


see


see


So lastName is either a field in an object or a property in a object or a key in a map or a property in a JSON object or a field in a document (which is just a key in a map or a property in a JSON object anyway).

You can use property paths for queries as well.


  Repo<Integer,Employee> employeeRepo;

        /** It builds indexes on properties. */
        employeeRepo = Repos.builder()
                .primaryKey("id")
                .searchIndex("department.name")
                .searchIndex("salary")
                .build(int.class, Employee.class).init(employees);


  List<Employee> results =
                employeeRepo.query(eq("department.name", "HR"));

        /* Verify. */
        Int.equalsOrDie(4, results.size());

        Str.equalsOrDie("HR", results.get(0).getDepartment().getName());


Seems every GRID provider has some way to do the above. 

At varying degrees support.

I figure there are three ways to store data in a GRID

  1. Opaque binary blob (with maybe some attributes for querying at least id for Key/Value Lookup)
  2. Indexed by key binary blobs
  3. Full search JSON/BSON/SMILE binary or TEXT storage
  4. Some combination of the above.
  5. Oh yeah.. I can't count.

When/how/if indexes are used are up to the provider but having a way to query the GRID whether the actual work is done MAP/REDUCE or INDEXED or INDEX+MAP/REDUCE.

It should be is easy as this.



list = employeeMapRepo.query( selects( selectAsTemplate("fullName", "{{firstName}} {{lastName}}", template), selectAs("contactInfo.phoneNumbers[0]", "ph"), selectAsTemplate("pay", "{{pay(salary)}}", template), selectAsTemplate("id", "{{DecryptionService.decrypt(id)}}", template) ) );

Hazelcast has a query API.
Infinispan has one (referenced in the original email).
I am sure others have query APIs.


 One should be able to plug in Guava, Concurrent Trees or Trove if one desires to do so.

Boon's data repo makes doing index based queries on collections a lot easier.

It provides a simplified API for doing so. It allows linear search for a sense of completion but I recommend using it primarily for using indexes and then using the streaming API for the rest (for type safety and speed).

You can use a wrapper class to wrap a collection into a indexed collection.

So.... anyway... this is not a complete thought or clear set of ideas of direction but I mere putting gas on the flames.


107 delivered a lot of what was in the original 347 charter.

I think query (some simplified version) should be on the table.


I actually really like the Hazelcast interfaces for a lot of this so....  maybe we can start there... 



Ben.Cotton

unread,
Mar 27, 2014, 4:21:35 PM3/27/14
to jsr...@googlegroups.com
Musing Openly:   Another thing we might want to talk about is the notion of  JGRID's adaptability to an intra-node transport SPI.

In exactly the same way that the RedHat JGroups  serves as a "transport provider" to Infinisipan (and others) for transportSet = {UDP,TCP}, there is now the very real consideration that the transportSet for Java technology has now grown (and will continue to grow).

E.g., since Java 7 your transportSet can now be all of  {UDP,TCP, RDMA_SDP_IB}

With the ambitions of current OpenJDK JEPs  to mainstream official un-restricted access to a new  API that delivers sun.misc.Unsafe like capabilites, there is absolutely no reason not to also include native ZERO-COPY IPC over a /dev/shm transport to your set of choices:

transportSet = {IPC, UDP, TCP. RDMA_SDP_IB}

For Java JGRID deployments onto HPC supercomputing use cases, IPC providers will be a "must have" transport consideration.

One final thing,  having been in contact with Oracle's Alan Bateman (NIO and SDP lead) ... there is nothing (besides motivation for someone to proceed) to stop the building of Linux device drivers that will empower Java to enjoy an RDMA_SDP_10gE transport.  This does not exist, and is not currently being worked on, but it would deliver to Java an RDMA for the masses capability.  

Again, just musing openly on these thoughts and considerations.  One things for sure: the days of {UDP,TCP} being your grid's only intra-JVM-node transport choices are over.  Though these details are best sorted out in a separate JTRANSPORT JSR, maybe our join to such a thing should have a placeholder in our 347 plans?


On Tuesday, September 27, 2011 6:14:52 PM UTC-4, Rick Hightower wrote:
 https://github.com/datagrids/spec/wiki/Proposed-features

Going through this list....

Seems like at least a simple Query API could be added.
Some subset that Infinispan already offers would be a great start. 

Also not 100% comfortable with Operations Mode. Seems we would/could/should tread lightly since these are more implementation details.
Not everyone has the same ideas of how to create elastic instances and/or what should happen if you add nodes dynamically.

I need to bone up on MapReduce. I think I have some comments there, but at the risk of sounding like a complete rube. I will hold off.

I really like the concepts of the Group API. 

Where are the JSR 347 discussions happening... there does not seem to be much on the group yet.'

Reply all
Reply to author
Forward
0 new messages