Ebean, ManyToOne list of IDs (not entities)??

1,690 views
Skip to first unread message

kraythe

unread,
May 1, 2015, 5:18:41 PM5/1/15
to eb...@googlegroups.com
Greetings, I want to be able to relate two entities without actually including one entity in the other, essentially I want the member to have a list of ids and not a list of entities. To illustrate consider the following: 

Given a standard database setup: 

CREATE TABLE `user` (
 
`id` bigint(20) NOT NULL AUTO_INCREMENT,
 
`email` varchar(255) DEFAULT NULL,
  PRIMARY KEY
(`id`),
);


CREATE TABLE
`profile` (
 
`id` bigint(20) NOT NULL AUTO_INCREMENT,
 
`owner_id ` bigint(20),
 
`full_name` varchar(255) DEFAULT NULL,
  PRIMARY KEY
(`id`),
  CONSTRAINT
`fk_profile$owner_id` FOREIGN KEY (`owner_id `) REFERENCES `user` (`id`),

);


And then a standard mapping: 

@Entity
@Table(name="user")
class User {
 
@Id
 
@Column(name = "id")
 
private Long id;
 
@OneToMany(mappedBy = "owner")
 
private List<Profile> profiles;
 
@Column(name = "email")
 
private String email;
 
// getters, setters, etc
}


@Entity
@Table(name="profile")
class Profile {
 
@Id
 
@Column(name = "id")
 
private Long id;
 
@ManyToOne
 
@Column(name ="owner_id")
 
private User owner;
 
@Column(name = "full_name")
 
private String fullName;


 
// getters, setters, etc
}


This works but there is a situation that both objects are in a memory cache (think of a distributed hash map) and they are in different caches. So What Id really like to do is something like the following. The database stays identical but the user only has a list of profile IDs rather than the profile. 

@Entity
@Table(name="user")
class User {
 
@Id
 
@Column(name = "id")
 
private Long id;
  @OneToMany(mappedBy = "owner")
 
private List<Long> profileIds;

 
@Column(name = "email")
 
private String email;
 
// getters, setters, etc
}


@Entity
@Table(name="profile")
class Profile {
 
@Id
 
@Column(name = "id")
 
private Long id;
  @ManyToOne
 
@Column(name ="owner_id")
 
private Long ownerId;

  @Column(name = "full_name")
 
private String fullName;


 
// getters, setters, etc
}


Is there any eay to accomplish this elegantly with ebeans? Also is it doable when there is a join table in between the entities?

Rob Bygrave

unread,
May 3, 2015, 12:27:31 AM5/3/15
to ebean@googlegroups
Hmmm, hard to say without knowing more but it looks to me like you are actually making things harder for yourself.

 both objects are in a memory cache (think of a distributed hash map) and they are in different caches.
Not sure why you don't use the built in L2 cache support ... and it is not clear why you think changing to List<Long> profileIds is a good idea here.

JPA2 introduced @ElementCollection ... which would support the mapping you desire but Ebean does not support @ElementCollection yet. To me you would be using the element collection feature in the wrong manor here (not what that feature was introduced for) so I'm not convinced you are solving your problem the correct way.  Ebean has really good support for partial objects so you don't have to change your model to support use cases where you only want to fetch the id values.  In general you should be wary of modifying your model to support 1 use case.

Anyway, no Ebean doesn't support @ElementCollection so you can't model it that way with Ebean (until the JPA2 support is added).


Cheers, Rob.


--

---
You received this message because you are subscribed to the Google Groups "Ebean ORM" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ebean+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

kraythe

unread,
May 3, 2015, 11:38:24 AM5/3/15
to eb...@googlegroups.com
The L2 Cache will cache the queries but presents any number of innumerable problems in a high volume system where you want the majority of data to be read out of the memory caches. The question you ask there is a loaded on, specifically "What are the advantages of Hazelcast or Memcached over L2 cache for your ORM tool." This has been answered a number of times for hibernate and applies equally well in this circumstance. With L2 caches you aren't int control of synchronization, eviction, distributed locking an a number of other things. And still your read through to the DB is thousands of times slower than a read from memory cache. With hazelcast I have the ability to invoke cluster wide distributed locks around critical sections of code, customize eviction strategies per map, cache objects instead of just tables (i.e. I can load a user with all of his roles and cache them as a unit.) Furthermore, I have Map Reduce APIs, and other abilities. 

Developing an application "by the book" with a spring template and L2 cache and so on will take you a certain distance but when you put massive load on it two things will happen. First your cluster computing power needs will go way up and second your speed of processing will go way down. Putting a memory cache in is MONUMENTALLY harder than simple web request -> endpoint -> db query web apps but its the only rational way to handle heavy concurrent systems. When a user is ordering pets from a pet store they don't interact with each other much. In the systems I work on each user has many interactions with other users.  Trivial book app implementation paradigms only take you so far. So we actually DO have an L2 cache but ALSO we have a Hazelcast caches and 99% of all queries are against the caches, not the DB. For the models that we dont have a Hazelcast cache for, the performance is orders of magnitude worse so I am picking up these extra models to refactor them into the cache.

I want to bisect the model at a one to many relationship so models on one side are cached in a different memory cache than the other. To do that I need just the IDs for the Many instances to navigate through the cache. Currently the only real way I can think of doing this is to map two entities to the same table, I.e. You have a Customer entity that has all of the data for a customer but maps only the user_id to a simple type (long, int, etc), but you also have a CustomerID entity that maps to the same table but maps only the ID field and the back reference to the user and uses the OneToMany - ManyToOne paradigm but only has the user field and id field. Barring something like @ElementCollection I think this might be my only option. 

kraythe

unread,
May 3, 2015, 11:43:36 AM5/3/15
to eb...@googlegroups.com
Oh this is a great article on the limitations of L2 caches by the way: 


Contrast that with the abilities of Hazelcast and it should be pretty apparent what we are doing. Basically taking a Large ERD model and breaking it up into descrite models and caching them in memory with read through to the DB when the need arises but otherwise all modifications, changes, searches, run against the memory caches.  

Rob Bygrave

unread,
May 3, 2015, 5:53:11 PM5/3/15
to ebean@googlegroups
"What are the advantages of Hazelcast or Memcached over L2 cache for your ORM tool."

No that is not the question in that you should be able to use Hazelcast (or any other) caching implementation with EbeanORM - that is, use Hazelcast as the underlying implementation used by the L2 cache.  There is an API that is provided so that anyone can plugin in any caching implementation. Personally I am very keen on using Hazelcast and providing that integration with EbeanORM out of the box ... but time/priorities have not permitted that yet. That said it wouldn't be much work at all, the API to implement is pretty straight forward. If that was done that might solve all your problems?


I want to bisect the model at a one to many relationship so models on one side are cached in a different memory cache than the other. 

With the EbeanORM L2 cache different beans go in different 'bean caches' (aka different Maps) ... so it sounds like this matches what you want. This can be a lot of work to implement the management of these caches (I think you know that) so it is probably worth seeing IF you can get EbeanORM to do this for you because it seems like it could. 


With L2 caches you aren't int control of synchronization, eviction, distributed locking an a number of other things. 

Maybe ... but maybe not because I think you could have all the control you want. That is, the caches that EbeanORM would use could all the known/accessible Hazelcast caches so you could also have complete access to all the caches (access external to EbeanORM) ... and all the calls EbeanORM makes to each cache could go through your own wrapper/delegate code (to control eviction etc).



Cheers, Rob.

Rob Bygrave

unread,
May 3, 2015, 6:06:29 PM5/3/15
to ebean@googlegroups
Oh this is a great article on the limitations of L2 caches by the way: 

You have to be careful what you read wrt Hibernate and translating that as a problem with EbeanORM.  The only real problem with the L2 query cache is that yes it can be quickly invalidated if you are using it on heavily updated beans/tables .. so yes, data grids and elasticsearch can provide a good fallback mechanism in that case (to avoid hitting the DB).  I actually do have a good plan for this case where you get a miss on a query cache ... you can in some cases go transparently to a datagrid (or elasticsearch) instead of hitting the DB.


Cheers, Rob.

kraythe

unread,
May 4, 2015, 10:48:06 AM5/4/15
to eb...@googlegroups.com
Thanks for the reply Rob, interesting discussion. My comments inline with the proviso that these are my opinions. :) 


On Sunday, May 3, 2015 at 4:53:11 PM UTC-5, Rob Bygrave wrote:
"What are the advantages of Hazelcast or Memcached over L2 cache for your ORM tool."

No that is not the question in that you should be able to use Hazelcast (or any other) caching implementation with EbeanORM - that is, use Hazelcast as the underlying implementation used by the L2 cache.  There is an API that is provided so that anyone can plugin in any caching implementation. Personally I am very keen on using Hazelcast and providing that integration with EbeanORM out of the box ... but time/priorities have not permitted that yet. That said it wouldn't be much work at all, the API to implement is pretty straight forward. If that was done that might solve all your problems?


Sure we can use the Hazelcast as L2 cache but that does not satisfy the use case. If you consider the Pet Store example, that is imminently effective. The write traffic on objects is not sufficient to merit much more than an L2 cache and furthermore, the concurrent synchronization is trivial. Users aren't typically placing orders at multiple times on 20 different nodes. Now if you consider something like an online game, the use case becomes far less trivial. Now you have objects that are shared among users, being updated on a microsecond interval and the possibility of update collision is high. The only way to handle this with any degree of reliability is a pessimistic locking scenario where the critical sections of code are guarded by cluster wide synchronization blocks. With caching mechanisms like Hazelcast, that is a very easy thing to manage. There are also other scenarios that Ebeans does .. with all due respect for Ebeans ... not do very well. Ebeans assumes everything is lazy loaded and for the standard out of the book application that is just grand. But from the point of view of more complex applications this hits a MASSIVE brick wall of performance, a cluster killing, business killing wall of lethality. The absolute last thing i need is some user navigating the model from say Level to Player and then recursively loading the entire model into memory in thousands of SQL calls. The permutations of SQL calls that can be executed in a large model are incredible and caching them at an L2 cache is not going to be effective in stopping them. And furthermore, most of those searches shouldn't go against the DB at all but be performed in memory. Consider another use case. Lets say we have some Player objects that each have made some moves and a total score needs to be calculated for a group of players over the entire series of thousands of moves. We could, for example do a "SELECT SUM(score) from move WHERE player.groupId = 3;" or the Ebean equivalent but then that query would be instantly out of date the very next move and the query then has to hit the database again. Multiply times several thousand players making hundreds of moves each, the system now requires some fancy, very fancy database work to even limp along. On the other hand if these objects are Hazelcast map resident objects and all moves are written to the cache which then writes them to the database, possibly using write behind, and that we are SURE are in the cache for 24 hours, the sum becomes an in memory total using Map Reduce API. In fact if I can make the Memory cache be the primary source of information and the database be merely a persistent backup that is optimal and the performance and horizontal scalability capabilities are significant, as such is my intention.

I guess the sort of point I am trying to make is that books on writing web applications and database applications use trivial examples of easy to code use cases that are easily encapsulated by things like L2 caches. Its when those apps get developed in that manner, put out to the market that they often fail. Scalability and horizontal concurrency is a HARD problem, not solved by simply slapping on a spring @Cacheable annotation or just configuring an L2 cache. The reality is a little more thought must be put into the architecture than "slap an L2 cache on and call it good." I have been an observer or part of dozens of projects that have failed with that view on software and if little old me has seen dozens, you know there are thousands out there. :) 

This is not to say techs like Ebeans and L2 Caches are bad. Not at all. They just don't simply provide a panacea that can be applied globally without thought like a recipe for cornbread. Personally I have had success creating very high concurrency web applications with a combination of many technologies. Code hits the memory caches for all work it can possibly do, querying through to the database is inconvenient to discourage it. The caches themselves query through to the database when necessary. The caches enforce the need for locks in cases where objects change or use optimistic locking with reactive retry in cases where such things are possible. The caches will sometimes write through in low volume situations or write behind in high volume situations. 

Anyway back to the point at hand and sorry about writing something of a novel. :) 

What would be really excellent if it existed (I wish I had time to do it) would be a cooperation where a cache and a ORM had cooperative querying architecture. Where a map reduce query could be passed to a caching layer and then transparently executed on the database without translation by the user or a complete rewrite of the query. That would be a tough problem to solve especially considering the distributed nature of a memory cache and the relative isolated world of a database but it would be very cool.  
 

I want to bisect the model at a one to many relationship so models on one side are cached in a different memory cache than the other. 

With the EbeanORM L2 cache different beans go in different 'bean caches' (aka different Maps) ... so it sounds like this matches what you want. This can be a lot of work to implement the management of these caches (I think you know that) so it is probably worth seeing IF you can get EbeanORM to do this for you because it seems like it could. 

Actually thats not what I want. Close. That models all associations as aggregations rather than recognizing composition relationships. When I load a user, for example, to not load the user roles would be to have a human without their arms. But even if Ebeans obeyed Cascade fetch annotations (I understand it might in the latest version?) that would still not remove the distributed concurrent problem I talked about above. There are times where critical section pessimistic locking is the only way to do things rationally. Now if your project is ENTIRELY reactive and re-entrant where any transaction issue can simply be retried, that would be awesome, but that is impractical to propose in a business environment. Essentially your proposal is "I have to rewrite the entire application of half million lines of code so that in 6 months when I am done you get the same functionality but it will have a much better architecture." Such a proposal would never fly in a real business. Assuming you could do that from the ground up in a new project you will still have to deal with critical section problems, race conditions, concurrent modifications and a dozen other non-trivial issues. 

I guess the reality is that each solution has to be tailored to each business individually. But thats good because it keeps unemployment low. :) 
 
With L2 caches you aren't int control of synchronization, eviction, distributed locking an a number of other things. 

Maybe ... but maybe not because I think you could have all the control you want. That is, the caches that EbeanORM would use could all the known/accessible Hazelcast caches so you could also have complete access to all the caches (access external to EbeanORM) ... and all the calls EbeanORM makes to each cache could go through your own wrapper/delegate code (to control eviction etc).

I think to make this a real reality, Ebeans would have to sort of merge the relational world with the NoSQL object based world. The real headache of an ORM is the mapping from object to relation. Other NoSQL features are less compelling to me personally, lack of a schema has its good points but in my opinion it also introduces the possibility of nightmares and the lack of ACID transactions is a serious issue with most business applications. If there was a way to implement a connector for Ebeans that transparently interoperated with a caching mechanism like hazelcast where you have the capabilities if it supported in memory map reduce with transparent queries, distributed locking, re-entrant tasks (Entry Processor's and Distributed tasks) and so on. These things would be awesome. I could ask the caching layer "give me all the players with more than 1000 points" and it would automatically invoke the back end Ebeans mechanism to ensure those objects are all in cache before returning. I could also use a map-reduce call that would recognize several objects needed for the query are not yet cached but should be to execute the query with good results and then load them before actually executing the query. Think of the query, "Find all players who have clans of more than 100 people and have scored over 1000 points in the last 24 hours and then give me a list of all of their equipment in common." A hybrid Hazelcast - Ebean solution could recognize that there are some players missing from that pool via a query to the DB, auto-load those players and then execute the rest of the map reduce call.That would be wicked cool. But SERIOUSLY non trivial to implement. They have gone part of the way with map store implementations but the transaction problem is still there, not to mention the fact that you cant know which node the map store will be invoked on and when it will be invoked. Leads to cross transactional purgatory.

Anyway, very interesting discussion. For the purposes of this post I think I will just have to double map entities so I can break up the model. What would be cool for a feature request is a new annotation like @IdCollection(targetEntity=Role.class) List<Integer> roleIds . This would map to a list of IDs of the dependent entity rather than simply the entity itself. JPA doesnt have that feature but I think it would be a cool one to have. Of course getting anything changed in JPA is a 2 year process that has to have write-off from 40 organization so I wont hold my breath there. On the other hand Ebeans has less Organizational Inertia. :) 

Rob Bygrave

unread,
May 5, 2015, 5:52:36 PM5/5/15
to ebean@googlegroups
Now if you consider something like an online game, the use case becomes far less trivial

Thanks, that makes it clearer where you are coming from.


There are also other scenarios that Ebeans does .. with all due respect for Ebeans ... not do very well. 

Absolutely the most useful feedback is when people describe a use case or behaviour that does not work very well for them.  This type of feedback is the most important and I encourage you and others to provide it.  The caveat is that the feedback has to have enough description of the use case and problem to provide insight. No detail = no insight = no improvement.


> Ebeans assumes everything is lazy loaded ...

This statement is not clear to me.  Can you describe this better?  At a guess I think you mean Ebean does not honour FetchType.Eager in annotations?  That might be an interesting discussion as to why that brings all manor of death and destruction to scalability :) ... but more detail would be good to explain what you mean here. In terms of Eager fetch in annotations with EbeanORM the expectation is that users specify in their query the desired object graph paths and properties to fetch.  I have come across the cases of simple apps where there is literally one 1 use case for a given entity bean and in that case the annotation based FetchType.Eager would have been nice - in this case Ebean can detect if the query has no detail (no fetch paths or properties) and could apply the annotations as a 'default fetch plan' so certainly I have pondered doing that.

... but I suspect I'm missing your point as Ebean does very well with it's fetch path / fetch properties approach (avoiding the problems that the FetchType annotations introduce) and I'd say Eager fetching of complex graphs is something Ebean does better than any other ORM.


>  ... Level to Player and then recursively loading the entire model into memory in thousands of SQL calls. The permutations of SQL calls that can be executed in a large model are incredible and caching them at an L2 cache is not going to be effective in stopping them

Certainly I have experienced bad L2 cache and N+1 query scenarios with other ORMs and NoSql datastores.  However generally speaking with EbeanORM 4.x we do very well here.  You should get very good cache hits on lazy loading, the model is loaded on demand (should not be incredibly large), batch loading is easy to use (and on by default now) to mitigate N + 1 queries and make loading complex graphs efficient.  Note that my second priority after documentation is providing a performance monitoring dashboard so we can all have good visibility on this for our applications. 

The weaknesses in the current L2 cache are the fast invalidation of L2 query caches and the lack of links between query caches and content  caches (JSON, HTML etc). I have a plan for both of those.

In the near future lazy loading could hit an "L3" cache which is initially going to be ElasticSearch but later could be a data grid solution. That is, the L2 cache can also be viewed as a set of behaviours and that those behaviours can be translated into 'local in memory cache hits', 'remote memory cache hits / remote ElasticSearch/Datagrid query hits' or DB query hits.  The predicates used in lazy loading are pretty simple so translating those into say ElasticSearch queries or data grid queries can be straight forward.

So yes, I'm bullish that EbeanORM can do very well here in terms of building object graphs from L2 and L3 caches.


They just don't simply provide a panacea that can be applied globally without thought like a recipe for cornbread. 

I'm not suggesting that.  Certainly an online gaming application is going to be vastly different to a stackoverflow type website.  

What I will say is that 'Hibernates problems are not Ebean's problems'. LazyInitialisationException is one example but there are many where people mistakenly translate Hibernate issues with ORM in general (or EbeanORM in particular). In this sense I often find people who are going down the wrong path based on some assumption they made which in turn was based on some Hibernate or JPA experience.

Improvements to documenation / video series should hopefully reduce that.


ORM had cooperative querying architecture

That might be close to what I'll be adding to EbeanORM in what will be it's L3 cache.  Specifically the first implementation will support ElasticSearch. That is, a ORM Query or lazy load that misses on the local in memory L2 cache then get translated into a ElasticSearch query.  I built the majority of this feature some time ago but it was designed to use raw Lucene - I then meet ElasticSearch and pulled all that effort out with a few to integrating ElasticSearch. Note that this support depends on the specific types of predicates used (and that those predicates can be translated appropriately).  For the lazy loading cases the query predicates easily translate so there is a big/easy win in that case - more work/harder for the query cache miss cases.

The initial goal is not to provide transactional read consistency guarantee's for this but instead 'most recent / best effort' results.  It is likely that the ORM query API will also be extended at the same time to allow adding ElasticSearch/text search specific predicates and for these queries the intention is to never hit the DB - only ElasticSearch.

In terms of Hazelcast or other data grids it seems reasonable that many ORM queries could be translated reasonably well into datagrid queries (that the predicates match well) but that requires time/investigation. It could also be that you'd want to go the other way with the datagrid driving everything.

This L3 cache might still not fit your online gaming database requirements though. 
 

> For the purposes of this post I think I will just have to double map entities so I can break up the model

Sure.




Cheers, Rob.

kraythe

unread,
May 8, 2015, 12:43:59 PM5/8/15
to eb...@googlegroups.com
On Tuesday, May 5, 2015 at 4:52:36 PM UTC-5, Rob Bygrave wrote:
In terms of Hazelcast or other data grids it seems reasonable that many ORM queries could be translated reasonably well into datagrid queries (that the predicates match well) but that requires time/investigation.

Absolutely they can. I guess the problem might be the inconsistencies in predicate APIs. I have a gut feeling they are resolvable with cooperation. 
 
It could also be that you'd want to go the other way with the datagrid driving everything.

For our purposes we need exactly this. We need the data loaded by one node to be available for the others. We need to have 1% of the read traffic on the DB than other sites do and the data grid needs to be the primary source of knowledge with Ebean ORM persisting the data grid objects in order for historical, reporting and  memory management other reasons. Any misses to the grid read through to the DB, that way we can manage clever eviction policies rather than having to keep the whole DB in memory.
 
This L3 cache might still not fit your online gaming database requirements though. 

Still worthwhile work and might be great for those cache misses. I just think it wont solve the essential problem of basically gluing a Data grid to the database underneath. I am loathe to recommend yet another JSR but it seems there might be a good case for a data grid persistence specification. Cooperation with Hibernate, Memcached, Infinispan and groups like Hibernate, Ebean and even object DBs like Mongo. One predicate API to rule them all. :) 

 
Reply all
Reply to author
Forward
0 new messages