Ebean Caching OR Redis Or Memcache to improve Read/Write performance

494 views
Skip to first unread message

Suraj Mundada

unread,
Apr 4, 2016, 3:11:10 PM4/4/16
to Ebean ORM
Hi,

We are using EBean ORM in Java Play-2.4.6 

Our is client-server architecture and customer transactions span over multiple requests-responses. Multiple requests require same model/ebean objects to be fetched. They are updated based on customer action. 

So, we end up retrieving same ebean objects multiple times and they may/may not get updated during processing of that request. 

As application scales up, I believe there will be impact in performance due to multiple DB hits for same records. To mitigate this, I want to use AWS Redis cache service to store these ebean objects to reduce DB hits. Idea is to:
1. check cache for ebean 
2. if not found, retrieve ebean and put in cache 
3. If found, use it
4. If bean is updated, replace bean in cache

Also, we would have cluster of nodes running play framework applications. I want to have a centralize the cache for all nodes in cluster i.e. clustering of cache.

Before jumping on Redis, I want to explore caching provided by EBean ORM itself. I checked http://ebean-orm.github.io/docs/features/l2caching to understand ebean caching. But could not find a clear solution.

I have also been suggested to look at ebean-orm.github.io/docs/features/elasticsearch/ But I am not sure of elastic search is right answer to my problem.

Can someone explain if ebean provides caching mechanism which solves my caching problem? Or Should I use Redis? 

Regards,
Suraj

Rob Bygrave

unread,
Apr 4, 2016, 4:33:24 PM4/4/16
to ebean@googlegroups
Hi,

In terms of L2 caching there are 2 almost separate features which is "Bean caching" and "Query caching".  

"Bean caching":
---------------------
So "Bean caching" is to support "Find by Id" and "Find by unique natural key" queries.  It sounds to me like your application is only going to need "bean caching".


"Query caching"
---------------------
This is where you want to improve the performance of "findList" / "findPagedList" type queries than fetch many beans.  It sounds like your application has no need for query caching - your caching requirement is purely for bean get/put by id.  

For applications that have a desire/need for "Query caching" ElasticSearch provides a very good solution (and currently there are no other options available out of the box.  Alternatives would be a data-grid product where Ebean converts the ORM query into a data-grid query for the data-grid to execute.  The issue here is that the data-grid has to compete against the inverted indexes of ElasticSearch and the prediction is that ElasticSearch will simply be way faster).

 
So at this point your application requirement has no need for "Query caching" so what are the options. 



"L2/Near cache and L3/Remote cache"
-----------------------------------------------------
This is somewhat Ebean specific terminology but it is relatively important to distinguish between the "Near cache" and the "Remote cache" part of L2 caching. Specifically the Near cache is in-process and requires no network hop where as a hit against the Remote cache has a network hop.

This brings up:
- Do we want a "Near cache"?  (More memory requirement on the Application server vs extra network hop)
- Do we want a "Remote cache" or simply hit the DB? (Your wanting the remote cache part but some might not choose that)
- What can we use as the "Remote cache" for "Bean caching"?  (Effectively any Key/value store is a good fit here but note that is not the case for L2 query cache)


"Ebean out of the box"
-----------------------------
Ebean does not have support for using AWS Elasticache out of the box.  
What Ebean has out of the box is it's own Near Cache plus ElasticSearch which supports both Bean caching and Query caching.

That is, ElasticSearch can also be seen as a key/value store.

There is an caching API that can be implemented so people can implement that themselves or look to log an enhancement request or look to sponsor a specific implementation.


"Updating the bean cache"
-----------------------------------
ElasticSearch supports updates where Ebean can just give it the (part of the document) that has changed.  That is, for an update Ebean does NOT need to give ElasticSearch all the values of the bean (the entire bean document) again but instead only needs to give ElasticSearch the delta (the changes properties).  What this means is that with ElasticSearch you can perform updates on a "partially loaded" entity bean.

The question when supporting each key/value store is IF there is an update on a "partially loaded" entity bean can Ebean just send the key/value store the changed properties or does Ebean need to load all the unloaded properties (from the DB) to send the key/value store a new version of the bean.

If an application does not use "partial objects" this does not matter as the beans being updated as always fully loaded but Ebean supports "partial objects" and "stateless updates" and so this question arises as you how well a key/value store supports that (and if not we can get the overhead of fetching the unloaded properties from the DB).


ebean provides caching mechanism which solves my caching problem? Or Should I use Redis? 

Yes Ebean provides a caching mechanism.  It would be great if there was a Redis/ AWS Elasticache caching plugin and then you could try both and compare the performance / networking / scaling directly.  Near caching is great but won't scale to massive volumes - ElasticSearch vs Redis has operational cost differences etc.


Cheers, Rob.



--

---
You received this message because you are subscribed to the Google Groups "Ebean ORM" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ebean+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Suraj Mundada

unread,
Apr 5, 2016, 1:21:17 AM4/5/16
to Ebean ORM
HI Rob,

This is excellent info. Thanks. 

My need is Bean caching. And I am going to have cluster of nodes hosting my application. Each application will have its Ebean cache. And I am not implementing sticky sessions. So, I will not be able to serve multiple requests for same customer from different nodes as cache may not have data from previous requests. So, I want to centralize the cache. 

That implies hosting ElasticSearch  on a remote server. Would ElasticSearch be a good choice in this case?

Or does ElasticSearch provides clustering so that different instances of ElasticSearch on my nodes can sync with each other?

Regards,
Suraj

Rob Bygrave

unread,
Apr 5, 2016, 1:53:59 AM4/5/16
to ebean@googlegroups
That implies hosting ElasticSearch  on a remote server. Would ElasticSearch be a good choice in this case?

The ElasticSearch guys highly recommend running ElasticSearch in it's own process/jvm (so not embedded in the application).  That is, technically we could embed an instance of ElasticSearch in with our application (that runs on each node) but that isn't recommended.

So the recommendation is to run a separate cluster of ElasticSearch servers.

AWS actually has a an ElasticSearch service (https://aws.amazon.com/elasticsearch-service/) so that might the best starting point. 

Note that with this AWS service the communication to ElasticSearch must be via HTTP (it does not support ElasticSearch native client) but that is all good for Ebean. (Ideally Ebean supports the option of using the ElasticSearch native transport client - there are some benefits to that but we won't be able to use that with the AWS ElasticSearch service. Refer to https://github.com/ebean-orm/avaje-ebeanorm-elastic/issues/3).



Would ElasticSearch be a good choice in this case?

Yes it would but there is a question.  

ElasticSearch indexes by default can be 1 second out date (they batch changes and flush every second).  This is understandable because internally is it using Lucene/Inverted indexes and these do not like short transactions.  So ElasticSearch by default every second will flush changes to the index (and also in the background it is merging and optimising the indexes).

This can be changed and manually controlled but I wonder if this is a problem for this use case (where there is really only get/put load)?



does ElasticSearch provides clustering so that different instances of ElasticSearch on my nodes can sync with each other?

Yes. ElasticSearch has built in clustering.


 
Ok, I'm going to have a look at something.

Cheers, Rob.


--

Rob Bygrave

unread,
Apr 5, 2016, 5:11:15 PM4/5/16
to ebean@googlegroups
Ok, so I'll add some more here.

ElasticSearch is great but for pure get/put it not perfect (inverted index is not perfect for this task with it's higher update overhead - and we don't get to use the benefits the inverted index brings).  There are plenty of 'knobs' to tweak in ElasticSearch to get this "good" and if there are not many updates and the 1 second flush delay isn't a problem then it would still be good ... plus we can do denormalisation.

Q: When the application is hitting the cache is it getting a "flat" bean (with just scalar properties like Strings, Longs etc) or does the bean have relationships?

For example, if we were getting a Customer do we also want to get the customers shipping address (related bean).  The point here is that with ElasticSearch we can denormalise such that the customer index also contains related information. (e.g. Customer plus billing address , e.g. Order index contains order plus some customer details plus order lines plus product details).

If denormalisation matches the application workload then ElasticSearch can really shine for this get/put workload as a single get also contains the related information. So an ElasticSearch "get customer" is 1 hit vs a normal L2 bean cache which requires more hits to get the related information (e.g. 1 hit for "get customer" plus a second hit for "get customer billing address" ... plus another hit for "get country" etc etc).

So denormalisation can make ElasticSearch a good choice (as long as the typically higher update costs related to inverted indexes are acceptable).




AWS Redis

The AWS Elasticache Redis option as I see it currently has a problem IF we want to do partial updates (updates on a partially loaded bean).  That is, for an update on a partially loaded bean we really want to send the cache a 'delta' for it to apply rather than a 'replacement value' and I don't think we can that that with the AWS / Redis option.  We would instead have to invalidate/remove entries when we had partial updates.  

This might be fine when the beans are small/simple (so not many columns, no big wide varchar(2000) type columns etc) and the application then forgoes partial updates (or accepts the extra costs associated with fully loaded beans).



Hazelcast and Infinispan

Both Infinispan and Hazelcast are purpose built for key/value caching plus both of these enable Ebean to update the cache using a 'delta'.  That is probably not news as both are Hibernate L2 cache options.


So with that I have logged:
... and the idea is to put both Hazelcast and ElasticSearch through a benchmark for the straight get/put workload.


Ebean's L2 cache has a plugin interface so it's a matter of implementing that for Hazelcast.  Watch that issue and expect an update there in 2 days and we can go from there.  

I'd probably recommend planning to build and execute some sort of benchmark with your known beans (with relationships) and some workloads (reads to updates etc). I expect you'll be able to run the benchmark using ElasticSearch and also using Hazelcast and do a direct comparison.



I think the series of questions to consider are:

Q: Would the application cache hits benefit from denormalisation (ElasticSearch wins here)
Q: Can I forgo "partial updates" for these beans (always use fully populated beans, not many columns, not big fat columns or just accept extra costs of fully populated beans).  This brings AWS Redis in as an option (but would need a L2 cache plugin).
Q: For benchmarking: Build an estimate/plan for realistic mixed workload simulation ( read/write ratios, estimate cardinality etc )



Cheers, Rob.


Suraj Mundada

unread,
Apr 6, 2016, 12:47:48 PM4/6/16
to Ebean ORM
Thanks Rob. Learnt so much about Ebean and caching :)
 
 1. I am mostly using flat beans. Only couple of beans have relationships. My DB schema has many relationships but I have not modelled my beans to capture those relationships to avoid complexities. 

2. I am not consciously doing anything to load beans partially. I believed till now that beans are fully loaded when retrieved from DB and all attributes are updated when save/update function is called on bean. So, partially/fully loaded bean is not a factor for me.

3.  I have never done benchmarking before this. But I will try and share the results in a week or so.

Decision criteria for me to choose caching framework:
1. It has to be centralized for different nodes or should have clustering support and scalable in both cases.
2. It should have minimal overhead of marshalling/unmarshalling of ebean (java) objects while storing on remote cache so that I can get performance similar to local cache.



Reply all
Reply to author
Forward
0 new messages