Architectural advice on caching in a Verticle

1,605 views
Skip to first unread message

N8

unread,
Dec 19, 2012, 9:42:13 AM12/19/12
to ve...@googlegroups.com
Hello All. 

I'm looking into doing some caching in a Vertx verticle that serves both text and binary content of various sizes. I have not come across and third party libraries that handle caching in an asynchronous manner (if you know of one, please speak up). I seem to be left with three options:

 1) Write my own framework that works in memory and async to disk. When I need to fall through to disk cache, use the vertx async APIs to load the contents. Maybe do writing to the cache on a background thread or Bus Mod so as not to occupy the verticle's thread with disk writes when it could be serving network requests. 
 2) Use something standard like EHCache, wrapped in a BusMod, but I have seen several posts in the past about the event bus not being all that fast, and the point of the cache is to speed things up (i.e. not make the network calls that would otherwise be needed to fullfil a request). Intuition tells me that the overhead of the event bus makes it a less than ideal choice in which to wrap a caching layer. Maybe intuition is off, I would love to hear some other opinions on this topic. I also wonder about behavior in a cluster, would the cache requests get served from a locally running instance of the Bus Mod, or would any Bus Mod in the cluster be picked? 
 3) Spawn some more threads in my verticle that manage the cache, and use a memory queue to manage the communication between the vertical and the cache. Of course I realize this comes with the penalty of each verticle instance having a completely independent cache memory from any other instances, effectively wasting memory on duplicated (or triplicated or quadruplicated, etc) entries. 

If you were designing a system in vertx that uses memory but falls back to disk on memory miss how would you do it? Are there more options other than the 3 I considered? 

Thank you in advance for any thoughts or experience you have on the topic. 

Nate McCall

unread,
Dec 19, 2012, 10:11:50 AM12/19/12
to ve...@googlegroups.com
I'm still coming up to speed on vertx myself, but with something
standard like caching, encapsulate it in your own provider so you can
have a "bake-off" between implementations. Guava's cache plumbing
sounds like it hits most of your requirements though [0].

However, I think you should strike the sendfile requirements off your
feature list. That is orthogonal to cache (de)serialization as it's
only a price you pay once to load from disk and really has nothing to
do with writing the cached item to the response thereafter.

With the event bus question, do it in the initial event loop until you
have an architectural reason to do otherwise (or a deeper
understanding as to why not - perhaps I soon will as well). From a
quick reading of the code, the event bus is basically a ByteBuffer
slapped onto an NIO socket. Not much happening there so I don't see
how it is "slow" outside of potentially mis-matching config settings
for processes on the different verticles. Of course you add a hop and
thus some latency, but that is the tradeoff for decoupling IME.

[0] http://code.google.com/p/guava-libraries/wiki/CachesExplained
> --
> You received this message because you are subscribed to the Google Groups
> "vert.x" group.
> To view this discussion on the web, visit
> https://groups.google.com/d/msg/vertx/-/KiDsbdY-3ScJ.
> To post to this group, send an email to ve...@googlegroups.com.
> To unsubscribe from this group, send email to
> vertx+un...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/vertx?hl=en-GB.

Pid

unread,
Dec 20, 2012, 11:09:14 AM12/20/12
to ve...@googlegroups.com
On 19/12/2012 14:42, N8 wrote:
> Hello All.
>
> I'm looking into doing some caching in a Vertx verticle that serves both
> text and binary content of various sizes. I have not come across and
> third party libraries that handle caching in an asynchronous manner (if
> you know of one, please speak up). I seem to be left with three options:
>
> 1) Write my own framework that works in memory and async to disk. When
> I need to fall through to disk cache, use the vertx async APIs to load
> the contents. Maybe do writing to the cache on a background thread or
> Bus Mod so as not to occupy the verticle's thread with disk writes when
> it could be serving network requests.

Eek.


> 2) Use something standard like EHCache, wrapped in a BusMod, but I have
> seen several posts in the past about the event bus not being all that
> fast, and the point of the cache is to speed things up (i.e. not make
> the network calls that would otherwise be needed to fullfil a request).
> Intuition tells me that the overhead of the event bus makes it a less
> than ideal choice in which to wrap a caching layer. Maybe intuition is
> off, I would love to hear some other opinions on this topic. I also
> wonder about behavior in a cluster, would the cache requests get served
> from a locally running instance of the Bus Mod, or would any Bus Mod in
> the cluster be picked?

Hazelcast is a dependency already. Why not investigate that?
Alternatively, look at the redis module.


> 3) Spawn some more threads in my verticle that manage the cache, and
> use a memory queue to manage the communication between the vertical and
> the cache. Of course I realize this comes with the penalty of each
> verticle instance having a completely independent cache memory from any
> other instances, effectively wasting memory on duplicated (or
> triplicated or quadruplicated, etc) entries.

This does not sound like a good solution.


> If you were designing a system in vertx that uses memory but falls back
> to disk on memory miss how would you do it? Are there more options other
> than the 3 I considered?

Use a sensible 3rd party solution. What does falling back to disk gain
you for your usecase? Where is the canonical source of data?


p


> Thank you in advance for any thoughts or experience you have on the topic.
>
> --
> You received this message because you are subscribed to the Google
> Groups "vert.x" group.
> To view this discussion on the web, visit
> https://groups.google.com/d/msg/vertx/-/KiDsbdY-3ScJ.
> To post to this group, send an email to ve...@googlegroups.com.
> To unsubscribe from this group, send email to
> vertx+un...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/vertx?hl=en-GB.


--

[key:62590808]

signature.asc

N8

unread,
Dec 21, 2012, 9:40:59 AM12/21/12
to ve...@googlegroups.com


On Thursday, December 20, 2012 1:09:14 PM UTC-3, Pid wrote:
On 19/12/2012 14:42, N8 wrote:
> Hello All.
>
> I'm looking into doing some caching in a Vertx verticle that serves both
> text and binary content of various sizes. I have not come across and
> third party libraries that handle caching in an asynchronous manner (if
> you know of one, please speak up). I seem to be left with three options:
>
>  1) Write my own framework that works in memory and async to disk. When
> I need to fall through to disk cache, use the vertx async APIs to load
> the contents. Maybe do writing to the cache on a background thread or
> Bus Mod so as not to occupy the verticle's thread with disk writes when
> it could be serving network requests.

Eek.

Agreed, I don't want to do that either.  
 



>  2) Use something standard like EHCache, wrapped in a BusMod, but I have
> seen several posts in the past about the event bus not being all that
> fast, and the point of the cache is to speed things up (i.e. not make
> the network calls that would otherwise be needed to fullfil a request).
> Intuition tells me that the overhead of the event bus makes it a less
> than ideal choice in which to wrap a caching layer. Maybe intuition is
> off, I would love to hear some other opinions on this topic. I also
> wonder about behavior in a cluster, would the cache requests get served
> from a locally running instance of the Bus Mod, or would any Bus Mod in
> the cluster be picked?

Hazelcast is a dependency already.  Why not investigate that?
Alternatively, look at the redis module.

We did think about using hazelcast, but I thought it just has a shared map, not a real caching with TTLs. We'll look into that a little more. The Redis module has the problem I was asking about initially, that everything goes over the event bus. Since every request to every server in the cluster would require a cache check (1000s per second), it seems less than efficient to pipe all this over the event bus versus just looking in local memory. 

 


>  3) Spawn some more threads in my verticle that manage the cache, and
> use a memory queue to manage the communication between the vertical and
> the cache. Of course I realize this comes with the penalty of each
> verticle instance having a completely independent cache memory from any
> other instances, effectively wasting memory on duplicated (or
> triplicated or quadruplicated, etc) entries.

This does not sound like a good solution.

Admittedly.  


> If you were designing a system in vertx that uses memory but falls back
> to disk on memory miss how would you do it? Are there more options other
> than the 3 I considered?

Use a sensible 3rd party solution.  What does falling back to disk gain
you for your usecase?  Where is the canonical source of data?

The canonical source is one of many servers accessed via HTTPS (only), residing in any part of the world. Much of the content is immutable, some of it expires in days to weeks and yet other parts that expire soon or can't be cached. The rational for saving to disk is to keep the content closer to the requester and avoid going out to servers and recalculating data that won't change or won't change often. Since the volume of requests served will be high, things may get pushed out of memory cache fairly quickly depending on the size of the memory. Additionally, some of the content may be large, so caching in memory does not make sense for this content, but keeping it on disk close to the requestor does. 

Pid

unread,
Dec 22, 2012, 3:08:01 PM12/22/12
to ve...@googlegroups.com
If the two examples above are unable to address your requirement, (I am
less familiar with them), then something like GemFire could certainly
ensure that cache data is localised in an optimal way.

This is a general problem for caching technology and one you should try
to avoid building yourself.


p
> <https://groups.google.com/d/msg/vertx/-/KiDsbdY-3ScJ>.
> > To post to this group, send an email to ve...@googlegroups.com
> <javascript:>.
> > To unsubscribe from this group, send email to
> > vertx+un...@googlegroups.com <javascript:>.
> > For more options, visit this group at
> > http://groups.google.com/group/vertx?hl=en-GB
> <http://groups.google.com/group/vertx?hl=en-GB>.
>
>
> --
>
> [key:62590808]
>
> --
> You received this message because you are subscribed to the Google
> Groups "vert.x" group.
> To view this discussion on the web, visit
> https://groups.google.com/d/msg/vertx/-/dVFqLgRLaCEJ.
signature.asc

Tim Fox

unread,
Dec 22, 2012, 5:09:11 PM12/22/12
to ve...@googlegroups.com
I would like to support an async cache API in vert.x, but clearly this
isn't going to happen overnight.
>
> Thank you in advance for any thoughts or experience you have on the
> topic.
> --
> You received this message because you are subscribed to the Google
> Groups "vert.x" group.
> To view this discussion on the web, visit
> https://groups.google.com/d/msg/vertx/-/KiDsbdY-3ScJ.

Tim Fox

unread,
Dec 22, 2012, 5:10:50 PM12/22/12
to ve...@googlegroups.com
Hazelcast provides TTLs too

To view this discussion on the web, visit https://groups.google.com/d/msg/vertx/-/dVFqLgRLaCEJ.

Asher Tarnopolski

unread,
Dec 23, 2012, 5:39:29 AM12/23/12
to ve...@googlegroups.com
i've implemented a memcached client as a worker. memcached protocol incorporates ttl, so maybe this is what you need.
it's not merged into mods yet, so meanwhile you can see it here: https://github.com/ashertarno/vertx-memcached
Reply all
Reply to author
Forward
0 new messages