A Scala front-end to a smart memcached

320 views
Skip to first unread message

David Pollak

unread,
Feb 23, 2008, 1:57:08 PM2/23/08
to liftweb
Folks,

Over the last week, I've been diving into the world of memcached and how it helps high traffic sites scale.

For those unfamiliar with memcached, it's a mechanism for caching "stuff".  It's widely used at sites like Facebook to store sessions, partially rendered pages, etc.  memcached is a process that listens on a port for particular requests (add, set, get, incr, remove) and performs these operations on name/value pairs.  Names are up to 256 bytes.  Values can be any arbitrary set of bytes.  Memcached is easily distributed (the clients select the memcached server based on a hash of the key) and supports failover (if the primary memcached server for a given hash is not available, a secondaries will be consulted.)

There are clients to memcached written for most languages (Perl, Python, Ruby, PHP, Java, .Net, etc.)

The C implementation of memcached is wicked fast and wicked stable.  There's a persistent version that stores records in a Berkeley DB backend which is about 10 times slower than the C in memory version (which is in an of itself wicked impressive.)  There's a Java implementation of memcached that claims to be 50% as fast as the C version (pretty impressive).

Steve Yen and I were chatting a while back about how memcached is stupid.  It's a byte-store, but it's subject to a lot of problems... basically, knowing when to dirty a cache item is tough.  Cache over-writes are a problem (2 or 3 processes generating the same results to cache meaning 2 or 3 simultaneous database accesses for the same data.)

The semantics of memcached are similar to those of REST and if the key is meaningful and can be parsed into a question, putting smarts (as well as the persistence that memcachedb does) behind the memcached wire protocol might lead to a new way of looking at distributed, scalable applications.

Put another way, if the memcached wire protocol is the front for an Actor-like mesh of computing, then there's a powerful abstraction that is unfrightening to the world of web developers who use memcached on a daily basis.  It would also mean that one can migrate a web application from PHP, Rails, etc. by moving the logic out of the web code and into this smart memcached thingy.  Then, migrate the front end to lift.

I'm thinking that there might be a very exciting project to add this memcached stuff to lift.  Would anyone out there be interested in spending time on such a project with me?  Would anyone out there see a use of such a thing?

Thanks,

David


--
lift, the secure, simple, powerful web framework http://liftweb.net
Collaborative Task Management http://much4.us

Viktor Klang

unread,
Feb 23, 2008, 2:07:44 PM2/23/08
to lif...@googlegroups.com

Hmmm, so Memcachd is like an equivalent to DHT? (Distributed Hash Tables)


I know that I've heared alot of buzz about Prevayler ( http://en.wikipedia.org/wiki/Prevayler )
Is there a great enough need to go the "CORBA-way" to have language/platform independance, or would something like Prevayler work? Or Terracotta?
What are the needs? What are the Pros and Cons of each listed offerings?

As memory storage gets larger and larger, RDBMSes (which are designed for systems with "little" main memory and large disks) become less and less optimal.
 
It's an interesting topic indeed.

Cheers,
-V



Thanks,

David


--
lift, the secure, simple, powerful web framework http://liftweb.net
Collaborative Task Management http://much4.us




--
_____________________________________
/                                                                 \
       /lift/ committer (www.liftweb.net)
     SGS member (Scala Group Sweden)
 SEJUG member (Swedish Java User Group)
\_____________________________________/

David Pollak

unread,
Feb 23, 2008, 2:17:15 PM2/23/08
to lif...@googlegroups.com
On 2/23/08, Viktor Klang <viktor...@gmail.com> wrote:


On Sat, Feb 23, 2008 at 7:57 PM, David Pollak <feeder.of...@gmail.com> wrote:
Folks,

Over the last week, I've been diving into the world of memcached and how it helps high traffic sites scale.

For those unfamiliar with memcached, it's a mechanism for caching "stuff".  It's widely used at sites like Facebook to store sessions, partially rendered pages, etc.  memcached is a process that listens on a port for particular requests (add, set, get, incr, remove) and performs these operations on name/value pairs.  Names are up to 256 bytes.  Values can be any arbitrary set of bytes.  Memcached is easily distributed (the clients select the memcached server based on a hash of the key) and supports failover (if the primary memcached server for a given hash is not available, a secondaries will be consulted.)

There are clients to memcached written for most languages (Perl, Python, Ruby, PHP, Java, .Net, etc.)

The C implementation of memcached is wicked fast and wicked stable.  There's a persistent version that stores records in a Berkeley DB backend which is about 10 times slower than the C in memory version (which is in an of itself wicked impressive.)  There's a Java implementation of memcached that claims to be 50% as fast as the C version (pretty impressive).

Steve Yen and I were chatting a while back about how memcached is stupid.  It's a byte-store, but it's subject to a lot of problems... basically, knowing when to dirty a cache item is tough.  Cache over-writes are a problem (2 or 3 processes generating the same results to cache meaning 2 or 3 simultaneous database accesses for the same data.)

The semantics of memcached are similar to those of REST and if the key is meaningful and can be parsed into a question, putting smarts (as well as the persistence that memcachedb does) behind the memcached wire protocol might lead to a new way of looking at distributed, scalable applications.

Put another way, if the memcached wire protocol is the front for an Actor-like mesh of computing, then there's a powerful abstraction that is unfrightening to the world of web developers who use memcached on a daily basis.  It would also mean that one can migrate a web application from PHP, Rails, etc. by moving the logic out of the web code and into this smart memcached thingy.  Then, migrate the front end to lift.

I'm thinking that there might be a very exciting project to add this memcached stuff to lift.  Would anyone out there be interested in spending time on such a project with me?  Would anyone out there see a use of such a thing?

Hmmm, so Memcachd is like an equivalent to DHT? (Distributed Hash Tables)


I know that I've heared alot of buzz about Prevayler ( http://en.wikipedia.org/wiki/Prevayler )
Is there a great enough need to go the "CORBA-way" to have language/platform independance, or would something like Prevayler work? Or Terracotta?

I was hoping that Terracotta and Actors would provide a solution.  This avenue has become less attractive for two reasons: the performance of Terracotta and Actors has not materialized (the current max is 3000 messages per second and I need to see a mesh with at least 1M messages per second) and matching a wire protocol with something that's very common and has excellent client support has significant advantages.

What are the needs? What are the Pros and Cons of each listed offerings?

As memory storage gets larger and larger, RDBMSes (which are designed for systems with "little" main memory and large disks) become less and less optimal.
 
It's an interesting topic indeed.

Cheers,
-V



Thanks,

David


--
lift, the secure, simple, powerful web framework http://liftweb.net
Collaborative Task Management http://much4.us




--
_____________________________________
/                                                                 \
       /lift/ committer (www.liftweb.net)
     SGS member (Scala Group Sweden)
 SEJUG member (Swedish Java User Group)
\_____________________________________/

Viktor Klang

unread,
Feb 23, 2008, 2:21:30 PM2/23/08
to lif...@googlegroups.com
On Sat, Feb 23, 2008 at 8:17 PM, David Pollak <feeder.of...@gmail.com> wrote:


On 2/23/08, Viktor Klang <viktor...@gmail.com> wrote:


On Sat, Feb 23, 2008 at 7:57 PM, David Pollak <feeder.of...@gmail.com> wrote:
Folks,

Over the last week, I've been diving into the world of memcached and how it helps high traffic sites scale.

For those unfamiliar with memcached, it's a mechanism for caching "stuff".  It's widely used at sites like Facebook to store sessions, partially rendered pages, etc.  memcached is a process that listens on a port for particular requests (add, set, get, incr, remove) and performs these operations on name/value pairs.  Names are up to 256 bytes.  Values can be any arbitrary set of bytes.  Memcached is easily distributed (the clients select the memcached server based on a hash of the key) and supports failover (if the primary memcached server for a given hash is not available, a secondaries will be consulted.)

There are clients to memcached written for most languages (Perl, Python, Ruby, PHP, Java, .Net, etc.)

The C implementation of memcached is wicked fast and wicked stable.  There's a persistent version that stores records in a Berkeley DB backend which is about 10 times slower than the C in memory version (which is in an of itself wicked impressive.)  There's a Java implementation of memcached that claims to be 50% as fast as the C version (pretty impressive).

Steve Yen and I were chatting a while back about how memcached is stupid.  It's a byte-store, but it's subject to a lot of problems... basically, knowing when to dirty a cache item is tough.  Cache over-writes are a problem (2 or 3 processes generating the same results to cache meaning 2 or 3 simultaneous database accesses for the same data.)

The semantics of memcached are similar to those of REST and if the key is meaningful and can be parsed into a question, putting smarts (as well as the persistence that memcachedb does) behind the memcached wire protocol might lead to a new way of looking at distributed, scalable applications.

Put another way, if the memcached wire protocol is the front for an Actor-like mesh of computing, then there's a powerful abstraction that is unfrightening to the world of web developers who use memcached on a daily basis.  It would also mean that one can migrate a web application from PHP, Rails, etc. by moving the logic out of the web code and into this smart memcached thingy.  Then, migrate the front end to lift.

I'm thinking that there might be a very exciting project to add this memcached stuff to lift.  Would anyone out there be interested in spending time on such a project with me?  Would anyone out there see a use of such a thing?

Hmmm, so Memcachd is like an equivalent to DHT? (Distributed Hash Tables)


I know that I've heared alot of buzz about Prevayler ( http://en.wikipedia.org/wiki/Prevayler )
Is there a great enough need to go the "CORBA-way" to have language/platform independance, or would something like Prevayler work? Or Terracotta?

I was hoping that Terracotta and Actors would provide a solution.  This avenue has become less attractive for two reasons: the performance of Terracotta and Actors has not materialized (the current max is 3000 messages per second and I need to see a mesh with at least 1M messages per second) and matching a wire protocol with something that's very common and has excellent client support has significant advantages.

Have the bottleneck been identified? (Is it Terracotta that is the bottleneck or is it the Actors-library?)
I agree that using already present standards is a good idea.
Do you happen to know how Memcachd when it comes to versioning?

Cheers,
-V
 

Blair Zajac

unread,
Feb 23, 2008, 2:29:05 PM2/23/08
to lif...@googlegroups.com

I wouldn't call it a CORBA way. There's no IDL file. The memached
wire protocol is extremely simple I believe. That's why it's easy to
implement a client in any language.

>
> I was hoping that Terracotta and Actors would provide a solution.
> This avenue has become less attractive for two reasons: the
> performance of Terracotta and Actors has not materialized (the
> current max is 3000 messages per second and I need to see a mesh
> with at least 1M messages per second) and matching a wire protocol
> with something that's very common and has excellent client support
> has significant advantages.
>
> Have the bottleneck been identified? (Is it Terracotta that is the
> bottleneck or is it the Actors-library?)
> I agree that using already present standards is a good idea.
> Do you happen to know how Memcachd when it comes to versioning?

There is no versioning in memcached. It's just a hash table.

We're using it for our internal deployment also.

Blair

--
Blair Zajac, Ph.D.
CTO, OrcaWare Technologies
<bl...@orcaware.com>
Subversion training, consulting and support
http://www.orcaware.com/svn/


Viktor Klang

unread,
Feb 23, 2008, 2:35:42 PM2/23/08
to lif...@googlegroups.com

My CORBA-analogy was more of a "It's language, platform and implementation independent".
 


>
> I was hoping that Terracotta and Actors would provide a solution.
> This avenue has become less attractive for two reasons: the
> performance of Terracotta and Actors has not materialized (the
> current max is 3000 messages per second and I need to see a mesh
> with at least 1M messages per second) and matching a wire protocol
> with something that's very common and has excellent client support
> has significant advantages.
>
> Have the bottleneck been identified? (Is it Terracotta that is the
> bottleneck or is it the Actors-library?)
> I agree that using already present standards is a good idea.
> Do you happen to know how Memcachd when it comes to versioning?

There is no versioning in memcached.  It's just a hash table.

Okay, so it's opaque-last-commit-wins.

 


We're using it for our internal deployment also.

Cool! Got any tips, tricks or wisdom to share?

Cheers,
-V
 


Blair

--
Blair Zajac, Ph.D.
CTO, OrcaWare Technologies
<bl...@orcaware.com>
Subversion training, consulting and support
http://www.orcaware.com/svn/




David Pollak

unread,
Feb 23, 2008, 2:38:26 PM2/23/08
to lif...@googlegroups.com


Viktor Klang wrote:


> I agree that using already present standards is a good idea.
> Do you happen to know how Memcachd when it comes to versioning?

There is no versioning in memcached.  It's just a hash table.

Okay, so it's opaque-last-commit-wins.
Yes and no.  There's an atomic "incr" operation which increments a value and returns the incremented value.  Using incr, there are some very exciting ways to version things.  It requires more client logic, but it's possible.

Blair Zajac

unread,
Feb 23, 2008, 2:43:48 PM2/23/08
to lif...@googlegroups.com

No, not yet. We're still developing our app.

The only trick I heard is that for boxes hosting application servers
which are CPU intensive you can put memcached on them also, since
memcached is memory intensive but not CPU intensive. So we have 10
application servers and each Java process gets 2 Gigs of RAM, leaving
another 2 free. So if we run a memcached process on each one and give
it a gig, then we have 10 gigs of distributed memory.

Blair

TylerWeir

unread,
Feb 23, 2008, 2:44:05 PM2/23/08
to liftweb
I more than likely can't help in terms of knowledge, but I'd love to
be involved and I'll offer my help for testing or something like that.

Tyler

Viktor Klang

unread,
Feb 23, 2008, 2:49:27 PM2/23/08
to lif...@googlegroups.com

Cool, 'cause versioning dramatically increases the use-cases :)
 



Viktor Klang

unread,
Feb 23, 2008, 2:51:06 PM2/23/08
to lif...@googlegroups.com

That's neat :)
How fail-safe is it? Is it replicated in some specific manner?

-Viktor
 

Blair



Blair Zajac

unread,
Feb 23, 2008, 2:53:49 PM2/23/08
to lif...@googlegroups.com

On Feb 23, 2008, at 11:51 AM, Viktor Klang wrote:

>
> > We're using it for our internal deployment also.
> >
> > Cool! Got any tips, tricks or wisdom to share?
>
> No, not yet. We're still developing our app.
>
> The only trick I heard is that for boxes hosting application servers
> which are CPU intensive you can put memcached on them also, since
> memcached is memory intensive but not CPU intensive. So we have 10
> application servers and each Java process gets 2 Gigs of RAM, leaving
> another 2 free. So if we run a memcached process on each one and give
> it a gig, then we have 10 gigs of distributed memory.
>
> That's neat :)
> How fail-safe is it? Is it replicated in some specific manner?

No, it's just a LRU cache to a backend much slower database that has
immutable data in it. So I can safely cache data in memcached. It
should be very fast.

Blair

Viktor Klang

unread,
Feb 23, 2008, 3:16:04 PM2/23/08
to lif...@googlegroups.com

Nice. Is there a possibility to use LFU or any other algorithm?

Cheers,
-V
 


Blair





Blair Zajac

unread,
Feb 23, 2008, 3:21:53 PM2/23/08
to lif...@googlegroups.com

On Feb 23, 2008, at 12:16 PM, Viktor Klang wrote:

>
>
> On Sat, Feb 23, 2008 at 8:53 PM, Blair Zajac <bl...@orcaware.com>
> wrote:
>
>
> On Feb 23, 2008, at 11:51 AM, Viktor Klang wrote:
>
> >
> > > We're using it for our internal deployment also.
> > >
> > > Cool! Got any tips, tricks or wisdom to share?
> >
> > No, not yet. We're still developing our app.
> >
> > The only trick I heard is that for boxes hosting application servers
> > which are CPU intensive you can put memcached on them also, since
> > memcached is memory intensive but not CPU intensive. So we have 10
> > application servers and each Java process gets 2 Gigs of RAM,
> leaving
> > another 2 free. So if we run a memcached process on each one and
> give
> > it a gig, then we have 10 gigs of distributed memory.
> >
> > That's neat :)
> > How fail-safe is it? Is it replicated in some specific manner?
>
> No, it's just a LRU cache to a backend much slower database that has
> immutable data in it. So I can safely cache data in memcached. It
> should be very fast.
>
> Nice. Is there a possibility to use LFU or any other algorithm?

I don't know for certain. But this page is worth checking out:

http://semanticvoid.com/pages/memcached.html

Regards,
Blair


Steve Jenson

unread,
Feb 23, 2008, 5:21:48 PM2/23/08
to lif...@googlegroups.com
On Sat, Feb 23, 2008 at 10:57 AM, David Pollak
<feeder.of...@gmail.com> wrote:

> Steve Yen and I were chatting a while back about how memcached is stupid.
> It's a byte-store, but it's subject to a lot of problems... basically,
> knowing when to dirty a cache item is tough. Cache over-writes are a
> problem (2 or 3 processes generating the same results to cache meaning 2 or
> 3 simultaneous database accesses for the same data.)
>
> The semantics of memcached are similar to those of REST and if the key is
> meaningful and can be parsed into a question, putting smarts (as well as the
> persistence that memcachedb does) behind the memcached wire protocol might
> lead to a new way of looking at distributed, scalable applications.
>
> Put another way, if the memcached wire protocol is the front for an
> Actor-like mesh of computing, then there's a powerful abstraction that is
> unfrightening to the world of web developers who use memcached on a daily
> basis. It would also mean that one can migrate a web application from PHP,
> Rails, etc. by moving the logic out of the web code and into this smart
> memcached thingy. Then, migrate the front end to lift.
>
> I'm thinking that there might be a very exciting project to add this
> memcached stuff to lift. Would anyone out there be interested in spending
> time on such a project with me? Would anyone out there see a use of such a
> thing?

Why not just plain REST? ETags and Last-Modified give you pretty good
cache semantics. I'll go ahead and say up front that I think memcached
is overused and overhyped.

Steve

Steve Jenson

unread,
Feb 23, 2008, 5:42:59 PM2/23/08
to lif...@googlegroups.com
On Sat, Feb 23, 2008 at 11:07 AM, Viktor Klang <viktor...@gmail.com> wrote:
> > I'm thinking that there might be a very exciting project to add this
> memcached stuff to lift. Would anyone out there be interested in spending
> time on such a project with me? Would anyone out there see a use of such a
> thing?
>
>
> Hmmm, so Memcachd is like an equivalent to DHT? (Distributed Hash Tables)

Memcached is not a DHT, it's just a remote hashtable. Most people who
use it do the distributed part on top of the memcached client they
use. So it's their application that's the DHT, not memcached.

> I know that I've heared alot of buzz about Prevayler (
> http://en.wikipedia.org/wiki/Prevayler )
> Is there a great enough need to go the "CORBA-way" to have
> language/platform independance, or would something like Prevayler work? Or
> Terracotta?
> What are the needs? What are the Pros and Cons of each listed offerings?
>
> As memory storage gets larger and larger, RDBMSes (which are designed for
> systems with "little" main memory and large disks) become less and less
> optimal.

Prevalyer is built on the concept that "most applications will never
exceed the amount of RAM they can buy". I worked on one that did and
am pretty much only interested in working on ones that will since
those are the ones that are popular enough to make serious money.

Remember, even if you can buy a machine with enough memory, does it
have enough IO bandwidth and CPU cycles to compute all the useful
things you need with that data? If not, how many replicas of that do
you need to handle your peak traffic requirements?

You know what I would like to see? A simplified BigTable clone that
has the same scaling properties as BigTable itself (unlike many of the
current clones).

Steve

David Pollak

unread,
Feb 23, 2008, 7:50:11 PM2/23/08
to lif...@googlegroups.com
Because HTTP is heavier weight than the memcached protocol, HTTP keep-alive is harder to implement (especially in languages with crappy threading and no real concept of "global") than keeping the socket open to memcached,  memcached has client-managed timeout semantics, and  there are a ton of applications that use memcached already and being able to gently migrate them to a Scala/lift backend is easier if you say, "you're already using memcached, just use this as a server and it will not only cache, but in some cases, compute your answer."

More generally, it's hard to write fault tolerant applications in Rails (and most other web frameworks), but it's easy in Erlang because in Erlang, failures are assumed and the "alternative in case of timeout" is almost always coded in.  memcached is a nice place to put the "buffer layer."

Let's just for a minute assign a name to this thing... let's call it smartcached.

Imaging for a moment that smartcached is based on Scala Actors.  You get a request for a cached item... the item is either not in cache or marked dirty.  You send a message to a computational unit to build a new cache entry.  If the computation doesn't come back in a certain period of time, you either return the old value (if that's a legal semantic for the data type) or your return a default value.  This means that the web front end always gets an answer.  That's a huge win, especially in a usage spike because you want answers to go back so web requests don't get piled up.  It also means that people don't have the perception that a given service is down (serving 2 or 3 minute old pages is better than serving 500s.)

The next win with smartcached is that you can serialize item building.  That means that if you have multiple requests for an item not in cache, you don't have multiple machines building the same cache item and contending for resources (one might argue that those resources are in memory on the database after the first request, but still, having the database do n times the work, especially in load situations is not a good thing.)  So, you wind up with the ability to serialize the building of an item.  This is a win.

But you also wind up with the ability to throttle back the computation of items.  When you have the computation of cache items distributed across n processes on m machines, there's no real way to "take the temperature" of the whole system (is the average database response time > 100% of normal, is the queue length in the message queue more than n items or is the items processed per second < 50% of normal) and throttle back number of simultaneous calculations so that the system has a chance to right itself.  If the calculation is centralize *and* there are semantics build into the work allocator of the centralized calculator for doing this throttling, it just becomes part of the infrastructure rather than something that someone has to actively think about for each query.

So... coming full circle, what does all this stuff have to do with memcached and why not just do it on top of Jetty and REST?

Doing internal infrastructure on REST means actively changing up all the places that you're calling memcached.  That's lots of work.  Doing stuff over HTTP and REST could mean the perception of SOA and the associated Technicolor Yawn from hip web 2.0 people that rage against the IBM mandated machine, dude. :-)

Plus, if we get it right on the memcached ABI side, there's no problems generalizing the stuff to REST.
Steve


  

steve.yen

unread,
Feb 23, 2008, 9:11:22 PM2/23/08
to liftweb
Hi David,

Definitely interested!

Been playing with prototype ideas, since we last spoke about this.
Crawled through memcached's C code, talked to folks who use it, trying
to understand their pain points.

And, been trying to replace my lift project's RDBMS backend with
something like smart/memcached++, albeit written in Scala. That
explains why I've been quiet on the lift group, as all my questions
are no longer frontend right now.

Theoretically, with a mesh of Actors smeared across your cluster
nodes, distributed close to the distributed data, you should be able
to more easily do computations that RDBM's have been traditionally bad
at. Graph-structure/tree queries, matrix operations (eigenvalues
anyone?), etc. Somewhere in that pile of ideas, I figure, should be
another pony -- a compelling reason or 2 why, besides caching, you'd
want to have a new, non-relational layer.

Steve

Steve Jenson

unread,
Feb 23, 2008, 11:05:38 PM2/23/08
to lif...@googlegroups.com
I hope you don't think of my comments as stop energy. I think this is
a very interesting experiment and my comments are meant to be
constructive.

On Sat, Feb 23, 2008 at 4:50 PM, David Pollak <d...@athena.com> wrote:

> Why not just plain REST? ETags and Last-Modified give you pretty good
> cache semantics. I'll go ahead and say up front that I think memcached
> is overused and overhyped.
>
> Because HTTP is heavier weight than the memcached protocol, HTTP keep-alive
> is harder to implement (especially in languages with crappy threading and no
> real concept of "global") than keeping the socket open to memcached,

I'll just throw this out there. Clearly your idea is predicated on
using memcached protocol but somebody else might find this useful:
There are other good protocols like BEEP that have multi-language
support and offer more semantics than get, put, get_multi, and aren't
nearly as heavy-weight as HTTP when it comes to persistent
connections. BEEP isn't so much a protocol as a toolkit for building
your own protocol. Has an RFC and clients in Java and C and was
written by Marshall Rose.

> memcached has client-managed timeout semantics, and there are a ton of
> applications that use memcached already and being able to gently migrate
> them to a Scala/lift backend is easier if you say, "you're already using
> memcached, just use this as a server and it will not only cache, but in some
> cases, compute your answer."

Here's kind of what you're saying to them:

"Here's a memcached server that's way slower but if you write your
code in Scala, it has nicer caching."

Am I essentially right? That might seem harsh but I'm trying to think
of potential reactions to this experiment. the hip web 2.0 crowd is a
cynical one. ;-)

> More generally, it's hard to write fault tolerant applications in Rails
> (and most other web frameworks), but it's easy in Erlang because in Erlang,
> failures are assumed and the "alternative in case of timeout" is almost
> always coded in. memcached is a nice place to put the "buffer layer."
>
> Let's just for a minute assign a name to this thing... let's call it
> smartcached.
>
> Imaging for a moment that smartcached is based on Scala Actors. You get a
> request for a cached item... the item is either not in cache or marked
> dirty. You send a message to a computational unit to build a new cache
> entry.

Where is this computational unit?

> If the computation doesn't come back in a certain period of time,
> you either return the old value (if that's a legal semantic for the data
> type) or your return a default value. This means that the web front end
> always gets an answer. That's a huge win, especially in a usage spike
> because you want answers to go back so web requests don't get piled up. It
> also means that people don't have the perception that a given service is
> down (serving 2 or 3 minute old pages is better than serving 500s.)

What if the client didn't have an old value yet and a default value
isn't good enough? That's a 500.

If your cache is replicated across actors then you have a better
chance of serving a cached item even if stale which seems to be the
property you want.

> The next win with smartcached is that you can serialize item building.
> That means that if you have multiple requests for an item not in cache, you
> don't have multiple machines building the same cache item and contending for
> resources (one might argue that those resources are in memory on the
> database after the first request, but still, having the database do n times
> the work, especially in load situations is not a good thing.) So, you wind
> up with the ability to serialize the building of an item. This is a win.

So you have to build the item with Scala or smartcached calls back
into your application to have it build the item and cache the result?
I think this relates to my question of where the computational unit
is.

> But you also wind up with the ability to throttle back the computation of
> items. When you have the computation of cache items distributed across n
> processes on m machines, there's no real way to "take the temperature" of
> the whole system (is the average database response time > 100% of normal, is
> the queue length in the message queue more than n items or is the items
> processed per second < 50% of normal) and throttle back number of
> simultaneous calculations so that the system has a chance to right itself.
> If the calculation is centralize *and* there are semantics build into the
> work allocator of the centralized calculator for doing this throttling, it
> just becomes part of the infrastructure rather than something that someone
> has to actively think about for each query.

You can achieve this throttling with a master node that tracks the
health of the system. Clients (meaning cache actors in this case) send
messages to the master informing it of their status. You can certainly
"take the temperature" of a distributed system, it's just not as easy
as with a centralized system.

> So... coming full circle, what does all this stuff have to do with
> memcached and why not just do it on top of Jetty and REST?
>
> Doing internal infrastructure on REST means actively changing up all the
> places that you're calling memcached. That's lots of work. Doing stuff
> over HTTP and REST could mean the perception of SOA and the associated
> Technicolor Yawn from hip web 2.0 people that rage against the IBM mandated
> machine, dude. :-)

The lift motto seems to be: build great things for smart people.
That's why it's not simply Rails in Scala. Why move away from that?
Building things for sheep is Not Satisfying in my experience.

> Plus, if we get it right on the memcached ABI side, there's no problems
> generalizing the stuff to REST.

That's true.

I like that you're taking an incremental approach to this. Maybe this
is just me missing BigTable but I think that at some point, smarter
caches become more work than building better databases that can simply
be fed more machines as load increases. Then again, maybe your smarter
cache eventually just becomes the database.

Steve

Alexander Keiblinger

unread,
Feb 24, 2008, 9:38:54 AM2/24/08
to lif...@googlegroups.com
There is an imlementation of memcached called cacherl in Erlang. This
implementation is easily clusterable in Erlang style and has the
additional benefit of data persistency using mnesia. The
implementation follows the memcached protocoll and can be used
interchangably for other memcached implementations.

Now let's add AMQP (https://www.amqp.org/) and for example rabbitmq (http://www.rabbitmq.com/
) to this equation. We get a nice setup for gracefully degrading data
serving using cached data as high load fallback.

Example:
A cache usage hits memcached (or cacherl). We configure a specified
timeout before the data gets delivered from the cache. Using AMQP
messaging we put a message requesting a new calculation of the cache
data entry into an AMQP message queue. Now we can have different
clients (using Scala, Erlang, C++ ...) capable of calculating the data
listening on that queue. Some code in the cloud (for example a scala
actor) calculates the data and puts the result back into the memcached
cache. If the result is written into the cache before the timeout then
the user gets up to the second accurate pages. If the timeout is
missed the user sees older data but no page failure.

Voila, there we have an enterprise grade infrastructure (clustered and
persisted data cache and message store) out of already avaliable (open
source) system components that can handle high load situations.

For using such a setup in a lift application we need a timeout
extension of cacherl (or memcached) and a mechanism in /lift/ for easy
(maybe transparent?) usage of memcached data. (*)

+1 for memcached interface protocol integration into /lift/

(*) The hard core Scala boys then can sign up for the task of porting
rabbitmq and cacherl from Erlang to Scala ;)

David Pollak

unread,
Feb 24, 2008, 6:20:50 PM2/24/08
to lif...@googlegroups.com
Interesting idea, but I'm not sure how much it buys.

Erlang is going to be slower than a Scala implementation of the memcached ABI.

If there's need for a messaging system, yeah RabbitMQ's the one to use.

But, all in all, I'd implement most of the system in Scala and skip having to have yet another piece of technology (Erlang) in the mix.

Alexander Keiblinger

unread,
Feb 24, 2008, 7:50:15 PM2/24/08
to lif...@googlegroups.com
Hi David,

> Erlang is going to be slower than a Scala implementation of the
> memcached ABI.

> ...


> But, all in all, I'd implement most of the system in Scala and skip
> having to have yet another piece of technology (Erlang) in the mix.

on what numbers your opinion to use Lift and a (yet virtual) Scala
version of Memcached is based on? I am a total Scala and Lift fanboy
and would also like to have such an infrastructure based on Scala/
Lift. But I do not see Erlang like load tolerance for clustered
servers with Lift/Scala yet. Erlang server clusters have a load
tolerance of about 80k requests per second (mid-range Linux/PC server
system) per cluster node. I have seen numers in this dimension for
erlang based web page (yaws) and comet request serving (erlycomet).
Unfortunately I have no numers for the erlang version of memcached.
Does Scala/Lift play in that league already? Of course it would be
better to have only one technology involved in such a setup.

Regards,
Alex

David Pollak

unread,
Feb 24, 2008, 8:15:30 PM2/24/08
to lif...@googlegroups.com
My assertion is based on the amount of computation that I'm assuming would go into smartcached.

I have no doubt that Erlang is faster than anything other than hand-tuned C for serving static content.

However, once you start dealing with any moderately complex computations, Erlang's going to start losing.  If you have to do String manipulation with 16 bit characters, Erlang's sunk.  Any file IO and Erlang's sunk (look at some of the ways that CouchDB gets real slow.)  There are even complaints about RabbitMQ's performance when there are durable queues.

Erlang has a better model for fault tolerance than anything around.  I want to bring some of that to smartcached because I think that creating "shock absorbers" for complex systems is the best way to scale.

So, yes, Erlang is great at byte-moving, but not great at computations.  I believe that JVM-based systems can get in striking distance for raw byte-moving and will be far superior for computations.

Thanks,

David
Regards,
Alex


  

David Pollak

unread,
Feb 25, 2008, 8:33:00 AM2/25/08
to lif...@googlegroups.com
Steve,

I wish I had about 2 hours to write a worthy response to this note.

A couple of things... I see a near-term need for a smartcached that uses the memcached protocol, but the larger project is not about wire protocols and should be pretty much independent of wire protocols.

Persistence mechanisms (RDBMS, BigTable, etc.) are great for "demand" based web applications.  However, I see a broader need for "proactive" web applications.  This is a hybrid of persistence, messaging (with smart, scalable routing rules), and some "live agent/Actor" thingy that sticks around and keeps fresh some form of state.

I've been ranting on other threads about lift not just being about HTTP request/response and more than just CRUD.  This discussion is bring these ideas into clearer focus for me.

Okay... enough of a cryptic post for the morning.

Thanks,

David

David Bernard

unread,
Feb 25, 2008, 8:54:32 AM2/25/08
to lif...@googlegroups.com
David Pollak wrote:
> Steve,
>
> I wish I had about 2 hours to write a worthy response to this note.
>
> A couple of things... I see a near-term need for a smartcached that uses
> the memcached protocol, but the larger project is not about wire
> protocols and should be pretty much independent of wire protocols.
>
> Persistence mechanisms (RDBMS, BigTable, etc.) are great for "demand"
> based web applications. However, I see a broader need for "proactive"
> web applications. This is a hybrid of persistence, messaging (with
> smart, scalable routing rules), and some "live agent/Actor" thingy that
> sticks around and keeps fresh some form of state.

It looks like (a lot) JavaSpaces (TuplesSpace, GigaSpaces)
With it you could do :
* messaging, persistence (pure memory, or as cache for FS, RDBMS,...)
* master/slave command pattern (divide and conquer)
* space for data to process by actor

May be the api of XxxSpaces could be a good source of inspiration ;), something like
* take[T](template: T, tx: Option[Transaction], timeout: Duration) : T
* takeAll[T](template: T, tx: Option[Transaction], timeout: Duration) : Iterable[T]
* write[T](entry: T, tx: Option[Transaction], expiration: Option[Duration])
* read[T](template: T, tx: Option[Transaction], timeout: Duration) : T
* notify (I don't remember the exact api : need to create NotificationListener,...)

IMO Actor + Spaces could provide a good env (Gigaspaces provide a "pseudo" actor mecanisme via worker)

my 2cents (if you need info about JavaSpaces/GigaSpaces (what is cool, and what is not cool, may be I could help)

> I've been ranting on other threads about lift not just being about HTTP
> request/response and more than just CRUD. This discussion is bring
> these ideas into clearer focus for me.
>
> Okay... enough of a cryptic post for the morning.

I'll read it ;)

/davidB

>
> Thanks,
>
> David
>
> On 2/23/08, *Steve Jenson* <ste...@gmail.com <mailto:ste...@gmail.com>>

Viktor Klang

unread,
Feb 25, 2008, 8:57:57 AM2/25/08
to lif...@googlegroups.com
On Mon, Feb 25, 2008 at 2:54 PM, David Bernard <david.be...@gmail.com> wrote:

David Pollak wrote:
> Steve,
>
> I wish I had about 2 hours to write a worthy response to this note.
>
> A couple of things... I see a near-term need for a smartcached that uses
> the memcached protocol, but the larger project is not about wire
> protocols and should be pretty much independent of wire protocols.
>
> Persistence mechanisms (RDBMS, BigTable, etc.) are great for "demand"
> based web applications.  However, I see a broader need for "proactive"
> web applications.  This is a hybrid of persistence, messaging (with
> smart, scalable routing rules), and some "live agent/Actor" thingy that
> sticks around and keeps fresh some form of state.

It looks like (a lot) JavaSpaces (TuplesSpace, GigaSpaces)
With it you could do :
* messaging, persistence (pure memory, or as cache for FS, RDBMS,...)
* master/slave command pattern (divide and conquer)
* space for data to process by actor

May be the api of XxxSpaces could be a good source of inspiration ;), something like
* take[T](template: T, tx: Option[Transaction], timeout: Duration) : T
* takeAll[T](template: T, tx: Option[Transaction], timeout: Duration) : Iterable[T]
* write[T](entry: T, tx: Option[Transaction], expiration: Option[Duration])
* read[T](template: T, tx: Option[Transaction], timeout: Duration) : T
* notify (I don't remember the exact api : need to create NotificationListener,...)


XxxSpaces? The porn industry is always leading the technological advances.
 

David Pollak

unread,
Feb 25, 2008, 9:11:33 AM2/25/08
to lif...@googlegroups.com
Can you put 1B objects into JavaSpaces?  10B?

David Pollak

unread,
Feb 25, 2008, 9:17:46 AM2/25/08
to lif...@googlegroups.com
On 2/25/08, David Pollak <feeder.of...@gmail.com> wrote:
Can you put 1B objects into JavaSpaces?  10B?

Also... what kind of backing store does JavaSpaces use?  What kind of fail-over/fault tolerance?

David Bernard

unread,
Feb 25, 2008, 9:41:16 AM2/25/08
to lif...@googlegroups.com
David Pollak wrote:
>
>
> On 2/25/08, *David Pollak* <feeder.of...@gmail.com
> <mailto:feeder.of...@gmail.com>> wrote:
>
> Can you put 1B objects into JavaSpaces? 10B?
>
>
> Also... what kind of backing store does JavaSpaces use? What kind of
> fail-over/fault tolerance?

JavaSpaces is a spec.
About GigaSpaces (a commercial/pro implementation (+/- free for startup))
* 1B or 10B is possible depends of the size of the object and the size of the cluster.
* by example if you use the GUI admin, you could bench (write/take/read) but object are very little (2 fields)
* a better solution, is to start a server (embedded or not) and push data (I could retreive code if you want)
* fail-over/fault tolerance is done by
* a cluster of nodes (several on the same host is possible)
* support partionning of data, replication combinaison of both between "node"
* support several strategy of load balancing between node
* possible to create mirror/backup
* possible to store data of (selected) node to backend like FS, RDBMS, custom
* possible to use a backend to retreive data not available in space (like cache missing)
Open source implementation are mono-server/process (correct me if I make a mistake) :
* blitz use Berkeley DB to persist state (for hard persistance, or for over-memory (a little like ehcache do))

I didn't test with more than 1M (~10 fields) and to simulate a classical Ask/Bid/Match (financial)
What is interesting is the api and transaction management (take from Jini) that is the same for messaging and data access (and simple)

/davidB

>
> On 2/25/08, *David Bernard* <david.be...@gmail.com

> <mailto:ste...@gmail.com> <mailto:ste...@gmail.com


> <mailto:ste...@gmail.com>>>
>
> > wrote:
> >
> >
> > I hope you don't think of my comments as stop energy. I
> think this is
> > a very interesting experiment and my comments are meant to be
> > constructive.
> >
> >
> > On Sat, Feb 23, 2008 at 4:50 PM, David Pollak
> <d...@athena.com <mailto:d...@athena.com>
>

Reply all
Reply to author
Forward
0 new messages