IdP Memcached StorageService implementation

844 views
Skip to first unread message

Manuel Haim

unread,
Jul 20, 2011, 8:26:33 AM7/20/11
to us...@shibboleth.net
Hi,

this may be an alternative to clustering the IdP 2.x with "Terracotta":

During the past weeks we have been evaluating several methods to cluster
the Shibboleth IdP 2.3, and we would like to share our results with the
community. We did some load testing and configuration optimizations on
our IdP nodes, and due to heavy resource load and performance issues
with "Terracotta", we decided to find a better solution for replicating
the IdP's state. Thus, we tested technologies like "Memcached",
"Infinispan" or "Hazelcast" and seem to have come up with a solution.

Based on an old post [1], we have managed to implement a lightweight and
working IdP StorageService (this is the place where the IdP stores its
shared Objects) which connects to a Memcached server, along with a
servlet filter for post-processing each IdP request (see technical
details below). Our StorageService and servlet filter can easily be
added to an IdP by just copying two .jar files (i.e. our extension and
the spymemcached library) and modifying a few lines in the IdP's
internal.xml and web.xml files. Additionally, a HTTP load balancer with
session stickiness is needed (we use "pound", sticking to the
"JSESSIONID" cookie). In our test setup with two IdP nodes, it turned
out that the Memcached solution was three times as fast as using
Terracotta, and it consumed way less memory (Memcached needed just about
50 MB RAM per 10.000 logins, while Terracotta consumed 600 MB disk space
after 10.000 logins and permanently acquired 800 MB RAM).

Here are some technical details: As Java uses Object references and the
IdP 2.x does not explicitly tell the StorageService when it further
modified an Object that has already been stored within, we cannot just
map the StorageService's get() and put() method to Memcached. Instead,
we keep all Objects in a local map and, each time the StorageService's
get() or put() method is called, compare their expirationTime with the
ones stored in Memcached (and synchronize the values depending on which
Object is newer). Additionally, we had to implement a servlet filter
which, after the session object has gone through the IdP, tells the
StorageService that the session object has changed (by calling the put()
method). Session indexes will only be stored as a reference to the
sessionId, and publicCredentials (if stored in the session's Subject)
are handled and synchronized as well.

Testing "Infinispan" and "Hazelcast" as storage services on the IdP 2.3
turned out that, even though both technologies provide a native Java Map
which can be synchronized between different nodes, they also synchronize
Object modifications only if the modified Object is explicitly put back.
Thus, the same workarounds as in our Memcached StorageService would be
needed there, so we did not pursue this approach any further. (By the
way, we have stumbled upon an interesting comparison of distributed
memory systems, see page 35 of [2] for a performance report.)


As said, we would like to share our IdP Memcached StorageService
extension. What would be the next steps? Is the Contributions page at
the SHIB2 wiki the right place? And may we get a place in the Shibboleth
SVN extensions directory? I guess we would have to set up a maven-aware
Java project then?

Manuel Haim


[1]: IDP Memcached StorageService implementation,
http://groups.google.com/group/shibboleth-users/browse_thread/thread/228409e500d84ae5

[2]: Scale over the limits: an overview of modern distributed caching
solutions
http://www.snoopal.com/documents/GaSovOTchHCsOF5X7bG691/-Scale-over-the-limits-an-overview-of-modern-distributed-caching-solutions

--
To unsubscribe from this group, send email to
users+un...@shibboleth.net

Chad La Joie

unread,
Jul 20, 2011, 8:45:56 AM7/20/11
to us...@shibboleth.net
Thanks Manuel for the write up.

The Contributions page is the write place to list your extension. We,
however, do not provide a project hosting service for people's
extensions so if you'll need to provide whatever infrastructure you want
people to use for that.
Chad La Joie
http://itumi.biz
trusted identities, delivered

Manuel Haim

unread,
Jul 20, 2011, 10:24:17 AM7/20/11
to us...@shibboleth.net
Thank you Chad,

we will look for a public place to go, and put a link and description on
the Contributions page in the next days.

-Manuel

Nick Duan

unread,
Jul 20, 2011, 10:38:05 AM7/20/11
to us...@shibboleth.net, d...@shibboleth.net
Does anyone know if there is an OpenSAML-based XACML PDP implementation?

Thanks!

ND

Manuel Haim

unread,
Jul 20, 2011, 11:34:52 AM7/20/11
to us...@shibboleth.net, Martin B. Smith
Am 20.07.2011 16:30, schrieb Martin B. Smith:

> On 07/20/2011 08:26 AM, Manuel Haim wrote:
>> Additionally, a HTTP load balancer with
>> session stickiness is needed (we use "pound", sticking to the
>> "JSESSIONID" cookie).
>
> Hi Manuel,
>
> Could you elaborate on the need for a sticky session if the data is
> fully clustered using memcached? Why are sticky sessions required?
>
> Thanks,

Hi Martin,

for some reasons, we cannot distribute the short-lived "loginContexts"
within the cluster. When the IdP does some redirects during a login
attempt, it stores some information in a so-called loginContext (which
is like a short-lived session, so some information from the first
request will be still available within the next request, till the login
succeeds). Therefor, the IdP creates a random id, stores this id as a
cookie in the httpResponse and also puts a new loginContext Object
(under this id) into the StorageService. Then, the loginContext Object
is returned for further modification.

However, after the loginContext has been created and sent to the
StorageService, you have only access to the loginContext Object and do
not know its id anymore (you cannot access cookies within the
httpResponse, and you cannot retrieve the id elsewhere). Without the id,
the post-processing servlet filter cannot put the modified loginContext
into the StorageService again (in order to make it available to other
IdP nodes).

Instead of modifying the IdP to make the loginContext id available (or
iterate the local map till we find the key which maps to the current
loginContext), we decided to have the loginContext stored local-only, as
it is only used during the login process and we were already having a
load balancer with sticky sessions (as recommended by the Shibboleth
wiki; I guess, having the load balancer sticking to the cookie
"_idp_authn_lc_key" for just a few minutes would work, too). This
decision does not affect the synchronization of other objects, though.
If the IdP node you have been using dies, each other IdP node will
recognize you and let you continue using your session.

William G. Thompson, Jr.

unread,
Jul 20, 2011, 12:51:22 PM7/20/11
to us...@shibboleth.net
Hi Manuel,

Wondering if you considered using a distributed EhCache without Terracotta?

Best,
Bill

Manuel Haim

unread,
Jul 21, 2011, 12:25:59 PM7/21/11
to us...@shibboleth.net
Hi again,

the extension is now available to the public, see:
https://wiki.shibboleth.net/confluence/display/SHIB2/Memcached+StorageService

Please feel free to try it out and send comments :)

@William:
I just had a look into EhCache and its clustering options. There is
indeed an option to replicate the cache without Terracotta (which we
must have missed, as Terracotta is heavily propagated as default within
the documentation). However, with EhCache you need to wrap each Object
as net.sf.ehcache.Element, there are no Java Generics like in Infinispan
or Hazelcast, and the setup process seems quite more complex than in our
Memcached solution. Perhaps if EhCache replicated Object changes (which
Infinispan and Hazelcast just don't), it could be an alternative. We
have not tested this yet, though.

-Manuel

Manuel Haim

unread,
Aug 1, 2011, 4:43:04 AM8/1/11
to us...@shibboleth.net
Am 31.07.2011 18:03, schrieb Peter Schober:
> Are you replicating memcached's content between memcached instances
> somehow, e.g. using the repcached patches[1]? If not how is this an
> alternative to Terracotta (as you'd only be shifting a SPOF from one
> IdP to one memcached instaance.)
>
> I know SimpleSAMLphp can be set up to use replicated memcached
> instances (though I can't find the docs on the current site) but the
> replication code had to be added to SimpleSAMLphp, AFAIR.
> -peter
>
> [1] http://repcached.lab.klab.org/


Hi Peter,

thanks for the repcached link. (There seem to be various clustered cache
solutions which speak the memcached protocol, see [1]; we only tested an
Infinispan cluster for this, but then stayed with memcached.)

Up to now, we do not replicate memcached's contents, but we spread data
over multiple memcached instances (by means of a hash function, which is
the default of the spymemcached library we are using), so each item is
stored only once. If a memcached node fails, its data may be lost, but
new data for this node will be stored on another node and retrieved from
there, so the cluster keeps working.

Additionally, with the IdP Memcached StorageService, each IdP keeps a
local cache for technical reasons (i.e. Java object references within
one IdP node must stay the same for each get() call). This also has the
intended side effect that, once the memcached entry is lost, the local
value will be used. As we use sticky sessions on the load balancer for
the user side, this guarantees that the user will always come back to
the same IdP node and keep his session even after a memcached node
failed. Once the user authenticates to another SP, the selected IdP node
will write the local session to memcached again, so it will be available
to the other IdP nodes for back-channel requests.

A different solution which incorporates memcached replication on our IdP
Memcached StorageService's side is being discussed (like building a
memcached cluster with one active and several passive nodes like in
Terracotta, or a memcached cluster where each item is stored on multiple
but not all nodes). This, however, may not come into consideration as
long as our current solution proves working.

-Manuel

[1] Memcached replication options?
http://www.quora.com/Memcached-replication-options

--
To unsubscribe from this list send an email to users-un...@shibboleth.net

Reply all
Reply to author
Forward
0 new messages