You can exclude the hazelcast dependency from the relevant module, and provide your own exact version.
Disabling the fourth node doesn't change anything.
Profiling shows the highest CPU/time is spent in Hazelcast. Whether this is a result of the updated Hazelcast version or the new synchronous CAS code remains to be seen.
Is it oossible to downgrade Hazelcast version (say, to 3.6) on CAS 4.2.6, i.e. were any new Hazelcast version-specific changes made between roughly 4.2.[12] and 4.2.6?
Thanks.
Tom.
> On Oct 13, 2016, at 2:18 PM, Tom Poage <tfp...@ucdavis.edu> wrote:
> br/>> Afternoon, <
> br/>> On moving from 4.2.1 to 4.2.6, our apparent syystem load increased dramatically.
> br/>> Run queue went from as high as 4 to nearly 30, with (Linux) load average jumping from a max of 0.2 to about 15 for a user base (TGT count) of 46k.
> br/>> A code diff doesn’t seem to show much, exxcept perhaps for the addition of a synchronous ticketTransactionManager. The only other likely candidate is either the bump in Hazelcast version, or that we went from 3 to 4 (single CPU) VMs in the cluster (point-to-point instead of multicast). CPU increased from a high of about 20% (usually 5-8%) to the 50% range. This is on all nodes. Ironically, response time doesn’t seem all that bad, though is a bit sluggish.
> br/>> Anyone else experience something similar??
> br/>> Thanks!!
> Tom.
> br/>> -- br/>> CAS gitter chatroom: https://gitter.im/apereo/cas <
> CAS mailing list guidelines: https://apereo.github.io/cas/Mailing-Lists.html
> CAS documentation website: https://apereo.github.io/cas
> CAS project website: https://github.com/apereo/cas
> --- br/>> You received this message because you are ssubscribed to the Google Groups "CAS Community" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to cas-user+u...@apereo.org.
> To post to this group, send email to cas-...@apereo.org.
> Visit this group at https://groups.google.com/a/apereo.org/group/cas-user/.
> To view this discussion on the web visit https://groups.google.com/a/apereo.org/d/msgid/cas-user/F67D31AA-2CFC-4DDA-8C5D-922E0B87798F%40ucdavis.edu.
> For more options, visit https://groups.google.com/a/apereo.org/d/optout.
-- br/>CAS gitter chatroom: https://gitter.im/apereo/cas <
CAS mailing list guidelines: https://apereo.github.io/cas/Mailing-Lists.html
CAS documentation website: https://apereo.github.io/cas
CAS project website: https://github.com/apereo/cas
--- br/>You received this message because you are subscribed to the Google Groups "CAS Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cas-user+u...@apereo.org.
To post to this group, send email to cas-...@apereo.org.
Visit this group at https://groups.google.com/a/apereo.org/group/cas-user/.
To view this discussion on the web visit https://groups.google.com/a/apereo.org/d/msgid/cas-user/4FAA8E69-3C2E-46E4-9BB5-9C48E5D40A31%40ucdavis.edu.
Disabling the registry cleaner brought load average on our (4) servers back down to normal levels (0.01-0.20), a twenty-fold decrease in load average. We're currently using Hazelcast, which has its own TTL eviction mechanism, so this cleaner is not necessary. The same holds for Ehcache (which we used previously).
cas.properties:
ticket.registry.cleaner.startdelay=-1
(value could have been zero, but -1 seemed more mnemonic of the intent)
If someone does need to use this ticket cleaner, it seems to make sense to run on only a single node, assuming global cache semantics to the get-all-entries method (don't recall the exact name at the moment).
Tom.
On Oct 14, 2016, at 1:28 PM, Tom Poage <tfp...@ucdavis.edu> wrote:
Looks like we found the source of the load issue.Best we can tell, somewhere about 4.2.5 the RegistryCleaner embedded in the DefaultTicketRegistry was refactored into a TicketRegistryCleaner that’s now automatically picked up and started for all registry types (*). This cleaner walks the entire cache map, by default every two minutes, and by chunks with an exclusive lock. Multiply that by 50k entries and several servers all competing to do the same thing and it’s no wonder there’s some load. :-)Question now is whether to disable on all nodes, or enable on only one in the cluster. Caches like Hazelcast and Ehcache have a time-to-live eviction policy, so it seems to me the registry cleaner is unnecessary for this type of cache.The CAS code suggests the cleaner can be disabled, albeit somewhat indirectly, by setting the “ticket.registry.cleaner.startdelay” property to less than or equal to zero.Tom.
Looks like we found the source of the load issue.
Best we can tell, somewhere about 4.2.5 the RegistryCleaner embedded in the DefaultTicketRegistry was refactored into a TicketRegistryCleaner that’s now automatically picked up and started for all registry types (*). This cleaner walks the entire cache map, by default every two minutes, and by chunks with an exclusive lock. Multiply that by 50k entries and several servers all competing to do the same thing and it’s no wonder there’s some load. :-)
Question now is whether to disable on all nodes, or enable on only one in the cluster. Caches like Hazelcast and Ehcache have a time-to-live eviction policy, so it seems to me the registry cleaner is unnecessary for this type of cache.
The CAS code suggests the cleaner can be disabled, albeit somewhat indirectly, by setting the “ticket.registry.cleaner.startdelay” property to less than or equal to zero.
Tom.
* https://github.com/apereo/cas/commit/c1cbde11c5722e1930357d3dc3bdb6d4cffa8214
From: Misagh Moayyed <mmoa...@unicon.net>
Date: Friday, October 14, 2016 at 10:43 AM
--
CAS gitter chatroom: https://gitter.im/apereo/cas
CAS mailing list guidelines: https://apereo.github.io/cas/Mailing-Lists.html
CAS documentation website: https://apereo.github.io/cas
CAS project website: https://github.com/apereo/cas
---
You received this message because you are subscribed to the Google Groups "CAS Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cas-user+u...@apereo.org.
To post to this group, send email to cas-...@apereo.org.
Visit this group at https://groups.google.com/a/apereo.org/group/cas-user/.
To view this discussion on the web visit https://groups.google.com/a/apereo.org/d/msgid/cas-user/B234A09E-75FE-45D7-998B-F634BF40EBAE%40ucdavis.edu.
On Oct 15, 2016, at 11:23 AM, Tom Poage <tfp...@gmail.com> wrote:This email I sent looks like it got stuck in Google yesterday for nearly 2-1/2 hours before delivery (cf. Received lines in mail header). List maintainers: Two followup emails I sent yesterday mid-day on this topic still have not been delivered.
X-Received: by 10.157.0.4 with SMTP id 4mr6139581ota.80.1476691968295;Mon, 17 Oct 2016 01:12:48 -0700 (PDT)X-BeenThere: cas-...@apereo.orgReceived: by 10.157.15.174 with SMTP id d43ls12332979otd.11.gmail; Mon, 17 Oct2016 01:12:47 -0700 (PDT)X-Received: by 10.157.50.165 with SMTP id u34mr6693969otb.45.1476691967039;Mon, 17 Oct 2016 01:12:47 -0700 (PDT)Received: by 10.202.244.67 with SMTP id s64msoih;Fri, 14 Oct 2016 16:51:47 -0700 (PDT)X-Received: by 10.99.113.25 with SMTP id m25mr18200910pgc.173.1476489107284;Fri, 14 Oct 2016 16:51:47 -0700 (PDT)Received: from smtp3.ucdavis.edu (smtp3.ucdavis.edu. [128.120.32.129])by mx.google.com with ESMTPS id i84si20253377pfi.299.2016.10.14.16.51.47for <cas-...@apereo.org>(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);Fri, 14 Oct 2016 16:51:47 -0700 (PDT)