Instance freezes by some clock leap

2,390 views
Skip to first unread message

Alex Balleste

unread,
Jun 13, 2017, 8:04:07 AM6/13/17
to sakai-pr...@apereo.org, Sakai Development
Hello everyone, since we moved to 11.3 we'd been experimenting some
aleatory freezes on servers. I couldn't catch any pattern about when and
how is produced but when it happens the service instance becomes unusable.

That is what I can see in the log in the moment of the error:

13-de juny-2017 13:07:46.812 WARN [sakai housekeeper] com.zaxxer.hikari.pool.HikariPool.run sakai - Thread starvation or clock leap detected (housekeeper delta=2m46s73ms824µs227ns).
13-de juny-2017 13:07:49.853 WARN [ajp-nio-8009-exec-128] org.sakaiproject.portal.charon.handlers.SiteHandler.doSite Redirecting tool inline url: https://cv.udl.cat/portal/site/coordD406-1617/tool/ee69da30-b80a-4317-a24d-b103b81d6509
13-de juny-2017 13:07:50.571 INFO [QuartzScheduler_Worker-5] org.apache.commons.httpclient.auth.AuthChallengeProcessor.selectAuthScheme basic authentication scheme selected
13-de juny-2017 13:07:56.011 WARN [ajp-nio-8009-exec-178] org.sakaiproject.portal.charon.handlers.SiteHandler.doSite Redirecting tool inline url: https://cv.udl.cat/portal/site/coordD406-1617/tool/ee69da30-b80a-4317-a24d-b103b81d6509
13-de juny-2017 13:07:57.955 WARN [ajp-nio-8009-exec-179] org.sakaiproject.portal.charon.handlers.SiteHandler.doSite Redirecting tool inline url: https://cv.udl.cat/portal/site/coordD406-1617/tool/ee69da30-b80a-4317-a24d-b103b81d6509
13-de juny-2017 13:07:59.715 INFO [ajp-nio-8009-exec-136] org.sakaiproject.entitybroker.util.servlet.DirectServlet.dispatch Could not process entity: /site/pages (500)[null]: EntityEncodingException: Unable to handle output request for format json for this path (/site/101500-1617/pages.json) for prefix (site) for entity (/site/pages), request url (/site/101500-1617/pages.json): Failed to encode into output stream: /site/pages
WARN Error sending http servlet error code (500) and message (EntityEncodingException: Unable to handle output request for format json for this path (/site/101500-1617/pages.json) for prefix (site) for entity (/site/pages), request url (/site/101500-1617/pages.json): Failed to encode into output stream: /site/pages): java.lang.IllegalStateException: Cannot call reset() after response has been committed
13-de juny-2017 13:18:41.836 WARN [sakai housekeeper] com.zaxxer.hikari.pool.HikariPool.run sakai - Thread starvation or clock leap detected (housekeeper delta=10m54s891ms50µs730ns).
13-de juny-2017 13:18:43.390 WARN [SakaiClusterService.Maintenance] org.sakaiproject.cluster.impl.SakaiClusterService.updateOurStatus run(): server has been closed in cluster table, reopened: sakai01-1497319511273
13-de juny-2017 13:18:54.133 WARN [ajp-nio-8009-exec-164] org.sakaiproject.db.impl.BasicSqlService.dbRead Sql.dbRead: sql: select SITE_ID from SAKAI_SITE_TOOL where TOOL_ID = ? 102616-1617-presence
java.sql.SQLTransientConnectionException: sakai - Connection is not available, request timed out after 651136ms.
at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:548)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:186)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:145)
at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:99)


Once we receive this "Thread starvation or clock leap" the connections
to the db are automatically broken, next queries return connection
timeouts to the db. Has anyone been experimenting this kind of
freezes/errors?

Any help will be appreciated.

Alex.

James Scoble

unread,
Jun 13, 2017, 8:10:02 AM6/13/17
to Alex Balleste, sakai-pr...@apereo.org, Sakai Development
Concerning "thread starvation" - how's the CPU usage level on the server? 
Because this:
Connection is not available, request timed out after 651136ms.

That's almost 11 minutes it tried to leave mysqld "on hold".

Is your MySQL on the same box?
Does your machine have enough CPU power and RAM?  Disk thrashing could also be happening, and slowing things.
And finally - does anything funny happen with your server's clock? 



Alex.

--
You received this message because you are subscribed to the Google Groups "Sakai Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+unsubscribe@apereo.org.
To post to this group, send email to saka...@apereo.org.
Visit this group at https://groups.google.com/a/apereo.org/group/sakai-dev/.



Disclaimer - This e-mail is subject to UWC policies and e-mail disclaimer published on our website at: https://www.uwc.ac.za/Pages/emaildisclaimer.aspx




Matthew Jones

unread,
Jun 13, 2017, 8:18:42 AM6/13/17
to James Scoble, Alex Balleste, sakai-pr...@apereo.org, Sakai Development
I found a thread here about this, here were two suggestions from the developer. It seems like most likely problem is the CPU was entering sleep mode because of no activity (a feature that should be disabled in BIOS or OS) or there was something wrong with time synchronization.


"2m36s starvation is significant. If the system did not enter sleep, and you are sure of that, there are only two possibilities that I can see.

One, this is a virtual machine that is configured to synchronize its clock from the host. In which case, the solution is to run ntpd (Linux) or w32time (windows) in the VM.

Two, there was actually a 2+ minute starvation event. In which case, you need to monitor the CPU and figure out what is creating the load."


To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+...@apereo.org.

To post to this group, send email to saka...@apereo.org.
Visit this group at https://groups.google.com/a/apereo.org/group/sakai-dev/.


Disclaimer - This e-mail is subject to UWC policies and e-mail disclaimer published on our website at: https://www.uwc.ac.za/Pages/emaildisclaimer.aspx




--
You received this message because you are subscribed to the Google Groups "Sakai Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+...@apereo.org.

Alexandre Ballesté

unread,
Jun 13, 2017, 8:29:24 AM6/13/17
to James Scoble, sakai-pr...@apereo.org, Sakai Development

We are running oracle in a different server. Each sakai server has 4 cpus (virtual) and 10GB RAM. Probably is something related with the clock.

Thanks for the advise.

Àlex.


El 13/06/17 a las 14:09, James Scoble escribió:
-- 
Alexandre Ballesté Crevillén alexandre.balleste at udl.cat
====================
Universitat de Lleida

Àrea de sistemes d'Informació i Comunicacions

Analista/Programador


University of Lleida

Information and Communication Systems Service


Tlf: +34 973 702148

Fax: +34 973 702130

=====================

Avís legal / Aviso legal / Avertiment legal / Legal notice <http://www.imatge.udl.cat/avis_legal_lopd.html>

Alexandre Ballesté

unread,
Jun 13, 2017, 8:40:29 AM6/13/17
to Matthew Jones, James Scoble, sakai-pr...@apereo.org, Sakai Development

I'll try to check the system status and the clock deviation with the sysadmins.

Thanks for the advise.

Àlex.


El 13/06/17 a las 14:18, 'Matthew Jones' via Sakai Development escribió:

Neal Caidin

unread,
Jun 13, 2017, 9:01:15 AM6/13/17
to Alexandre Ballesté, Matthew Jones, James Scoble, Sakai Production, Sakai Development
Please keep us posted with your results.

Good luck with the issue.

-- Neal


To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+unsubscribe@apereo.org.

To post to this group, send email to saka...@apereo.org.
Visit this group at https://groups.google.com/a/apereo.org/group/sakai-dev/.


Disclaimer - This e-mail is subject to UWC policies and e-mail disclaimer published on our website at: https://www.uwc.ac.za/Pages/emaildisclaimer.aspx




--
You received this message because you are subscribed to the Google Groups "Sakai Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+unsubscribe@apereo.org.

To post to this group, send email to saka...@apereo.org.
Visit this group at https://groups.google.com/a/apereo.org/group/sakai-dev/.
--
You received this message because you are subscribed to the Google Groups "Sakai Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+unsubscribe@apereo.org.

To post to this group, send email to saka...@apereo.org.
Visit this group at https://groups.google.com/a/apereo.org/group/sakai-dev/.
-- 
Alexandre Ballesté Crevillén alexandre.balleste at udl.cat
====================
Universitat de Lleida

Àrea de sistemes d'Informació i Comunicacions

Analista/Programador


University of Lleida

Information and Communication Systems Service


Tlf: +34 973 702148

Fax: +34 973 702130

=====================

Avís legal / Aviso legal / Avertiment legal / Legal notice <http://www.imatge.udl.cat/avis_legal_lopd.html>

--
You received this message because you are subscribed to the Google Groups "Sakai Production" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sakai-production+unsubscribe@apereo.org.
To post to this group, send email to sakai-pr...@apereo.org.
Visit this group at https://groups.google.com/a/apereo.org/group/sakai-production/.

Reply all
Reply to author
Forward
0 new messages