[umlaut] background service timed out

26 views
Skip to first unread message

Barnaby Alter

unread,
Sep 25, 2015, 11:43:22 AM9/25/15
to umlaut-...@googlegroups.com
Hello Jonathan, and all,

It's been reported to me that a number of services in umlaut are returning a series of "some content may not be included due to errors" with the error "Exception: background service timed out (took longer than 30 to run); thread assumed dead. dispatched service id:"

Now it may be the case that some of these services we don't subscribe to and that may be legitimate and they need to be disabled. Others I know we do and of course on a reload after the page has been cached many of these services will load successfully. Examples of services failing are HathiTrust, Internet Archive, OCLC Worldcat.org, among others.

Are there options to increase the timeout, or is there another known config that I'm missing?

We used to have a guy around here who knew everything about Umlaut, but no longer. Hope you can help.

Thanks,

Barnaby Alter
Web Services
Division of Libraries
NYU

Jonathan Rochkind

unread,
Sep 28, 2015, 10:32:32 AM9/28/15
to umlaut-...@googlegroups.com
Hi Barnaby,

You CAN increase the timeout in config. In your ./app/controllers/umlaut_controller.rb, add a configuration for 'background_service_timeout', which is in seconds, defaults to 30.

But I think this is unlikely to be a good solution to your problem. You probably don't want to make users wait more than 30 seconds for a response. And there's no good reason they should have to -- there's probably some other problem creating this symptom, there's no reason the background services should take longer than 30 seconds to complete.

It may be that the services are dying entirely before completing, which means no matter how high you set the timeout, they'll still timeout, because they're actually dead not proceeding anymore. (Which is one reason for the timeout).

For additional info, I'd check your application logs (normally in ./logs/production.log), grep for the strings "ERROR", and "FATAL", do you get any more information about what might be going wrong?

Also, you can check the Umlaut admin interface's error reporting console, to see if you can get any more information there. You have to turn on the admin console and set password(s) for it though to use it, information here: https://github.com/team-umlaut/umlaut/wiki/Admin-Functions

One thing that might be relevant is your "pool" setting in your ./config/database.yml.  Do you have a pool setting in your database.yml, and if so what is it set to? The pool setting in database.yml controls how many simultaneous  connections your app can make to the database. If you have a large number of services, some of which may be misbehaving -- they may be running out of database connections.  If so, you should see error log lines suggesting this in your log file.

Hope this gives you some places to start looking, please feel free to let us know what you find and I'll provide additional advice if I can,

Jonathan
--
You received this message because you are subscribed to the Google Groups "Umlaut" group.
To unsubscribe from this group and stop receiving emails from it, send an email to umlaut-softwa...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Barnaby Alter

unread,
Oct 5, 2015, 2:08:06 PM10/5/15
to umlaut-...@googlegroups.com
Thanks for this advice, Jonathan. I will troubleshoot and let you know what comes up.

Barnaby Alter
Web Services
Division of Libraries
NYU

Barnaby Alter

unread,
Oct 26, 2015, 2:31:29 PM10/26/15
to umlaut-...@googlegroups.com
Hi Jonathan, 

You were right in that increasing the timeout was ill-advised and did nothing. We installed the admin panel but didn't see any useful errors come through. We tried upping the database pool and making the timeout absurdly high. We grepped for strings FATAL and ERROR to no avail.

But on following the logs tediously and following when each Background Service finished, we saw an Exception. Then grepping the word "Exception" continually showed us: Background Service execution exception: #<Exception: Third party service error: com.isinet.esti.AuthorizationException: Server.authorization - Not entitled for product 'JCR'.>

Disabling the "JCR" made all the other services, presumably loading after that point, to load successfully, consistently. My guess is that the results of failed services were inconsistent because of the asynchronicity of the background responses.

It's possible we used to be registered with JCR and no longer are, but not one communicated this to me. Regardless, is this an error in the response from JCR to Umlaut? And if so should Umlaut be able to catch that and ignore that background service? In short, one error shouldn't cause the remainder of outstanding background services to fail should it?

Thanks for all your help,

Barnaby Alter
Web Services
Division of Libraries
NYU

Jonathan Rochkind

unread,
Oct 26, 2015, 2:44:06 PM10/26/15
to umlaut-...@googlegroups.com
I'm so glad you figured it out!

Yes, Umlaut _should_ be able to recover from that, and run all other services correctly.

My guess is that, yes, you are no longer authorized for JCR.  However, whether for that or any other JCR failure, Umlaut is supposed to be able to mark JCR as failed, but continue with other services.

Additionally, it should list the JCR failure in the logs with an "ERROR" level, and it should ALSO display those JCR failures in the admin panel error listing.

So it sounds like a bug in Umlaut that it's doing none of those things. However, I don't think I can reproduce -- in fact, the JCR API isn't authorized on my development machine, so when I'm running Umlaut on my development machine I get an access error from JCR too, all the time -- but I see the errors my logs and it does not mess up Umlaut's ability to complete the request otherwise.

However, triggered by your error report, I do see that the JCR code is raising a ruby Exception, when it really ought to be raising a StandardError or other custom class descended from StandardError.

I am not sure if this caused your problem, but it is not right, so I will fix it and release an Umlaut patch release, 4.1.6, soon.  If you want to try against 4.1.6 _with_ JCR enabled (probably not on your production machine!), and see if Umlaut starts acting as expected despite the JCR exception, it would be useful!

Jonathan

Barnaby Alter

unread,
Oct 26, 2015, 3:04:43 PM10/26/15
to umlaut-...@googlegroups.com
Tested against master and it works as expected, just reports that JCR is an error and all the rest resolve normally.

Thanks for the quick patch and I’ll be sure to update once it’s released.

Barnaby Alter
Web Development
Division of Libraries
NYU

Jonathan Rochkind

unread,
Oct 26, 2015, 3:14:38 PM10/26/15
to umlaut-...@googlegroups.com
Awesome. Sorry about that. It is released!
Reply all
Reply to author
Forward
0 new messages