SOLR workflow index connection timeouts

640 views
Skip to first unread message

Kristof Keppens

unread,
Feb 17, 2015, 5:04:46 AM2/17/15
to matterhorn-users@opencast.org >> Matterhorn Users
Hi,

We have been using an external SOLR server for the workflow index for
some time, and we encounter frequent errors in the log regarding
connection timeout :


2015-02-17 08:28:34 ERROR (AbstractFaultChainInitiatorObserver:101) -
Error occurred during error handling, give up!
org.apache.cxf.interceptor.Fault:
org.apache.solr.client.solrj.SolrServerException:
org.apache.commons.httpclient.ConnectTimeoutException: The host did not
accept the connection within timeout of 100 ms


Are there other people on this list who are using an external solr
server for the workflow index and are seeing the same issues ? An as an
extra question, is it possible to configure the timeout value that is now
set at 100ms ?

We are also having the issue that some workflows are not updated
correctly, mostly in going from scheduled -> capturing. The timeout
errors in the log occur at roughly the same time as the starttime of the
recordings, so this seems related.

Thanks,

Kristof

James Perrin

unread,
Feb 17, 2015, 5:40:03 AM2/17/15
to matterho...@opencast.org
Hi,

I'm afraid I can't offer a solution but we did trial an external Solr
server on our staging cluster and ran into exactly the same problem. We
had to back out before we could investigate further as we need to test
other features.

Regards
James
> To unsubscribe from this group and stop receiving emails from it, send
> an email to matterhorn-use...@opencast.org.

--
------------------------------------------------------------------------
James S. Perrin

Media Technologies Team
J20, Sackville Building
The University of Manchester
Oxford Road, Manchester, M13 9PL

t: +44 (0) 161 275 6945
e: james....@manchester.ac.uk
w: www.manchester.ac.uk/researchcomputing
------------------------------------------------------------------------
"The test of intellect is the refusal to belabour the obvious"
- Alfred Bester
------------------------------------------------------------------------

Karen Dolan

unread,
Feb 17, 2015, 7:47:12 AM2/17/15
to James Perrin, matterho...@opencast.org
James,

The org.opencast SolrServerFactory.java appears to deliberately set connectionTimeout to "100" in the code. That seems very low. The default for CXF is much higher. I don't know what the reason is for the low timeout.

-Karen

Lars Kiesow

unread,
Feb 17, 2015, 8:04:46 AM2/17/15
to matterho...@opencast.org
Hi,
would it make sense to add a configuration key for this?
–Lars

Lars Kiesow

unread,
Feb 17, 2015, 8:04:49 AM2/17/15
to matterho...@opencast.org
Hi,
would it make sense to add a configuration key for this?
–Lars


On Tue, 17 Feb 2015 07:46:59 -0500
Karen Dolan <kdo...@dce.harvard.edu> wrote:

Greg Logan

unread,
Feb 17, 2015, 10:21:37 AM2/17/15
to matterho...@opencast.org
On 17/02/15 07:02 AM, Lars Kiesow wrote:
> Hi,
> would it make sense to add a configuration key for this?

+1 for that. We explicitly mention in the installation docs that this
is a supported configuration, and if it's broken because the Solr
connector assumes that we're dealing with an internal connection then we
should fix it.

G

Kristof Keppens

unread,
Feb 18, 2015, 2:50:56 AM2/18/15
to matterho...@opencast.org
On 17-02-15 16:23, Greg Logan wrote:
> On 17/02/15 07:02 AM, Lars Kiesow wrote:
>> Hi,
>> would it make sense to add a configuration key for this?
> +1 for that. We explicitly mention in the installation docs that this
> is a supported configuration, and if it's broken because the Solr
> connector assumes that we're dealing with an internal connection then we
> should fix it.
>
> G
That would be great, for now I can change that value in the
SolrServerFactory.java to something like 1000 and see if that fixes
the problems, but if this value is so low at the moment because it
assumes an internal connection it would be great to be able to
configure it through a config file.

Kristof

Karen Dolan

unread,
Feb 18, 2015, 11:10:56 AM2/18/15
to matterho...@opencast.org
Kristof,

If the setting is in milliseconds, which I think it is, 1000=1second. Depending on your network, that might need to be higher.

Best of luck,
Karen

Kristof Keppens

unread,
Feb 19, 2015, 2:51:37 AM2/19/15
to matterho...@opencast.org
I've set it to 1000 for now, and it seems that the errors are gone now, will need to wait a bit longer to be sure if this fixes the problem though.

Thanks for the help,

Kristof

James Perrin

unread,
Aug 27, 2015, 6:39:03 AM8/27/15
to matterho...@opencast.org
Hi,

FYI

As part of are upgrade to 1.6 we have decided to try have the separate
solr server again.

I set the connectionTimeout to 5000ms (5s) and this has worked well with
no connection timeouts occurring. I did find you that if you need to set
this higher (ie >20s) then you will also need to increase the Timeout of
the Tomcat server hosting the Solr instance as it's default value seems
to be 20s

However though I've also had a handful of Read time outs!

2015-08-27 00:04:49 ERROR (AbstractFaultChainInitiatorObserver:101) -
Error occurred during error handling, give up!
org.apache.cxf.interceptor.Fault:
org.apache.solr.client.solrj.SolrServerException:
java.net.SocketTimeoutException: Read timed out
at
org.apache.cxf.service.invoker.AbstractInvoker.createFault(AbstractInvoker.java:155)
at
org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:121)
at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:133)
at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:82)
at
org.apache.cxf.interceptor.ServiceInvokerInterceptor$1.run(ServiceInvokerInterceptor.java:58)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)


It seems the socket read timeout as set by SetSoTimeout is 1000ms,
hopefully increasing this to 5000ms well stop any more timeout occurrences.

Regards
James


On 19/02/15 07:51, Kristof Keppens wrote:
> I've set it to 1000 for now, and it seems that the errors are gone now,
> will need to wait a bit longer to be sure if this fixes the problem though.
>
> Thanks for the help,
>
> Kristof
>
> On 18-02-15 17:10, Karen Dolan wrote:
>> Kristof,
>>
>> If the setting is in milliseconds, which I think it is, 1000=1second.
>> Depending on your network, that might need to be higher.
>>
>> Best of luck,
>> Karen
>>
>>
>> On Feb 18, 2015, at 2:50 AM, Kristof Keppens <kri...@inuits.eu
>> <mailto:kri...@inuits.eu>> wrote:
>>
>>>> +1 for that. We explicitly mention in the installation docs that this
>>>> is a supported configuration, and if it's broken because the Solr
>>>> connector assumes that we're dealing with an internal connection then we
>>>> should fix it.
>>>>
>>>> G
>>> That would be great, for now I can change that value in the
>>> SolrServerFactory.java to something like 1000 and see if that fixes
>>> the problems, but if this value is so low at the moment because it
>>> assumes an internal connection it would be great to be able to
>>> configure it through a config file.
>>
>> To unsubscribe from this group and stop receiving emails from it, send
>> an email to matterhorn-use...@opencast.org
>> <mailto:matterhorn-use...@opencast.org>.
>
> To unsubscribe from this group and stop receiving emails from it, send
> an email to matterhorn-use...@opencast.org
> <mailto:matterhorn-use...@opencast.org>.

Kristof Keppens

unread,
Sep 25, 2015, 7:43:57 AM9/25/15
to matterho...@opencast.org
Thanks for this information, we are still having the read timeouts as
well, changed the SetSoTimeout to 5s as well, hopefully this solves our
problem as well.

Regards,

Kristof

James Perrin

unread,
Sep 25, 2015, 8:48:21 AM9/25/15
to matterho...@opencast.org
Glad that it helped. Unfortunately separate issues meant that we didn't
deploy 1.6 for the start of the academic year, so we've not seen how
this ran in production. We now plan to deploy at in the mid semester, so
if it works for you please let me know.

Cheers
James
> an email to matterhorn-use...@opencast.org.

Kristof Keppens

unread,
Oct 7, 2015, 2:36:25 AM10/7/15
to matterho...@opencast.org
Hi James and list,

A quick update on this issue, since our change in the timeout settings
we haven't seen these errors on our production system.
So far it seems to have fixed the problem. Is there any chance that we
get this change for these values in to the main opencast codebase,
maybe for the 1.7 release ?

Kristof

Greg Logan

unread,
Oct 7, 2015, 4:36:18 PM10/7/15
to matterho...@opencast.org
Absolutely!  Can you either attach a diff directly to list, or create a pull request?  This is something that could conceivably get into 1.6.3, but can definitely be there for 1.7.0.

G

Kristof Keppens

unread,
Oct 8, 2015, 1:45:01 AM10/8/15
to matterho...@opencast.org
Great! I created a pull request for this small change ( #660 ) againt r/1.6.x, I hope this is correct?

Kristof

Greg Logan

unread,
Oct 8, 2015, 11:54:57 AM10/8/15
to matterho...@opencast.org
Yep, looks good.  There is another review in front of that one, but once I'm done with it we'll get 660 in.

G
Reply all
Reply to author
Forward
0 new messages