Most important, I’ll be interested to hear if this may explain what you are seeing, or if you feel you are seeing something else. And even if you would say you had already solved your problem (and if so, please tell us what it was), I do hope again that the info above may help some readers.
/charlie
From: fusion...@googlegroups.com [mailto:fusion...@googlegroups.com] On Behalf Of michael...@intergral.com
Sent: Wednesday, October 11, 2017 10:42 AM
To: FusionReactor <fusion...@googlegroups.com>
Subject: [fusionreactor] Re: Waiting on lock issue
Hi Alex,
I have seen before that servers can become unresponsive during garbage collection if a particular memory space has no available resource.
<snip>
Yes, this is what we found out.
We are using FR with Weblogic 11g. Our developers had a nightly indexing job that was “running fine for a year”. They had constantly told us that they had made no code changes at all. However, what they failed to mention was that they increased the number of nightly indexes they were performing. In the past it was only 4 however, recently they doubled it to 8. When I say indexing they are pulling various artifacts out of an Oracle database and then populating 8 SOLR indexes.
Our JDBC configuration was set to time out idle connections at 2 minutes.
What was happening is that the additional indexes were pushing the idle time over 2 minutes and oracle was coming along and hanging up the connection and treating it as if it was leaked or that the code was missing the close statement. Of course I was chasing this issue without knowing the # of indexes had increased.
If I remove the jdbc timeout and set it to zero the issue goes away. However, I am not comfortable in doing this (am I wrong is this ??)
I have not had time yet to sit down and review exactly how the code runs this process.
- Alex
--
You received this message because you are subscribed to the Google Groups "FusionReactor" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
fusionreacto...@googlegroups.com.
To post to this group, send email to
fusion...@googlegroups.com.
Visit this group at https://groups.google.com/group/fusionreactor.
For more options, visit https://groups.google.com/d/optout.
5) Finally, one reason I’m pressing this is you say, “the additional indexes were pushing the idle time over 2 minutes”. Well, “idle” time for a CF datasource definition refers to the duration of time a connection thread is detected to “not be doing anything anymore” and so is returned to the connection pool.
But since you are implying that “adding index jobs” means doing more work, I’m not seeing readily how that would lead to increased idle time?
Hope something there is helpful.
/charlie
From: fusion...@googlegroups.com [mailto:fusion...@googlegroups.com] On Behalf Of DeMarco, Alex
Sent: Monday, October 16, 2017 06:34 AM
To: fusion...@googlegroups.com
Subject: RE: [fusionreactor] Re: Waiting on lock issue
Yes, this is what we found out.
We are using FR with Weblogic 11g. Our developers had a nightly indexing job that was “running fine for a year”. They had constantly told us that they had made no code changes at all. However, what they failed to mention was that they increased the number of nightly indexes they were performing. In the past it was only 4 however, recently they doubled it to 8. When I say indexing they are pulling various artifacts out of an Oracle database and then populating 8 SOLR indexes.
Our JDBC configuration was set to time out idle connections at 2 minutes.
What was happening is that the additional indexes were pushing the idle time over 2 minutes and oracle was coming along and hanging up the connection and treating it as if it was leaked or that the code was missing the close statement. Of course I was chasing this issue without knowing the # of indexes had increased.
If I remove the jdbc timeout and set it to zero the issue goes away. However, I am not comfortable in doing this (am I wrong is this ??)
I have not had time yet to sit down and review exactly how the code runs this process.
From: fusion...@googlegroups.com [mailto:fusion...@googlegroups.com] On Behalf Of charlie arehart
Sent: Monday, October 16, 2017 02:28 PM
To: fusion...@googlegroups.com
Subject: RE: [fusionreactor] Re: Waiting on lock issue
Well, this just raises more questions (to me). :-)
<snip>
Sorry for the delay but have had other issues going on.
So again this is NOT coldfusion code or solr within coldfusion.
This is a JAVA based application.
The index processes are running in Java to pull records from oracle and push to SOLR. They recently doubled the number of indexes they were creating/rebuilding.
The job in question has the ability to run each index separately or ALL. If we do them separately it completes without issue. If we do them all, it eventually failes with a connection is closed error. If I set the timeout to zero the job always completes..
They are also using hibernate which is what is throwing the connection is closed error.
It seems that when the process starts it binds to a jdbc connection, and then when the last record is pushed to SOLR, they complete the transaction and try to close the connection. When they do them separately it works fine. But if they do them ALL then the transaction goes to finish and can’t because the jdbc connection it had, was closed by Oracle due to the 120second timeout value.
- Alex
From: fusion...@googlegroups.com [mailto:fusion...@googlegroups.com]
On Behalf Of charlie arehart
Sent: Monday, October 16, 2017 3:28 PM
To: fusion...@googlegroups.com
Subject: RE: [fusionreactor] Re: Waiting on lock issue
5) Finally, one reason I’m pressing this is you say, “the additional indexes were pushing the idle time over 2 minutes”. Well, “idle” time for a CF datasource definition refers to the duration of time a connection thread is detected to “not be doing anything
anymore” and so is returned to the connection pool.
But since you are implying that “adding index jobs” means doing more work, I’m not seeing readily how that would lead to increased idle time?
Hope something there is helpful.
/charlie
From:
fusion...@googlegroups.com [mailto:fusion...@googlegroups.com]
On Behalf Of DeMarco, Alex
Sent: Monday, October 16, 2017 06:34 AM
To: fusion...@googlegroups.com
Subject: RE: [fusionreactor] Re: Waiting on lock issue
Yes, this is what we found out.
We are using FR with Weblogic 11g. Our developers had a nightly indexing job that was “running fine for a year”. They had constantly told us that they had made no code changes at all. However, what they failed to mention was that they increased the number of nightly indexes they were performing. In the past it was only 4 however, recently they doubled it to 8. When I say indexing they are pulling various artifacts out of an Oracle database and then populating 8 SOLR indexes.
Our JDBC configuration was set to time out idle connections at 2 minutes.
What was happening is that the additional indexes were pushing the idle time over 2 minutes and oracle was coming along and hanging up the connection and treating it as if it was leaked or that the code was missing the close statement. Of course I was chasing this issue without knowing the # of indexes had increased.
If I remove the jdbc timeout and set it to zero the issue goes away. However, I am not comfortable in doing this (am I wrong is this ??)
I have not had time yet to sit down and review exactly how the code runs this process.
- Alex
/charlie
From: fusion...@googlegroups.com [mailto:fusion...@googlegroups.com] On Behalf Of DeMarco, Alex
Sent: Tuesday, October 17, 2017 01:51 PM
To: fusion...@googlegroups.com
Subject: RE: [fusionreactor] Re: Waiting on lock issue
Sorry for the delay but have had other issues going on.
So again this is NOT coldfusion code or solr within coldfusion.
This is a JAVA based application.
The index processes are running in Java to pull records from oracle and push to SOLR. They recently doubled the number of indexes they were creating/rebuilding.
The job in question has the ability to run each index separately or ALL. If we do them separately it completes without issue. If we do them all, it eventually failes with a connection is closed error. If I set the timeout to zero the job always completes..
They are also using hibernate which is what is throwing the connection is closed error.
It seems that when the process starts it binds to a jdbc connection, and then when the last record is pushed to SOLR, they complete the transaction and try to close the connection. When they do them separately it works fine. But if they do them ALL then the transaction goes to finish and can’t because the jdbc connection it had, was closed by Oracle due to the 120second timeout value.
From: fusion...@googlegroups.com [mailto:fusion...@googlegroups.com] On Behalf Of charlie arehart
Sent: Tuesday, October 17, 2017 02:54 PM
To: fusion...@googlegroups.com
Subject: RE: [fusionreactor] Re: Waiting on lock issue
OK, thanks for clarifying. And no problem on the delay. Stuff happens. It was just a friendly ping, as I am interested to see this resolved with you.
(Before proceeding to the substantive discussion of the issue, I would note that if you may mean to imply that it should have been obvious this “was not CF” because you said “We are using FR with Weblogic 11g”, I’ll just note that one can of course run CFML and CF on WLS and indeed most any JEE server. And as you acknowledge here, CF does also bundle Solr. And I think in the past you had been working with and talking here about CF, so I just all the more naturally assumed it.)
Anyway, the good news is that the concepts apply to any JEE server (and for any pure JEE folks here, note that CF is in fact by default a deployment of a CFML engine on Tomcat).
And yep, thanks for the specifics. So yep, this seems to be a misalignment of the Oracle sessiontimeout (the 120 seconds: would you agree that is the Oracle session timeout) and this jdbc connection pool processing. What’s odd is that again, you would not be seeming to hit an IDLE timeout because your thread is not idle. Are you 100% positive it’s the idle timeout that you are setting to 0 on the WLS side? This feels more like a timeout of how long a connection can remain open (which is not the idle time), especially because you say it’s more likely to be hit the longer the request runs (talking to the DB). But I’ll admit there could be some subtlety here I’m not seeing or that’s not clear from what you’ve shared.
And whichever timeout you say you’re setting, are you doing it in code or configuration? (JEE servers support defining the equivalent of CF datasources, as Admin-level definitions of such JDBC connection info, or of course you can do it in code on the individual JDBC connection.)
I appreciate that the fact you can “set the timeout to 0” is a workaround, and you may be inclined to leave it at that. But if you are perhaps interested in understanding the problem, I am also, and perhaps others would learn from whatever is discovered here.
/charlie
From: fusion...@googlegroups.com [mailto:fusion...@googlegroups.com] On Behalf Of DeMarco, Alex
Sent: Tuesday, October 17, 2017 01:51 PM
To: fusion...@googlegroups.com
Subject: RE: [fusionreactor] Re: Waiting on lock issue
Sorry for the delay but have had other issues going on.
<snip>
Ok to be clear ( I may have misspoke before) it is the “Inactive Connection Timeout” value. if I set it to zero which is never timeout the issue goes away. If I set the timeout for longer that the entire process takes to complete (20 minutes) the issue goes away.
I am setting in the config for the datasource in question via the Weblogic Admin Console.
Unfortunately I have had a number of other production level issues that needed attention this week and have not had any time work this issue or respond to this thread in a timely fashion, since implementing the workaround.
From the Weblogic Admin Help:
Inactive Connection Timeout:
The number of inactive seconds on a reserved connection before WebLogic Server reclaims the connection and releases it back into the connection pool.
You can use the Inactive Connection Timeout feature to reclaim leaked connections - connections that were not explicitly closed by the application. Note that this feature is not intended to be used in place of properly closing connections.
- Alex
From: fusion...@googlegroups.com [mailto:fusion...@googlegroups.com]
On Behalf Of charlie arehart
Sent: Tuesday, October 17, 2017 3:54 PM
To: fusion...@googlegroups.com
Subject: RE: [fusionreactor] Re: Waiting on lock issue
/charlie
From:
fusion...@googlegroups.com [mailto:fusion...@googlegroups.com]
On Behalf Of DeMarco, Alex
Sent: Tuesday, October 17, 2017 01:51 PM
To: fusion...@googlegroups.com
Subject: RE: [fusionreactor] Re: Waiting on lock issue
Sorry for the delay but have had other issues going on.
So again this is NOT coldfusion code or solr within coldfusion.
This is a JAVA based application.
The index processes are running in Java to pull records from oracle and push to SOLR. They recently doubled the number of indexes they were creating/rebuilding.
The job in question has the ability to run each index separately or ALL. If we do them separately it completes without issue. If we do them all, it eventually failes with a connection is closed error. If I set the timeout to zero the job always completes..
They are also using hibernate which is what is throwing the connection is closed error.
It seems that when the process starts it binds to a jdbc connection, and then when the last record is pushed to SOLR, they complete the transaction and try to close the connection. When they do them separately it works fine. But if they do them ALL then the transaction goes to finish and can’t because the jdbc connection it had, was closed by Oracle due to the 120second timeout value.
- Alex
/charlie
From: fusion...@googlegroups.com [mailto:fusion...@googlegroups.com] On Behalf Of DeMarco, Alex
Sent: Thursday, October 19, 2017 02:19 PM
To: fusion...@googlegroups.com
Subject: RE: [fusionreactor] Re: Waiting on lock issue
Ok to be clear ( I may have misspoke before) it is the “Inactive Connection Timeout” value. if I set it to zero which is never timeout the issue goes away. If I set the timeout for longer that the entire process takes to complete (20 minutes) the issue goes away.
<snip>