Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Oracle post failover transaction commit error

8 views
Skip to first unread message

Wenji Tong

unread,
Feb 12, 2003, 2:44:48 PM2/12/03
to
I have a problem on trasaction after database failover.

Solaris 8, WLS 6.1 sp3, Oracle 8.1.7 (2 Oracle instances, one is primary and
another is standby), JDK 1.3.1

We found this problem in Oracle (Net 8 connection time) failover test.
Here is what we did.

During the load, we shut down the primary Oracle server (box).
All in-flight transactions were wrong, this is ok.
When new requests came in, WebLogic begin to refresh the connections.
Because the primary Oracle server was down, it took about 70 seconds to
refresh a connection (due to the socket timeout value) and redirect to the
standby Oracle. This is fine.
After a while, all connections were refreshed and all connected to the
standby serever.
When I opened WebLogic console to monitor the in-flight transaction, I found
some transactions are doing committing and never finished.
At this time most of the transaction can go through but few of them through
an exception (attached at the end). This error could never gone although the
frequency was not high. The strange thing is I checked the database, the
data was committed.

I've tried Oracle OCI driver and thin driver, both had this problem. Is
there anyone can help me on that?

Thanks,
Wenji


<Jan 28, 2003 1:49:57 PM EST> <Error> <EJB> <Exception during commit of
transaction Name=[EJB
com.bankframe.bp.retail.solutionset.impl.customersearch.CustomerSearchSessio
nBean.processDataPacket(com.bankframe.bo.DataPacket)],Xid=28502:685f84a9ba5b
1ed8(192232),Status=Committing,numRepliesOwedMe=0,numRepliesOwedOthers=0,sec
onds since begin=122,seconds
left=60,ServerResourceInfo[weblogic.jdbc.jts.Connection]=(state=ended,assign
ed=prod-srv2),SCInfo[prod+prod-srv2]=(state=pre-prepared),properties=({weblo
gic.transaction.name=[EJB
com.bankframe.bp.retail.solutionset.impl.customersearch.CustomerSearchSessio
nBean.processDataPacket(com.bankframe.bo.DataPacket)],
weblogic.jdbc=t3://10.161.46.31:7101}),OwnerTransactionManager=ServerTM[Serv
erCoordinatorDescriptor=(CoordinatorURL=prod-srv2+10.161.46.31:7101+prod+, R
esources={})],CoordinatorURL=prod-srv2+10.161.46.31:7101+prod+):
javax.transaction.SystemException: Timeout during commit processing
at
weblogic.transaction.internal.ServerTransactionImpl.internalCommit(ServerTra
nsactionImpl.java:243)
at
weblogic.transaction.internal.ServerTransactionImpl.commit(ServerTransaction
Impl.java:189)
at weblogic.ejb20.internal.BaseEJBObject.postInvoke(BaseEJBObject.java:272)
at


Joseph Weinstein

unread,
Feb 12, 2003, 3:55:58 PM2/12/03
to Wenji Tong

Wenji Tong wrote:

> I have a problem on trasaction after database failover.
>
> Solaris 8, WLS 6.1 sp3, Oracle 8.1.7 (2 Oracle instances, one is primary and
> another is standby), JDK 1.3.1
>
> We found this problem in Oracle (Net 8 connection time) failover test.
> Here is what we did.
>
> During the load, we shut down the primary Oracle server (box).
> All in-flight transactions were wrong, this is ok.
> When new requests came in, WebLogic begin to refresh the connections.
> Because the primary Oracle server was down, it took about 70 seconds to
> refresh a connection (due to the socket timeout value) and redirect to the
> standby Oracle. This is fine.
> After a while, all connections were refreshed and all connected to the
> standby serever.
> When I opened WebLogic console to monitor the in-flight transaction, I found
> some transactions are doing committing and never finished.
> At this time most of the transaction can go through but few of them through
> an exception (attached at the end). This error could never gone although the
> frequency was not high. The strange thing is I checked the database, the
> data was committed.
>
> I've tried Oracle OCI driver and thin driver, both had this problem. Is
> there anyone can help me on that?
>
> Thanks,
> Wenji

Hi. It seems that Oracle failover is not perfect. Our transaction coordinator
is supposed to have control of the transaction. If a failover occurs while
a transaction is pending, the coordinator should still have the ability to roll
back the tx. Apparently there are cases where the failover causes an open
transaction to be committed, in such a way that even if the coordinator has
decided to roll it back, it can't. These may be when the failover occurs while
we are waiting for the commit call to return. We either get an exception or
we get no return from the commit() call so we try to roll back and fail. The
actual commit succeeded, but we never knew.
Joe

Wenji Tong

unread,
Feb 12, 2003, 4:11:58 PM2/12/03
to
Thanks, Joe!

Things could be more complicate. I did some tests to find the details of the
problem. Here are the results.

1. I've done a test. In this test, I shut down the Oracle and also stop the
load. After all threads and connection returned to the pool and all
transactions done (roll back or abandoned), I started load again. I still
could find this error. That means this error is not related to any in-flight
transactions.

2. After all connectioin failed over, this error was still not gone. The
frequency was not high, but it was always there.

3. In WebLogic console, monitor in-flight transaction, I saw some
transactions were doing committing, but never finished if there was no load.
When I saw an error printed in log, one of the committing transaction gone
but there came out another transaction doing the commit and can't be
finished. I'm not sure if it was related to WebLogic console's bug.

4. Increase the transaction timeout can fix this problem. Unfortunately, we
can't increase the transaction timeout anymore due to our business
requirements.

I hope those information will be helpful.

Thanks,

Wenji

"Joseph Weinstein" <j...@bea.com.remove.this> wrote in message
news:3E4AB4DE...@bea.com.remove.this...

Joseph Weinstein

unread,
Feb 12, 2003, 6:08:59 PM2/12/03
to Wenji Tong

Wenji Tong wrote:

> Thanks, Joe!
>
> Things could be more complicate. I did some tests to find the details of the
> problem. Here are the results.

Hi,
This is developing into a bigger problem than I can solve informally in newsgroups,
so I suggest that you open an official support case with this information.
Joe

0 new messages