XADisk error during stress-test. How to debug?

348 views
Skip to first unread message

sger...@gmail.com

unread,
Jun 22, 2011, 5:25:22 AM6/22/11
to xad...@googlegroups.com
Hello,
 
During our stress test (handling of ~10k files in ~2h) we noticed that at some point (still investigating which), XADisk (deployed as a JCA adapter) became unavailable.
From that point on the stack trace of our code whenever it called XADisk was as seen below.
 
Is there a way to put XADisk in debug mode or somehow further investigate the problem?
 
Thank you,
S.

-----------------------

2011-06-22 10:09:07,088 ERROR [PPI-Executor-5] XADiskFileWrapper: - Error moving file

javax.resource.ResourceException: Failed to setup the Resource Adapter Connection for enlistment in the transaction, Pool = 'eis/XADisk/LocalConnectionFactory', javax.transaction.SystemException: start() failed on resource 'eis/XADisk/LocalConnectionFactory': XAER_RMFAIL : Resource manager is unavailable

javax.transaction.xa.XAException: Internal error: XAResource 'eis/XADisk/LocalConnectionFactory' is unavailable

at weblogic.transaction.internal.XAResourceDescriptor.checkResource(XAResourceDescriptor.java:946)

at weblogic.transaction.internal.XAResourceDescriptor.startResourceUse(XAResourceDescriptor.java:632)

at weblogic.transaction.internal.XAServerResourceInfo.start(XAServerResourceInfo.java:1200)

at weblogic.transaction.internal.XAServerResourceInfo.xaStart(XAServerResourceInfo.java:1134)

at weblogic.transaction.internal.XAServerResourceInfo.enlist(XAServerResourceInfo.java:275)

at weblogic.transaction.internal.ServerTransactionImpl.enlistResource(ServerTransactionImpl.java:516)

at weblogic.transaction.internal.ServerTransactionImpl.enlistResource(ServerTransactionImpl.java:443)

at weblogic.connector.transaction.outbound.XATxConnectionHandler.enListResource(XATxConnectionHandler.java:118)

at weblogic.connector.outbound.ConnectionHandlerBaseImpl.prepareHandle(ConnectionHandlerBaseImpl.java:670)

at weblogic.connector.outbound.ConnectionHandlerBaseImpl.createConnectionHandle(ConnectionHandlerBaseImpl.java:361)

at weblogic.connector.outbound.ConnectionInfo.createConnectionHandle(ConnectionInfo.java:423)

at weblogic.connector.outbound.ConnectionManagerImpl.getConnection(ConnectionManagerImpl.java:308)

at weblogic.connector.outbound.ConnectionManagerImpl.allocateConnection(ConnectionManagerImpl.java:115)

at org.xadisk.connector.outbound.XADiskConnectionFactoryImpl.getConnection(XADiskConnectionFactoryImpl.java:31)

at xxx.XADiskFileWrapper.move(XADiskFileWrapper.java:84)

...

at java.lang.Thread.run(Thread.java:619)

.

Caused by: javax.transaction.SystemException: start() failed on resource 'eis/XADisk/LocalConnectionFactory': XAER_RMFAIL : Resource manager is unavailable

javax.transaction.xa.XAException: Internal error: XAResource 'eis/XADisk/LocalConnectionFactory' is unavailable

at weblogic.transaction.internal.XAResourceDescriptor.checkResource(XAResourceDescriptor.java:946)

at weblogic.transaction.internal.XAResourceDescriptor.startResourceUse(XAResourceDescriptor.java:632)

at weblogic.transaction.internal.XAServerResourceInfo.start(XAServerResourceInfo.java:1200)

at weblogic.transaction.internal.XAServerResourceInfo.xaStart(XAServerResourceInfo.java:1134)

at weblogic.transaction.internal.XAServerResourceInfo.enlist(XAServerResourceInfo.java:275)

at weblogic.transaction.internal.ServerTransactionImpl.enlistResource(ServerTransactionImpl.java:516)

at weblogic.transaction.internal.ServerTransactionImpl.enlistResource(ServerTransactionImpl.java:443)

at weblogic.connector.transaction.outbound.XATxConnectionHandler.enListResource(XATxConnectionHandler.java:118)

at weblogic.connector.outbound.ConnectionHandlerBaseImpl.prepareHandle(ConnectionHandlerBaseImpl.java:670)

at weblogic.connector.outbound.ConnectionHandlerBaseImpl.createConnectionHandle(ConnectionHandlerBaseImpl.java:361)

at weblogic.connector.outbound.ConnectionInfo.createConnectionHandle(ConnectionInfo.java:423)

at weblogic.connector.outbound.ConnectionManagerImpl.getConnection(ConnectionManagerImpl.java:308)

at weblogic.connector.outbound.ConnectionManagerImpl.allocateConnection(ConnectionManagerImpl.java:115)

at org.xadisk.connector.outbound.XADiskConnectionFactoryImpl.getConnection(XADiskConnectionFactoryImpl.java:31)

at xxx.XADiskFileWrapper.move(XADiskFileWrapper.java:84)

...

at java.lang.Thread.run(Thread.java:619)

at weblogic.transaction.internal.XAServerResourceInfo.xaStart(XAServerResourceInfo.java:1158)

at weblogic.transaction.internal.XAServerResourceInfo.enlist(XAServerResourceInfo.java:275)

at weblogic.transaction.internal.ServerTransactionImpl.enlistResource(ServerTransactionImpl.java:516)

at weblogic.transaction.internal.ServerTransactionImpl.enlistResource(ServerTransactionImpl.java:443)

at weblogic.connector.transaction.outbound.XATxConnectionHandler.enListResource(XATxConnectionHandler.java:118)

... 21 more

Nitin Verma

unread,
Jun 22, 2011, 1:47:22 PM6/22/11
to XADisk
Hi Stelios,

Though the above weblogic logs are not showing up the call-stack from
the XADisk's XAResource class (org.xadisk.connector.XAResourceImpl),
but if I make a guess, the exception is being thrown from here in the
org.xadisk.connector.XAResourceImpl class:

_________________________________________________________________________________________________________________________

public void start(Xid xid, int flag) throws XAException {
...
...
try {
...
} catch (XASystemException xase) {
throw new
XAException(XAException.XAER_RMFAIL); <---- here.
}
...
...
}

_________________________________________________________________________________________________________________________


As no constructor for XAException accepts a "Throwable cause", the
above code had to throw XAException without attaching the "xase"
object (which
could have given us the actual reason for failure).

As you rightly said, the root/first occurrence of this
XASystemException has happened even before you saw above log messages
during "getConnection". To know at which point, you will have to print
the stackTrace in the exception handling of your application, so that
as soon as the XASystemException is seen by the application, it is
logged and we can then inspect what exactly is bringing XADisk down.

In version 1.1, I have implemented logging to cover only some very
specific case and not this one; but I understand this could be very
useful in situations like above - I will make a note.

Please let me know how does the investigation go and if I can help
further.

Thanks,
Nitin

Nitin Verma

unread,
Jul 6, 2011, 1:23:02 PM7/6/11
to xad...@googlegroups.com
Hi Stelios,

Could you figure it out. Ping me if I can be of any help.

Thanks,
Nitin

sger...@gmail.com

unread,
Jul 18, 2011, 11:00:36 AM7/18/11
to xad...@googlegroups.com
Hello,
 
I found some time to investigate this issue and here are the results.
 
I changed the code of XAResourceImpl, as per your suggestion, and added a quick exception logging statement "xase.printStackTrace()".
I rebuilt XADisk, redeployed and ran again the stress test both on my Windows machine as well as the RedHat servers.
 
On my machine, after a native error of "XADisk Error [Native Module] Directory flush failed, with error code 1, for z:\outbox" (mounted drive over Samba, JIRA issue 77), the new logging statement prints the attached exception1.txt.
 
On the TEST system of a cluster of 2 RedHat machines and Weblogic 10.3.2, after some time through the test I get the attached exception exception2.txt
 
To be honest, I am more interested in the 2nd error, as the system will rarely be used over Samba, even during development.
Any ideas/suggestions?
 
Thank you,
S.
 
 
exception1.txt
exception2.txt

Nitin Verma

unread,
Jul 18, 2011, 12:30:30 PM7/18/11
to Stelios Gerogiannakis, xad...@googlegroups.com
Hi Stelios,

Just to confirm, is the second error seen only during stress testing? Or, was it visible sometime under normal load also?
What kind of stress test do you have; a lot of transactions, a lot of operations etc?
I guess the first error is not related to high load. Do you see the first one only during high load?

Thanks,
Nitin


 

--
You received this message because you are subscribed to the Google Groups "XADisk" group.
To view this discussion on the web visit https://groups.google.com/d/msg/xadisk/-/y6IDyPCQUnkJ.
To post to this group, send email to xad...@googlegroups.com.
To unsubscribe from this group, send email to xadisk+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/xadisk?hl=en.

Nitin Verma

unread,
Jul 18, 2011, 1:06:54 PM7/18/11
to XADisk
Hi Stelios,

I also tried to reverse think how the 2nd issue could arise. It must
be a bug in XADisk. Please make the small
change as below, compile and see if this fixes the issue.

In the class org.xadisk.filesystem.NativeXAFileSystem, method
getAllSessions(), change the following code from

return sessions.toArray(new NativeSession[sessions.size()]);

to

return sessions.toArray(new NativeSession[0]);


It would be great to hear if it fixes the issue and keeps you moving
further.


Thanks,
Nitin

sger...@gmail.com

unread,
Jul 21, 2011, 10:14:47 AM7/21/11
to xad...@googlegroups.com
Hello Nitin.
 
The 1st error (Windows-related inability to flush a mounted Samba folder) is not at all related to high load; it happens all the time. 
I assume it is a shortcoming of either the native library or an inherent problem of the stack on my machine (Windows XP SP3, Samba v.XYZ,...)
As I said, this is not so important, as Samba over Windows is not really reliable for us; file copies are NOT atomic, so we only use it for very basic testing.
 
The 2nd error appears after stress testing or more specifically a high number of files being processed (unsure what "high" actually means as it just appears at some point, after some 100s-1000s of files being processed).
The current setup of our stress environment is a throughput of roughly 1,5 files per second, XADisk only being involved in maximum 2 copies in either of 2 mounted NFS folders in different transactions.
So to give an example:
  • File of type A comes in
    • XADisk copies it on mounted folder X in the context of an XA transaction along with a DB commit
  • File of type B comes in
    • XADisk copies it on mounted folder X (transaction 1)
    • XADisk copies an auto-generated file on mounted folder Z (transaction 2)
 

sger...@gmail.com

unread,
Jul 21, 2011, 10:15:36 AM7/21/11
to xad...@googlegroups.com
OK, i will make the change and launch the stress test again
 
Thank you for the suggestion.

sger...@gmail.com

unread,
Jul 29, 2011, 6:00:18 AM7/29/11
to xad...@googlegroups.com
Hello,
 
I made the change you suggested and repeated the test with the patched XADisk.
 
After processing ~20k files there was no visible problem, i.e. no exceptions in the logs and all files eventually processed.
 
So I would say (subject to further testing of course), that this might have been it.
 
Thank you for your support! 

Nitin Verma

unread,
Sep 24, 2011, 8:03:11 AM9/24/11
to XADisk
The tracking bug is http://java.net/jira/browse/XADISK-78
The changes have been checked-in to /trunk.

Nitin Verma

unread,
Sep 24, 2011, 8:19:42 AM9/24/11
to XADisk
A minor correction regarding what I said earlier:

"As no constructor for XAException accepts a "Throwable cause", the
above code had to throw XAException without attaching the "xase"
object (which could have given us the actual reason for failure)."

Though there is no such constructor of XAException with "cause"
argument, but we can still work this around using the
initCause(Throwable cause) method. I am filing a tracing bug to fix
this issue in the XAResource objects of XADisk - http://java.net/jira/browse/XADISK-87

Thanks,
Nitin
Reply all
Reply to author
Forward
0 new messages