commit is extremly slow on certain environments

103 views
Skip to first unread message

Kevin

unread,
Apr 17, 2015, 1:16:30 PM4/17/15
to xad...@googlegroups.com
Hi all,

I encountered a problem during the commit of XADisk.
Following you find a wrap up of the problem.

Any help is much appreciated!


Initial situation

I want to store 100.000 DB entries in the file-system of my server running jBoss EAP 6.0
For this, I'm using a job who is triggered every day once.
To be able to rollback after an error, I'm using an transaction during writing to the filesystem and setting the "already saved"-flag in the DB.
If one part of this job fails, everything should be rolled back.

XADisk helps me to write transactional to the file-system and works in two steps:
  1. create 1000 files with 100 DB entries (file size is about 100kb each)
  2. commit the files and writing them effectively to the file-system
public void writeByteArrayToFile(String filename, byte[] fileToWrite) {
StandaloneFileSystemConfiguration config = createConfiguration("/sysDir/");
config.setTransactionTimeout(300); // 5 Minutes
XAFileSystem fileSystem = XAFileSystemProxy.getNativeXAFileSystemReference("xaDiskInstance");
fileSystem.waitForBootup(WAIT_FOR_BOOTUP_TIME);
session = fileSystem.createSessionForLocalTransaction();
File file = new File(filename);
try {
session.createFile(file, false); // false to create file, true for directory
XAFileOutputStream xafos = session.createXAFileOutputStream(file, true);
xafos.write(fileToWrite);
xafos.close();
LOGGER.info("Successfully written " + fileToWrite.length + " Bytes to " + file.getAbsolutePath());
} catch (Exception e) {
// handle exception
}
}

public void commit() {
try {
LOGGER.info("Start XADisk committing");
session.commit();
LOGGER.info("XADisk commit successful.");
} catch (NoTransactionAssociatedException e) {
// handle exception
}
}


Problem

The second step - the commit - takes extremely long on some environments (see analysis).
The same software release is running on all environments and the file size is always almost the same (per file about 100kb)


Analysis

The commit was tested on 4 environments:
  1. localhost (Windows)
  2. MGT (test server) (Linux)
  3. Technical Test (Linux)
  4. Development server (Linux)
Following times are measured (localhost isn't in the list due to windows):

MGT = 101,241 seconds for the XADisk commit 33,747 ms per file  
2015-04-17 10:52:16,474|INFO|XADiskUtil||Start XADisk committing 0,56 minutes per 100.000
2015-04-17 10:53:57,715|INFO|XADiskUtil||XADisk commit successful.   database entries
2015-04-17 10:53:57,715|INFO|EticketDdsEntsorgungServiceBean||Copied 300010 database entries  
   
MGT = 25,0448 seconds for the XADisk commit 8,34933333 ms per file  
2015-04-17 12:27:11,035|INFO|XADiskUtil||Start XADisk committing 0,14 minutes per 100.000
2015-04-17 12:27:36,083|INFO|XADiskUtil||XADisk commit successful.   database entries
2015-04-17 12:27:36,083|INFO|EticketDdsEntsorgungServiceBean||Copied 300010 database entries  
   
   
dev server = 1873,490 seconds for the XADisk commit 4683,725 ms per file  
2015-04-17 00:00:47,965|INFO|XADiskUtil||Start XADisk committing 78,06 minutes per 100.000
2015-04-17 00:32:01,455|INFO|XADiskUtil||XADisk commit successful.   database entries
2015-04-17 00:32:01,456|INFO|EticketDdsEntsorgungServiceBean||Copied 40000 database entries  
   
dev server = 1913,239 seconds for the XADisk commit 4783,0975 ms per file  
2015-04-17 16:08:07,181|INFO|XADiskUtil||Start XADisk committing 79,72 minutes per 100.000
2015-04-17 16:40:00,420|INFO|XADiskUtil||XADisk commit successful.   database entries
2015-04-17 16:40:00,421|INFO|EticketDdsEntsorgungServiceBean||Copied 40000 database entries  
   
   
technical test = 5073,385 seconds for the XADisk commit 3618,67689 ms per file  
2015-03-26 16:32:48,595|INFO|XADiskUtil||Start XADisk committing 60,31 minutes per 100.000
2015-03-26 17:57:21,980|INFO|XADiskUtil||XADisk commit successful.   database entries
2015-03-26 17:57:21,980|INFO|EticketDdsEntsorgungServiceBean||Copied 140216 database entries  
   
technical test = 5878,897 seconds for the XADisk commit 3446,01231 ms per file  
2015-04-14 17:50:09,003|INFO|XADiskUtil||Start XADisk committing 57,43 minutes per 100.000
2015-04-14 19:28:07,900|INFO|XADiskUtil||XADisk commit successful.   database entries
2015-04-14 19:28:07,901|INFO|EticketDdsEntsorgungServiceBean||Copied 170583 database entries  


Summary 

Committing the files on the dev server and the technical test is about 196 times slower than on the MGT.




Does anyone have an idea why this is so slow?

Greetings,
Kevin 


Nitin Verma

unread,
Apr 19, 2015, 11:15:46 AM4/19/15
to xad...@googlegroups.com, wavemast...@googlemail.com
Hello Kevin,

As I understand, a single transaction is creating and writing data to 1000 files. And you are also using "heavyWrite" mode for writing to these files - which would result in the data getting directly written to backup files, and during commit the files (with its full content) will be moved from the system-directory (for xadisk), to the actual destination.

Such scenarios are quite general and should not take this much of time. I would suggest that you pick the last mentioned environment (the technical test server #2), where the time is very high, and take repeated stack-dump of the thread while it is stuck inside the commit. Doing this would enable us to find out the step(s) where the thread is stuck and then diagnose the problem. Please share the information you collect from there.

Good luck.

Thanks,
Nitin

wavemast...@googlemail.com

unread,
Apr 20, 2015, 8:36:55 AM4/20/15
to xad...@googlegroups.com, wavemast...@googlemail.com
Hi Nitin,

thanks for your response!

You understood right about the creating of 1000 files in a single transaction with "heavyWrite"-mode.

I've done the stack-dumping for the thread but I'm not very familiar with those dumps.
Please find the eight dumps in the attachment (ordered by the name). If you need more information about the case, please ask for it, I'll do everything to help you.

Thanks,
Kevin
thread-dump.2.log
thread-dump.3.log
thread-dump.4.log
thread-dump.5.log
thread-dump.6.log
thread-dump.7.log
thread-dump.0.log
thread-dump.1.log
Message has been deleted

wavemast...@googlemail.com

unread,
Apr 20, 2015, 11:36:53 AM4/20/15
to xad...@googlegroups.com, wavemast...@googlemail.com
Hi Nitin,


I've done a few analyses and found the following part:

"MTx-Server_Worker-1" prio=10 tid=0x0000000001223000 nid=0x1d58 waiting on condition [0x00007f9a853ca000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.xadisk.filesystem.utilities.FileIOUtility.doGCBeforeRetry(FileIOUtility.java:152)
at org.xadisk.filesystem.utilities.FileIOUtility.renameTo(FileIOUtility.java:29)
at org.xadisk.filesystem.DurableDiskSession.renameTo(DurableDiskSession.java:142)
at org.xadisk.filesystem.NativeSession.commitFileSpecialMove(NativeSession.java:961)
at org.xadisk.filesystem.NativeSession.commit(NativeSession.java:739)
at org.xadisk.filesystem.NativeSession.commit(NativeSession.java:1339)
at de.db.mtx.service.softwareverteilung.datenentsorgung.XADiskUtil.commit(XADiskUtil.java:122)
at de.db.mtx.service.softwareverteilung.datenentsorgung.EticketDdsEntsorgungServiceBean.copyEticketDdsFromDatabaseToFile(EticketDdsEntsorgungServiceBean.java:137)

in almost every dump (exept of one, #5).



It seems the FileIOUtility can't rename my files on the first try so it tries to call the GC and rename the file again:

    public static void renameTo(File src, File dest) throws IOException {
        if (!src.renameTo(dest)) {
            if (renamePossible(src, dest)) {
                int retryCount = 1;
                while (!src.renameTo(dest)) {
doGCBeforeRetry(retryCount++, src);
                    if (src.renameTo(dest)) {
                        break;
                    }
                    if (!src.isDirectory()) {
                        copyFile(src, dest, false);
                        dest.setLastModified(src.lastModified());
                        deleteFile(src);
                        break;
                    }
                }
            } else {
                throw new IOException("Rename not feasible from " + src + " to " + dest);
            }
        }
    }


Could you help me to avoid this behavior? (so that my files are renamed on the first try).

Interesting fact: Although it takes around 4 seconds to create one file, it never crashes with the 
if(retryCount == 5) {
    throw new IOException("The i/o operation could not be completed for "
     + "the file/directory with path [" + f.getAbsolutePath() + "] due "
     + "to an unknown reason.");
exception (until now at least).

So it seems to accomplish the renaming. But not on the first try. I assume it does it on the third of fourth try.


Thanks a lot for your help!
Greetings,
Kevin

Nitin Verma

unread,
Apr 20, 2015, 1:04:09 PM4/20/15
to xad...@googlegroups.com, wavemast...@googlemail.com
Hi Kevin,

Are the files being created/written on a different file-system than the xadisk system directory's file system (this is where the backup files reside). If yes, can you also check if these delays disappear if the files created/written lie on the same file-system as the xadisk system directory.

Thanks,
Nitin

Kevin Deffke

unread,
Apr 21, 2015, 4:56:10 AM4/21/15
to xad...@googlegroups.com, wavemast...@googlemail.com
Hi Nitin,

I've double checked: in both environments (dev server and technical test) is the create/write directory on the same file system as the XADisk system directory.
Moreover, as I moved the create/write directory to a mounted file system for test purposes, I did not see any speed gain or loss. So this delay is not caused by the file system itself!

Any other ideas?

Greetings,
Kevin

Nitin Verma

unread,
Apr 26, 2015, 12:12:34 PM4/26/15
to xad...@googlegroups.com, wavemast...@googlemail.com
Hi Kevin,

Sorry for my delay in response.

I am not getting a hint on why this could be happening. It may be environment specific and you will need to do some diagnosis - for example,
1. you can add a logging inside the rename method of FileIOUtility and see if it is calling the doGCBeforeRetry for every rename. You can also add the paths (being renamed from/to) in the logging to be sure of the context.
2. is it happening due to load? try reduce the number of files to 10.
3. just write a simple standalone java code (with same jdk) which would do rename of some files and see if all of them succeed in the first attempt. this would be to rule out any os/filesystem/jdk issue.

Thanks,
Nitin

pjrha...@gmail.com

unread,
Sep 24, 2018, 5:41:11 AM9/24/18
to XADisk
The problem is that File.renameTo will fail on some (all?) OS unless the source and destination are on the same filesystem. Looking at FileIOUtility renameTo method, it tries 3 times renameTo until it actually copies the file over to other filesystem. In the process it calls doGCBeforeRetry which, besides other useless things, does Thread.sleep(1000). So in the end, your commit will will take one second for each file that needs to be moved.

I'd be willing to fix this and submit a patch to the maintainer, but it seems the original java.net site is down, so it's a question where it can be sent to.
Reply all
Reply to author
Forward
0 new messages