I have recently run into the problem that when recursively iterating
through folders over a network, the performance is terrible. It's
30-50 times slower than when the folders are on the local machine.
I've put a small test program at the end of this post - just try it
out with a large/deep folder both locally and on a network drive.
The culprit in this case is the call to isDirectory(), but any attempt
to question any file attribute (e.g. lastModified()) results in the
bad performance. And you really must use either isDirectory() or isFile
() before doing something with the File[] that listFiles() returned.
The problem is not unknown to Sun (http://bugs.sun.com/bugdatabase/
view_bug.do?bug_id=6483858) but it's been ignored for JSR-203 to solve
- sometime!
Does anyone have a workaround for this problem without invoking JNI or
using native code?
And does anyone know when JSR-203 will be implemented? Java 7?
John
<pre>
import java.io.*;
import java.util.*;
public class TstListFiles {
static int listPath(File path) {
int ct = 0;
File[] files = path.listFiles();
for (int i = 0, n = files.length; i < n; i++) {
ct++;
// long l = files[i].lastModified();
// boolean b = files[i].isDirectory();
boolean b = files[i].isFile();
if (files[i].isDirectory()) {
ct += listPath(files[i]);
}
}
return ct;
}
public static void main(String args[]) {
for (int i=0; i < args.length; i++) {
long start = System.currentTimeMillis();
int numFiles = listPath(new File(args[i]));
System.out.println(args[i] + " has " + numFiles + " files/
folders and took " + (System.currentTimeMillis()-start) + "ms");
}
}
}
</pre>
| Directory | Traversal (count / ms) | Performance |
| d:\apps\ | 7,094 / 532 | 13.335 |
| x:\ | 517 / 984 | 0.525 |
| l:\ | 1,417 / 2,391 | 0.593 |
"maybe you should look at your network performance as well as compare
JVM's times against a non-JVM program."
That's what started me off. I have a C program which does a folder
sync in 20 seconds and my Java counterpart takes 11 minutes! So it's
not down to the network. (And the disks are all NTFS not FAT32.)
John
I can't quite recall, but there may be some tuning you can do at the
network share level
Be interesting if you have the same issues over NFS.
- are there different native APIs?
- can they be accessed differently (does java force a network lookup
while C gets a cached version)?
I would suggest 20 secs vs 11 min means it is defiantly "down to the
network" - round trips to the file server to get an attribute would
blow the time out like that.
On Jan 6, 11:40 pm, Christian Catchpole <christ...@catchpole.net>
wrote:
And I haven't written this sort of code in C for a while, so this is
an extra big guess.
When you get a dir list in Java it creates a list of File objects
which are just paths to the file. Calling isDirectory() forces is to
go back the the network because it doesn't have any kind of real
handle on the OS object which represents the file.
In C, when you get a directory list, do you get a list of "handles to
OS objects" which can be polled immediately for an attribute?
However, as the facts stand it's much slower over the network, so to
return to the original post - does anyone have a solution or
workaround?
On Jan 7, 12:09 am, Christian Catchpole <christ...@catchpole.net>
wrote:
/Casper
Hope this helps,
Mark
--
Mark Fortner
Ironically, when you get a File[] from Java, the attributes probably
come back to your machine in the OS objects, but arn't stored anywhere
by Java. It just pulls the file names and stores them in the File
objects. Each file.isDirectory() is a round trip to the server.
I doubt the Sun would try to implement any caching into the File
object simply because it might break too many things. If you call
getDirectory() 1 min after creating the File object, the result could
be wrong. So I understand how they can't really solve is because of
the contract that the File class has.
The only solution would be some kind of new API or extension.
Cheers, Paul.
> --
> You received this message because you are subscribed to the Google Groups "The Java Posse" group.
> To post to this group, send email to java...@googlegroups.com.
> To unsubscribe from this group, send email to javaposse+...@googlegroups.com.
I've seen the Javadoc for the new NIO somewhere but could not
immediately find any references to better performance or network
latency. I think there was a new Method (getAttributes() or something
similar). Do we have to use different methods to get the faster
performance or should the program be unchanged? Anyone know?
I'll try the beta out and report back...
@Christian
"A network file system will have the "standard" problem of round trip
latency time for any operations which can not be aggregated into one
call."
So how come the C program is so fast? The problem must surely lie in
the Java implementation of the NFS.
Thanks all, John
Alan Bateman was kind enough to send me a test program for the NIO.2
in Java 7 (see below). I downloaded the current JDK7 binaries and
tested on my system. The results are very impressive. On my first
test, file walking over the network was actually faster than on the
local machine! In subsequent repeats of the test, the local machine
was faster - due to local caching I presume. In any event it looks
like this problem has been solved for Java7 ;-)
Thank you Alan!
John
12:00 c:\temp
java TstJava7NIO c:\c <--- local
Found 4720 directories and 29622 files in 49688 ms
12:01 c:\temp
java TstJava7NIO j:\c <--- network
Found 5390 directories and 34197 files in 25812 ms
12:02 c:\temp
java TstJava7NIO c:\c
Found 4720 directories and 29622 files in 1515 ms
12:02 c:\temp
java TstJava7NIO j:\c
Found 5390 directories and 34197 files in 12937 ms
12:03 c:\temp
java TstJava7NIO c:\c
Found 4720 directories and 29622 files in 1516 ms
12:03 c:\temp
java TstJava7NIO j:\c
Found 5390 directories and 34197 files in 12843 ms
Alan's test program:
import java.nio.file.*;
import java.nio.file.attribute.*;
public class TstJava7NIO {
static class Counter extends SimpleFileVisitor<Path> {
private int dirCount;
private int fileCount;
int dirCount() {
return dirCount;
}
int fileCount() {
return fileCount;
}
@Override
public FileVisitResult preVisitDirectory(Path dir) {
dirCount++;
return FileVisitResult.CONTINUE;
}
@Override
public FileVisitResult visitFile(Path file,
BasicFileAttributes attrs) {
fileCount++;
return FileVisitResult.CONTINUE;
}
}
public static void main(String[] args) {
Path top = FileSystems.getDefault().getPath(args[0]);
Counter counter = new Counter();
long start = System.currentTimeMillis();
Files.walkFileTree(top, counter);
long end = System.currentTimeMillis();
System.out.format("Found %d directories and %d files in %d ms
%n", counter.dirCount(), counter.fileCount,
(end - start));
}
}
See also the timing information mentioned in the evaluation section of
this bug:
http://bugs.sun.com/view_bug.do?bug_id=6808590