Bad performance getting file attributes over the network

1,253 views
Skip to first unread message

John Muir

unread,
Jan 6, 2010, 8:59:22 AM1/6/10
to The Java Posse
Hi Folks,

I have recently run into the problem that when recursively iterating
through folders over a network, the performance is terrible. It's
30-50 times slower than when the folders are on the local machine.
I've put a small test program at the end of this post - just try it
out with a large/deep folder both locally and on a network drive.

The culprit in this case is the call to isDirectory(), but any attempt
to question any file attribute (e.g. lastModified()) results in the
bad performance. And you really must use either isDirectory() or isFile
() before doing something with the File[] that listFiles() returned.

The problem is not unknown to Sun (http://bugs.sun.com/bugdatabase/
view_bug.do?bug_id=6483858) but it's been ignored for JSR-203 to solve
- sometime!

Does anyone have a workaround for this problem without invoking JNI or
using native code?

And does anyone know when JSR-203 will be implemented? Java 7?

John

<pre>
import java.io.*;
import java.util.*;

public class TstListFiles {

static int listPath(File path) {
int ct = 0;
File[] files = path.listFiles();
for (int i = 0, n = files.length; i < n; i++) {
ct++;
// long l = files[i].lastModified();
// boolean b = files[i].isDirectory();
boolean b = files[i].isFile();
if (files[i].isDirectory()) {
ct += listPath(files[i]);
}
}
return ct;
}

public static void main(String args[]) {
for (int i=0; i < args.length; i++) {
long start = System.currentTimeMillis();
int numFiles = listPath(new File(args[i]));
System.out.println(args[i] + " has " + numFiles + " files/
folders and took " + (System.currentTimeMillis()-start) + "ms");
}
}
}

</pre>

Alexey Zinger

unread,
Jan 6, 2010, 11:58:06 AM1/6/10
to java...@googlegroups.com
Were you able to verify that performing similar operations outside of JVM produces better results?  What gives me pause in your description is that you're doing this over a network, which has to be slower than local disk access.  Note that the bug files with Sun you linked to talks about all file access, not networked access, and it seems from the comments, it mostly comes down to FAT32 file systems (think USB devises).

I ran your algorithm against directory trees both locally and over a network.  Predictably, local file traversal was faster than networked one, but it wasn't as though I was getting an unacceptable performance in either case.  Also note that tools like Ant and Maven likely use Java's standard IO libraries and are routinely asked to do batch operations on files and I don't often hear people complaining about that being the bottleneck.  Not saying you aren't experiencing a performance issue, but maybe you should look at your network performance as well as compare JVM's times against a non-JVM program.
 
Alexey



From: John Muir <jm...@guskoeln.de>
To: The Java Posse <java...@googlegroups.com>
Sent: Wed, January 6, 2010 8:59:22 AM
Subject: [The Java Posse] Bad performance getting file attributes over the network
--
You received this message because you are subscribed to the Google Groups "The Java Posse" group.
To post to this group, send email to java...@googlegroups.com.
To unsubscribe from this group, send email to javaposse+unsub...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/javaposse?hl=en.



Alexey Zinger

unread,
Jan 6, 2010, 12:12:38 PM1/6/10
to java...@googlegroups.com
Sorry to respond to myself, but wanted to post some data from my run of your algorithm:
 
DirectoryTraversal (count / ms)Performance
d:\apps\7,094 / 53213.335
x:\517 / 9840.525
l:\1,417 / 2,3910.593

Above, d:\apps\ is a local directory, whereas x:\ and l:\ are over the network.


From: Alexey Zinger <inlin...@yahoo.com>
To: java...@googlegroups.com
Sent: Wed, January 6, 2010 11:58:06 AM
Subject: Re: [The Java Posse] Bad performance getting file attributes over the network
To unsubscribe from this group, send email to javaposse+...@googlegroups.com.

John Muir

unread,
Jan 6, 2010, 12:28:28 PM1/6/10
to The Java Posse
Hi Alexey,

"maybe you should look at your network performance as well as compare
JVM's times against a non-JVM program."

That's what started me off. I have a C program which does a folder
sync in 20 seconds and my Java counterpart takes 11 minutes! So it's
not down to the network. (And the disks are all NTFS not FAT32.)

John

Steven Herod

unread,
Jan 6, 2010, 5:29:42 PM1/6/10
to The Java Posse
Are the drives shared using CIFS? (Windows File share).

I can't quite recall, but there may be some tuning you can do at the
network share level

Be interesting if you have the same issues over NFS.

Christian Catchpole

unread,
Jan 6, 2010, 5:40:39 PM1/6/10
to The Java Posse
I was surprised that there was a major difference between C and Java
as I thought they would just be hitting the same native file system
API anyway. But, perhaps its not that simple..

- are there different native APIs?
- can they be accessed differently (does java force a network lookup
while C gets a cached version)?

I would suggest 20 secs vs 11 min means it is defiantly "down to the
network" - round trips to the file server to get an attribute would
blow the time out like that.

John Muir

unread,
Jan 6, 2010, 5:50:39 PM1/6/10
to The Java Posse
Have another look at the Sun/Bugs link I provided in the original post
- it seems to be a 'known' problem, but Sun is either not willing or
not able to solve the problem.

On Jan 6, 11:40 pm, Christian Catchpole <christ...@catchpole.net>
wrote:

John Muir

unread,
Jan 6, 2010, 5:55:05 PM1/6/10
to The Java Posse
Yes, these tests were using Windows file shares - client and server
were Windows XP and 2008 Server respectively. I can't give any info
about NFS performance and to be honest, that's not my ballpark.

Christian Catchpole

unread,
Jan 6, 2010, 6:09:28 PM1/6/10
to The Java Posse
Here is my "guess" as to what is happening (I had a quick look at the
bug ticket but couldn't see an explanation).

And I haven't written this sort of code in C for a while, so this is
an extra big guess.

When you get a dir list in Java it creates a list of File objects
which are just paths to the file. Calling isDirectory() forces is to
go back the the network because it doesn't have any kind of real
handle on the OS object which represents the file.

In C, when you get a directory list, do you get a list of "handles to
OS objects" which can be polled immediately for an attribute?

John Muir

unread,
Jan 6, 2010, 6:46:33 PM1/6/10
to The Java Posse
I think you are on to the correct explanation, at least from what I
read on the Sun/Bugs site. It seems to be related to caching the file
information.

However, as the facts stand it's much slower over the network, so to
return to the original post - does anyone have a solution or
workaround?

On Jan 7, 12:09 am, Christian Catchpole <christ...@catchpole.net>
wrote:

John Muir

unread,
Jan 6, 2010, 6:48:22 PM1/6/10
to The Java Posse
And does anyone know when JSR-203 will be implemented? Java 7?

Casper Bang

unread,
Jan 6, 2010, 6:57:05 PM1/6/10
to The Java Posse
Wow, sounds like java.io.File is broken in a similar fashion as
java.net.URL - unfortunately for you without having an alternative
like java.net.URI.

/Casper

Mark Fortner

unread,
Jan 6, 2010, 7:02:11 PM1/6/10
to java...@googlegroups.com
I believe it will be implemented in Java 7. Carl would know for sure
since I believe he worked on it. You might try Commons VFS though. It
has support for a number of different file systems and does some
caching.

Hope this helps,

Mark

--
Mark Fortner

blog: http://feeds.feedburner.com/jroller/ideafactory

Christian Catchpole

unread,
Jan 6, 2010, 7:07:59 PM1/6/10
to The Java Posse
Yeah, that's what's going to happen. Locally it's going to be quick
to get that file handle again. A network file system will have the
"standard" problem of round trip latency time for any operations which
can not be aggregated into one call.

Ironically, when you get a File[] from Java, the attributes probably
come back to your machine in the OS objects, but arn't stored anywhere
by Java. It just pulls the file names and stores them in the File
objects. Each file.isDirectory() is a round trip to the server.

I doubt the Sun would try to implement any caching into the File
object simply because it might break too many things. If you call
getDirectory() 1 min after creating the File object, the result could
be wrong. So I understand how they can't really solve is because of
the contract that the File class has.

The only solution would be some kind of new API or extension.

Paul King

unread,
Jan 6, 2010, 9:37:28 PM1/6/10
to java...@googlegroups.com
JSR-203 is already incorporated in the beta builds at
jdk7.dev.java.net including JavaDoc and Sun tutorials.
I don't know if it fixes any of your performance issues but would be
interested in your experiences if you try it.

Cheers, Paul.

> --
> You received this message because you are subscribed to the Google Groups "The Java Posse" group.
> To post to this group, send email to java...@googlegroups.com.

> To unsubscribe from this group, send email to javaposse+...@googlegroups.com.

John Muir

unread,
Jan 7, 2010, 1:29:24 AM1/7/10
to The Java Posse
Hi Paul,

I've seen the Javadoc for the new NIO somewhere but could not
immediately find any references to better performance or network
latency. I think there was a new Method (getAttributes() or something
similar). Do we have to use different methods to get the faster
performance or should the program be unchanged? Anyone know?

I'll try the beta out and report back...

@Christian

"A network file system will have the "standard" problem of round trip
latency time for any operations which can not be aggregated into one
call."

So how come the C program is so fast? The problem must surely lie in
the Java implementation of the NFS.

Thanks all, John

Christian Catchpole

unread,
Jan 7, 2010, 4:41:29 AM1/7/10
to The Java Posse
I have explained why Java is so slow to do this operation. It's not
really "Java's implementation of the NFS" but actually a side effect
of how the File class works. I assume in C, when you get a directory
list, all the information comes back in one hit. And you can get all
the attributes you want right away. When Java gets a directory list
it probably does the same thing - but then it does something
different. It pulls out the names (and only the names) and creates a
list of File objects, puts them in an array and passes them back to
you. Remember, all the File object really has inside it is the path
to the file. Any call to say isDirectory() forces it to go back to
the OS with the full path to the file and ask for the attribute, and
thus calling back to the remote server.

John Muir

unread,
Jan 7, 2010, 6:15:56 AM1/7/10
to The Java Posse
WOW!

Alan Bateman was kind enough to send me a test program for the NIO.2
in Java 7 (see below). I downloaded the current JDK7 binaries and
tested on my system. The results are very impressive. On my first
test, file walking over the network was actually faster than on the
local machine! In subsequent repeats of the test, the local machine
was faster - due to local caching I presume. In any event it looks
like this problem has been solved for Java7 ;-)

Thank you Alan!

John

12:00 c:\temp
java TstJava7NIO c:\c <--- local
Found 4720 directories and 29622 files in 49688 ms

12:01 c:\temp
java TstJava7NIO j:\c <--- network
Found 5390 directories and 34197 files in 25812 ms

12:02 c:\temp
java TstJava7NIO c:\c
Found 4720 directories and 29622 files in 1515 ms

12:02 c:\temp
java TstJava7NIO j:\c
Found 5390 directories and 34197 files in 12937 ms

12:03 c:\temp
java TstJava7NIO c:\c
Found 4720 directories and 29622 files in 1516 ms

12:03 c:\temp
java TstJava7NIO j:\c
Found 5390 directories and 34197 files in 12843 ms


Alan's test program:

import java.nio.file.*;
import java.nio.file.attribute.*;

public class TstJava7NIO {

static class Counter extends SimpleFileVisitor<Path> {
private int dirCount;
private int fileCount;
int dirCount() {
return dirCount;
}
int fileCount() {
return fileCount;
}
@Override
public FileVisitResult preVisitDirectory(Path dir) {
dirCount++;
return FileVisitResult.CONTINUE;
}
@Override
public FileVisitResult visitFile(Path file,
BasicFileAttributes attrs) {
fileCount++;
return FileVisitResult.CONTINUE;
}
}

public static void main(String[] args) {
Path top = FileSystems.getDefault().getPath(args[0]);
Counter counter = new Counter();
long start = System.currentTimeMillis();
Files.walkFileTree(top, counter);
long end = System.currentTimeMillis();
System.out.format("Found %d directories and %d files in %d ms
%n", counter.dirCount(), counter.fileCount,
(end - start));
}
}

Alan

unread,
Jan 7, 2010, 3:02:09 AM1/7/10
to The Java Posse
Look at java.nio.file.Files.walkFileTree - it's the general method for
doing recursive operations in the API and should give you much
performance that you are seeing.

See also the timing information mentioned in the evaluation section of
this bug:
http://bugs.sun.com/view_bug.do?bug_id=6808590

Reply all
Reply to author
Forward
0 new messages