ant compile-clojure thrashing the disk; why?

41 views
Skip to first unread message

B Smith-Mannschott

unread,
Dec 30, 2010, 8:42:40 AM12/30/10
to cloju...@googlegroups.com
When I use ant to build clojure from source, I can hear my
Velociraptor grinding away from halfway across the room and am seeing
over 70% io-wait (on a 10'000 RPM disk). The build takes about 2
minutes on this machine. When I move clojure to a tmpfs build time is
reduced to about 30 seconds on the same machine.

machine = (2.4GHz Quad Core2; 8GB RAM; HDD: 150 GB @ 10'000 RPM + 2 x
300 GB @ 10'000 RPM; Linux; EXT4)

Does anyone know why the Clojure compiler causes so many IO
operations? If no one has looked at this previously, I'd be willing to
poke around and see if I can figure out way to improve the situation.

I note that the Clojure compiler produces some 2800 class files, but I
have a pure Java build that produces twice as many class files and yet
finishes in 30 seconds on the same disk. ("Clojure produces a lot of
class files" is obviously not a sufficient explanation.)

// Ben

B Smith-Mannschott

unread,
Dec 30, 2010, 3:13:11 PM12/30/10
to cloju...@googlegroups.com

I've continued investigating the heavy i/o during compile-clojure
noted in my previous message. I believe I have found a way to
ameliorate the problem. I also believe the severity of the observed
symptoms are related to three factors:

- Linux - the problem manifests more distinctly on linux than on Mac OS X.
- slow storage - the problem manifests more distinctly on physical HDD or
really slow flash than it does on a modern SSD.
- fast processor - the larger the gap between processor speed and storage
speed, the more noticable the problem.

The last point is something of a guess, since I don't have access to
the machine I originally observed this on as it is at work and I am at
home.

I've made a change, which improves the performance of compile-clojure
and run three trials of `ant compile-clojure` on 5 different machines
both with and without the change.

The Change
==========

Compiler.writeClassFile() concludes thus:

try {
cfs.write(bytecode);
cfs.flush();
cfs.getFD().sync();
} finally {
cfs.close();
}

I don't believe cfs.flush() is necessary since the finally block
guarantees that cfs.close() will be called and cfs.close() implies
cfs.flush(). (Or at least it used to, I'm not seeing that mentioned in
the JDK 1.6 JavaDocs for OutputStream.close())

cfs.getFD().sync() strikes me as paranoid. Is there any reason to
believe that the "system buffers" it promises to write to the
underlying device won't land there (eventually) following a call to
close()?

The change was: deleting the calls cfs.flush() and cfs.getFD().sync().

The Machines Tested
===================

(p) An EeePC netbook
Atom with 1 core (2 threads) at 1.66 GHz
A bog-standard 120GB 2.5" 5400 RPM Notebook HDD using EXT4
2 GB RAM
Ubuntu Linux 10.04

(w) A Dell "Mini 9" netbook
Atom with 1 core (2 threads) at 1.6 GHz
16 GB of (slow and old) flash instead of a hard disk, using EXT2
2 GB RAM
Linux Mint 10 (similar to Ubuntu Linux 10.10)

(l) An EeePC 1015PEM (Netbook)
Atom with 2 cores (2 threads per core) at 1.5 GHz
40 GB SSD (Sandforce) using BTRFS
2 GB RAM
Linux Mint 10 (similar to Ubuntu Linux 10.10)

(m) A MacBook
Core2Duo (dual core) at 2.1 GHz
A bog-standard 250GB 2.5" 5400 RPM Notebook HDD using HFS+
4 GB RAM
Mac OS X 10.6.x

(a) A MacBook Air
Core2Duo (ULV, dual core) at 1.6 GHz
128 GB Flash using HFS+
4 GB RAM
Mac OS X 10.6.x

storage: slow: p, w, m; fast: l, a
os: linux: p, w, l; mac: m, a


(x) (y) (r)
---------------
p 171 90 52%
w 170 97 57%
l 77 71 92%
m 62 57 91%
a 36 36 100%
---------------
(x) seconds required for 'ant compile-clojure' without my change
(y) seconds required for 'ant compile-clojure' with my change
(r) ratio (y)/(x) as a percentage (lower indicates greater effect of change)

This change looks like a performance win for Linux systems not working
off of fast SSD storage. It doesn't show as much effect on SSDs or
under Mac OS X generally. I don't have any data on Windows.

Are there reasons for not removing the flush() and sync() calls from
Compiler.java?

// Ben

Alex Miller

unread,
Dec 30, 2010, 4:09:30 PM12/30/10
to cloju...@googlegroups.com
The getFD().sync() is (I believe) calling fsync to force a write of os buffers to disk (stream must be flushed first).  In the case of a crash, you would not necessarily get the buffers written.

Interesting thread on fsync (note that it's faster on OS X b/c OS X doesn't actually do it when you ask):

I guess my question would be why does it matter whether the class is written correctly if you crash?  Aren't you screwed at that point anyhow?  I suppose you might have a class file(s) lying around then that are lost or corrupted but if your jvm is dead you're probably going to restart whatever you were doing anyways.  

A compromise might be to fsync periodically (once/second) instead of on every file to give more safety.





From: B Smith-Mannschott <bsmit...@gmail.com>
To: cloju...@googlegroups.com
Sent: Thu, December 30, 2010 2:13:11 PM
Subject: Re: ant compile-clojure thrashing the disk; why?
--
You received this message because you are subscribed to the Google Groups "Clojure Dev" group.
To post to this group, send email to cloju...@googlegroups.com.
To unsubscribe from this group, send email to clojure-dev+unsub...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/clojure-dev?hl=en.

B Smith-Mannschott

unread,
Dec 31, 2010, 4:01:02 AM12/31/10
to cloju...@googlegroups.com
On Thu, Dec 30, 2010 at 22:09, Alex Miller <alexd...@yahoo.com> wrote:
> The getFD().sync() is (I believe) calling fsync to force a write of os
> buffers to disk (stream must be flushed first).  In the case of a crash, you
> would not necessarily get the buffers written.
> Interesting thread on fsync (note that it's faster on OS X b/c OS X doesn't
> actually do it when you ask):
> http://shaver.off.net/diary/2008/05/25/fsyncers-and-curveballs/
> I guess my question would be why does it matter whether the class is written
> correctly if you crash?  Aren't you screwed at that point anyhow?  I suppose
> you might have a class file(s) lying around then that are lost or corrupted
> but if your jvm is dead you're probably going to restart whatever you were
> doing anyways.

My feeling precisely. The class files are automatically derived build
products. They can always be regenerated from the sources should the
system go down in the middle of a compile.

In the most typical case, they only exist so that we can then copy
them into a jar file. A clever build could blow them away again as
soon as the JAR is built, and they might never touch the disk. (A
really clever build can even write them directly to the target JAR,
though I consider that a gross violation of 'separation of concerns'
and don't advocate it.)

// Ben

> clojure-dev...@googlegroups.com.


> For more options, visit this group at
> http://groups.google.com/group/clojure-dev?hl=en.
>

> --
> You received this message because you are subscribed to the Google Groups
> "Clojure Dev" group.
> To post to this group, send email to cloju...@googlegroups.com.
> To unsubscribe from this group, send email to

> clojure-dev...@googlegroups.com.

Jürgen Hötzel

unread,
Jan 4, 2011, 2:01:35 PM1/4/11
to cloju...@googlegroups.com, Laurent Petit
2010/12/31 B Smith-Mannschott <bsmit...@gmail.com>
On Thu, Dec 30, 2010 at 22:09, Alex Miller <alexd...@yahoo.com> wrote:
> The getFD().sync() is (I believe) calling fsync to force a write of os
> buffers to disk (stream must be flushed first).  In the case of a crash, you
> would not necessarily get the buffers written.
> Interesting thread on fsync (note that it's faster on OS X b/c OS X doesn't
> actually do it when you ask):
> http://shaver.off.net/diary/2008/05/25/fsyncers-and-curveballs/
> I guess my question would be why does it matter whether the class is written
> correctly if you crash?  Aren't you screwed at that point anyhow?  I suppose
> you might have a class file(s) lying around then that are lost or corrupted
> but if your jvm is dead you're probably going to restart whatever you were
> doing anyways.

My feeling precisely. The class files are automatically derived build
products. They can always be regenerated from the sources should the
system go down in the middle of a compile.

In the most typical case, they only exist so that we can then copy
them into a jar file.  A clever build could blow them away again as
soon as the JAR is built, and they might never touch the disk.  (A
really clever build can even write them directly to the target JAR,
though I consider that a gross violation of 'separation of concerns'
and don't advocate it.)

I have summarized  the discussion and enclosed a fix in this Task:


Jürgen

Laurent PETIT

unread,
Jan 4, 2011, 2:18:12 PM1/4/11
to Jürgen Hötzel, cloju...@googlegroups.com
2011/1/4 Jürgen Hötzel <jue...@hoetzel.info>

Thanks Jurgen for having added me as a direct recipient, or I would have overlooked this thread.

I am, indeed, at least from the point of view of this thread ;-), the "culprit". To summarize, this call was not added for paranoid reasons concerning crash considerations ; no, it was about being certain that there could no more be race conditions between code having called "compile" and the OS flushing the class to disk. With toolings such as IDEs in mind, where the IDE calls compile() on a file, and once compile() returns, immediately calls the refresh mechanism of the IDE.
It's not an hypothetical problem, not having sync() made the IDE unstable and regularly not seeing the compiled file 'cause it "won" over the OS flush.

Note however that in my initial post, I mentioned that there could be performance issues.
Here is the extract from the original post:

"
I have a problem for implementing the auto-compile feature of clojuredev

eclipse plugin.

My problem is that it seems that the function 'compile returns very fast to
the clojure dev builder, and when the builder refreshes the workspace
(operation needed to "see" the newly compiled classes), it's ... to early !!

I guess something could be done in clojure.lang.Compiler.writeClassFile()
method to avoid this situation.
The idea could be to make it guarantee that whenever it returns, the changes
are not only flushed to the OS, but also written on disk.
I don't know if this could have significant performance impacts, so maybe it
could be necessary to make this behaviour dependent on some new parameter,
or on a global var (something like *ensure-filesystem-sync*).
"

Jürgen Hötzel

unread,
Jan 4, 2011, 2:36:33 PM1/4/11
to Laurent PETIT, cloju...@googlegroups.com
Hi Laurent,

thanks for the quick response!
2011/1/4 Laurent PETIT <lauren...@gmail.com>

I am, indeed, at least from the point of view of this thread ;-), the "culprit". To summarize, this call was not added for paranoid reasons concerning crash considerations ; no, it was about being certain that there could no more be race conditions between code having called "compile" and the OS flushing the class to disk. With toolings such as IDEs in mind, where the IDE calls compile() on a file, and once compile() returns, immediately calls the refresh mechanism of the IDE.
It's not an hypothetical problem, not having sync() made the IDE unstable and regularly not seeing the compiled file 'cause it "won" over the OS flush.

The purpose of "sync()" is do make the change visible "on-disk". 
When calling "flush" the changes are already visible to other user processes (like the IDE). 
Furthermore calling "close" ensures  the changes are visible to other user processes (even without calling "flush" previously).

So I assumed another race condition (file is initially created with 0 file size, partly written bytecode).

Jürgen 



Laurent PETIT

unread,
Jan 4, 2011, 2:58:16 PM1/4/11
to Jürgen Hötzel, cloju...@googlegroups.com
2011/1/4 Jürgen Hötzel <jue...@hoetzel.info>

Hi Laurent,

thanks for the quick response!
2011/1/4 Laurent PETIT <lauren...@gmail.com>

I am, indeed, at least from the point of view of this thread ;-), the "culprit". To summarize, this call was not added for paranoid reasons concerning crash considerations ; no, it was about being certain that there could no more be race conditions between code having called "compile" and the OS flushing the class to disk. With toolings such as IDEs in mind, where the IDE calls compile() on a file, and once compile() returns, immediately calls the refresh mechanism of the IDE.
It's not an hypothetical problem, not having sync() made the IDE unstable and regularly not seeing the compiled file 'cause it "won" over the OS flush.

The purpose of "sync()" is do make the change visible "on-disk". 
When calling "flush" the changes are already visible to other user processes (like the IDE). 

Question: guaranteed behaviour OS-wide ? (linux, Windows, MacOS) ? Or just linux ? I'm asking 'cause I was testing on 2 OSes (linux/windows) and only when I found the sync() method had I found a write-once-run-anywhere solution, if I remember correctly.
 
Furthermore calling "close" ensures  the changes are visible to other user processes (even without calling "flush" previously).

I understand this one quite easily, indeed.

So I assumed another race condition (file is initially created with 0 file size, partly written bytecode).

Not the initial trigger to add the call to sync(). And quite certain the call to close() was there before the patch, and only the sync() call made the difference.

Laurent PETIT

unread,
Jan 4, 2011, 3:07:34 PM1/4/11
to Jürgen Hötzel, cloju...@googlegroups.com


2011/1/4 Laurent PETIT <lauren...@gmail.com>

2011/1/4 Jürgen Hötzel <jue...@hoetzel.info>
Hi Laurent,

thanks for the quick response!
2011/1/4 Laurent PETIT <lauren...@gmail.com>

I am, indeed, at least from the point of view of this thread ;-), the "culprit". To summarize, this call was not added for paranoid reasons concerning crash considerations ; no, it was about being certain that there could no more be race conditions between code having called "compile" and the OS flushing the class to disk. With toolings such as IDEs in mind, where the IDE calls compile() on a file, and once compile() returns, immediately calls the refresh mechanism of the IDE.
It's not an hypothetical problem, not having sync() made the IDE unstable and regularly not seeing the compiled file 'cause it "won" over the OS flush.

The purpose of "sync()" is do make the change visible "on-disk". 
When calling "flush" the changes are already visible to other user processes (like the IDE). 

Question: guaranteed behaviour OS-wide ? (linux, Windows, MacOS) ? Or just linux ? I'm asking 'cause I was testing on 2 OSes (linux/windows) and only when I found the sync() method had I found a write-once-run-anywhere solution, if I remember correctly.
 
Furthermore calling "close" ensures  the changes are visible to other user processes (even without calling "flush" previously).

I understand this one quite easily, indeed.

Or maybe not. Extract from flush() javadoc:
"
sync only affects buffers downstream of this FileDescriptor. If any in-memory buffering is being done by the application (for example, by a BufferedOutputStream object), those buffers must be flushed into the FileDescriptor (for example, by invoking OutputStream.flush) before that data will be affected by sync
"

So the call to flush() is mandatory before the call to sync() to ensure that *everything* is written to disk.

Jürgen Hötzel

unread,
Jan 4, 2011, 4:33:51 PM1/4/11
to Laurent PETIT, cloju...@googlegroups.com
Or maybe not. Extract from flush() javadoc:
"
sync only affects buffers downstream of this FileDescriptor. If any in-memory buffering is being done by the application (for example, by a BufferedOutputStream object), those buffers must be flushed into the FileDescriptor (for example, by invoking OutputStream.flush) before that data will be affected by sync
"

Yes: Good catch. Quite confusing because two different IO layers involved (userspace FileOutputStreams and kernelspace FDs)  

"sync" on kernel filedescriptor writes kernelspace buffers to disk.
"flush" enforces kernel syscall "write" on the filedescriptor ("flushes" userspace/jvm buffers  to kernelspace).

 
Jürgen
Reply all
Reply to author
Forward
0 new messages