Slower performance with ext4

Raphael

unread,

Oct 30, 2009, 2:08:20 AM10/30/09

to not for general discussions Ubuntu user technical support

Help, after I had clean installed Karmic on my ext4 partition, the performance was significantly slower compared to ext3. Startup was around 7 secs but with ext4 it's now 20 secs application speeds are also slower.

Regards,

Raphael

New Email names for you!
Get the Email name you've always wanted on the new @ymail and @rocketmail.
Hurry before someone else does!

Christopher Chan

unread,

Oct 30, 2009, 2:42:37 AM10/30/09

to ubuntu...@lists.ubuntu.com

Raphael wrote:
>
>
> Help, after I had clean installed Karmic on my ext4 partition, the
> performance was significantly slower compared to ext3. Startup was
> around 7 secs but with ext4 it's now 20 secs application speeds are
> also slower.

jfs! jfs! jfs!

--
ubuntu-users mailing list
ubuntu...@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users

Raphael

unread,

Oct 30, 2009, 3:38:28 AM10/30/09

to Ubuntu user technical support, not for general discussions, ubuntu...@lists.ubuntu.com

But isn't the default filesystem for ubuntu ext4, and isn't it the default for a reason?

Sent from my iPod

Raphael wrote:

Get your preferred Email name!
Now you can @ymail.com and @rocketmail.com
http://mail.promotions.yahoo.com/newdomains/sg/

Chan Chung Hang Christopher

unread,

Oct 30, 2009, 6:30:36 AM10/30/09

to ubuntu...@lists.ubuntu.com

Raphael wrote:
> But isn't the default filesystem for ubuntu ext4, and isn't it the default for a reason?
>

/me rotfl. On RHEL/Centos, ext3 is the only filesystem available. Not
sure if they will offer ext3 + ext4 with RHEL6. Why?

ext4 is an unproven filesystem. Data loss anyone? Not like ext3 and jfs.
XFS codebase is so blooming big, I am not sure that it will ever reach
the state that ext3 and jfs currently are.

fyr...@netscape.net

unread,

Oct 30, 2009, 6:58:46 AM10/30/09

to ubuntu...@lists.ubuntu.com

  >Data loss anyone?<

What evidence do you have that there would be data loss? ext2 and ext3 were used almost immediately after their release as well. The distro maintainers usually do some basic reliability tests or at least have access to such tests. So I would be happy to read any tests you've seen that suggest ext4 is unreliable. To start scaring people with talk of data loss based on random speculation would not be good.

Regards,

John

-----Original Message-----
From: Chan Chung Hang Christopher <christop...@bradbury.edu.hk>
To: ubuntu...@lists.ubuntu.com
Sent: Fri, Oct 30, 2009 11:30 am
Subject: Re: Slower performance with ext4

Raphael wrote:

> But isn't the default filesystem for ubuntu ext4, and isn't it the default for 

a reason?

>   



/me rotfl. On RHEL/Centos, ext3 is the only filesystem available. Not 

sure if they will offer ext3 + ext4 with RHEL6. Why?


ext4 is an unproven filesystem.  Not like ext3 and jfs.

Steve Flynn

unread,

Oct 30, 2009, 7:38:08 AM10/30/09

to Ubuntu user technical support, not for general discussions

On Fri, Oct 30, 2009 at 10:58 AM, <fyr...@netscape.net> wrote:
> >Data loss anyone?<
>
> What evidence do you have that there would be data loss? ext2 and ext3 were
> used almost immediately after their release as well. The distro maintainers
> usually do some basic reliability tests or at least have access to such
> tests. So I would be happy to read any tests you've seen that suggest ext4
> is unreliable. To start scaring people with talk of data loss based on
> random speculation would not be good.

It's not random specualtion:

http://www.google.co.uk/search?q=ext4+dataloss+reports&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-GB:official&client=firefox-a

--
Steve
When one person suffers from a delusion it is insanity. When many
people suffer from a delusion it is called religion.

09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0

Avi Greenbury

unread,

Oct 30, 2009, 7:52:26 AM10/30/09

to Ubuntu user technical support, not for general discussions

fyr...@netscape.net wrote:
> >Data loss anyone?<
> What evidence do you have that there would be data loss?

It's a brand-spanking-new filesystem. Fear of data loss is natural and
understandable.

--
Avi Greenbury
http://aviswebsite.co.uk ;)
http://aviswebsite.co.uk/asking-questions

Karl F. Larsen

unread,

Oct 30, 2009, 8:05:54 AM10/30/09

to Ubuntu user technical support, not for general discussions

Raphael wrote:
>
> Help, after I had clean installed Karmic on my ext4 partition,

the performance was significantly slower compared to ext3.

Startup was around 7 secs but with ext4 it's now 20 secs
application

speeds are also slower.

If you can document this in a bug report it will help the
designers of ext4 to work on their project to speed it up.
I have been working on my wife's Windows XP and it is sure
lazy coming on. They have an early Microsoft add, and then XP
starts to unfold...

Karl

>
> Regards,
> Raphael

>
>
> New Email names for you!

> Get the Email name you've always wanted on the new @ymail and @rocketmail.

> Hurry before someone else does!

> http://mail.promotions.yahoo.com/newdomains/sg/
>

--

Karl F. Larsen, AKA K5DI
Linux User
#450462 http://counter.li.org.
Key ID = 3951B48D

Karl F. Larsen

unread,

Oct 30, 2009, 8:10:56 AM10/30/09

to Ubuntu user technical support, not for general discussions

Raphael wrote:
> But isn't the default filesystem for ubuntu ext4, and isn't it the default for a reason?
>
> Sent from my iPod
>
> On 30-Oct-2009, at 2:42 PM, Christopher Chan <christop...@bradbury.edu.hk> wrote:
>
> Raphael wrote:
>
>
> Help, after I had clean installed Karmic on my ext4 partition, the
> performance was significantly slower compared to ext3. Startup was
> around 7 secs but with ext4 it's now 20 secs application speeds are
> also slower.
> jfs! jfs! jfs!
>

When I loaded Karmic Beta I had a choice to use ext3 or ext4.
Grub 2 will run on either file system. This tells me I made a
bad choice going with ext4, but I thought it was fixed by now.

Karl

--

Karl F. Larsen, AKA K5DI
Linux User
#450462 http://counter.li.org.
Key ID = 3951B48D

Chan Chung Hang Christopher

unread,

Oct 30, 2009, 8:16:27 AM10/30/09

to ubuntu...@lists.ubuntu.com

fyr...@netscape.net wrote:
>
> >Data loss anyone?<
> What evidence do you have that there would be data loss? ext2 and ext3 were used almost immediately after their release as well. The distro maintainers usually do some basic reliability tests or at least have access to such tests. So I would be happy to read any tests you've seen that suggest ext4 is unreliable. To start scaring people with talk of data loss based on random speculation would not be good.
>
>

Dude, I used to work with clusters of mta boxes. The last thing I needed
then was a filesystem that loses data or corrupts its metadata easily. I
wait before using any new fangled filesystem regardless of how uber fast
it is or I play the pull the plug game with them with whatever
journaling mode they have available.

ext4 data loss reports started with Ubuntu Jaunty I think too?

Leonard Chatagnier

unread,

Oct 30, 2009, 8:21:03 AM10/30/09

to not for general discussionsUbuntu user technical support

--- On Fri, 10/30/09, Steve Flynn <another...@gmail.com> wrote:

> From: Steve Flynn <another...@gmail.com>
> Subject: Re: Slower performance with ext4

> To: "Ubuntu user technical support, not for general discussions" <ubuntu...@lists.ubuntu.com>
> Date: Friday, October 30, 2009, 6:38 AM
> On Fri, Oct 30, 2009 at 10:58
> AM, <fyr...@netscape.net>
> wrote:
> > >Data loss anyone?<
> >
> > What evidence do you have that there would be data
> loss? ext2 and ext3 were
> > used almost immediately after their release as well.
> The distro maintainers
> > usually do some basic reliability tests or at least
> have access to such
> > tests. So I would be happy to read any tests you've
> seen that suggest ext4
> > is unreliable. To start scaring people with talk of
> data loss based on
> > random speculation would not be good.
>
> It's not random specualtion:
>
> http://www.google.co.uk/search?q=ext4+dataloss+reports&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-GB:official&client=firefox-a
>
>

Reading the first couple of items shown in your above link it does appear to have been fixed in both Jaunty and Karmic. Your comment may not be random speculation but it does appear to be outdated. I'm using Karmic Beta 64 bit and have not experienced any data loss, thanks for that.
Leonard Chatagnier
lenc...@sbcglobal.net

Leonard Chatagnier

unread,

Oct 30, 2009, 8:26:34 AM10/30/09

to not for general discussionsUbuntu user technical support

--- On Fri, 10/30/09, Karl F. Larsen <klar...@gmail.com> wrote:

> From: Karl F. Larsen <klar...@gmail.com>
> Subject: Re: Slower performance with ext4
> To: "Ubuntu user technical support, not for general discussions" <ubuntu...@lists.ubuntu.com>

> Date: Friday, October 30, 2009, 7:05 AM
> Raphael wrote:
> >
> > Help, after I had clean installed Karmic on my ext4
> partition,
>
> the performance was significantly slower compared to ext3.
>
> Startup was around 7 secs but with ext4 it's now 20
> secs
> application
>
> speeds are also slower.
>
>
> If you can document this in a bug report
> it will help the
> designers of ext4 to work on their project to speed it up.
>
> I have been working on my wife's Windows XP and it is sure
>
> lazy coming on. They have an early Microsoft add, and then
> XP
> starts to unfold...
>
> Karl
>
>

Karl, try reading the link the OP gave. A bug report has already been filed. Actually it was on data loss as far as I read.
Leonard Chatagnier
lenc...@sbcglobal.net

Karl F. Larsen

unread,

Oct 30, 2009, 8:38:49 AM10/30/09

to Ubuntu user technical support, not for general discussions

I did goto his bug report and the responders seemed to change
it to lost data. In his message he said ext3 was much faster
than ext4.

Karl

--

Karl F. Larsen, AKA K5DI
Linux User
#450462 http://counter.li.org.
Key ID = 3951B48D

Steve Flynn

unread,

Oct 30, 2009, 9:08:14 AM10/30/09

to Ubuntu user technical support, not for general discussions

On Fri, Oct 30, 2009 at 12:21 PM, Leonard Chatagnier
<lenc...@sbcglobal.net> wrote:
>
>> It's not random specualtion:
>>
>> http://www.google.co.uk/search?q=ext4+dataloss+reports&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-GB:official&client=firefox-a
>>
> Reading the first couple of items shown in your above link it does appear to have been fixed in both Jaunty and Karmic. Your comment may not be random speculation but it does appear to be outdated. I'm using Karmic Beta 64 bit and have not experienced any data loss, thanks for that.

Actually I forgot to post the link I was really after. The release
notes for 9.10 highlight posible filesystem corruption with large
files (> 512 Mb)...

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/453579

--
Steve
When one person suffers from a delusion it is insanity. When many
people suffer from a delusion it is called religion.

09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0

--

Leonard Chatagnier

unread,

Oct 30, 2009, 9:12:18 AM10/30/09

to not for general discussionsUbuntu user technical support

It is a little confusing; subject vs. link. However, I find Karmic much faster than ext3 on everything except browser surfing but that's not the same issue.
Leonard Chatagnier
lenc...@sbcglobal.net

Leonard Chatagnier

unread,

Oct 30, 2009, 9:49:09 AM10/30/09

to not for general discussionsUbuntu user technical support

--- On Fri, 10/30/09, Steve Flynn <another...@gmail.com> wrote:

> From: Steve Flynn <another...@gmail.com>
> Subject: Re: Slower performance with ext4
> To: "Ubuntu user technical support, not for general discussions" <ubuntu...@lists.ubuntu.com>

> Date: Friday, October 30, 2009, 8:08 AM
> On Fri, Oct 30, 2009 at 12:21 PM,
> Leonard Chatagnier
> <lenc...@sbcglobal.net>
> wrote:
> >
> >> It's not random specualtion:
> >>
> >> http://www.google.co.uk/search?q=ext4+dataloss+reports&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-GB:official&client=firefox-a
> >>
> > Reading the first couple of items shown in your above
> link it does appear to have been fixed in both Jaunty and
> Karmic. Your comment may not be random speculation but it
> does appear to be outdated. I'm using Karmic Beta 64 bit
> and have not experienced any data loss, thanks for that.
>
> Actually I forgot to post the link I was really after. The
> release
> notes for 9.10 highlight posible filesystem corruption with
> large
> files (> 512 Mb)...
>
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/453579
>
>

Interesting read. However, I don't see anything like it on my Karmic 64 bit without any lvm or raid setup just a plain install. I've downloaded and burned the Karmic beta live DVD, a large file, without any issues. Even checked the md5sum, sha1sun and the sha256sum(think that's right) and all checked including the disc verification on boot up.
As I mentioned to Karl in his reply, I find Karmic very fast compared to older versions except for browsing speed but that is another issue entirely, I believe.
Apparently, everyone's case is unique and YMMV. I do have a gut feeling that some of the issues are hardware related.
Leonard Chatagnier
lenc...@sbcglobal.net

Chan Chung Hang Christopher

unread,

Oct 30, 2009, 10:35:14 AM10/30/09

to ubuntu...@lists.ubuntu.com

Saying weird things like Karmic (an OS) being faster than ext3 (a
filesystem) just adds to the confusing.

Just do something like this: http://www.htiweb.inf.br/benchmark/fsbench.htm

For those interested, I have a tarball of fsbench (perl scripts emulated
delivery to maildirs) if you wish to see filesystem performance when
used for a mail store.

fyr...@netscape.net

unread,

Oct 30, 2009, 2:54:11 PM10/30/09

to ubuntu...@lists.ubuntu.com

Good references given. Google returns a lot of corruption hits but most are from Jan-March of this year. Those are related to write-caching which is always a risk in case of power failure or system hang (mentioned in most that I read). There was reference to a boot-time mount parameter as a "work around," which may have to do with disabling write-caching. This would definitely affect performance. I am still reading docs to see if that is indeed what has happened.

The document https://bugs.launchpad.net/ubuntu/+source/linux/+bug/453579 makes for very good reading (if you can stay awake). It includes discussion of whether or not to use ext4 as default in this release or revert to ext3. Some people wanted to stick to the traditional Debian conservatism and others (the Mavericks) sided with the bleeding edge. This was enlightening. The case was made that it was better to not take any chances and just use ext3. Then it was pointed out that of the corruption bug reports, there was no definitive link between the reports and they could not reproduce the error consistently enough to say that this component or that component/driver was defective. I found this to be the most revealing message:

" The relation to that upstream bug is tenuous at best. The upstream bug:
    - is reported against a newer kernel than the one we're shipping
    - is reported to only happen when ext4 is on top of the DM layer, whereas Scott's
case was ext4 on a raw device
    - is reported in connection with an unclean shutdown and subsequent fsck,
whereas Scott reported corruption of files without an unclean shutdown
      (but no mention in this bug of whether the corruption requires an intervening
      reboot/fsck to appear - Scott, please clarify)

So that upstream bug link should be dropped; it really doesn't look like the same bug."

So this was the reason for going ahead with the ext4 deployment depending on how you interpret what is being said here. I have to say that given how Linux is used and considering these docs, sticking with ext4 was a risky move. The decision may be vindicated if no bug is found with the file system itself. Distros always have to be wary of being last to roll out a feature. Why? Two words: Competitive Disadvantage. I can't imagine that this is an easy decision for the guy who has to balance those two forces. I have to think that EULA's which absolve software companies of everything but dodging taxes play a big role in how these decisions are made, nicht wahr?

Regards,

John

-----Original Message-----
From: Chan Chung Hang Christopher <christop...@bradbury.edu.hk>
To: ubuntu...@lists.ubuntu.com

Amedee Van Gasse (ub)

unread,

Oct 30, 2009, 7:18:47 PM10/30/09

to ubuntu...@lists.ubuntu.com

On Fri, October 30, 2009 07:08, Raphael wrote:
>
>
> Help, after I had clean installed Karmic on my ext4 partition, the
> performance was significantly slower compared to ext3. Startup was around
> 7 secs but with ext4 it's now 20 secs application speeds are also slower.

What are you comparing?
* Karmic Koala clean install on ext3
with
* Karmic Koala clean install on ext4

Could you please do a clean install with ext3 + install bootchart, to get
an exact timing, and then do the same with a clean ext4 install? Thank
you.

Amedee Van Gasse (ub)

unread,

Oct 30, 2009, 7:27:10 PM10/30/09

to ubuntu...@lists.ubuntu.com

On Fri, October 30, 2009 11:58, fyr...@netscape.net wrote:
>
>
> >Data loss anyone?<
> What evidence do you have that there would be data loss?

Ext4 uses a filesystem performance technique called allocate-on-flush,
also known as delayed allocation. It consists of delaying block allocation
until the data is going to be written to the disk, unlike some other file
systems, which may allocate the necessary blocks before that step. This
improves performance and reduces fragmentation by improving block
allocation decisions based on the actual file size.

Delayed allocation poses some additional risk of data loss in cases where
the system crashes before all of the data has been written to the disk.

The typical scenario in which this might occur is a program replacing the
contents of a file without forcing a write to the disk with fsync.
Problems can arise if the system crashes before the actual write occurs.
In this situation, users of ext3 have come to expect that the disk will
hold either the old version or the new version of the file following the
crash. However, the ext4 code in the Linux kernel version 2.6.28 will
often clear the contents of the file before the crash, but never write the
new version, thus losing the contents of the file entirely.

Altering this behavior by using fsync more often could lead to severe
performance penalties on ext3 filesystems mounted with the data=ordered
flag (the default on most Linux distributions). Given that both
file-systems will be in use for some time, this complicates matters
enormously for end-user application developers. In response, Theodore Ts'o
has written some patches for ext4 that cause it to limit its delayed
allocation in these common cases. For a small cost in performance, this
will significantly increase the chance that either version of the file
will survive the crash.

The new patches are expected to become part of the mainline kernel 2.6.30.
Various distributions may choose to backport them to 2.6.28 or 2.6.29, for
instance Ubuntu made them part of the 2.6.28 kernel in version 9.04—Jaunty
Jackalope.

(from Wikipedia)

Amedee Van Gasse (ub)

unread,

Oct 30, 2009, 7:27:51 PM10/30/09

to ubuntu...@lists.ubuntu.com

On Fri, October 30, 2009 12:38, Steve Flynn wrote:
> On Fri, Oct 30, 2009 at 10:58 AM, <fyr...@netscape.net> wrote:
>> >Data loss anyone?<
>>
>> What evidence do you have that there would be data loss? ext2 and ext3
>> were
>> used almost immediately after their release as well. The distro
>> maintainers
>> usually do some basic reliability tests or at least have access to such
>> tests. So I would be happy to read any tests you've seen that suggest
>> ext4
>> is unreliable. To start scaring people with talk of data loss based on
>> random speculation would not be good.
>
> It's not random specualtion:
>
> http://www.google.co.uk/search?q=ext4+dataloss+reports&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-GB:official&client=firefox-a

And it was solved for Ubuntu 9.04 by Theodore Ts'o...

Karl F. Larsen

unread,

Oct 30, 2009, 7:35:47 PM10/30/09

to Ubuntu user technical support, not for general discussions

Amedee Van Gasse (ub) wrote:
> On Fri, October 30, 2009 07:08, Raphael wrote:
>>
>> Help, after I had clean installed Karmic on my ext4 partition, the
>> performance was significantly slower compared to ext3. Startup was around
>> 7 secs but with ext4 it's now 20 secs application speeds are also slower.
>
> What are you comparing?
> * Karmic Koala clean install on ext3
> with
> * Karmic Koala clean install on ext4
>
> Could you please do a clean install with ext3 + install bootchart, to get
> an exact timing, and then do the same with a clean ext4 install? Thank
> you.
>
>

This is important. My old computer takes 45 seconds to go
from clicking Grub start to full on. I expect the speed of the
CPU is critical to a shorter time.

I am 75 years old and 45 seconds is blinding speed!

73 Karl

--

Karl F. Larsen, AKA K5DI
Linux User
#450462 http://counter.li.org.
Key ID = 3951B48D

Chan Chung Hang Christopher

unread,

Oct 31, 2009, 10:55:36 AM10/31/09

to ubuntu...@lists.ubuntu.com

fyr...@netscape.net wrote:
> Good references given. Google returns a lot of corruption hits but most are from Jan-March of this year. Those are related to write-caching which is always a risk in case of power failure or system hang (mentioned in most that I read). There was reference to a boot-time mount parameter as a "work around," which may have to do with disabling write-caching. This would definitely affect performance. I am still reading docs to see if that is indeed what has happened.
>
>
>
> The document https://bugs.launchpad.net/ubuntu/+source/linux/+bug/453579 makes for very good reading (if you can stay awake). It includes discussion of whether or not to use ext4 as default in this release or revert to ext3. Some people wanted to stick to the traditional Debian conservatism and others (the Mavericks) sided with the bleeding edge. This was enlightening. The case was made that it was better to not take any chances and just use ext3. Then it was pointed out that of the corruption bug reports, there was no definitive link between the reports and they could not reproduce the error consistently enough to say that this component or that component/driver was defective. I found this to be the most revealing message:
>
>
>
>
>
> " The relation to that upstream bug is tenuous at best. The upstream bug:
>
>
> - is reported against a newer kernel than the one we're shipping
>
>
> - is reported to only happen when ext4 is on top of the DM layer, whereas Scott's
> case was ext4 on a raw device
>
>
> - is reported in connection with an unclean shutdown and subsequent
> fsck,
> whereas Scott reported corruption of files without an unclean
> shutdown
> (but no mention in this bug of whether the corruption requires
> an intervening
> reboot/fsck to appear - Scott, please clarify)
>
>
>
>
> So that upstream bug link should be dropped; it really doesn't look like the same bug."
>
>
>
>
>
>
>
> So this was the reason for going ahead with the ext4 deployment depending on how you interpret what is being said here. I have to say that given how Linux is used and considering these docs, sticking with ext4 was a risky move. The decision may be vindicated if no bug is found with the file system itself. Distros always have to be wary of being last to roll out a feature. Why? Two words: Competitive Disadvantage. I can't imagine that this is an easy decision for the guy who has to balance those two forces. I have to think that EULA's which absolve software companies of everything but dodging taxes play a big role in how these decisions are made, nicht wahr?
>
>

So long as we get to choose what filesystem is used, I really do not
care what they decide is default. jfs
has the second best write performance since the last test I saw and it
has proven itself recently by surviving a ntop, no filters applied, rrd
database creation run involving thousands of hosts without crashing the
system nor loosing data. I will just stick with what is proven and still
performing well.

Roy Smith

unread,

Oct 31, 2009, 11:36:38 AM10/31/09

to Ubuntu user technical support, not for general discussions

I just did a clean install yesterday and the way the installer is set up now ext4 is the default when using the guided setup. You can have it use ext3 or reiser4 if you manually set up your partitions.

-- 

Roy Smith
Ubuntu 9.10 Karmic Koala
Registered Linux User #488144
Registered Ubuntu User #26841

Mark Kirkwood

unread,

Oct 31, 2009, 5:52:42 PM10/31/09

to Ubuntu user technical support, not for general discussions

Raphael wrote:
>
>
> Help, after I had clean installed Karmic on my ext4 partition, the
> performance was significantly slower compared to ext3. Startup was
> around 7 secs but with ext4 it's now 20 secs application speeds are
> also slower.
>

You will get similar performance to ext3 if you disable write barrier
support (barrier=0 in fstab) see:

http://kernelnewbies.org/Ext4#head-25c0a1275a571f7332fa196d4437c38e79f39f63

However, the write barrier support is a safety feature. However my work
machine uses ext4 with barriers disabled, so far without any issues.

This got me wondering about barrier support in ext3 - is it not
implemented or just off by default? From my brief glance at ext3 code,
there appears to be some stuff about barriers in there, but I couldn't
see anything about the defaults (you can specify barrier=0/1 for an ext3
filesystem when mounting - not sure if it has any effect tho!

Regards

Mark

Chan Chung Hang Christopher

unread,

Nov 1, 2009, 5:52:19 AM11/1/09

to Ubuntu user technical support, not for general discussions

> I just did a clean install yesterday and the way the installer is set up now
> ext4 is the default when using the guided setup. You can have it use ext3 or
> reiser4 if you manually set up your partitions.
>
>

reiser4? Seriously? Did it make into the mainline kernel or something? I
have not followed reiser4 much after Hans got jailed.

fyr...@aim.com

unread,

Nov 1, 2009, 5:20:52 PM11/1/09

to ubuntu...@lists.ubuntu.com

Yea, too bad about that. Was always fast and very stable. Of course, open source is the gift that keeps on giving.

-----Original Message-----
From: Chan Chung Hang Christopher <christop...@bradbury.edu.hk>
Sent: Sun, Nov 1, 2009 11:52 am
Subject: Re: Slower performance with ext4

Rashkae

unread,

Nov 1, 2009, 7:19:08 PM11/1/09

to Ubuntu user technical support, not for general discussions

Chan Chung Hang Christopher wrote:

> fyr...@netscape.net wrote:
>>
>> >Data loss anyone?<
>> What evidence do you have that there would be data loss? ext2 and ext3 were used almost immediately after their release as well. The distro maintainers usually do some basic reliability tests or at least have access to such tests. So I would be happy to read any tests you've seen that suggest ext4 is unreliable. To start scaring people with talk of data loss based on random speculation would not be good.
>>
>>
>
> Dude, I used to work with clusters of mta boxes. The last thing I needed
> then was a filesystem that loses data or corrupts its metadata easily. I
> wait before using any new fangled filesystem regardless of how uber fast
> it is or I play the pull the plug game with them with whatever
> journaling mode they have available.
>
>
> ext4 data loss reports started with Ubuntu Jaunty I think too?
>

The early data loss in Jaunty was really applications clobbering their
own files combined with EXT4's delayed allocation. Basically, EXT4 was
behaving, for all intents and purposes, like XFS, without the null
bytes. (I still question the sanity of whoever thought this would be a
good idea.. after all, wouldn't be all be using XFS years ago if this
behaviour was so superior?) Following patches back ported to change
that introduced kernel soft lock bug in the ubuntu kernel (that was
never confirmed in the mainline kernel.). And now we have uncomfired
sightings of data corruption, but the one person who claims to reproduce
that looks like he has memory corruption issues. (He gets a different
md5sum every time he checks the same file... not really a filesystem
issue there.)

None of this is really applicable to your point. for a mission critical
production system, you want to use what's known and proven (I do find
the choice of jfs odd however. I like EXT3 for reliable and
predictable, and XFS for performance, so long as I know my particular
workload won't be affected by XFS's null bytes on unclean shutdown.)

However, the improvements EXT4 has made to the workloads that caused
EXT3 bad performance are amazing.. I'm much looking forward to the
testing/proving phase to be done.

Christopher Chan

unread,

Nov 1, 2009, 7:47:15 PM11/1/09

to ubuntu...@lists.ubuntu.com

XFS blooming aggressive caching and lack of full journaling is a
disaster waiting to happen for mta queues. If you are running Centos,
you only get ext3...

JFS seems to have the second best performance overall according to Bruce
Guenter's maildir simulated local mail delivery benchmark. and it is
stable too.

> However, the improvements EXT4 has made to the workloads that caused
> EXT3 bad performance are amazing.. I'm much looking forward to the
> testing/proving phase to be done.
>
>

:-D. Looking for a guinea pig that is willing to run fsbench from Bruce
Guenter. Too bad it is of interest only to mail admins :-P

Rashkae

unread,

Nov 1, 2009, 8:08:51 PM11/1/09

to Ubuntu user technical support, not for general discussions

XFS has as much journaling as any other candidates. Journal for
metadata. And all MTA's, reportedly, write files in a sane manner and
never assume a file is written to disk until the fsync completes, and
therefore, are not at all affected by XFS aggressive caching. Mail
server is therefore one of the workloads XFS is best suited for.

> JFS seems to have the second best performance overall according to Bruce
> Guenter's maildir simulated local mail delivery benchmark. and it is
> stable too.

JFS performs great in benchmarks, but back when I used to use it, I've
consistently been able to bend it out of shape under real world
conditions.. No data loss mind you, but damaged meta data (fixed with
jfs repair, but that should never be needed in a modern file system) and
bizarre corner cases that caused performance to sink through the floor.
(in one instance, I was able to reproduce an issue where reading a file
while writing new files to disk would perform poorly depending on
whether the filename had one . or two. Ie, if the filename was
something.tar.gz, or renamed to something.tgz.) At one time in the
distant past, someone completely broke quota support in JFS, and no one
even noticed for 4 kernel releases. JFS just doesn't seem to have
enough people using it to maintain a well tested status.

Christopher Chan

unread,

Nov 1, 2009, 9:36:17 PM11/1/09

to Ubuntu user technical support, not for general discussions

Journaling only for metadata is not 'as much journaling as any other
canditates.' You cannot say metadata journaling only as equivalent to
the data and metadata journaling that is possible with ext3. XFS's
journaling only provides filesystem metadata consistency which is why
you get files full of NULLs after a crash/power out. MTAs rely on fsync
calls and how a filesystem behaves in regards to fsync requests is the
real determiner of whether there is a data guarantee or not. XFS does
not provide data guarantee. It, at best, provides a metadata guarantee.
XFS should not be used for mta queues unless it is in conjunction with
hardware raid that has a bbu cache. XFS is best suited for streaming
applications where the data loss is tolerated.

>
>> JFS seems to have the second best performance overall according to Bruce
>> Guenter's maildir simulated local mail delivery benchmark. and it is
>> stable too.
>>
>
> JFS performs great in benchmarks, but back when I used to use it, I've
> consistently been able to bend it out of shape under real world
> conditions.. No data loss mind you, but damaged meta data (fixed with
> jfs repair, but that should never be needed in a modern file system) and
> bizarre corner cases that caused performance to sink through the floor.
> (in one instance, I was able to reproduce an issue where reading a file
> while writing new files to disk would perform poorly depending on
> whether the filename had one . or two. Ie, if the filename was
> something.tar.gz, or renamed to something.tgz.) At one time in the
> distant past, someone completely broke quota support in JFS, and no one
> even noticed for 4 kernel releases. JFS just doesn't seem to have
> enough people using it to maintain a well tested status.
>
>
>

I would put that to no one being bothered to report bugs and also the
lack of users.

James Michael Fultz

unread,

Nov 1, 2009, 10:03:42 PM11/1/09

to ubuntu...@lists.ubuntu.com

* Rashkae <ubu...@tigershaunt.com> [2009-11-01 19:19 -0500]:

> The early data loss in Jaunty was really applications clobbering their
> own files combined with EXT4's delayed allocation. Basically, EXT4 was
> behaving, for all intents and purposes, like XFS, without the null
> bytes. (I still question the sanity of whoever thought this would be a
> good idea.. after all, wouldn't be all be using XFS years ago if this

> behaviour was so superior?) [...]

Qualification: Behaving the way XFS did a few years ago. It was fixed
and has been for a few years now.

> [...] I like EXT3 for reliable and predictable, and XFS for

> performance, so long as I know my particular workload won't be
> affected by XFS's null bytes on unclean shutdown.)

I've been using XFS since its problems were fixed and haven't suffered
any data loss due to it despite many unclean shutdowns due to power
loss.

I am now considering using ext4 when I upgrade to 9.10, however.

Christopher Chan

unread,

Nov 1, 2009, 10:18:26 PM11/1/09

to Ubuntu user technical support, not for general discussions

>> JFS seems to have the second best performance overall according to Bruce
>> Guenter's maildir simulated local mail delivery benchmark. and it is
>> stable too.
>>
>
> JFS performs great in benchmarks, but back when I used to use it, I've
> consistently been able to bend it out of shape under real world
> conditions.. No data loss mind you, but damaged meta data (fixed with
> jfs repair, but that should never be needed in a modern file system) and
> bizarre corner cases that caused performance to sink through the floor.

The benchmark that I looked at emulated real world conditions. Delivery
to a maildir (fsbench). Not iffy benchmarks like hdparm or bonnie or
postmark.

Rashkae

unread,

Nov 1, 2009, 11:35:32 PM11/1/09

to Ubuntu user technical support, not for general discussions

>
> Journaling only for metadata is not 'as much journaling as any other
> canditates.' You cannot say metadata journaling only as equivalent to
> the data and metadata journaling that is possible with ext3. XFS's
> journaling only provides filesystem metadata consistency which is why
> you get files full of NULLs after a crash/power out. MTAs rely on fsync
> calls and how a filesystem behaves in regards to fsync requests is the
> real determiner of whether there is a data guarantee or not. XFS does
> not provide data guarantee. I

This is completely false,, XFS gives as much data guarantee as the other
filesystems in respects to an fsync. The reason files can have Null
bytes appended to them in XFS is because XFS, unlike ext3, will commit
meta data changes out of order from the data actually being written to
disk, but this has nothing to do with fsync, which works as intended.

Mark Kirkwood

unread,

Nov 2, 2009, 12:33:44 AM11/2/09

to Ubuntu user technical support, not for general discussions

Christopher Chan wrote:
>
> Journaling only for metadata is not 'as much journaling as any other
> canditates.' You cannot say metadata journaling only as equivalent to
> the data and metadata journaling that is possible with ext3. XFS's
> journaling only provides filesystem metadata consistency which is why
> you get files full of NULLs after a crash/power out. MTAs rely on fsync
> calls and how a filesystem behaves in regards to fsync requests is the
> real determiner of whether there is a data guarantee or not. XFS does
> not provide data guarantee. It, at best, provides a metadata guarantee.
> XFS should not be used for mta queues unless it is in conjunction with
> hardware raid that has a bbu cache. XFS is best suited for streaming
> applications where the data loss is tolerated.
>
>

Sorry, but that is completely incorrect. Applications that use fsync are
safe with any filesystem - fsync forces the modified buffers to *disk*,
so all discussions about os and filesystem caching are irrelivant[1].

Cheers

Mark

[1] You do have to consider whether the underlying disk firmware honors
the fsync request to flush - this is why scsi disks are still often
preferred for data critical situations. It is only recently with the
advent of more advanced sata firmware that they too are now reasonably
usable in those situations (tho you want to leave write barrier support
enabled then!)

Christopher Chan

unread,

Nov 2, 2009, 12:45:41 AM11/2/09

to ubuntu...@lists.ubuntu.com

Rashkae wrote:
>> Journaling only for metadata is not 'as much journaling as any other
>> canditates.' You cannot say metadata journaling only as equivalent to
>> the data and metadata journaling that is possible with ext3. XFS's
>> journaling only provides filesystem metadata consistency which is why
>> you get files full of NULLs after a crash/power out. MTAs rely on fsync
>> calls and how a filesystem behaves in regards to fsync requests is the
>> real determiner of whether there is a data guarantee or not. XFS does
>> not provide data guarantee. I
>>
>
>
> This is completely false,, XFS gives as much data guarantee as the other
> filesystems in respects to an fsync. The reason files can have Null
> bytes appended to them in XFS is because XFS, unlike ext3, will commit
> meta data changes out of order from the data actually being written to
> disk, but this has nothing to do with fsync, which works as intended.
>
>
>

Wake up call.

http://www.humboldt.co.uk/2009/03/fsync-across-platforms.html

Even the definition of fsync indicates that not every fsync call returns
after data is safely on disk. fsync returns after metadata has hit disk
in the case of XFS, JFS, reiserfs and ext3 ordered mode and by 'hit
disk' I mean the journal of the filesystem and not the actual location
in the filesystem.

http://www.opengroup.org/onlinepubs/009695399/functions/fsync.html

'In the middle ground between these extremes, /fsync/() might or might
not actually cause data to be written where it is safe from a power
failure.'

For this reason, some people might go to such lengths as installing
nvram cards for use as external journals for ext3 in full journaling
mode in conjunction with their software raid or get a hardware raid that
has bbu cache.

Christopher Chan

unread,

Nov 2, 2009, 12:49:48 AM11/2/09

to Ubuntu user technical support, not for general discussions

Mark Kirkwood wrote:
> Christopher Chan wrote:
>
>> Journaling only for metadata is not 'as much journaling as any other
>> canditates.' You cannot say metadata journaling only as equivalent to
>> the data and metadata journaling that is possible with ext3. XFS's
>> journaling only provides filesystem metadata consistency which is why
>> you get files full of NULLs after a crash/power out. MTAs rely on fsync
>> calls and how a filesystem behaves in regards to fsync requests is the
>> real determiner of whether there is a data guarantee or not. XFS does
>> not provide data guarantee. It, at best, provides a metadata guarantee.
>> XFS should not be used for mta queues unless it is in conjunction with
>> hardware raid that has a bbu cache. XFS is best suited for streaming
>> applications where the data loss is tolerated.
>>
>>
>>
> Sorry, but that is completely incorrect. Applications that use fsync are
> safe with any filesystem - fsync forces the modified buffers to *disk*,
> so all discussions about os and filesystem caching are irrelivant[1].
>

Yes...where *disk* = journal. Which for JFS, XFS and ext3 data=ordered
means metadata only. Only ext3 data=journal guarantees data and
metadata. Feel free to get (whoever filesystem developer) to confirm for
me because you won't get any other answer than what I have just posted.

Mark Kirkwood

unread,

Nov 2, 2009, 1:28:26 AM11/2/09

to Ubuntu user technical support, not for general discussions

Not so - disk != journal. At fsync the buffers are written through the
os buffer to the physical disk cache, and the cache is instructed to
write 'em to the rotating media. This is for data, *not* always metadata
(see man for fsync vs fdatasync). In fact it is the metadata that has
historically caused the most problems - hence the need to journal this.

The vast majority of the world databases and mail servers depend on the
fact that fsync forces modified *data* buffers to their respective file
on disk.

The zero length files that people dislike so much on xfs are caused by
applications that do *not* request an fsync - and also cheap sata disks
that do not honor fsync's request to actually write the buffers...
thankfully these are less common now (especially for serious sata drives
like WD's Velociraptor).

regards

Mark

Ong Raphael

unread,

Nov 2, 2009, 1:47:31 AM11/2/09

to Ubuntu user technical support, not for general discussions

So, is the conclusion that ext4 sucks and ext3 rocks in performance wise?

From: Christopher Chan <christop...@bradbury.edu.hk>

To: "Ubuntu user technical support, not for general discussions" <ubuntu...@lists.ubuntu.com>

Sent: Monday, 2 November 2009 1:49:48

Subject: Re: Slower performance with ext4

Search. browse and book your hotels and flights through Yahoo! Travel

Christopher Chan

unread,

Nov 2, 2009, 1:51:41 AM11/2/09

to ubuntu...@lists.ubuntu.com

Maybe you want to first VERIFY with the various filesystem developers
before you start yapping what appears to be the only sensible
explanation but is in fact a myth. On Linux, XFS, JFS and ext3
data=ordered return fsync as soon as the metadata hits the journal on
disk and before the data is commited to its location on the filesystem
and metadata is committed to its location. ext3 data=journal returns
after both data and metadata is committed to disk on the JOURNAL and
before they are written to their locations in the filesystem. I have not
yet looked at ext4 so I will not say anything about what it does.

> The vast majority of the world databases and mail servers depend on the
> fact that fsync forces modified *data* buffers to their respective file
> on disk.
>

Sure. Too bad that is not always true.

> The zero length files that people dislike so much on xfs are caused by
> applications that do *not* request an fsync - and also cheap sata disks
> that do not honor fsync's request to actually write the buffers...
> thankfully these are less common now (especially for serious sata drives
> like WD's Velociraptor).
>
>

Heh, what do you know? I have been burned by XFS after a powerloss and
got over 4000 zero length files in a postfix queue. No filesystem
corruption, just zero data files. You want to tell me that postfix does
not use fsync? You can guess what I did to the XFS filesystem mounted
for the queue directory. I destroyed it and got ext3 instead in full
data journal mode. Which I repeated on all the other mtas that had a XFS
filesystem for their mail queue.

Christopher Chan

unread,

Nov 2, 2009, 1:54:39 AM11/2/09

to ubuntu...@lists.ubuntu.com

Ong Raphael wrote:
>
> So, is the conclusion that ext4 sucks and ext3 rocks in performance wise?

I do not know the answer to that because I have not looked at how ext4
behaves. Performance wise, if you are not using hardware raid with bbu
cache, ext3 in data=journal mode and with its journal stored externally
on a bbu nvram card will blow XFS out of the water and still guarantee
data and metadata consistency.

> ------------------------------------------------------------------------
> *From:* Christopher Chan <christop...@bradbury.edu.hk>
> *To:* "Ubuntu user technical support, not for general discussions"
> <ubuntu...@lists.ubuntu.com>
> *Sent:* Monday, 2 November 2009 1:49:48
> *Subject:* Re: Slower performance with ext4

> ubuntu...@lists.ubuntu.com <mailto:ubuntu...@lists.ubuntu.com>

> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
>

> ------------------------------------------------------------------------
>
> Search. browse and book
> <http://sg.rd.yahoo.com/spirit/fea/travel/*http://sg.travel.yahoo.com>

> your hotels and flights through Yahoo! Travel
>

Christopher Chan

unread,

Nov 2, 2009, 2:14:31 AM11/2/09

to Ubuntu user technical support, not for general discussions

It appears that ext4 shares the same journaling modes as ext3 with some
data-integrity and performance enhancements. Anyway, here is the
relevant portion of another party's comments on filesystem behaviour
that you may find more acceptable than mine.

"Ext4 supports multiple modes of journaling, depending upon the needs of
the user. For example, ext4 supports a mode in which only metadata is
journaled (Writeback mode), a mode in which metadata is journaled but
data is written as the metadata is written from the journal (Ordered
mode), and a mode in which both metadata and data are journaled (Journal
mode—the most reliable mode). Note that Journal mode, although the best
way to ensure a consistent file system, is also the slowest, because all
data flows through the journal."

From: http://www.ibm.com/developerworks/linux/library/l-anatomy-ext4/

In case you have not figured out why ext3 + external journal on uber
fast and secure bbu nvram card blows everything else, except filesystems
on hardward raid + bbu cache, out of the water, it is because fsync
returns as soon as stuff is committed to...the journal.

Mark Kirkwood

unread,

Nov 2, 2009, 2:19:15 AM11/2/09

to Ubuntu user technical support, not for general discussions

Christopher Chan wrote:
>
> Heh, what do you know? I have been burned by XFS after a powerloss and
> got over 4000 zero length files in a postfix queue. No filesystem
> corruption, just zero data files. You want to tell me that postfix does
> not use fsync? You can guess what I did to the XFS filesystem mounted
> for the queue directory. I destroyed it and got ext3 instead in full
> data journal mode. Which I repeated on all the other mtas that had a XFS
> filesystem for their mail queue.
>
>

Hmm - not gonna get into trading personal insults , as nothing is to be
gained that way.

You were running this on server grade hardware? or - let me guess - a
workstation with cheap sata drives? I have run many instances of mysql,
postgres and oracle on *server* grade hardware [1] with xfs for probably
the last 7 years and never have *any* data corruption issue in spite of
many power outages...

regards

Mark

[1] meaning a designated server mobo with eec ram and scsi (or sas) hard
drives.

Mark Kirkwood

unread,

Nov 2, 2009, 2:30:40 AM11/2/09

to Ubuntu user technical support, not for general discussions

Mark Kirkwood wrote:
> Christopher Chan wrote:
>
>> Heh, what do you know? I have been burned by XFS after a powerloss and
>> got over 4000 zero length files in a postfix queue. No filesystem
>> corruption, just zero data files. You want to tell me that postfix does
>> not use fsync? You can guess what I did to the XFS filesystem mounted
>> for the queue directory. I destroyed it and got ext3 instead in full
>> data journal mode. Which I repeated on all the other mtas that had a XFS
>> filesystem for their mail queue.
>>
>>
>>
>
> Hmm - not gonna get into trading personal insults , as nothing is to be
> gained that way.
>
> You were running this on server grade hardware? or - let me guess - a
> workstation with cheap sata drives? I have run many instances of mysql,
> postgres and oracle on *server* grade hardware [1] with xfs for probably
> the last 7 years and never have *any* data corruption issue in spite of
> many power outages...
>
> regards
>
> Mark
>
> [1] meaning a designated server mobo with eec ram and scsi (or sas) hard
> drives.
>
>
>

Interesting data point for both of us:

http://blogs.gnome.org/alexl/2009/03/16/ext4-vs-fsync-my-take/

He claims ext4 is safe with sensible usage of fsync but reckons xfs is
not. Without wading through the code for the various fs it is tricky to
be 100% sure if he is correct or mistaken, as it is clearly *possible*
for the respective fs drivers to intercept the f(data)sync etc calls and
do undeserved violence to 'em....

regards

Mark

Christopher Chan

unread,

Nov 2, 2009, 2:38:56 AM11/2/09

to Ubuntu user technical support, not for general discussions

Mark Kirkwood wrote:
> Christopher Chan wrote:
>
>> Heh, what do you know? I have been burned by XFS after a powerloss and
>> got over 4000 zero length files in a postfix queue. No filesystem
>> corruption, just zero data files. You want to tell me that postfix does
>> not use fsync? You can guess what I did to the XFS filesystem mounted
>> for the queue directory. I destroyed it and got ext3 instead in full
>> data journal mode. Which I repeated on all the other mtas that had a XFS
>> filesystem for their mail queue.
>>
>>
>>
>
> Hmm - not gonna get into trading personal insults , as nothing is to be
> gained that way.
>

No. This is not an insult. You are doing others a disservice by spouting
myths. I am now calling your bluff on how different filesystems behave
in regards to fsync requests and challenge you to get an authoritative
answer from any of the developers of XFS, JFS, ext(x) that contradicts
what I has said.

> You were running this on server grade hardware? or - let me guess - a
> workstation with cheap sata drives? I have run many instances of mysql,
> postgres and oracle on *server* grade hardware [1] with xfs for probably
> the last 7 years and never have *any* data corruption issue in spite of
> many power outages...
>

Did you miss my remarks about when you are not using hardware raid + bbu
cache? You do know that such hardware covers for any short comings in
filesystems with regards to data consistency and that that is the reason
for the existence of such hardware?

> regards
>
> Mark
>
> [1] meaning a designated server mobo with eec ram and scsi (or sas) hard
> drives.
>
>

A server motherboard that uses ECC RAM and SAS/SCSI hard drives and
software raid will suffer the same results. You have been spouting
inaccurate information about filesystem behaviour that will affect those
who do not have the means to purchase your uber hardware that covers for
any filesystem's shortcomings with respects to data integrity. Others
make do with less by having a full understanding of the behaviour of the
operating systems they run whether it is FreeBSD and softupdates or
Linux and its various filesystems that support journaling. You can get
the same data integrity on lesser hardware (motherboards supporting
ECC-RAM are no longer the realm of 'server' grade motherboards) if
configured properly.

James Michael Fultz

unread,

Nov 2, 2009, 2:42:35 AM11/2/09

to ubuntu...@lists.ubuntu.com

* Mark Kirkwood <mar...@paradise.net.nz> [2009-11-02 20:19 +1300]:
> Christopher Chan wrote:
[claims of mailspool corruption on XFS]

>
> You were running this on server grade hardware? or - let me guess - a
> workstation with cheap sata drives? I have run many instances of mysql,
> postgres and oracle on *server* grade hardware [1] with xfs for probably
> the last 7 years and never have *any* data corruption issue in spite of
> many power outages...

While the difference in application and work loads matters too, I've
been running XFS on my desktop computers built from consumer-grader
hadware for years without data loss despite unclean shutdowns due to
power outtages.

But don't take my word for it, XFS has been fixed.

<http://sandeen.net/wordpress/?p=17>

Christopher Chan

unread,

Nov 2, 2009, 2:43:44 AM11/2/09

to ubuntu...@lists.ubuntu.com

No, it is more a problem of the myth of fsync guaranteeing data is
committed to the filesystem every time.

http://www.opengroup.org/onlinepubs/009695399/functions/fsync.html

Not even the specification explicitly spells that out.

ext4 fsync is only safe if data=journal is used and write-caches are
either disabled or have a bbu.

Mark Kirkwood

unread,

Nov 2, 2009, 2:45:51 AM11/2/09

to Ubuntu user technical support, not for general discussions

Further, here is a posting about databases, xfs and fsync:

http://www.mysqlperformanceblog.com/2009/03/02/ssd-xfs-lvm-fsync-write-cache-barrier-and-lost-transactions/

It reinforces what I've been saying - issues with device firmware can
get you into trouble, these guys needed to disable the ssd write cache
to guarantee reliability. Most database admins (including me) recommend
battery backed raid controllers when used with sata drivers to be sure
(no matter what fs is being used - or what operating system for that
matter).

Mark Kirkwood

unread,

Nov 2, 2009, 2:55:22 AM11/2/09

to Ubuntu user technical support, not for general discussions

Christopher Chan wrote:
>
> No, it is more a problem of the myth of fsync guaranteeing data is
> committed to the filesystem every time.
>
> http://www.opengroup.org/onlinepubs/009695399/functions/fsync.html
>
>
> Not even the specification explicitly spells that out.
>
>
> ext4 fsync is only safe if data=journal is used and write-caches are
> either disabled or have a bbu.
>
>

Sigh - write cache disabled *or* write barriers supported by underlying
device....

regards

Mark

Mark Kirkwood

unread,

Nov 2, 2009, 3:01:27 AM11/2/09

to Ubuntu user technical support, not for general discussions

Christopher Chan wrote:
>
> No. This is not an insult. You are doing others a disservice by spouting
> myths. I am now calling your bluff on how different filesystems behave
> in regards to fsync requests and challenge you to get an authoritative
> answer from any of the developers of XFS, JFS, ext(x) that contradicts
> what I has said.
>
>

No bluff, just no alarmist hype either.

>
> Did you miss my remarks about when you are not using hardware raid + bbu
> cache? You do know that such hardware covers for any short comings in
> filesystems with regards to data consistency and that that is the reason
> for the existence of such hardware?
>
>

It is (or should be) widely known that cheap (s)ata drives do not honor
fsync requests (*many* google links).

>
>
> A server motherboard that uses ECC RAM and SAS/SCSI hard drives and
> software raid will suffer the same results. You have been spouting
> inaccurate information about filesystem behaviour that will affect those
> who do not have the means to purchase your uber hardware that covers for
> any filesystem's shortcomings with respects to data integrity. Others
> make do with less by having a full understanding of the behaviour of the
> operating systems they run whether it is FreeBSD and softupdates or
> Linux and its various filesystems that support journaling. You can get
> the same data integrity on lesser hardware (motherboards supporting
> ECC-RAM are no longer the realm of 'server' grade motherboards) if
> configured properly.
>
>

No it will not. I've been a Freebsd server admin for the last 10 years -
no data loss due to power failure on any of my servers - because I've
used reliable hardware that honors fsync.

regards

Mark

Christopher Chan

unread,

Nov 2, 2009, 3:24:23 AM11/2/09

to Ubuntu user technical support, not for general discussions

>
>> Did you miss my remarks about when you are not using hardware raid + bbu
>> cache? You do know that such hardware covers for any short comings in
>> filesystems with regards to data consistency and that that is the reason
>> for the existence of such hardware?
>>
>>
>>
>
> It is (or should be) widely known that cheap (s)ata drives do not honor
> fsync requests (*many* google links).
>
>

Too bad I got the same problem with scsi drives. There were no sata
drives given to me during my four years as a MTA admin in Outblaze Ltd.
(2002 - 2004) and with server boards from Supermicro.

>> A server motherboard that uses ECC RAM and SAS/SCSI hard drives and
>> software raid will suffer the same results. You have been spouting
>> inaccurate information about filesystem behaviour that will affect those
>> who do not have the means to purchase your uber hardware that covers for
>> any filesystem's shortcomings with respects to data integrity. Others
>> make do with less by having a full understanding of the behaviour of the
>> operating systems they run whether it is FreeBSD and softupdates or
>> Linux and its various filesystems that support journaling. You can get
>> the same data integrity on lesser hardware (motherboards supporting
>> ECC-RAM are no longer the realm of 'server' grade motherboards) if
>> configured properly.
>>
>>
>>
> No it will not. I've been a Freebsd server admin for the last 10 years -
> no data loss due to power failure on any of my servers - because I've
> used reliable hardware that honors fsync.
>
>

Yawn. Been there and done that. Without bbu cached hardware raid. Just
plain Linux software raid. XFS = pray for no power loss and ext3
data=journal = sleep well at night (except for spammers getting
through the developers' webmail system).

You are using hardware raid + bbu and you have no need to delve deep
into how the filesystems work. If you do not want to take even the
standard explanations for ext3's (which are repeated for ext4) different
journaling modes then that is just too bad. Just stop propagating the
myth that fsync = return after data has been written to the filesytem.
If that was the case, there would not be large differences in filesystem
performance

Mark Kirkwood

unread,

Nov 2, 2009, 3:58:45 AM11/2/09

to Ubuntu user technical support, not for general discussions

Christopher Chan wrote:
>
>
> Yawn. Been there and done that. Without bbu cached hardware raid. Just
> plain Linux software raid. XFS = pray for no power loss and ext3
> data=journal = sleep well at night (except for spammers getting
> through the developers' webmail system).
>
>
> You are using hardware raid + bbu and you have no need to delve deep
> into how the filesystems work. If you do not want to take even the
> standard explanations for ext3's (which are repeated for ext4) different
> journaling modes then that is just too bad. Just stop propagating the
> myth that fsync = return after data has been written to the filesytem.
> If that was the case, there would not be large differences in filesystem
> performance
>
>

Double yawn - of course there is a *performance* difference - different
filesystems do writes different ways (extent vs not for instance),
however reliability is determined by the interaction with the device
layers write cache (amongst other things - but that is the main effect
here). Not matter how clever the filesystem - if the underlying hardware
does not actually do the writes as requested (queue cheap sata as I've
mentioned), then all bets are off. Hence the market for battery backed
raid controllers for sata drives in particular (and note that these are
worth using particularly 3Ware and Areca).

So to reiterate - guaranteeing writes to filesystem is good - but not
good enough if the underlying device does not honor the software
request. This is the guts of most workstation corruption problems,
regardless of fs type.

For instance, I have experienced power interruption data loss on my
workstation (ext3 filesystem + cheap sata drive) - and this is expected
from this type of hardware.

regards

Mark

Mark Kirkwood

unread,

Nov 2, 2009, 4:12:08 AM11/2/09

to Ubuntu user technical support, not for general discussions

Mark Kirkwood wrote:
> So to reiterate - guaranteeing writes to filesystem is good - but not
> good enough if the underlying device does not honor the software
> request. This is the guts of most workstation corruption problems,
> regardless of fs type.
>

Of course - the above does not rule out implementation bugs with some
kernel versions and some driver variants etc etc ... which will be
always with us to some extent (c.f the Intel x25-E ssd link in previous
posting). We expect (and generally see) these fixed in subsequent versions.

Mark Kirkwood

unread,

Nov 2, 2009, 4:55:50 AM11/2/09

to Ubuntu user technical support, not for general discussions

Christopher Chan wrote:
>
> Too bad I got the same problem with scsi drives. There were no sata
> drives given to me during my four years as a MTA admin in Outblaze Ltd.
> (2002 - 2004) and with server boards from Supermicro.
>
>

Interesting - did you enable the write cache? Supermicro had been my
choice of server boards (e.g P3TDER) until recently.

cheers

Mark

Rashkae

unread,

Nov 2, 2009, 6:57:24 AM11/2/09

to Ubuntu user technical support, not for general discussions

Christopher Chan wrote:
> Rashkae wrote:
>>> Journaling only for metadata is not 'as much journaling as any other
>>> canditates.' You cannot say metadata journaling only as equivalent to
>>> the data and metadata journaling that is possible with ext3. XFS's
>>> journaling only provides filesystem metadata consistency which is why
>>> you get files full of NULLs after a crash/power out. MTAs rely on fsync
>>> calls and how a filesystem behaves in regards to fsync requests is the
>>> real determiner of whether there is a data guarantee or not. XFS does
>>> not provide data guarantee. I
>>>
>>
>> This is completely false,, XFS gives as much data guarantee as the other
>> filesystems in respects to an fsync. The reason files can have Null
>> bytes appended to them in XFS is because XFS, unlike ext3, will commit
>> meta data changes out of order from the data actually being written to
>> disk, but this has nothing to do with fsync, which works as intended.
>>
>>
>>
> Wake up call.
>
> http://www.humboldt.co.uk/2009/03/fsync-across-platforms.html
>
>

I was going to reply to this, but your not even trying. That had
nothing to do with XFS or filesystem features at all, and I'm well aware
of the potential problems with hard drive write cache... doesn't mean
anything to this discussion.

Rashkae

unread,

Nov 2, 2009, 7:00:00 AM11/2/09

to Ubuntu user technical support, not for general discussions

Ong Raphael wrote:
>
> So, is the conclusion that ext4 sucks and ext3 rocks in performance wise?
>

Err no, EXT4 is the clear performance winner. But not everyone would be
comfortable using it in production environment due to how new it is, and
the potential for there still being nasty bugs in the code. (which
likely there are, whether lots of people are affected by them or not.)

Rashkae

unread,

Nov 2, 2009, 7:13:23 AM11/2/09

to Ubuntu user technical support, not for general discussions

Mark Kirkwood wrote:
> Mark Kirkwood wrote:
>> So to reiterate - guaranteeing writes to filesystem is good - but not
>> good enough if the underlying device does not honor the software
>> request. This is the guts of most workstation corruption problems,
>> regardless of fs type.
>>
> Of course - the above does not rule out implementation bugs with some
> kernel versions and some driver variants etc etc ... which will be
> always with us to some extent (c.f the Intel x25-E ssd link in previous
> posting). We expect (and generally see) these fixed in subsequent versions.
>
>

It was interesting how the author in that X25 link did not like the
performance with default settings, so first turned off Barriers,, which
exist exactly to prevent the kind of data loss he then goes on to document.

It's all well and good that he then found a way to get great write
performance in a safe manner with write-cache turned off, but I saw
nothing new or interesting in that article

Chan Chung Hang Christopher

unread,

Nov 2, 2009, 8:37:59 AM11/2/09

to Ubuntu user technical support, not for general discussions

Mark Kirkwood wrote:
> Christopher Chan wrote:
>
>> Yawn. Been there and done that. Without bbu cached hardware raid. Just
>> plain Linux software raid. XFS = pray for no power loss and ext3
>> data=journal = sleep well at night (except for spammers getting
>> through the developers' webmail system).
>>
>>
>> You are using hardware raid + bbu and you have no need to delve deep
>> into how the filesystems work. If you do not want to take even the
>> standard explanations for ext3's (which are repeated for ext4) different
>> journaling modes then that is just too bad. Just stop propagating the
>> myth that fsync = return after data has been written to the filesytem.
>> If that was the case, there would not be large differences in filesystem
>> performance
>>
>>
>>
> Double yawn - of course there is a *performance* difference - different
> filesystems do writes different ways (extent vs not for instance),
> however reliability is determined by the interaction with the device
> layers write cache (amongst other things - but that is the main effect
> here). Not matter how clever the filesystem - if the underlying hardware
> does not actually do the writes as requested (queue cheap sata as I've
> mentioned), then all bets are off. Hence the market for battery backed
> raid controllers for sata drives in particular (and note that these are
> worth using particularly 3Ware and Areca).
>

Heh. So do you care then to explain why there is a performance
difference within the SAME filesystem due to different journaling modes
chosen then? fsbench was first written to see the differences between
the various modes of ext3 and also to compare other filesystems with or
without external journals.

> So to reiterate - guaranteeing writes to filesystem is good - but not
> good enough if the underlying device does not honor the software
> request. This is the guts of most workstation corruption problems,
> regardless of fs type.
>
> For instance, I have experienced power interruption data loss on my
> workstation (ext3 filesystem + cheap sata drive) - and this is expected
> from this type of hardware.

Too bad that fsync in Linux does not flush write-caches whether
ide/sata/scsi if enabled. It does not matter whether the disk honours
the software request because there is no such request. Known problem
since 2001 and still present until at least 2009 and the fsync man page
does not indicate any change. The kind of disk has nothing to do with it.

Chan Chung Hang Christopher

unread,

Nov 2, 2009, 8:49:09 AM11/2/09

to ubuntu...@lists.ubuntu.com

Maybe things have changed for XFS now but for ext3, disk = journal.

http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L71

When data=journal, data and metadata for file are written to the journal
and then fsync returns. End of story.

When data=ordered, when metadata is written via sync_inode(), fsync
returns and you hope nothing happens within the next half second if you
want data consistency too.

Hence the reason why a ext3 filesystem on software raid but mounted
data=journal and with an external journal on a bbu nvram card will blow
away other filesystems in performance and data consistency.

Comments for your pleasure:

Raphael

unread,

Nov 2, 2009, 9:28:17 AM11/2/09

to Ubuntu user technical support, not for general discussions, Ubuntu user technical support, not for general discussions

Okay! :)

Sent from my iPod

On 02-Nov-2009, at 8:00 PM, Rashkae <ubu...@tigershaunt.com> wrote:

New Email names for you!
Get the Email name you've always wanted on the new @ymail and @rocketmail.
Hurry before someone else does!
http://mail.promotions.yahoo.com/newdomains/sg/

mar...@paradise.net.nz

unread,

Nov 3, 2009, 3:37:36 AM11/3/09

to Ubuntu user technical support, not for general discussions

Good idea to post the source :-).

However it does not seem to actually support your statement.

When fs is mounted data=journal then yes - the logic goes as you suggest.
Clearly, as the data+metadata is in the journal, then this is all we need to
sync (its a nice optimization).

In other cases (no journal, data=ordered,writeback), then the metadata is
synced to the journal, and the data buffers are synced to their respective
inodes - that is what the comments appear to say as well.

So it seems that disk = journal *only* if you are journalling the *data*! (not
that staggering an observation, but as you mentioned does explain why sometimes
data=journal performs better than the other ext3 journal options).

Also there is still the issue of does your data (or metadata) actually hit the
disk platter (whether via the journal or the file itself), and this concerns the
business of disk write caches and barrier support - since for journal or file
you gotta signal the backing device to flush. If it tells fibs to you, or your
barrier support is buggy - then you can still get data loss, no matter what fs
options are enabled.

regards

Mark

Chan Chung Hang Christopher

unread,

Nov 3, 2009, 9:16:55 AM11/3/09

to Ubuntu user technical support, not for general discussions

ROTFL. Nice OPTIMIZATION? One is possibly doing almost DOUBLE the
writes. It is really only an optimization if you are using ext3
data=journal for a mail queue and the journal is on a uber fast nvram
card (memory speed versus disk speed) because most mails should not
queue and if you have a nice big nvram card to act as a buffer and speed
up response to fsync calls for other cases. Hence why most people use
raid cards with nice big bbu caches nowadays. /me jumps up and down on a
bunch of 3ware 75xx/85xx cards.

> In other cases (no journal, data=ordered,writeback), then the metadata is
> synced to the journal, and the data buffers are synced to their respective
> inodes - that is what the comments appear to say as well.
>
> So it seems that disk = journal *only* if you are journalling the *data*! (not
> that staggering an observation, but as you mentioned does explain why sometimes
> data=journal performs better than the other ext3 journal options).
>
>

Not so fast pal. data=writeback issues a flush for data...and nothing
else (goto flush ... out) and data=ordered issues a call that syncs the
inode only. The only part where data buffers are synced is
data=writeback (just like what others have explained about
data=writeback) and there is no data buffer related call for data =
ordered. Just an inode sync.

However, I do have my doubts about the journal being used when
data=ordered/writeback. I have not spent a lot of time but I cannot find
where the inode sync call puts anything in the journal...the call is
generic and not specific to ext3 too. It appears things have changed
since barriers were introduced.

> Also there is still the issue of does your data (or metadata) actually hit the
> disk platter (whether via the journal or the file itself), and this concerns the
> business of disk write caches and barrier support - since for journal or file
> you gotta signal the backing device to flush. If it tells fibs to you, or your
> barrier support is buggy - then you can still get data loss, no matter what fs
> options are enabled.
>
>

Again, in Linux there ain't no signal to the disk write cache to flush.
Either you turn it off or suffer the consequences. Did you miss the
Notes at the end of the fsync (2) man page?

Mark Kirkwood

unread,

Nov 4, 2009, 4:52:36 AM11/4/09

to Ubuntu user technical support, not for general discussions

Chan Chung Hang Christopher wrote:
>
>
> ROTFL. Nice OPTIMIZATION? One is possibly doing almost DOUBLE the
> writes. It is really only an optimization if you are using ext3
> data=journal for a mail queue and the journal is on a uber fast nvram
> card (memory speed versus disk speed) because most mails should not
> queue and if you have a nice big nvram card to act as a buffer and speed
> up response to fsync calls for other cases. Hence why most people use
> raid cards with nice big bbu caches nowadays. /me jumps up and down on a
> bunch of 3ware 75xx/85xx cards.
>
>
>>
>

> Not so fast pal. data=writeback issues a flush for data...and nothing
> else (goto flush ... out) and data=ordered issues a call that syncs the
> inode only. The only part where data buffers are synced is
> data=writeback (just like what others have explained about
> data=writeback) and there is no data buffer related call for data =
> ordered. Just an inode sync.
>
> However, I do have my doubts about the journal being used when
> data=ordered/writeback. I have not spent a lot of time but I cannot find
> where the inode sync call puts anything in the journal...the call is
> generic and not specific to ext3 too. It appears things have changed
> since barriers were introduced.
>
>>

Actually I think we have both misunderstood this point - because the
code we are looking at is not the whole story. How it works is that an
application calls fsync() , which will then call sys_fsync(), which will
(amongst other things) call:

- generic_block_fdatasync() to sync the *data* blocks
- ext3_sync_file() to sort out the metadata and journal stuff*/
/*
Note the comments in the links you posted actually mention this. We have
been looking at the latter code only in isolation. I think this article:

http://www.linuxfoundation.org/news-media/blogs/browse/2009/03/ssd’s-journaling-and-noatimerelatime

discusses the business quite well: data=journal *does* write the data
twice! Once to the files themselves and once to the journal. However,
under spcialized circumstances this is still faster than the other
journal modes.

> Again, in Linux there ain't no signal to the disk write cache to flush.
> Either you turn it off or suffer the consequences. Did you miss the
> Notes at the end of the fsync (2) man page?
>
>

Exactly - that is precisely the point I was making previously. Note that
SCSI/SAS disks generally default to the write cache being *off* which
makes 'em safer choices for serious storage. Write cache *on* means you
are at the mercy of how good the barrier support is (not that great
generally it seems), no matter what journal options are used.

Now I think that our differing emphasis on data vs metadata is probably
due to you minding mail servers (lots of important metadata changes from
mew files etc) and me minding databases (typically no important metadata
changes - e.g innodb typically has everything in 3 files...but very
important data changes - e.g. transaction logs).

In your use case, it makes sense to use data=journal. In mine typically
it does not (note that a database transaction log functions like a
journal - a serially appended file of transactions - so
data=ordered,writeback or even xfs journaling etc is not only fine but
optimal [1])!

regards

Mark

[1] Or even ext2 in some cases.

fyr...@aim.com

unread,

Oct 31, 2009, 7:17:26 PM10/31/09

to ubuntu...@lists.ubuntu.com

Wow, this SATA RAID I fought so hard to install takes about 12 seconds.

john

-----Original Message-----
From: Karl F. Larsen <klar...@gmail.com>
Sent: Sat, Oct 31, 2009 12:35 am
Subject: Re: Slower performance with ext4

Amedee Van Gasse (ub) wrote:

> On Fri, October 30, 2009 07:08, Raphael wrote:

>>

>> Help, after I had clean installed Karmic on my ext4 partition, the

>> performance was significantly slower compared to ext3. Startup was around

>> 7 secs but with ext4 it's now 20 secs application speeds are also slower.

> 

> What are you comparing?

> * Karmic Koala clean install on ext3

> with

> * Karmic Koala clean install on ext4

> 

> Could you please do a clean install with ext3 + install bootchart, to get

> an exact timing, and then do the same with a clean ext4 install? Thank

> you.

> 

> 

    This is important. My old computer takes 45 seconds to go 

from clicking Grub start to full on. I expect the speed of the 

CPU is critical to a shorter time.



    I am 75 years old and 45 seconds is blinding speed!





73 Karl





-- 



    Karl F. Larsen, AKA K5DI

    Linux User

    #450462   http://counter.li.org.

         Key ID = 3951B48D

Mark Kirkwood

unread,

Nov 5, 2009, 3:32:45 AM11/5/09

to Ubuntu user technical support, not for general discussions

Amedee Van Gasse (ub) wrote:
>
> What are you comparing?
> * Karmic Koala clean install on ext3
> with
> * Karmic Koala clean install on ext4
>
> Could you please do a clean install with ext3 + install bootchart, to get
> an exact timing, and then do the same with a clean ext4 install? Thank
> you.
>
>

Good advice,

You might want to also do a boot with your ext4 filesystems mounted with
'barrier=0' in fstab.
Cheers

Mark

P.s: Apologies for helping to drag this thread a little off topic for a
while there - but hopefully some of you found the discussion interesting
anyway!

Raphael

unread,

Nov 5, 2009, 5:43:46 AM11/5/09

to Ubuntu user technical support, not for general discussions, Ubuntu user technical support, not for general discussions

Sure, I'll clean install after I re-clean install Windows :p Got some hardware problems yesterday.

Thanks for the loads of replies!

Sent from my iPod

Good advice,

Mark

New Email names for you!

Get the Email name you've always wanted on the new @ymail and @rocketmail.
Hurry before someone else does!
http://mail.promotions.yahoo.com/newdomains/sg/

--

Fred Roller

unread,

Nov 5, 2009, 8:18:18 AM11/5/09

to Ubuntu user technical support, not for general discussions

Mark Kirkwood wrote:
> Amedee Van Gasse (ub) wrote:
>
>> What are you comparing?
>> * Karmic Koala clean install on ext3
>> with
>> * Karmic Koala clean install on ext4
>>
>> Could you please do a clean install with ext3 + install bootchart, to get
>> an exact timing, and then do the same with a clean ext4 install? Thank
>> you.
>>
>>
>>
>
> Good advice,
>
> You might want to also do a boot with your ext4 filesystems mounted with
> 'barrier=0' in fstab.
>

Can you explain what this does?

--
Fred
www.fwrgallery.com

"Life is like linux, simple. If you are fighting it you are doing something wrong."

Steve Flynn

unread,

Nov 5, 2009, 8:27:37 AM11/5/09

to Ubuntu user technical support, not for general discussions

On Thu, Nov 5, 2009 at 1:18 PM, Fred Roller <fro...@tnclimited.com> wrote:

>> You might want to also do a boot with your ext4 filesystems mounted with
>> 'barrier=0' in fstab.
>
> Can you explain what this does?

Good explaniation here Fred:

http://kernelnewbies.org/Ext4#head-25c0a1275a571f7332fa196d4437c38e79f39f63

--
Steve
When one person suffers from a delusion it is insanity. When many
people suffer from a delusion it is called religion.

09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0

Chan Chung Hang Christopher

unread,

Nov 5, 2009, 8:34:44 AM11/5/09

to Ubuntu user technical support, not for general discussions

>> You might want to also do a boot with your ext4 filesystems mounted with
>> 'barrier=0' in fstab.
>>
>>
>
> Can you explain what this does?
>
>
>
>

It disables write barriers. Write barriers are enabled by default on
ext4. Blast Mark, making me bone up on what is going on lately. :-)

That means that if write-caches are enabled on disks, you are at risk of
losing data in the event of a sudden power loss but you get better
performance in return. Write barriers allow you to have write-caches
enabled and not have to risk losing data by ensuring that data is safely
on disk before saying "It's done."

However, not everything disk related supports write-barriers, namely
device-mapper, so if you use LVM or any md module other than raid1, you
better turn write-caches off or get yourself a hardware raid card with
bbu cache or a bbu nvram card and data=journal.

Fred Roller

unread,

Nov 5, 2009, 12:51:35 PM11/5/09

to ubuntu...@lists.ubuntu.com

Chan Chung Hang Christopher wrote:

>>> You might want to also do a boot with your ext4 filesystems mounted with
>>> 'barrier=0' in fstab.
>>>
>>>
>>>
>> Can you explain what this does?
>>
>>
>>
>>
>>
>
>
> It disables write barriers. Write barriers are enabled by default on
> ext4. Blast Mark, making me bone up on what is going on lately. :-)
>
> That means that if write-caches are enabled on disks, you are at risk of
> losing data in the event of a sudden power loss but you get better
> performance in return. Write barriers allow you to have write-caches
> enabled and not have to risk losing data by ensuring that data is safely
> on disk before saying "It's done."
>
> However, not everything disk related supports write-barriers, namely
> device-mapper, so if you use LVM or any md module other than raid1, you
> better turn write-caches off or get yourself a hardware raid card with
> bbu cache or a bbu nvram card and data=journal.
>
>

Steve and Chan, thank you. Book marked the link for more in depth
reading later.

--
Fred
www.fwrgallery.com

"Life is like linux, simple. If you are fighting it you are doing something wrong."

Reply all

Reply to author

Forward