--
ubuntu-users mailing list
ubuntu...@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
Sent from my iPod
Raphael wrote:
Get your preferred Email name!
Now you can @ymail.com and @rocketmail.com
http://mail.promotions.yahoo.com/newdomains/sg/
/me rotfl. On RHEL/Centos, ext3 is the only filesystem available. Not
sure if they will offer ext3 + ext4 with RHEL6. Why?
ext4 is an unproven filesystem. Data loss anyone? Not like ext3 and jfs.
XFS codebase is so blooming big, I am not sure that it will ever reach
the state that ext3 and jfs currently are.
>Data loss anyone?<What evidence do you have that there would be data loss? ext2 and ext3 were used almost immediately after their release as well. The distro maintainers usually do some basic reliability tests or at least have access to such tests. So I would be happy to read any tests you've seen that suggest ext4 is unreliable. To start scaring people with talk of data loss based on random speculation would not be good.
Raphael wrote:
> But isn't the default filesystem for ubuntu ext4, and isn't it the default for
a reason?
>
/me rotfl. On RHEL/Centos, ext3 is the only filesystem available. Not
sure if they will offer ext3 + ext4 with RHEL6. Why?
ext4 is an unproven filesystem. Not like ext3 and jfs.
It's not random specualtion:
--
Steve
When one person suffers from a delusion it is insanity. When many
people suffer from a delusion it is called religion.
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
It's a brand-spanking-new filesystem. Fear of data loss is natural and
understandable.
--
Avi Greenbury
http://aviswebsite.co.uk ;)
http://aviswebsite.co.uk/asking-questions
the performance was significantly slower compared to ext3.
Startup was around 7 secs but with ext4 it's now 20 secs
application
speeds are also slower.
If you can document this in a bug report it will help the
designers of ext4 to work on their project to speed it up.
I have been working on my wife's Windows XP and it is sure
lazy coming on. They have an early Microsoft add, and then XP
starts to unfold...
Karl
>
> Regards,
> Raphael
>
>
> New Email names for you!
> Get the Email name you've always wanted on the new @ymail and @rocketmail.
> Hurry before someone else does!
> http://mail.promotions.yahoo.com/newdomains/sg/
>
--
Karl F. Larsen, AKA K5DI
Linux User
#450462 http://counter.li.org.
Key ID = 3951B48D
Karl
--
Karl F. Larsen, AKA K5DI
Linux User
#450462 http://counter.li.org.
Key ID = 3951B48D
Dude, I used to work with clusters of mta boxes. The last thing I needed
then was a filesystem that loses data or corrupts its metadata easily. I
wait before using any new fangled filesystem regardless of how uber fast
it is or I play the pull the plug game with them with whatever
journaling mode they have available.
ext4 data loss reports started with Ubuntu Jaunty I think too?
> From: Steve Flynn <another...@gmail.com>
> Subject: Re: Slower performance with ext4
> To: "Ubuntu user technical support, not for general discussions" <ubuntu...@lists.ubuntu.com>
> Date: Friday, October 30, 2009, 6:38 AM
> On Fri, Oct 30, 2009 at 10:58
> AM, <fyr...@netscape.net>
> wrote:
> > >Data loss anyone?<
> >
> > What evidence do you have that there would be data
> loss? ext2 and ext3 were
> > used almost immediately after their release as well.
> The distro maintainers
> > usually do some basic reliability tests or at least
> have access to such
> > tests. So I would be happy to read any tests you've
> seen that suggest ext4
> > is unreliable. To start scaring people with talk of
> data loss based on
> > random speculation would not be good.
>
> It's not random specualtion:
>
> http://www.google.co.uk/search?q=ext4+dataloss+reports&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-GB:official&client=firefox-a
>
>
Reading the first couple of items shown in your above link it does appear to have been fixed in both Jaunty and Karmic. Your comment may not be random speculation but it does appear to be outdated. I'm using Karmic Beta 64 bit and have not experienced any data loss, thanks for that.
Leonard Chatagnier
lenc...@sbcglobal.net
> From: Karl F. Larsen <klar...@gmail.com>
> Subject: Re: Slower performance with ext4
> To: "Ubuntu user technical support, not for general discussions" <ubuntu...@lists.ubuntu.com>
> Date: Friday, October 30, 2009, 7:05 AM
> Raphael wrote:
> >
> > Help, after I had clean installed Karmic on my ext4
> partition,
>
> the performance was significantly slower compared to ext3.
>
> Startup was around 7 secs but with ext4 it's now 20
> secs
> application
>
> speeds are also slower.
>
>
> If you can document this in a bug report
> it will help the
> designers of ext4 to work on their project to speed it up.
>
> I have been working on my wife's Windows XP and it is sure
>
> lazy coming on. They have an early Microsoft add, and then
> XP
> starts to unfold...
>
> Karl
>
>
Karl, try reading the link the OP gave. A bug report has already been filed. Actually it was on data loss as far as I read.
Leonard Chatagnier
lenc...@sbcglobal.net
Karl
--
Karl F. Larsen, AKA K5DI
Linux User
#450462 http://counter.li.org.
Key ID = 3951B48D
Actually I forgot to post the link I was really after. The release
notes for 9.10 highlight posible filesystem corruption with large
files (> 512 Mb)...
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/453579
--
Steve
When one person suffers from a delusion it is insanity. When many
people suffer from a delusion it is called religion.
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
--
It is a little confusing; subject vs. link. However, I find Karmic much faster than ext3 on everything except browser surfing but that's not the same issue.
Leonard Chatagnier
lenc...@sbcglobal.net
> From: Steve Flynn <another...@gmail.com>
> Subject: Re: Slower performance with ext4
> To: "Ubuntu user technical support, not for general discussions" <ubuntu...@lists.ubuntu.com>
> Date: Friday, October 30, 2009, 8:08 AM
> On Fri, Oct 30, 2009 at 12:21 PM,
> Leonard Chatagnier
> <lenc...@sbcglobal.net>
> wrote:
> >
> >> It's not random specualtion:
> >>
> >> http://www.google.co.uk/search?q=ext4+dataloss+reports&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-GB:official&client=firefox-a
> >>
> > Reading the first couple of items shown in your above
> link it does appear to have been fixed in both Jaunty and
> Karmic. Your comment may not be random speculation but it
> does appear to be outdated. I'm using Karmic Beta 64 bit
> and have not experienced any data loss, thanks for that.
>
> Actually I forgot to post the link I was really after. The
> release
> notes for 9.10 highlight posible filesystem corruption with
> large
> files (> 512 Mb)...
>
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/453579
>
>
Interesting read. However, I don't see anything like it on my Karmic 64 bit without any lvm or raid setup just a plain install. I've downloaded and burned the Karmic beta live DVD, a large file, without any issues. Even checked the md5sum, sha1sun and the sha256sum(think that's right) and all checked including the disc verification on boot up.
As I mentioned to Karl in his reply, I find Karmic very fast compared to older versions except for browsing speed but that is another issue entirely, I believe.
Apparently, everyone's case is unique and YMMV. I do have a gut feeling that some of the issues are hardware related.
Leonard Chatagnier
lenc...@sbcglobal.net
Saying weird things like Karmic (an OS) being faster than ext3 (a
filesystem) just adds to the confusing.
Just do something like this: http://www.htiweb.inf.br/benchmark/fsbench.htm
For those interested, I have a tarball of fsbench (perl scripts emulated
delivery to maildirs) if you wish to see filesystem performance when
used for a mail store.
What are you comparing?
* Karmic Koala clean install on ext3
with
* Karmic Koala clean install on ext4
Could you please do a clean install with ext3 + install bootchart, to get
an exact timing, and then do the same with a clean ext4 install? Thank
you.
Ext4 uses a filesystem performance technique called allocate-on-flush,
also known as delayed allocation. It consists of delaying block allocation
until the data is going to be written to the disk, unlike some other file
systems, which may allocate the necessary blocks before that step. This
improves performance and reduces fragmentation by improving block
allocation decisions based on the actual file size.
Delayed allocation poses some additional risk of data loss in cases where
the system crashes before all of the data has been written to the disk.
The typical scenario in which this might occur is a program replacing the
contents of a file without forcing a write to the disk with fsync.
Problems can arise if the system crashes before the actual write occurs.
In this situation, users of ext3 have come to expect that the disk will
hold either the old version or the new version of the file following the
crash. However, the ext4 code in the Linux kernel version 2.6.28 will
often clear the contents of the file before the crash, but never write the
new version, thus losing the contents of the file entirely.
Altering this behavior by using fsync more often could lead to severe
performance penalties on ext3 filesystems mounted with the data=ordered
flag (the default on most Linux distributions). Given that both
file-systems will be in use for some time, this complicates matters
enormously for end-user application developers. In response, Theodore Ts'o
has written some patches for ext4 that cause it to limit its delayed
allocation in these common cases. For a small cost in performance, this
will significantly increase the chance that either version of the file
will survive the crash.
The new patches are expected to become part of the mainline kernel 2.6.30.
Various distributions may choose to backport them to 2.6.28 or 2.6.29, for
instance Ubuntu made them part of the 2.6.28 kernel in version 9.04—Jaunty
Jackalope.
(from Wikipedia)
And it was solved for Ubuntu 9.04 by Theodore Ts'o...
I am 75 years old and 45 seconds is blinding speed!
73 Karl
--
Karl F. Larsen, AKA K5DI
Linux User
#450462 http://counter.li.org.
Key ID = 3951B48D
-- Roy Smith Ubuntu 9.10 Karmic Koala Registered Linux User #488144 Registered Ubuntu User #26841
You will get similar performance to ext3 if you disable write barrier
support (barrier=0 in fstab) see:
http://kernelnewbies.org/Ext4#head-25c0a1275a571f7332fa196d4437c38e79f39f63
However, the write barrier support is a safety feature. However my work
machine uses ext4 with barriers disabled, so far without any issues.
This got me wondering about barrier support in ext3 - is it not
implemented or just off by default? From my brief glance at ext3 code,
there appears to be some stuff about barriers in there, but I couldn't
see anything about the defaults (you can specify barrier=0/1 for an ext3
filesystem when mounting - not sure if it has any effect tho!
Regards
Mark
reiser4? Seriously? Did it make into the mainline kernel or something? I
have not followed reiser4 much after Hans got jailed.
The early data loss in Jaunty was really applications clobbering their
own files combined with EXT4's delayed allocation. Basically, EXT4 was
behaving, for all intents and purposes, like XFS, without the null
bytes. (I still question the sanity of whoever thought this would be a
good idea.. after all, wouldn't be all be using XFS years ago if this
behaviour was so superior?) Following patches back ported to change
that introduced kernel soft lock bug in the ubuntu kernel (that was
never confirmed in the mainline kernel.). And now we have uncomfired
sightings of data corruption, but the one person who claims to reproduce
that looks like he has memory corruption issues. (He gets a different
md5sum every time he checks the same file... not really a filesystem
issue there.)
None of this is really applicable to your point. for a mission critical
production system, you want to use what's known and proven (I do find
the choice of jfs odd however. I like EXT3 for reliable and
predictable, and XFS for performance, so long as I know my particular
workload won't be affected by XFS's null bytes on unclean shutdown.)
However, the improvements EXT4 has made to the workloads that caused
EXT3 bad performance are amazing.. I'm much looking forward to the
testing/proving phase to be done.
XFS blooming aggressive caching and lack of full journaling is a
disaster waiting to happen for mta queues. If you are running Centos,
you only get ext3...
JFS seems to have the second best performance overall according to Bruce
Guenter's maildir simulated local mail delivery benchmark. and it is
stable too.
> However, the improvements EXT4 has made to the workloads that caused
> EXT3 bad performance are amazing.. I'm much looking forward to the
> testing/proving phase to be done.
>
>
:-D. Looking for a guinea pig that is willing to run fsbench from Bruce
Guenter. Too bad it is of interest only to mail admins :-P
XFS has as much journaling as any other candidates. Journal for
metadata. And all MTA's, reportedly, write files in a sane manner and
never assume a file is written to disk until the fsync completes, and
therefore, are not at all affected by XFS aggressive caching. Mail
server is therefore one of the workloads XFS is best suited for.
> JFS seems to have the second best performance overall according to Bruce
> Guenter's maildir simulated local mail delivery benchmark. and it is
> stable too.
JFS performs great in benchmarks, but back when I used to use it, I've
consistently been able to bend it out of shape under real world
conditions.. No data loss mind you, but damaged meta data (fixed with
jfs repair, but that should never be needed in a modern file system) and
bizarre corner cases that caused performance to sink through the floor.
(in one instance, I was able to reproduce an issue where reading a file
while writing new files to disk would perform poorly depending on
whether the filename had one . or two. Ie, if the filename was
something.tar.gz, or renamed to something.tgz.) At one time in the
distant past, someone completely broke quota support in JFS, and no one
even noticed for 4 kernel releases. JFS just doesn't seem to have
enough people using it to maintain a well tested status.
Journaling only for metadata is not 'as much journaling as any other
canditates.' You cannot say metadata journaling only as equivalent to
the data and metadata journaling that is possible with ext3. XFS's
journaling only provides filesystem metadata consistency which is why
you get files full of NULLs after a crash/power out. MTAs rely on fsync
calls and how a filesystem behaves in regards to fsync requests is the
real determiner of whether there is a data guarantee or not. XFS does
not provide data guarantee. It, at best, provides a metadata guarantee.
XFS should not be used for mta queues unless it is in conjunction with
hardware raid that has a bbu cache. XFS is best suited for streaming
applications where the data loss is tolerated.
>
>> JFS seems to have the second best performance overall according to Bruce
>> Guenter's maildir simulated local mail delivery benchmark. and it is
>> stable too.
>>
>
> JFS performs great in benchmarks, but back when I used to use it, I've
> consistently been able to bend it out of shape under real world
> conditions.. No data loss mind you, but damaged meta data (fixed with
> jfs repair, but that should never be needed in a modern file system) and
> bizarre corner cases that caused performance to sink through the floor.
> (in one instance, I was able to reproduce an issue where reading a file
> while writing new files to disk would perform poorly depending on
> whether the filename had one . or two. Ie, if the filename was
> something.tar.gz, or renamed to something.tgz.) At one time in the
> distant past, someone completely broke quota support in JFS, and no one
> even noticed for 4 kernel releases. JFS just doesn't seem to have
> enough people using it to maintain a well tested status.
>
>
>
I would put that to no one being bothered to report bugs and also the
lack of users.
Qualification: Behaving the way XFS did a few years ago. It was fixed
and has been for a few years now.
> [...] I like EXT3 for reliable and predictable, and XFS for
> performance, so long as I know my particular workload won't be
> affected by XFS's null bytes on unclean shutdown.)
I've been using XFS since its problems were fixed and haven't suffered
any data loss due to it despite many unclean shutdowns due to power
loss.
I am now considering using ext4 when I upgrade to 9.10, however.
The benchmark that I looked at emulated real world conditions. Delivery
to a maildir (fsbench). Not iffy benchmarks like hdparm or bonnie or
postmark.
This is completely false,, XFS gives as much data guarantee as the other
filesystems in respects to an fsync. The reason files can have Null
bytes appended to them in XFS is because XFS, unlike ext3, will commit
meta data changes out of order from the data actually being written to
disk, but this has nothing to do with fsync, which works as intended.
Cheers
Mark
[1] You do have to consider whether the underlying disk firmware honors
the fsync request to flush - this is why scsi disks are still often
preferred for data critical situations. It is only recently with the
advent of more advanced sata firmware that they too are now reasonably
usable in those situations (tho you want to leave write barrier support
enabled then!)
http://www.humboldt.co.uk/2009/03/fsync-across-platforms.html
Even the definition of fsync indicates that not every fsync call returns
after data is safely on disk. fsync returns after metadata has hit disk
in the case of XFS, JFS, reiserfs and ext3 ordered mode and by 'hit
disk' I mean the journal of the filesystem and not the actual location
in the filesystem.
http://www.opengroup.org/onlinepubs/009695399/functions/fsync.html
'In the middle ground between these extremes, /fsync/() might or might
not actually cause data to be written where it is safe from a power
failure.'
For this reason, some people might go to such lengths as installing
nvram cards for use as external journals for ext3 in full journaling
mode in conjunction with their software raid or get a hardware raid that
has bbu cache.
Yes...where *disk* = journal. Which for JFS, XFS and ext3 data=ordered
means metadata only. Only ext3 data=journal guarantees data and
metadata. Feel free to get (whoever filesystem developer) to confirm for
me because you won't get any other answer than what I have just posted.
The vast majority of the world databases and mail servers depend on the
fact that fsync forces modified *data* buffers to their respective file
on disk.
The zero length files that people dislike so much on xfs are caused by
applications that do *not* request an fsync - and also cheap sata disks
that do not honor fsync's request to actually write the buffers...
thankfully these are less common now (especially for serious sata drives
like WD's Velociraptor).
regards
Mark
Maybe you want to first VERIFY with the various filesystem developers
before you start yapping what appears to be the only sensible
explanation but is in fact a myth. On Linux, XFS, JFS and ext3
data=ordered return fsync as soon as the metadata hits the journal on
disk and before the data is commited to its location on the filesystem
and metadata is committed to its location. ext3 data=journal returns
after both data and metadata is committed to disk on the JOURNAL and
before they are written to their locations in the filesystem. I have not
yet looked at ext4 so I will not say anything about what it does.
> The vast majority of the world databases and mail servers depend on the
> fact that fsync forces modified *data* buffers to their respective file
> on disk.
>
Sure. Too bad that is not always true.
> The zero length files that people dislike so much on xfs are caused by
> applications that do *not* request an fsync - and also cheap sata disks
> that do not honor fsync's request to actually write the buffers...
> thankfully these are less common now (especially for serious sata drives
> like WD's Velociraptor).
>
>
Heh, what do you know? I have been burned by XFS after a powerloss and
got over 4000 zero length files in a postfix queue. No filesystem
corruption, just zero data files. You want to tell me that postfix does
not use fsync? You can guess what I did to the XFS filesystem mounted
for the queue directory. I destroyed it and got ext3 instead in full
data journal mode. Which I repeated on all the other mtas that had a XFS
filesystem for their mail queue.
I do not know the answer to that because I have not looked at how ext4
behaves. Performance wise, if you are not using hardware raid with bbu
cache, ext3 in data=journal mode and with its journal stored externally
on a bbu nvram card will blow XFS out of the water and still guarantee
data and metadata consistency.
> ------------------------------------------------------------------------
> *From:* Christopher Chan <christop...@bradbury.edu.hk>
> *To:* "Ubuntu user technical support, not for general discussions"
> <ubuntu...@lists.ubuntu.com>
> *Sent:* Monday, 2 November 2009 1:49:48
> *Subject:* Re: Slower performance with ext4
> ubuntu...@lists.ubuntu.com <mailto:ubuntu...@lists.ubuntu.com>
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
>
> ------------------------------------------------------------------------
>
> Search. browse and book
> <http://sg.rd.yahoo.com/spirit/fea/travel/*http://sg.travel.yahoo.com>
> your hotels and flights through Yahoo! Travel
>
It appears that ext4 shares the same journaling modes as ext3 with some
data-integrity and performance enhancements. Anyway, here is the
relevant portion of another party's comments on filesystem behaviour
that you may find more acceptable than mine.
"Ext4 supports multiple modes of journaling, depending upon the needs of
the user. For example, ext4 supports a mode in which only metadata is
journaled (Writeback mode), a mode in which metadata is journaled but
data is written as the metadata is written from the journal (Ordered
mode), and a mode in which both metadata and data are journaled (Journal
mode—the most reliable mode). Note that Journal mode, although the best
way to ensure a consistent file system, is also the slowest, because all
data flows through the journal."
From: http://www.ibm.com/developerworks/linux/library/l-anatomy-ext4/
In case you have not figured out why ext3 + external journal on uber
fast and secure bbu nvram card blows everything else, except filesystems
on hardward raid + bbu cache, out of the water, it is because fsync
returns as soon as stuff is committed to...the journal.
Hmm - not gonna get into trading personal insults , as nothing is to be
gained that way.
You were running this on server grade hardware? or - let me guess - a
workstation with cheap sata drives? I have run many instances of mysql,
postgres and oracle on *server* grade hardware [1] with xfs for probably
the last 7 years and never have *any* data corruption issue in spite of
many power outages...
regards
Mark
[1] meaning a designated server mobo with eec ram and scsi (or sas) hard
drives.
Interesting data point for both of us:
http://blogs.gnome.org/alexl/2009/03/16/ext4-vs-fsync-my-take/
He claims ext4 is safe with sensible usage of fsync but reckons xfs is
not. Without wading through the code for the various fs it is tricky to
be 100% sure if he is correct or mistaken, as it is clearly *possible*
for the respective fs drivers to intercept the f(data)sync etc calls and
do undeserved violence to 'em....
regards
Mark
No. This is not an insult. You are doing others a disservice by spouting
myths. I am now calling your bluff on how different filesystems behave
in regards to fsync requests and challenge you to get an authoritative
answer from any of the developers of XFS, JFS, ext(x) that contradicts
what I has said.
> You were running this on server grade hardware? or - let me guess - a
> workstation with cheap sata drives? I have run many instances of mysql,
> postgres and oracle on *server* grade hardware [1] with xfs for probably
> the last 7 years and never have *any* data corruption issue in spite of
> many power outages...
>
Did you miss my remarks about when you are not using hardware raid + bbu
cache? You do know that such hardware covers for any short comings in
filesystems with regards to data consistency and that that is the reason
for the existence of such hardware?
> regards
>
> Mark
>
> [1] meaning a designated server mobo with eec ram and scsi (or sas) hard
> drives.
>
>
A server motherboard that uses ECC RAM and SAS/SCSI hard drives and
software raid will suffer the same results. You have been spouting
inaccurate information about filesystem behaviour that will affect those
who do not have the means to purchase your uber hardware that covers for
any filesystem's shortcomings with respects to data integrity. Others
make do with less by having a full understanding of the behaviour of the
operating systems they run whether it is FreeBSD and softupdates or
Linux and its various filesystems that support journaling. You can get
the same data integrity on lesser hardware (motherboards supporting
ECC-RAM are no longer the realm of 'server' grade motherboards) if
configured properly.
While the difference in application and work loads matters too, I've
been running XFS on my desktop computers built from consumer-grader
hadware for years without data loss despite unclean shutdowns due to
power outtages.
But don't take my word for it, XFS has been fixed.
<http://sandeen.net/wordpress/?p=17>
No, it is more a problem of the myth of fsync guaranteeing data is
committed to the filesystem every time.
http://www.opengroup.org/onlinepubs/009695399/functions/fsync.html
Not even the specification explicitly spells that out.
ext4 fsync is only safe if data=journal is used and write-caches are
either disabled or have a bbu.
Further, here is a posting about databases, xfs and fsync:
It reinforces what I've been saying - issues with device firmware can
get you into trouble, these guys needed to disable the ssd write cache
to guarantee reliability. Most database admins (including me) recommend
battery backed raid controllers when used with sata drivers to be sure
(no matter what fs is being used - or what operating system for that
matter).
Sigh - write cache disabled *or* write barriers supported by underlying
device....
regards
Mark
>
> Did you miss my remarks about when you are not using hardware raid + bbu
> cache? You do know that such hardware covers for any short comings in
> filesystems with regards to data consistency and that that is the reason
> for the existence of such hardware?
>
>
It is (or should be) widely known that cheap (s)ata drives do not honor
fsync requests (*many* google links).
>
>
> A server motherboard that uses ECC RAM and SAS/SCSI hard drives and
> software raid will suffer the same results. You have been spouting
> inaccurate information about filesystem behaviour that will affect those
> who do not have the means to purchase your uber hardware that covers for
> any filesystem's shortcomings with respects to data integrity. Others
> make do with less by having a full understanding of the behaviour of the
> operating systems they run whether it is FreeBSD and softupdates or
> Linux and its various filesystems that support journaling. You can get
> the same data integrity on lesser hardware (motherboards supporting
> ECC-RAM are no longer the realm of 'server' grade motherboards) if
> configured properly.
>
>
No it will not. I've been a Freebsd server admin for the last 10 years -
no data loss due to power failure on any of my servers - because I've
used reliable hardware that honors fsync.
regards
Mark
Too bad I got the same problem with scsi drives. There were no sata
drives given to me during my four years as a MTA admin in Outblaze Ltd.
(2002 - 2004) and with server boards from Supermicro.
>> A server motherboard that uses ECC RAM and SAS/SCSI hard drives and
>> software raid will suffer the same results. You have been spouting
>> inaccurate information about filesystem behaviour that will affect those
>> who do not have the means to purchase your uber hardware that covers for
>> any filesystem's shortcomings with respects to data integrity. Others
>> make do with less by having a full understanding of the behaviour of the
>> operating systems they run whether it is FreeBSD and softupdates or
>> Linux and its various filesystems that support journaling. You can get
>> the same data integrity on lesser hardware (motherboards supporting
>> ECC-RAM are no longer the realm of 'server' grade motherboards) if
>> configured properly.
>>
>>
>>
> No it will not. I've been a Freebsd server admin for the last 10 years -
> no data loss due to power failure on any of my servers - because I've
> used reliable hardware that honors fsync.
>
>
Yawn. Been there and done that. Without bbu cached hardware raid. Just
plain Linux software raid. XFS = pray for no power loss and ext3
data=journal = sleep well at night (except for spammers getting
through the developers' webmail system).
You are using hardware raid + bbu and you have no need to delve deep
into how the filesystems work. If you do not want to take even the
standard explanations for ext3's (which are repeated for ext4) different
journaling modes then that is just too bad. Just stop propagating the
myth that fsync = return after data has been written to the filesytem.
If that was the case, there would not be large differences in filesystem
performance
So to reiterate - guaranteeing writes to filesystem is good - but not
good enough if the underlying device does not honor the software
request. This is the guts of most workstation corruption problems,
regardless of fs type.
For instance, I have experienced power interruption data loss on my
workstation (ext3 filesystem + cheap sata drive) - and this is expected
from this type of hardware.
regards
Mark
cheers
Mark
I was going to reply to this, but your not even trying. That had
nothing to do with XFS or filesystem features at all, and I'm well aware
of the potential problems with hard drive write cache... doesn't mean
anything to this discussion.
Err no, EXT4 is the clear performance winner. But not everyone would be
comfortable using it in production environment due to how new it is, and
the potential for there still being nasty bugs in the code. (which
likely there are, whether lots of people are affected by them or not.)
It was interesting how the author in that X25 link did not like the
performance with default settings, so first turned off Barriers,, which
exist exactly to prevent the kind of data loss he then goes on to document.
It's all well and good that he then found a way to get great write
performance in a safe manner with write-cache turned off, but I saw
nothing new or interesting in that article
Heh. So do you care then to explain why there is a performance
difference within the SAME filesystem due to different journaling modes
chosen then? fsbench was first written to see the differences between
the various modes of ext3 and also to compare other filesystems with or
without external journals.
> So to reiterate - guaranteeing writes to filesystem is good - but not
> good enough if the underlying device does not honor the software
> request. This is the guts of most workstation corruption problems,
> regardless of fs type.
>
> For instance, I have experienced power interruption data loss on my
> workstation (ext3 filesystem + cheap sata drive) - and this is expected
> from this type of hardware.
Too bad that fsync in Linux does not flush write-caches whether
ide/sata/scsi if enabled. It does not matter whether the disk honours
the software request because there is no such request. Known problem
since 2001 and still present until at least 2009 and the fsync man page
does not indicate any change. The kind of disk has nothing to do with it.
Maybe things have changed for XFS now but for ext3, disk = journal.
http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L71
When data=journal, data and metadata for file are written to the journal
and then fsync returns. End of story.
When data=ordered, when metadata is written via sync_inode(), fsync
returns and you hope nothing happens within the next half second if you
want data consistency too.
Hence the reason why a ext3 filesystem on software raid but mounted
data=journal and with an external journal on a bbu nvram card will blow
away other filesystems in performance and data consistency.
Comments for your pleasure:
53 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L53> *//*/*
54 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L54> */ * data=writeback:/*
55 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L55> */ * The caller's filemap_fdatawrite()/wait will sync the data./*
56 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L56> */ * sync_inode() will sync the metadata/*
57 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L57> */ */*
58 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L58> */ * data=ordered:/*
59 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L59> */ * The caller's filemap_fdatawrite() will write the data and/*
60 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L60> */ * sync_inode() will write the inode if it is dirty. Then the caller's/*
61 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L61> */ * filemap_fdatawait() will wait on the pages./*
62 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L62> */ */*
63 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L63> */ * data=journal:/*
64 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L64> */ * filemap_fdatawrite won't do anything (the buffers are clean)./*
65 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L65> */ * ext3_force_commit will write the file data into the journal and/*
66 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L66> */ * will wait on that./*
67 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L67> */ * filemap_fdatawait() will encounter a ton of newly-dirtied pages/*
68 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L68> */ * (they were dirtied by commit). But that's OK - the blocks are/*
69 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L69> */ * safe in-journal, which is all fsync() needs to ensure./*
70 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L70> */ *//*
Sent from my iPod
On 02-Nov-2009, at 8:00 PM, Rashkae <ubu...@tigershaunt.com> wrote:
New Email names for you!
Get the Email name you've always wanted on the new @ymail and @rocketmail.
Hurry before someone else does!
http://mail.promotions.yahoo.com/newdomains/sg/
Good idea to post the source :-).
However it does not seem to actually support your statement.
When fs is mounted data=journal then yes - the logic goes as you suggest.
Clearly, as the data+metadata is in the journal, then this is all we need to
sync (its a nice optimization).
In other cases (no journal, data=ordered,writeback), then the metadata is
synced to the journal, and the data buffers are synced to their respective
inodes - that is what the comments appear to say as well.
So it seems that disk = journal *only* if you are journalling the *data*! (not
that staggering an observation, but as you mentioned does explain why sometimes
data=journal performs better than the other ext3 journal options).
Also there is still the issue of does your data (or metadata) actually hit the
disk platter (whether via the journal or the file itself), and this concerns the
business of disk write caches and barrier support - since for journal or file
you gotta signal the backing device to flush. If it tells fibs to you, or your
barrier support is buggy - then you can still get data loss, no matter what fs
options are enabled.
regards
Mark
ROTFL. Nice OPTIMIZATION? One is possibly doing almost DOUBLE the
writes. It is really only an optimization if you are using ext3
data=journal for a mail queue and the journal is on a uber fast nvram
card (memory speed versus disk speed) because most mails should not
queue and if you have a nice big nvram card to act as a buffer and speed
up response to fsync calls for other cases. Hence why most people use
raid cards with nice big bbu caches nowadays. /me jumps up and down on a
bunch of 3ware 75xx/85xx cards.
> In other cases (no journal, data=ordered,writeback), then the metadata is
> synced to the journal, and the data buffers are synced to their respective
> inodes - that is what the comments appear to say as well.
>
> So it seems that disk = journal *only* if you are journalling the *data*! (not
> that staggering an observation, but as you mentioned does explain why sometimes
> data=journal performs better than the other ext3 journal options).
>
>
Not so fast pal. data=writeback issues a flush for data...and nothing
else (goto flush ... out) and data=ordered issues a call that syncs the
inode only. The only part where data buffers are synced is
data=writeback (just like what others have explained about
data=writeback) and there is no data buffer related call for data =
ordered. Just an inode sync.
However, I do have my doubts about the journal being used when
data=ordered/writeback. I have not spent a lot of time but I cannot find
where the inode sync call puts anything in the journal...the call is
generic and not specific to ext3 too. It appears things have changed
since barriers were introduced.
> Also there is still the issue of does your data (or metadata) actually hit the
> disk platter (whether via the journal or the file itself), and this concerns the
> business of disk write caches and barrier support - since for journal or file
> you gotta signal the backing device to flush. If it tells fibs to you, or your
> barrier support is buggy - then you can still get data loss, no matter what fs
> options are enabled.
>
>
Again, in Linux there ain't no signal to the disk write cache to flush.
Either you turn it off or suffer the consequences. Did you miss the
Notes at the end of the fsync (2) man page?
Actually I think we have both misunderstood this point - because the
code we are looking at is not the whole story. How it works is that an
application calls fsync() , which will then call sys_fsync(), which will
(amongst other things) call:
- generic_block_fdatasync() to sync the *data* blocks
- ext3_sync_file() to sort out the metadata and journal stuff*/
/*
Note the comments in the links you posted actually mention this. We have
been looking at the latter code only in isolation. I think this article:
http://www.linuxfoundation.org/news-media/blogs/browse/2009/03/ssd’s-journaling-and-noatimerelatime
discusses the business quite well: data=journal *does* write the data
twice! Once to the files themselves and once to the journal. However,
under spcialized circumstances this is still faster than the other
journal modes.
> Again, in Linux there ain't no signal to the disk write cache to flush.
> Either you turn it off or suffer the consequences. Did you miss the
> Notes at the end of the fsync (2) man page?
>
>
Exactly - that is precisely the point I was making previously. Note that
SCSI/SAS disks generally default to the write cache being *off* which
makes 'em safer choices for serious storage. Write cache *on* means you
are at the mercy of how good the barrier support is (not that great
generally it seems), no matter what journal options are used.
Now I think that our differing emphasis on data vs metadata is probably
due to you minding mail servers (lots of important metadata changes from
mew files etc) and me minding databases (typically no important metadata
changes - e.g innodb typically has everything in 3 files...but very
important data changes - e.g. transaction logs).
In your use case, it makes sense to use data=journal. In mine typically
it does not (note that a database transaction log functions like a
journal - a serially appended file of transactions - so
data=ordered,writeback or even xfs journaling etc is not only fine but
optimal [1])!
regards
Mark
[1] Or even ext2 in some cases.
Amedee Van Gasse (ub) wrote:
> On Fri, October 30, 2009 07:08, Raphael wrote:
>>
>> Help, after I had clean installed Karmic on my ext4 partition, the
>> performance was significantly slower compared to ext3. Startup was around
>> 7 secs but with ext4 it's now 20 secs application speeds are also slower.
>
> What are you comparing?
> * Karmic Koala clean install on ext3
> with
> * Karmic Koala clean install on ext4
>
> Could you please do a clean install with ext3 + install bootchart, to get
> an exact timing, and then do the same with a clean ext4 install? Thank
> you.
>
>
This is important. My old computer takes 45 seconds to go
from clicking Grub start to full on. I expect the speed of the
CPU is critical to a shorter time.
I am 75 years old and 45 seconds is blinding speed!
73 Karl
--
Karl F. Larsen, AKA K5DI
Linux User
#450462 http://counter.li.org.
Key ID = 3951B48D
Good advice,
You might want to also do a boot with your ext4 filesystems mounted with
'barrier=0' in fstab.
Cheers
Mark
P.s: Apologies for helping to drag this thread a little off topic for a
while there - but hopefully some of you found the discussion interesting
anyway!
Thanks for the loads of replies!
Sent from my iPod
Good advice,
Mark
New Email names for you!
Get the Email name you've always wanted on the new @ymail and @rocketmail.
Hurry before someone else does!
http://mail.promotions.yahoo.com/newdomains/sg/
--
Can you explain what this does?
--
Fred
www.fwrgallery.com
"Life is like linux, simple. If you are fighting it you are doing something wrong."
>> You might want to also do a boot with your ext4 filesystems mounted with
>> 'barrier=0' in fstab.
>
> Can you explain what this does?
Good explaniation here Fred:
http://kernelnewbies.org/Ext4#head-25c0a1275a571f7332fa196d4437c38e79f39f63
--
Steve
When one person suffers from a delusion it is insanity. When many
people suffer from a delusion it is called religion.
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
It disables write barriers. Write barriers are enabled by default on
ext4. Blast Mark, making me bone up on what is going on lately. :-)
That means that if write-caches are enabled on disks, you are at risk of
losing data in the event of a sudden power loss but you get better
performance in return. Write barriers allow you to have write-caches
enabled and not have to risk losing data by ensuring that data is safely
on disk before saying "It's done."
However, not everything disk related supports write-barriers, namely
device-mapper, so if you use LVM or any md module other than raid1, you
better turn write-caches off or get yourself a hardware raid card with
bbu cache or a bbu nvram card and data=journal.
--
Fred
www.fwrgallery.com
"Life is like linux, simple. If you are fighting it you are doing something wrong."