> On Mon, 11 Mar 2013 15:43:20 +0000
> Rainer Weikusat <
rwei...@mssgmbh.com> wrote:
>> > The filesystem need not go corrupt. As McKusick says, the trick is
>> > to avoid dangling pointers, to make sure there's never a time when
>> > metadata are pointing to something not there.
>>
>> But these dangling pointers can't be avoided reliably, at least not
>> for filesystem which do in-place updates, because updates happen in
>> units of bits in the best case.
>
> Granted, there are limitations to what any software can do in the face
> of hardware failure.
In other words, "It doesn't work". Even if neither kernel nor disk
would employ any caching, 'all synchronous writes' still wouldn't
guarantee that no data is ever lost because of 'unfortunate
events'. What remains is a policy question: Is minimizing the time
window where 'stuff can go wrong' more import than maximizing
'common-case' performance? My usual answer to that is that I use
relatively small, synchronously-mounted root filesystems because this
'maximizes safety' in the sense that recovering/ reinstalling a system
based on whatever data is still available remains possible in
'unfortunate circumstances' (yes, I had to do that in the past, even
in the not too distant past) but gladly accept the ext* default of
'doing everything asynchronously' anywhere else (and applications to
override this default where necessary, eg, database management
systems).
[...]
>> The latter is what caused the relatively well-known issue with 'some
>> Linux filesystems', notably, older versions of ext4: Writing the new
>> contents of some file to a temporary file is composed of at least
>> three independent write requests (data, metadata and the temporary
>> filename) and renaming the temporary file to its 'final name' is
>> another write operation and the only implicit ordering guarantee here
>> is that the temporary file name will either not be created on disk at
>> all or the the final name will be created after the temporary one.
>> This does not imply anything for the other two writes which may
>> happen 'at any time' and 'in any order', completely independently of
>> the directory changes: If power fails after the rename was committed
>> and before the data actually hits the disk, the result will be a file
>> with size zero instead of 'either the old content or the new content'.
>
> You're making my point for me. When you say "ordering guarantee",
> you're talking about the semantics of the operation and implementation
> choices by the filesystem. The kernel is best positioned to ensure
> that a rename operation happens in the right order and is completed
> before the call returns. The filesystem designer has a choice:
> to promise that the rename operation (if and when it returns) has also
> committed the directory information, or not.
In order for the write-to-tempfile/ rename atomic file replacement
attempt to work as intended in case of a sudden 'short-term memory
erasure' aka system crash/ power outage, the rename operation need not
happen synchronously (since 'correctly named file with the old
contents' is one of the expected outcomes) but it must happen after
the new data was written to persistent storage and after the i-node
metadata was also updated. This means that these three indepdendent,
asynchronous write operations must happen in a particular (partial)
order, not that 'the rename must happen in the right order' (whatever
this is supposed to mean exactly).
> You're saying whoa, the atomic rename(2) makes no such promise, cf.
> Posix. You have to call fsync.
Not really. I'm saying that file systems exist where this ordering is
not guaranteed and the only way to deal with this phenomenon is to use
fsync to enforce it. Some filesystem people believe this behaviour is
desirable/ correct, some other filesystem people believe that
behavious is desirable/ correct and as application developer who
doesn't do kernel work except if it really can't be avoided, I have to
accomodate all these different viewpoints. I don't even really have a
final opinion on this myself and if I had one, it wouldn't matter.
[...]
> I have to ask, though: There might be a billion SSDs walking around in
> mobile phones. If 0.01% of them had problems, we'd be talking 100,000
> phone failures per phone lifetime, surely enough for headline news.
> Are SSDs more reliable than that, or does iOS just do a great job
> coping?
Guessing at the unknown always leaves many options :-). My take would
be: If one person in a group of 10,000 claims to be experiencing
'mysterious catastrophical problems' none of the other 9,999 ever saw
and if all of these 9,999 happy people should rather be really afraid
of encountering themselves and would have to change their lifestyle in
a fundamentally unfashionable way if they wanted to act rationally in
their own interest if the anecdotes they heard were true (if they
heard them at all), the reaction will be general disbelief and - in
the most extreme cases - forced hospitalitation of the few 'people who
claim it is all built on sand and thus, greatly discomfort eveyone
else'.
The best practical definition of psychosis I've managed to come up so
far is someone who insists on drawing attention to weird things nobody
wants to be bothered with :->.