Well, I think in the case it isn't -corrupt- metadata, more the case
that there is an inconsistency between different structures that are
internally consistent.
e.g. remove a free space extent from the freespace tree without
removing the space from the global free space counters. Now
delalloc reservation is allowed by the global counters, but when we
got to allocate the extent - or the bmap btree block to index it -
we fail the allocation because the free space btrees are empty.
The allocation structures are not internally inconsistent or
corrupt, so it's done the right thing by returning ENOSPC. The
global counters are not obviously inconsistent or corrupt, either.
So it can be triggered by just the right sort of corruption at
exactly the right time (i.e at 100% ENOSPC), but the chances of this
convoluted set of circumstances happening in production systems is
pretty much infintesimal.
> We still get a kernel log about something going wrong, only now the
> report doesn't trigger everyone's WARN triggers, and we tell the user to
> go run xfs_repair.
I think that is exactly the wrong thing to do.
We have a history of this WARN firing as a result of software bugs
in XFS - typically a transaction space reservation or allocation
parameter setup issue - in which case a WARN_ON_ONCE is more
appropriate here than declaring the filesystem corrupt.
That's the bottom line - this specific WARN has been placed because
it is an indicator of a bug in the code, not because it is something
that occurs because of filesystem corruption. The WARN is an
indicator that the bug needs to be reported, not simply put back on
the user to clean up the mess and continue on blissfully unaware
that they tripped over a kernel bug rather than some nebulous,
unexplainable corruption.
syzbot being able to trip over it by corrupting the fs in just the
right way doesn't mean we should change it - syzbot is a malicious
attacker, not a production workload, and I really don't think we
should be changing warnings that we actually want users to report
just to shut up syzbot.