Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Issues with 3.6.x and XFS

19 views
Skip to first unread message

Aaron W. Hsu

unread,
Nov 13, 2012, 2:42:25 PM11/13/12
to

Hey all:

I just wanted to share a scary experience that I had the other day
regarding my RAID-0, XFS based Slackware installation. I shut the
system down like I normally do, by pressing the power button quickly,
which on normal kernels triggers the halt(1) behavior, or so I
thought. At any rate, the shutdown happened very quickly, and while
this will occassionally happen, so I did not think anything of it,
on reboot, I discovered that the system would not mount the partition.

This being a software RAID-0 setup, my initial thoughts were that
the entire thing had gone, since software raid, and RAID-0 especially
is hardly a recipe for reliability. However, before doing anything
drastic I played around with trying to see what was causing the
problem. I noticed a few error messages about missing partition
tables on /dev/md0, but before panicking on those, some research
revealed that this is a benign error, since there is not expected
to be any partition on that device.

What I was seeing was a serious problem with XFS not being able to
replay the file system log when mounting the unclean filesystem.
The best that I can tell is that the system shutdown before XFS was
able to flush all of the data of the logs properly, leading to an
incomplete system. I was worried about repairing the file system
using xfs_repair -L as that will completely dump the journals.
However, more research about this seems to indicate that this is
particularly a problem with circular reference of some kind (I
don't hack on XFS, so I don't know the details) in the logs leading
to some sort of problems. There is a long email thread somewhere
tracking this down on the 3.6.x kernel series and especially,
there is a one character patch that fixes the issue.

Fortunately for me, though, I was able to mount up the 14.0 slackware
DVD and mount the drive using an older 3.2.29 kernel in RW mode,
rather than RO mode. This replayed the journal logs, after which I was
able to do a safer xfs_repair without the -L option. All is well now,
but I wanted to share the experience in case someone else somehow
gets themselves into this situation with XFS and recent kernels. I
don't know how much XFS or kernels or what are to blame, but it does
at least seem that you can fix it without losing data.

--
Aaron W. Hsu | arc...@sacrideo.us | http://www.sacrideo.us
Programming is just another word for the lost art of thinking.

Keith Keller

unread,
Nov 13, 2012, 3:04:33 PM11/13/12
to
On 2012-11-13, Aaron W Hsu <arc...@sacrideo.us> wrote:
>
> I was worried about repairing the file system
> using xfs_repair -L as that will completely dump the journals.

I have had the unfortunate occasion to require -L a few times in the
past. The only time major damage was unrecoverable was a catastrophic
failure on the RAID controller. Each other time, -L found some minor
problems (which got shunted to lost+found) but otherwise the filesystem
was fine. (This was on 2.6 kernels, however, not 3.x, so YMMV!)

I'm not actively advocating for cavalier use of -L, but it's not
necessarily a bad thing if you do require it.

--keith


--
kkeller...@wombat.san-francisco.ca.us
(try just my userid to email me)
AOLSFAQ=http://www.therockgarden.ca/aolsfaq.txt
see X- headers for PGP signature information

Sylvain Robitaille

unread,
Nov 13, 2012, 5:38:43 PM11/13/12
to
Couple reactions ...

On Tue, 13 Nov 2012 13:42:25 -0600, Aaron W Hsu wrote:

> ... I shut the system down like I normally do, by pressing the
> power button quickly, which on normal kernels triggers the halt(1)
> behavior, or so I thought.

Please tell me you'll use "shutdown -h" from now on! Is it really worth
all that aggravation to attempt to save a dozen keystrokes?

> ... software raid, and RAID-0 especially is hardly a recipe for
> reliability.

It begs the question, of course, "then why use it?" Now that said, I
admire your perseverance and determination to find a working solution
that would not risk losing your data.

> ... I was able to mount up the 14.0 slackware DVD and mount the drive
> using an older 3.2.29 kernel in RW mode, rather than RO mode. This
> replayed the journal logs, after which I was able to do a safer
> xfs_repair without the -L option.

Excellent solution. I've found the installation disk to be absolutely
invaluable at times when a system isn't booting (for any of various
reasons).

Think you'll reconsider using the front panel ("power") button for
shutting the system down, using a disk layout that you believe has
a tendency towards unreliability, or even using a perhaps finnicky
filesystem? What would you have done if the installation disk's
huge.s kernel didn't have support for XFS? (I suppose the obvious
answer would be to find a different "rescue" disk that does support
it; never mind that question ... ;-)

--
----------------------------------------------------------------------
Sylvain Robitaille s...@encs.concordia.ca

Systems analyst / AITS Concordia University
Faculty of Engineering and Computer Science Montreal, Quebec, Canada
----------------------------------------------------------------------

Grant

unread,
Nov 13, 2012, 8:25:34 PM11/13/12
to
On Tue, 13 Nov 2012 22:38:43 +0000 (UTC), Sylvain Robitaille <s...@alcor.concordia.ca> wrote:

>Couple reactions ...
>
>On Tue, 13 Nov 2012 13:42:25 -0600, Aaron W Hsu wrote:
>
>> ... I shut the system down like I normally do, by pressing the
>> power button quickly, which on normal kernels triggers the halt(1)
>> behavior, or so I thought.
>
>Please tell me you'll use "shutdown -h" from now on! Is it really worth
>all that aggravation to attempt to save a dozen keystrokes?

Or 'halt' ;)

Grant.

Aaron W. Hsu

unread,
Nov 13, 2012, 11:22:05 PM11/13/12
to
Keith Keller <kkeller...@wombat.san-francisco.ca.us> writes:

>I'm not actively advocating for cavalier use of -L, but it's not
>necessarily a bad thing if you do require it.

Indeed, I am very glad that it is there; I am even more glad that
I did not need to use it this time.

Aaron W. Hsu

unread,
Nov 13, 2012, 11:39:29 PM11/13/12
to
Sylvain Robitaille <s...@alcor.concordia.ca> writes:

>Please tell me you'll use "shutdown -h" from now on! Is it really worth
>all that aggravation to attempt to save a dozen keystrokes?

See below. :-)

>> ... software raid, and RAID-0 especially is hardly a recipe for
>> reliability.

>It begs the question, of course, "then why use it?" Now that said, I
>admire your perseverance and determination to find a working solution
>that would not risk losing your data.

There is a level of convenience involved in using RAID to map my
various hard drives together, together with the minor, but extant
speed benefits that I gain from using it. Together with an active
backup plan that will cover me in the case of a catastrophic failure
and the general personal experience of never having Software RAID be
at the actual point of failure (though I hold no illusions as to the
safety of software raid), I find it to be a convenient and workable
solution to the dual hard drive laptop that I a running.

This would be in contrast to the bcache system which I would really
like to use (this laptop has a third, smaller SSD drive meant for
caching), but which is far from reliable enough to let me consider
it. Software RAID, while hardly server-class reliable, is reliable
enough for use on my personal development workstation, provided
the appropriate backup plan is in place.

>> ... I was able to mount up the 14.0 slackware DVD and mount the drive
>> using an older 3.2.29 kernel in RW mode, rather than RO mode. This
>> replayed the journal logs, after which I was able to do a safer
>> xfs_repair without the -L option.

>Excellent solution. I've found the installation disk to be absolutely
>invaluable at times when a system isn't booting (for any of various
>reasons).

It is indeed. However, I was somewhat suprised to discover that
the xfs_repair program is not included in the initrd image. This
probably would have made it possible to accomplish this all from
the 3.2.29 kernel that I have on my hard drive, rather than using
the DVD disk/USB stick.

>Think you'll reconsider using the front panel ("power") button for
>shutting the system down, using a disk layout that you believe has
>a tendency towards unreliability, or even using a perhaps finnicky
>filesystem?

Well, I like the convenience of using the power button, but
after this experience, I went to figure out what command the
power button was actually triggering. As it turns out, it was
triggering an exact '/sbin/init 0' rather than what I would have
expected, namely, the standard '/sbin/shutdown -h now' command.
Given this, one of the things I will likely do in response to this
is to change it so that the power button will trigger a shutdown(1)
call rather than a direct 'init' call.

As for the disk layout, I think I mostly addressed that above, but
the short answer is that I will probably continue to use RAID-0 as it
has worked well and is reliable enough given the use case.

As for the file system, I am a big fan of the XFS file system. It has
been and I think still is quite reliable, and this fluke I think has
more to do with the power button behavior of my system than the
XFS file system. At any rate, the behavior that was triggered appears
to have a simple patch and I believe that patch will probably find
its way into the kernels, so I don't think this will be an issue in
the future. Moreover, I am quite happy with the performance and
behavior of the XFS file system, and Slackware provides explicit
and good support for it with its various tools. It's certainly one
of the workhorse file systems of Linux, with a lot of nice tools
around it. The same cannot be said for BTRFS, and EXT4 is sort of
weird in some ways. I think for the most part EXT4 is a nice file
system now, but I guess I still just like the XFS. In my experience,
the EXT4 file system has been much more finnicky than the XFS
filesystem, but on most of my systems I have not had a lot of file
system problems. I've been relatively blessed or lucky in the
absence of very serious filesystem or hard drive issues, and
careful enough that I have never lost critical data because of the
issues that I did have.

> What would you have done if the installation disk's
>huge.s kernel didn't have support for XFS? (I suppose the obvious
>answer would be to find a different "rescue" disk that does support
>it; never mind that question ... ;-)

So, if Slackware's system didn't have default support for XFS,
I probably would not be using it. However, it does, and I do. :-)
The installer supports XFS, as do the initrd generation scripts,
the init program itself, and all the userland tools for XFS are
shipped by default and included in the installation disk. If this
were not the case, then I would be much less likely to use it.
Slackware is actually really great about this, providing tools for
file systems like JFS, which I like, but which many other Linux
distributions do not support. The support for file systems throughout
the whole Slackware "life cycle" is really quite great, and one of
the nicer things about Slackware.

Sylvain Robitaille

unread,
Nov 14, 2012, 2:52:58 PM11/14/12
to
On Tue, 13 Nov 2012 22:39:29 -0600, Aaron W Hsu wrote:

> ... one of the things I will likely do in response to this is
> to change it so that the power button will trigger a shutdown(1)
> call rather than a direct 'init' call.

Yes, ok, close enough. I just hate it, though when I accidentally
hit the power button on my netbook, and that's exactly what happens.

It's not as difficult as it might seem to accidentally hit a power
button: on this netbook (the Asus EeePC701), the power button us up
near the display hinge, such that every time I pick the computer up
in my right hand, with the display hinged open, my thumb is right
near the power button. The law of averages ensures that I sometimes
accidentally press the button. :-(

I haven't reprogrammed it, though, to ignore the button press, because I
want to be able to very quickly initiate a clean shutdown, for example
if I'm about to run out of battery power.

> As for the file system, I am a big fan of the XFS file system.

Well, you're certainly right that it's well supported in Slackware.
I tend to be *very* conservative with filesystems, accepting that
perhaps there may be better performance available, but that I'm
familiar with how to manage those I use. With ext3 and ext4, I never
made the switch until it became the default at Slackware installation
time, and in the cases of a few systems, not even then. If I'm tight
for disk space I stick with ext2, and don't have to worry about space
used by journaling, and all the tools to manage the file systems are
the same. I don't think I have any more ext2 filesystems, though
(and it's not really worth checking).

Grant

unread,
Nov 14, 2012, 3:40:12 PM11/14/12
to
On Wed, 14 Nov 2012 19:52:58 +0000 (UTC), Sylvain Robitaille <s...@alcor.concordia.ca> wrote:

>On Tue, 13 Nov 2012 22:39:29 -0600, Aaron W Hsu wrote:
>
>> ... one of the things I will likely do in response to this is
>> to change it so that the power button will trigger a shutdown(1)
>> call rather than a direct 'init' call.
>
>Yes, ok, close enough. I just hate it, though when I accidentally
>hit the power button on my netbook, and that's exactly what happens.

I ask why you not enable the four second reset delay in BIOS?? Solves
this issue by ignoring the odd accidental button press, to hold the
button in for four seconds really means you want to hardware reset.

OTOH Slack should be able to trap the button press because windoze can ;)
>
>It's not as difficult as it might seem to accidentally hit a power
>button: on this netbook (the Asus EeePC701), the power button us up
>near the display hinge, such that every time I pick the computer up
>in my right hand, with the display hinged open, my thumb is right
>near the power button. The law of averages ensures that I sometimes
>accidentally press the button. :-(

There you go, have a chat with the BIOS then :)
>
>I haven't reprogrammed it, though, to ignore the button press, because I
>want to be able to very quickly initiate a clean shutdown, for example
>if I'm about to run out of battery power.

Four seconds is long enough to avoid the accidental press, short enough
for your purpose.
>
>> As for the file system, I am a big fan of the XFS file system.
>
>Well, you're certainly right that it's well supported in Slackware.
>I tend to be *very* conservative with filesystems, accepting that
>perhaps there may be better performance available, but that I'm
>familiar with how to manage those I use. With ext3 and ext4, I never
>made the switch until it became the default at Slackware installation
>time, and in the cases of a few systems, not even then. If I'm tight
>for disk space I stick with ext2, and don't have to worry about space
>used by journaling, and all the tools to manage the file systems are
>the same. I don't think I have any more ext2 filesystems, though
>(and it's not really worth checking).

Downside of xfs is the state info held in memory, it is not known for
being happy with surprise power removals or resets.

Grant.

Jerry Peters

unread,
Nov 14, 2012, 3:53:37 PM11/14/12
to
Grant <o...@grrr.id.au> wrote:
> On Wed, 14 Nov 2012 19:52:58 +0000 (UTC), Sylvain Robitaille <s...@alcor.concordia.ca> wrote:
>
>>On Tue, 13 Nov 2012 22:39:29 -0600, Aaron W Hsu wrote:
>>
>>> ... one of the things I will likely do in response to this is
>>> to change it so that the power button will trigger a shutdown(1)
>>> call rather than a direct 'init' call.
>>
>>Yes, ok, close enough. I just hate it, though when I accidentally
>>hit the power button on my netbook, and that's exactly what happens.
>
> I ask why you not enable the four second reset delay in BIOS?? Solves
> this issue by ignoring the odd accidental button press, to hold the
> button in for four seconds really means you want to hardware reset.
>
> OTOH Slack should be able to trap the button press because windoze can ;)

It does, it's not the BIOS that's shutting down the computer. acpid
is notified of the button press, checks to see if it's handling the
event, and if so handles it. See 'man acpid'.

Jerry

Sylvain Robitaille

unread,
Nov 14, 2012, 5:32:00 PM11/14/12
to
On Thu, 15 Nov 2012 07:40:12 +1100, Grant wrote:

> I ask why you not enable the four second reset delay in BIOS??
> Solves this issue by ignoring the odd accidental button press,
> to hold the button in for four seconds really means you want to
> hardware reset.

That's already there. I'm not talking about invoking a hard reset by
*holding* the power button, but rather causing a clean shutdown by a
*short* press of the power button. See the acpid reference mentioned
in an earlier followup to your message.

> OTOH Slack should be able to trap the button press because windoze can
> ;)

Slackware (well, Linux specifically) does trap the button press. It's
*both* the desired/expected behaviour, *and* the problem (though perhaps
only in my case is it really a problem).

Jim Diamond

unread,
Nov 15, 2012, 8:32:41 AM11/15/12
to
On 2012-11-14 at 15:52 AST, Sylvain Robitaille <s...@alcor.concordia.ca> wrote:
> On Tue, 13 Nov 2012 22:39:29 -0600, Aaron W Hsu wrote:
>
>> ... one of the things I will likely do in response to this is
>> to change it so that the power button will trigger a shutdown(1)
>> call rather than a direct 'init' call.
>
> Yes, ok, close enough. I just hate it, though when I accidentally
> hit the power button on my netbook, and that's exactly what happens.
>
> It's not as difficult as it might seem to accidentally hit a power
> button: on this netbook (the Asus EeePC701), the power button us up
> near the display hinge, such that every time I pick the computer up
> in my right hand, with the display hinged open, my thumb is right
> near the power button. The law of averages ensures that I sometimes
> accidentally press the button. :-(
>
> I haven't reprogrammed it, though, to ignore the button press, because I
> want to be able to very quickly initiate a clean shutdown, for example
> if I'm about to run out of battery power.

In my /etc/acpi/acpi_hndler.sh I have (as well as a lot of other stuff)

console_user=`who | grep ' :0 ' | awk '{print $1}'`
xmessage=/usr/bin/xmessage
shutdown_timeout=10
shutdown_message="Press 'Cancel' within $shutdown_timeout seconds
to cancel system shutdown!"

case "$1" in
button|button/lid|button/power)
case "$2" in
power|PWRF)
if [ "$console_user" = "" ]
then
/sbin/init 0
else
echo "$shutdown_message" | su $console_user -c \
"env DISPLAY=:0.0 $xmessage -file - -default Cancel \
-buttons 'Shutdown Now:0,Cancel:1' \
-timeout $shutdown_timeout" >/dev/null 2>&1
if [ $? = 0 ]
then
su $console_user -c \
"env DISPLAY=:0.0 $xmessage 'System shutting down'" &
sleep 1
/sbin/init 0
else
su $console_user -c \
"env DISPLAY=:0.0 $xmessage 'System shutdown cancelled'"
if

...


(Minor changes from my actual code to simplify the example.)


If I accidentally hit the power button, I have 10 seconds to cancel
the shutdown, otherwise I can wait or click the "Shutdown Now" button
and shut down. I highly recommend it.

Cheers.
Jim

Sylvain Robitaille

unread,
Nov 15, 2012, 12:07:15 PM11/15/12
to
On Thu, 15 Nov 2012 09:32:41 -0400, Jim Diamond wrote:

> In my /etc/acpi/acpi_hndler.sh I have ... (a useful code snippet to
> address accidental power-button presses, while still allowing for an
> immediate clean shutdown) ...

Thanks for that. I'll certainly give some thought to the idea.
I honestly hadn't thought of using something like xmessage for this,
and it seems as though you set your system up for precisely the same
situation I was describing.

> If I accidentally hit the power button, I have 10 seconds to cancel
> the shutdown, otherwise I can wait or click the "Shutdown Now"
> button and shut down. I highly recommend it.

I like it. I think you did a good job with that.

Jim Diamond

unread,
Nov 15, 2012, 9:24:25 PM11/15/12
to
On 2012-11-15 at 13:07 AST, Sylvain Robitaille <s...@alcor.concordia.ca> wrote:
> On Thu, 15 Nov 2012 09:32:41 -0400, Jim Diamond wrote:
>
>> In my /etc/acpi/acpi_hndler.sh I have ... (a useful code snippet to
>> address accidental power-button presses, while still allowing for an
>> immediate clean shutdown) ...
>
> Thanks for that. I'll certainly give some thought to the idea.
> I honestly hadn't thought of using something like xmessage for this,
> and it seems as though you set your system up for precisely the same
> situation I was describing.

>> If I accidentally hit the power button, I have 10 seconds to cancel
>> the shutdown, otherwise I can wait or click the "Shutdown Now"
>> button and shut down. I highly recommend it.
>
> I like it. I think you did a good job with that.

Thanks.

With my current laptop, the position of the power button makes it hard
to accidentally cause a problem. But with my previous laptop, it was
easy. After accidentally shutting it down once or twice at very
inopportune moments, I decided that suspenders and a belt sometimes
makes sense.

It occurs to me that I could probably just set XAUTHORITY (or some
such thing) to the logged-in user's .Xauthority file, but since I
thought of the other way first, there it is.

Cheers.
Jim
0 new messages