Some time ago, I discovered some problems with xfs. Unfortunatly, I had
no time diving into it. However, some weeks ago some other people
running debian on ARM machines confirmed the problem on their machines
starting at [1], so I think it is appropitate to at least report it.
It has also been seen on 2.6.27-rc4 [2].
summary: the xfs partition corrupts almost immediatly after creation. I
had the impression, that the first unlink (rm) causes the corruption,
but this might be just an impression.
During the tests I made, I conserved a image of the corrupted filesystem
which I can make available on request (it's 26 Mbyte, gzipped).
Please let me know how I can assist you in finding the problem.
[1] http://lists.debian.org/debian-arm/2008/08/msg00155.html
[2] http://lists.debian.org/debian-arm/2008/08/msg00184.html
Best regards,
Tobias Frost
http://blog.coldtobi.de
PS: Thank you for your great work!
Some Logs (copies from the debian mailing list, so you don't have to
follow the whole thread there:)
-I did test xfs on my Thecus 2100. I could reproduce the fs-corruption
with xfs.
The xfs was created freshly on the partition used to be swap.
The corruption occured after downloading the ltp from source-forge,
untaring it and a attempted make
(The make never completed, therefore I did not run the stress-tests of
ltp)
Some infos:
thecus:~/#uname -a
Linux thecus.coldtobi.ip 2.6.26-1-iop32x #1 Fri Aug 8 23:42:37 UTC 2008
armv5tel GNU/Linux
thecus:~# dpkg -l xfsprogs
+++-==============================================================
ii xfsprogs 2.9.8-1 Utilities for managing the XFS filesystem
hecus:~#xfs_check /dev/md1 2>&1 | tee fsck.log -
ERROR: The filesystem has valuable metadata changes in a log which needs
to
be replayed. Mount the filesystem to replay the log, and unmount it
before
re-running xfs_check. If you are unable to mount the filesystem, then
use
the xfs_repair -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a
mount
of the filesystem before doing this.
ERROR: The filesystem has valuable metadata changes in a log which needs
to
be replayed. Mount the filesystem to replay the log, and unmount it
before
re-running xfs_check. If you are unable to mount the filesystem, then
use
the xfs_repair -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a
mount
of the filesystem before doing this.
thecus:~# mount -o ro /dev/md1 /tmp/tst/
thecus:~# dmesg
[43132282.570000] Filesystem "md1": Disabling barriers, not supported by
the underlying device
[43132282.590000] XFS mounting filesystem md1
[43132283.600000] Starting XFS recovery on filesystem: md1 (logdev:
internal)
[43132283.620000] Filesystem "md1": XFS internal error
xlog_valid_rec_header(1) at line 3471 of file fs/xfs/xfs_log_recover.c.
Caller 0xbf24b298
[43132283.640000] [<c00291e0>] (dump_stack+0x0/0x14) from [<bf232704>]
(xfs_error_report+0x4c/0x5c [xfs])
[43132283.650000] [<bf2326b8>] (xfs_error_report+0x0/0x5c [xfs]) from
[<bf249fc4>] (xlog_valid_rec_header+0x150/0x184 [xfs])
[43132283.660000] r4:defc0000
[43132283.660000] [<bf249e74>] (xlog_valid_rec_header+0x0/0x184 [xfs])
from [<bf24b298>] (xlog_do_recovery_pass+0x21c/0x824 [xfs])
[43132283.670000] r5:defbc4a0 r4:00000000
[43132283.680000] [<bf24b07c>] (xlog_do_recovery_pass+0x0/0x824 [xfs])
from [<bf24b8ec>] (xlog_do_log_recovery+0x4c/0x98 [xfs])
[43132283.690000] [<bf24b8a0>] (xlog_do_log_recovery+0x0/0x98 [xfs])
from [<bf24b958>] (xlog_do_recover+0x20/0x124 [xfs])
[43132283.700000] r9:00000000 r8:df738400 r6:000008f8 r5:ce0512e0
r4:000008f8
[43132283.710000] [<bf24b938>] (xlog_do_recover+0x0/0x124 [xfs]) from
[<bf24baf0>] (xlog_recover+0x94/0xbc [xfs])
[43132283.720000] r9:00000000 r8:df738400 r6:000008f8 r5:000001f0
r4:ce0512e0
[43132283.730000] [<bf24ba5c>] (xlog_recover+0x0/0xbc [xfs]) from
[<bf2442b8>] (xfs_log_mount+0xe0/0x164 [xfs])
[43132283.730000] r7:00000000 r6:00000000 r4:001dc860
[43132283.730000] [<bf2441d8>] (xfs_log_mount+0x0/0x164 [xfs]) from
[<bf24db8c>] (xfs_mountfs+0x270/0x664 [xfs])
[43132283.750000] r8:df738420 r7:df738400 r6:00005000 r5:00000000
r4:0003b90c
[43132283.760000] [<bf24d91c>] (xfs_mountfs+0x0/0x664 [xfs]) from
[<bf2554c4>] (xfs_mount+0x290/0x348 [xfs])
[43132283.760000] [<bf255234>] (xfs_mount+0x0/0x348 [xfs]) from
[<bf266854>] (xfs_fs_fill_super+0xbc/0x208 [xfs])
[43132283.780000] [<bf266798>] (xfs_fs_fill_super+0x0/0x208 [xfs]) from
[<c00946c4>] (get_sb_bdev+0xf4/0x14c)
[43132283.790000] [<c00945d0>] (get_sb_bdev+0x0/0x14c) from [<bf264dd4>]
(xfs_fs_get_sb+0x24/0x30 [xfs])
[43132283.800000] [<bf264db0>] (xfs_fs_get_sb+0x0/0x30 [xfs]) from
[<c00941d0>] (vfs_kern_mount+0xa0/0x140)
[43132283.810000] [<c0094130>] (vfs_kern_mount+0x0/0x140) from
[<c00942d0>] (do_kern_mount+0x40/0xdc)
[43132283.820000] [<c0094290>] (do_kern_mount+0x0/0xdc) from
[<c00ab0d0>] (do_new_mount+0x5c/0x8c)
[43132283.830000] r8:00000001 r7:00000040 r6:df0d1ef0 r5:dfe7b000
r4:00000001
[43132283.830000] [<c00ab074>] (do_new_mount+0x0/0x8c) from [<c00ab298>]
(do_mount+0x198/0x1c0)
[43132283.850000] r7:df0d1ef0 r6:00000040 r5:00000001 r4:00000000
[43132283.850000] [<c00ab100>] (do_mount+0x0/0x1c0) from [<c00ab34c>]
(sys_mount+0x8c/0xd4)
[43132283.860000] [<c00ab2c0>] (sys_mount+0x0/0xd4) from [<c0024a60>]
(ret_fast_syscall+0x0/0x3c)
[43132283.860000] r7:00000015 r6:beb295c0 r5:beb29598 r4:00000000
[43132283.870000] XFS: log mount/recovery failed: error 117
[43132283.910000] XFS: log mount failed
thecus:~# xfs_repair /dev/md1
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
ERROR: The filesystem has valuable metadata changes in a log which needs
to
be replayed. Mount the filesystem to replay the log, and unmount it
before
re-running xfs_repair. If you are unable to mount the filesystem, then
use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a
mount
of the filesystem before doing this.
thecus:~# xfs_repair -L /dev/md1
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
ALERT: The filesystem has valuable metadata changes in a log which is
being
destroyed because the -L option was used.
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
Phase 5 - rebuild AG headers and trees...
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done
thecus:~# xfs_check /dev/md1 2>&1 | tee fsck.log -
thecus:~# mount /dev/md1 /tmp/tst/
thecus:~# dmesg
[43132552.030000] Filesystem "md1": Disabling barriers, not supported by
the underlying device
[43132552.050000] XFS mounting filesystem md1
[43132552.190000] Ending clean XFS mount for filesystem: md1
thecus:~# cd /tmp/tst
thecus:/tmp/tst# rm -r ltp-full-20080731
rm: cannot remove directory
`ltp-full-20080731/testcases/kernel/syscalls': Directory not empty
rm: cannot remove directory
`ltp-full-20080731/testcases/ballista/ballista/outfiles': Directory not
empty
rm: cannot remove directory
`ltp-full-20080731/testcases/open_posix_testsuite/conformance/interfaces': Directory not empty
rm: cannot remove directory
`ltp-full-20080731/testcases/network/rpc/rpc-tirpc-full-test-suite':
Directory not empty
rm: cannot remove directory
`ltp-full-20080731/testcases/open_hpi_testsuite/utils/t/epath':
Directory not empty
thecus:/tmp/tst# rm -rf ltp-full-20080731
rm: cannot remove directory
`ltp-full-20080731/testcases/kernel/syscalls': Directory not empty
rm: cannot remove directory
`ltp-full-20080731/testcases/ballista/ballista/outfiles': Directory not
empty
rm: cannot remove directory
`ltp-full-20080731/testcases/open_posix_testsuite/conformance/interfaces': Directory not empty
rm: cannot remove directory
`ltp-full-20080731/testcases/network/rpc/rpc-tirpc-full-test-suite':
Directory not empty
rm: cannot remove directory
`ltp-full-20080731/testcases/open_hpi_testsuite/utils/t/epath':
Directory not empty
thecus:~# dmesg
[43132552.190000] Ending clean XFS mount for filesystem: md1
[43132681.530000] 00000000: 58 46 53 42 00 00 10 00 00 00 00 00 00 07 72
10 XFSB..........r.
[43132681.550000] Filesystem "md1": XFS internal error xfs_da_do_buf(2)
at line 2085 of file fs/xfs/xfs_da_btree.c. Caller 0xbf226cac
[43132681.560000] [<c00291e0>] (dump_stack+0x0/0x14) from [<bf232704>]
(xfs_error_report+0x4c/0x5c [xfs])
[43132681.570000] [<bf2326b8>] (xfs_error_report+0x0/0x5c [xfs]) from
[<bf232770>] (xfs_corruption_error+0x5c/0x68 [xfs])
[43132681.580000] r4:def2e400
[43132681.580000] [<bf232714>] (xfs_corruption_error+0x0/0x68 [xfs])
from [<bf226b00>] (xfs_da_do_buf+0x568/0x688 [xfs])
[43132681.580000] r6:bf226cac r5:00000000 r4:ce179438
[43132681.600000] [<bf226598>] (xfs_da_do_buf+0x0/0x688 [xfs]) from
[<bf226cac>] (xfs_da_read_buf+0x34/0x3c [xfs])
[43132681.600000] [<bf226c78>] (xfs_da_read_buf+0x0/0x3c [xfs]) from
[<bf22ccdc>] (xfs_dir2_leaf_getdents+0x484/0x8bc [xfs])
[43132681.620000] [<bf22c858>] (xfs_dir2_leaf_getdents+0x0/0x8bc [xfs])
from [<bf229200>] (xfs_readdir+0xcc/0xe0 [xfs])
[43132681.620000] [<bf229134>] (xfs_readdir+0x0/0xe0 [xfs]) from
[<bf25ff7c>] (xfs_file_readdir+0x144/0x194 [xfs])
[43132681.640000] [<bf25fe38>] (xfs_file_readdir+0x0/0x194 [xfs]) from
[<c009ee48>] (vfs_readdir+0x84/0xb8)
[43132681.650000] [<c009edc4>] (vfs_readdir+0x0/0xb8) from [<c009eee8>]
(sys_getdents64+0x6c/0xc0)
[43132681.650000] [<c009ee7c>] (sys_getdents64+0x0/0xc0) from
[<c0024a60>] (ret_fast_syscall+0x0/0x3c)
[43132681.670000] r7:000000d9 r6:0001ea84 r5:0001ea98 r4:00000000
[43132682.010000] 00000000: 58 46 53 42 00 00 10 00 00 00 00 00 00 07 72
10 XFSB..........r.
[43132682.030000] Filesystem "md1": XFS internal error xfs_da_do_buf(2)
at line 2085 of file fs/xfs/xfs_da_btree.c. Caller 0xbf226cac
[43132682.040000] [<c00291e0>] (dump_stack+0x0/0x14) from [<bf232704>]
(xfs_error_report+0x4c/0x5c [xfs])
[43132682.050000] [<bf2326b8>] (xfs_error_report+0x0/0x5c [xfs]) from
[<bf232770>] (xfs_corruption_error+0x5c/0x68 [xfs])
[43132682.050000] r4:def2e400
[43132682.050000] [<bf232714>] (xfs_corruption_error+0x0/0x68 [xfs])
from [<bf226b00>] (xfs_da_do_buf+0x568/0x688 [xfs])
[43132682.080000] r6:bf226cac r5:00000000 r4:ce179438
[43132682.080000] [<bf226598>] (xfs_da_do_buf+0x0/0x688 [xfs]) from
[<bf226cac>] (xfs_da_read_buf+0x34/0x3c [xfs])
[43132682.090000] [<bf226c78>] (xfs_da_read_buf+0x0/0x3c [xfs]) from
[<bf22ccdc>] (xfs_dir2_leaf_getdents+0x484/0x8bc [xfs])
[43132682.100000] [<bf22c858>] (xfs_dir2_leaf_getdents+0x0/0x8bc [xfs])
from [<bf229200>] (xfs_readdir+0xcc/0xe0 [xfs])
[43132682.110000] [<bf229134>] (xfs_readdir+0x0/0xe0 [xfs]) from
[<bf25ff7c>] (xfs_file_readdir+0x144/0x194 [xfs])
[43132682.130000] [<bf25fe38>] (xfs_file_readdir+0x0/0x194 [xfs]) from
[<c009ee48>] (vfs_readdir+0x84/0xb8)
[43132682.140000] [<c009edc4>] (vfs_readdir+0x0/0xb8) from [<c009eee8>]
(sys_getdents64+0x6c/0xc0)
[43132682.150000] [<c009ee7c>] (sys_getdents64+0x0/0xc0) from
[<c0024a60>] (ret_fast_syscall+0x0/0x3c)
[43132682.150000] r7:000000d9 r6:0001fdc4 r5:0001fdd8 r4:00000000
[43132683.550000] 00000000: 58 46 53 42 00 00 10 00 00 00 00 00 00 07 72
10 XFSB..........r.
[43132683.570000] Filesystem "md1": XFS internal error xfs_da_do_buf(2)
at line 2085 of file fs/xfs/xfs_da_btree.c. Caller 0xbf226cac
[43132683.580000] [<c00291e0>] (dump_stack+0x0/0x14) from [<bf232704>]
(xfs_error_report+0x4c/0x5c [xfs])
[43132683.590000] [<bf2326b8>] (xfs_error_report+0x0/0x5c [xfs]) from
[<bf232770>] (xfs_corruption_error+0x5c/0x68 [xfs])
[43132683.610000] r4:def2e400
[43132683.610000] [<bf232714>] (xfs_corruption_error+0x0/0x68 [xfs])
from [<bf226b00>] (xfs_da_do_buf+0x568/0x688 [xfs])
[43132683.620000] r6:bf226cac r5:00000000 r4:ce179438
[43132683.620000] [<bf226598>] (xfs_da_do_buf+0x0/0x688 [xfs]) from
[<bf226cac>] (xfs_da_read_buf+0x34/0x3c [xfs])
[43132683.640000] [<bf226c78>] (xfs_da_read_buf+0x0/0x3c [xfs]) from
[<bf22ccdc>] (xfs_dir2_leaf_getdents+0x484/0x8bc [xfs])
[43132683.650000] [<bf22c858>] (xfs_dir2_leaf_getdents+0x0/0x8bc [xfs])
from [<bf229200>] (xfs_readdir+0xcc/0xe0 [xfs])
[43132683.650000] [<bf229134>] (xfs_readdir+0x0/0xe0 [xfs]) from
[<bf25ff7c>] (xfs_file_readdir+0x144/0x194 [xfs])
[43132683.670000] [<bf25fe38>] (xfs_file_readdir+0x0/0x194 [xfs]) from
[<c009ee48>] (vfs_readdir+0x84/0xb8)
[43132683.680000] [<c009edc4>] (vfs_readdir+0x0/0xb8) from [<c009eee8>]
(sys_getdents64+0x6c/0xc0)
[43132683.690000] [<c009ee7c>] (sys_getdents64+0x0/0xc0) from
[<c0024a60>] (ret_fast_syscall+0x0/0x3c)
[43132683.690000] r7:000000d9 r6:0001fe04 r5:0001fe18 r4:00000000
(..)
Valid signature
--
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
At one point there were other patches floating around to "fix" arm which
were not correct. Are these problems seen on a pristine 2.6.27-rc4
kernel, or with other special arm patches applied?
-Eric
You could try undoing this:
/* ARM old ABI has some weird alignment/padding */
#if defined(__arm__) && !defined(__ARM_EABI__)
#define __arch_pack __attribute__((packed))
#else
#define __arch_pack
#endif
and just define __arch_pack to nothing unconditionally, to see if that's
what broke...
Or if someone can provide an xfs.ko, or point "pahole" at it yourself,
and see if xfs_dir2_sf_hdr, xfs_dir2_sf_entry, and xfs_dir2_sf look more
like
http://sandeen.net/xfs-diskformat/xfs-oldabi-arm-pahole-structs
or
http://sandeen.net/xfs-diskformat/xfs-oldabi-fixed-arm-pahole-structs
?
otherwise I will give this a whirl in the emulator again...
Thanks,
-Eric
Ok, actually: first - sorry for the scattershot replies. I thought
userspace was updated earlier, but:
xfsprogs-2.10.1 (5 September 2008)...
- Add packed on-disk shortform directory for ARM's old ABI, thanks to
Eric Sandeen.
and the original kernel change:
[XFS] Pack some shortform dir2 structures for the ARM old ABI
architecture.
...
Note that userspace needs a similar treatment, and any filesystems which
were running with the previous rogue "fix" will now see corruption
...
So perhaps as a first easy test, can you please re-test with
xfsprogs-2.10.1 or newer.
Thanks,
-Eric
I tried to reproduce this problem on my ARM machine and it's really
easy to trigger. See the transcript below.
I tried with 2.6.26.6 (without the ARM old ABI fix) and 2.6.27 (with
the fix), and with xfsprogs 2.9.8-1.
Note that I'm actually using the ARM EABI, and not the old ABI.
I'm not sure what Tobias used.
xfs.ko compiled with -g can be found at http://www.cyrius.com/tmp/xfs.ko.bz2
(3.1 MB)
Here's the transcript. It's really easy to trigger. Just copy some
files to the XFS partition (works) and then run 'ls' (oops):
debian:~# modprobe xfs
debian:~# mkfs.xfs -f /dev/sda6
meta-data=/dev/sda6 isize=256 agcount=4, agsize=17778431 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=71113722, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096
log =internal log bsize=4096 blocks=32768, version=2
= sectsz=512 sunit=0 blks, lazy-count=0
realtime =none extsz=4096 blocks=0, rtextents=0
debian:~# dmesg | tail -n 2
[42949548.970000] SGI XFS with ACLs, security attributes, realtime, large block numbers, no debug enabled
[42949548.980000] SGI XFS Quota Management subsystem
debian:~# mount /dev/sda6 /mnt
debian:~# dmesg | tail -n 2
[42949596.470000] XFS mounting filesystem sda6
[42949596.610000] Ending clean XFS mount for filesystem: sda6
debian:~# cp /usr/bin/* /mnt/
debian:~# dmesg | tail -n 2
[42949596.470000] XFS mounting filesystem sda6
[42949596.610000] Ending clean XFS mount for filesystem: sda6
debian:~# ls /mnt
ls: reading directory /mnt: Structure needs cleaning
debian:~# dmesg | tail -n 16
[42949596.610000] Ending clean XFS mount for filesystem: sda6
[42949619.790000] 00000000: 58 46 53 42 00 00 10 00 00 00 00 00 04 3d 1b fa XFSB.........=..
[42949619.800000] Filesystem "sda6": XFS internal error xfs_da_do_buf(2) at line 2107 of file fs/xfs/xfs_da_btree.c. Caller 0xbf148b44
[42949619.820000] [<c002a370>] (dump_stack+0x0/0x14) from [<bf154968>] (xfs_error_report+0x4c/0x5c [xfs])
[42949619.820000] [<bf15491c>] (xfs_error_report+0x0/0x5c [xfs]) from [<bf1549d4>] (xfs_corruption_error+0x5c/0x68 [xfs])
[42949619.830000] r4:c7914400
[42949619.840000] [<bf154978>] (xfs_corruption_error+0x0/0x68 [xfs]) from [<bf1489b8>] (xfs_da_do_buf+0x554/0x654 [xfs])
[42949619.850000] r6:bf148b44 r5:00000000 r4:c7073418
[42949619.850000] [<bf148464>] (xfs_da_do_buf+0x0/0x654 [xfs]) from [<bf148b44>] (xfs_da_read_buf+0x34/0x3c [xfs])
[42949619.860000] [<bf148b10>] (xfs_da_read_buf+0x0/0x3c [xfs]) from [<bf14edec>] (xfs_dir2_leaf_getdents+0x480/0x8b4 [xfs])
[42949619.880000] [<bf14e96c>] (xfs_dir2_leaf_getdents+0x0/0x8b4 [xfs]) from [<bf14b07c>] (xfs_readdir+0xcc/0xe0 [xfs])
[42949619.890000] [<bf14afb0>] (xfs_readdir+0x0/0xe0 [xfs]) from [<bf18140c>] (xfs_file_readdir+0x144/0x194 [xfs])
[42949619.900000] [<bf1812c8>] (xfs_file_readdir+0x0/0x194 [xfs]) from [<c009ffb0>] (vfs_readdir+0x84/0xb8)
[42949619.910000] [<c009ff2c>] (vfs_readdir+0x0/0xb8) from [<c00a0050>] (sys_getdents64+0x6c/0xc0)
[42949619.920000] [<c009ffe4>] (sys_getdents64+0x0/0xc0) from [<c0025bc0>] (ret_fast_syscall+0x0/0x3c)
[42949619.930000] r7:000000d9 r6:0002a01c r5:0002a030 r4:00000000
debian:~#
--
Martin Michlmayr
http://www.cyrius.com/
Thanks; a quick look at the disk structure sizes & offsets shows no
differences (as I'd hope/expect for ARM EABI).
> Here's the transcript. It's really easy to trigger. Just copy some
> files to the XFS partition (works) and then run 'ls' (oops):
So is this a regression? did it used to work? If so, when? :)
(just for the record; it didn't oops, it shut down the filesystem and
gave you a backtrace to the error...)
It's trying to get a buffer for a directory leaf block from disk, and
it's finding that the magic number is bad.
What's a little odd is that the buffer it dumped out looks like the
beginning of a perfectly valid superblock for your filesystem (magic,
block size, and block count all match). If you printk the "bno"
variable right around line 2106 in xfs_da_btree.c, can you see what you get?
creating an xfs_metadump of the filesystem for examination on a non-arm
box might also be interesting.
Thanks,
-Eric
--
The original report was with 2.6.18 but that was with the old ABI:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=423562
I just installed a 2.6.22 kernel with EABI and I can also trigger
the bug. So it's not a (recent) regression.
> What's a little odd is that the buffer it dumped out looks like the
> beginning of a perfectly valid superblock for your filesystem
> (magic, block size, and block count all match). If you printk the
> "bno" variable right around line 2106 in xfs_da_btree.c, can you see
> what you get?
bno is 0.
> creating an xfs_metadump of the filesystem for examination on a
> non-arm box might also be interesting.
http://www.cyrius.com/tmp/dump5
(11 MB)
--
Martin Michlmayr
http://www.cyrius.com/
As far as I can remember (I only used old ABI arm) this is not a
regression. XFS never worked on arm for me.
If you need tests on old ABI just tell me.
--
Ever tried. Ever failed. No matter.
Try again. Fail again. Fail better.
~ Samuel Beckett ~
Ok; I think there are probably a few problems, so trying to keep them
straight. I at least had xfs working properly in a qemu arm emulator a
few weeks ago... :)
> If you need tests on old ABI just tell me.
>
ok, thanks!
-Eric
Ok, that's a little odd. (correlates with the "bad" magic that was
seen, because block 0 is the superblock, but doesn't make sense because
we were trying to read a directory leaf block, in theory)
If you unmount & remount, does the ls work then?
>> creating an xfs_metadump of the filesystem for examination on a
>> non-arm box might also be interesting.
>
> http://www.cyrius.com/tmp/dump5
> (11 MB)
Thanks.
xfs_repair on x86 shows no errors; however it won't mount normally (bad
log clientid) - but mount -o norecovery,ro and subsequent ls works fine
(at first I thought filenames were badly scrambled but then remembered
that xfs_metadump does this by default ;))
The remaining problem that I know of on some arm architectures is a vmap
cache aliasing problem that usually shows up as log corruption; that may
explain the bad clientid thing but not sure why we're reading block 0 above.
Do you know what cachepolicy you're booted with? If it's writeallocate,
you might try cachepolicy=writeback, otherwise try cachepolicy=uncached
(which will be horribly slow) and see if the problem goes away or not;
it'd be a clue.
-Eric
I cannot even mount it:
debian:~# mkfs.xfs -f /dev/sda5
meta-data=/dev/sda5 isize=256 agcount=4, agsize=94380 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=377519, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096
log =internal log bsize=4096 blocks=2560, version=2
= sectsz=512 sunit=0 blks, lazy-count=0
realtime =none extsz=4096 blocks=0, rtextents=0
debian:~# mount /dev/sda5 /mnt
[42949596.920000] XFS mounting filesystem sda5
debian:~# cp /usr/bin/* /mnt/
debian:~# umount /mnt
debian:~# mount -t xfs /dev/sda5 /mnt
[42949612.290000] XFS mounting filesystem sda5
[42949612.460000] Starting XFS recovery on filesystem: sda5 (logdev: internal)
[42949612.480000] XFS: xlog_recover_process_data: bad flag
[42949612.500000] XFS: log mount/recovery failed: error 5
[42949612.500000] XFS: log mount failed
mount: /dev/sda5: can't read superblock
debian:~#
> Do you know what cachepolicy you're booted with? If it's writeallocate,
> you might try cachepolicy=writeback, otherwise try cachepolicy=uncached
> (which will be horribly slow) and see if the problem goes away or not;
> it'd be a clue.
I just tried with cachepolicy=writeback and cachepolicy=uncached but I
get the same problem.
--
Martin Michlmayr
http://www.cyrius.com/
On Fri, 2008-10-17 at 11:46 +0200, Gaudenz Steinlin wrote:
> On Fri, Oct 17, 2008 at 09:01:09AM +0200, Martin Michlmayr wrote:
> > * Eric Sandeen <san...@sandeen.net> [2008-10-16 17:13]:
> > > So is this a regression? did it used to work? If so, when? :)
> >
> > The original report was with 2.6.18 but that was with the old ABI:
> > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=423562
> > I just installed a 2.6.22 kernel with EABI and I can also trigger
> > the bug. So it's not a (recent) regression.
>
> As far as I can remember (I only used old ABI arm) this is not a
> regression. XFS never worked on arm for me.
>
> If you need tests on old ABI just tell me.
>
--
>> Do you know what cachepolicy you're booted with? If it's writeallocate,
>> you might try cachepolicy=writeback, otherwise try cachepolicy=uncached
>> (which will be horribly slow) and see if the problem goes away or not;
>> it'd be a clue.
>
> I just tried with cachepolicy=writeback and cachepolicy=uncached but I
> get the same problem.
Oh, wow. This sounds like a new problem then; not a cache problem, and
not an alignment problem... hrm. I'll try to think of something else
to try.
Thanks,
-Eric
You are wrong.
Our team have experience with XFS on ARM OABI on NAS. It worked fine on
2.6.12, but got broken on 2.6.17.
http://oss.sgi.com/bugzilla/show_bug.cgi?id=712
--
Regards, Kirill A. Shutemov
+ Belarus, Minsk
+ ALT Linux Team, http://www.altlinux.com/
So any chance to bisect it down to at least a kernel release where it
stopped working?
I'll try to do it.
That's the log replay indicating that there's a bad transaction
header in the log. Very strange - it shoul dbe a clean log. What
does xfs_logprint -t /dev/sda5 tell you about the transactions
in the log?
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
This is probably the vmap cache aliasing problem that we paid a bit of
attention to a few months ago, no?
-Eric
mdebian:~# mkfs.xfs -f /dev/sda5
meta-data=/dev/sda5 isize=256 agcount=4, agsize=94380
blks
= sectsz=512 attr=2
data = bsize=4096 blocks=377519,
imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096
log =internal log bsize=4096 blocks=2560, version=2
= sectsz=512 sunit=0 blks,
lazy-count=0
realtime =none extsz=4096 blocks=0, rtextents=0
debian:~# mount -t xfs /dev/sda5 /mnt
debian:~# cp /usr/bin/* /mnt/
debian:~# umount /mnt
debian:~# mount -t xfs /dev/sda5 /mnt
mount: /dev/sda5: can't read superblock
debian:~# xfs_logprint -t /dev/sda5
xfs_logprint:
data device: 0x805
log device: 0x805 daddr: 1510112 length: 20480
XFS: Log inconsistent (didn't find previous header)
XFS: empty log check failed
xfs_logprint: failed to find head and tail, error: 5
debian:~#
--
Martin Michlmayr
http://www.cyrius.com/
Shouldn't the cachepolicy switch take care of that? But yes, there
are various problems with I/O on vmap regions with virtually indexed
caches, see:
http://www.spinics.net/lists/linux-arch/msg04301.html
Actually, looking again at the second document it shows exactly the
symptoms you're seeing.
>
> -Eric
>
>
---end quoted text---
> > This is probably the vmap cache aliasing problem that we paid a bit of
> > attention to a few months ago, no?
>
> Shouldn't the cachepolicy switch take care of that?
Setting cachepolicy=uncached should make aliasing issues disappear.
If you're still seeing issues with cachepolicy=uncached, it's likely
some other issue.