Large Linux XFS volumes and xfs

Steve Cousins

unread,

Sep 8, 2006, 5:07:32 PM9/8/06

to

I'm testing a 9 TB xfs file system on a X86_64 Linux system and it is
all working fine except that I just tried (on a whim) to run xfs_check
on it and it gave me the message "out of memory". I looked into it a
bit and saw that I should use xfs_check64. I did this and it ran for a
while and then crashed. I was watching it with "top" and saw that it
was wanting 20GB of RAM/swap. I have 4 GB of RAM and 2 GB of swap so of
course it crashed.

What do others do with Linux file systems this large? Do you have 20 GB
of RAM? Do you use a different file system that doesn't use as much RAM
to check the fs? Do you just rely on not needing to use xfs_check64?

Thanks,

Steve

Jan-Frode Myklebust

unread,

Sep 8, 2006, 8:23:31 PM9/8/06

to

On 2006-09-08, Steve Cousins <steve....@maine.edu> wrote:
> I'm testing a 9 TB xfs file system on a X86_64 Linux system and it is
> all working fine except that I just tried (on a whim) to run xfs_check
> on it and it gave me the message "out of memory". I looked into it a
> bit and saw that I should use xfs_check64. I did this and it ran for a
> while and then crashed. I was watching it with "top" and saw that it
> was wanting 20GB of RAM/swap. I have 4 GB of RAM and 2 GB of swap so of
> course it crashed.

Ouch, looks like you're right:

http://oss.sgi.com/archives/linux-xfs/2005-08/msg00045.html

> What do others do with Linux file systems this large? Do you have 20 GB
> of RAM? Do you use a different file system that doesn't use as much RAM
> to check the fs? Do you just rely on not needing to use xfs_check64?

I never really went above multiple volumes of 0.5 TB on my XFS-based file
servers. For the really large fs's (10+ TB) I've started using GPFS, but
have never had to repair (mmfsck) anything larger than 2 TB. Should
probably look into how GPFS's mmfsck will handle the 10+ TB fs's with *many*
files...

Quick (?) test on a 10 TB fs, 1,3 TB used, 6 million inodes, seems
to indicate that it's a bit smarter than xfs_check. It's doing
multiple passes over the inodes because it doesn't have enough
memory to do it in one go. Also it's splitting (some of) the work
over all nodes in the cluster :

# date ; mmfsck mailusers -n -v; date
Sat Sep 9 02:00:59 CEST 2006
Multiple passes over all inodes will be performed due to a
shortage of available memory. File system check would need
a minimum available pagepool memory of 1416M bytes to perform
only one pass over storage pool "system".
The currently available memory for use by mmfsck is 1023M bytes.
Checking "mailusers"
fsckFlags 0x9
needNewLogs 0
nThreads 8
clientTerm 0
fsckReady 1
fsckCreated 0
Disks 9
Bytes per subblock 2048
Sectors per subblock 4
Sectors per indirect block 16
Subblocks per block 32
Subblocks per indirect block 4
Inodes 6412032
Inode size 512
singleINum -1
Inode regions 53
maxInodesPerSegment 9472
Segments per inode region 13
Bytes per inode segment 4096
nInode0Files 1
Memory available per pass 1071298512
Regions per pass 16346
fsckStatus 2
Inodes per inode block 128
Data ptrs per inode 32
Indirect ptrs per inode 32
Data ptrs per indirect 679
User files exposed some
Meta files exposed some
User files ill replicated some
Meta files ill replicated some
User files unbalanced some
Meta files unbalanced some
Current snapshots 0
Max snapshots 31
Checking inodes
Regions 0 to 16345 of total 22613 in storage pool "system".
Node 172.20.42.7 (mail1) starting inode scan 0 to 1282431
Node 172.20.42.9 (mail2) starting inode scan 1282432 to 2564863
Node 172.20.42.10 (smtp1) starting inode scan 2564864 to 3847295
Node 172.20.42.11 (maildb) starting inode scan 3847296 to 5129727
Node 172.20.42.8 (smtp2) starting inode scan 5129728 to 6412031
Node 172.20.42.7 (mail1) ending inode scan 0 to 1282431
Node 172.20.42.9 (mail2) ending inode scan 1282432 to 2564863
Node 172.20.42.11 (maildb) ending inode scan 3847296 to 5129727
Node 172.20.42.8 (smtp2) ending inode scan 5129728 to 6412031
Node 172.20.42.10 (smtp1) ending inode scan 2564864 to 3847295

Lost blocks were found.
Correct the allocation map? no

Regions 16346 to 22612 of total 22613 in storage pool "system".
Node 172.20.42.7 (mail1) starting inode scan 0 to 1282431
Node 172.20.42.11 (maildb) starting inode scan 1282432 to 2564863
Node 172.20.42.8 (smtp2) starting inode scan 2564864 to 3847295
Node 172.20.42.10 (smtp1) starting inode scan 3847296 to 5129727
Node 172.20.42.9 (mail2) starting inode scan 5129728 to 6412031
Node 172.20.42.11 (maildb) ending inode scan 1282432 to 2564863
Node 172.20.42.7 (mail1) ending inode scan 0 to 1282431
Node 172.20.42.8 (smtp2) ending inode scan 2564864 to 3847295
Node 172.20.42.10 (smtp1) ending inode scan 3847296 to 5129727
Node 172.20.42.9 (mail2) ending inode scan 5129728 to 6412031
Checking inode map file
Checking directories and files
<taking a looong time here.. I'll let it run over night>

-jf

Jan-Frode Myklebust

unread,

Sep 8, 2006, 8:59:30 PM9/8/06

to

Finished after 50 minutes. Final output at the bottom.

Checking log files
Checking extended attributes file
Checking allocation summary file
Checking policy file
Checking filesets metadata
Checking file reference counts
Checking file system replication status

6412032 inodes
4589943 allocated
0 repairable
0 repaired
0 damaged
0 deallocated
0 orphaned
0 attached

5263978239 subblocks
661936663 allocated
11616 unreferenced
0 deletable
0 deallocated

24706229 addresses
0 suspended

File system contains unrepaired damage.
Exit status 0:0:8.
Sat Sep 9 02:51:02 CEST 2006

Ooops, guess I'll need to re-run it to correct the allocation map.

-jf

Steve Cousins

unread,

Sep 11, 2006, 12:28:06 PM9/11/06

to Jan-Frode Myklebust

Jan-Frode Myklebust wrote:

>Finished after 50 minutes. Final output at the bottom.
>
>

Hi Jan-Frode,

Long time. Is GPFS an IBM-only file system? The little bit I've looked
makes it seem so.

Thanks,

Steve

Steve Cousins

unread,

Sep 12, 2006, 12:55:13 PM9/12/06

to

Steve Cousins wrote:

Just to let people know what I ended up doing, I added a 20 GB swap file
and xfs_check64 now works, if slowly. It showed up a few minor problems
so I ran xfs_repair and that only took 2.5 GB of RAM and it only took
4.5 minutes to run. Go figure.

Steve

Large Linux XFS volumes and xfs_check?

Steve Cousins

Jan-Frode Myklebust

Jan-Frode Myklebust

Steve Cousins

Steve Cousins