Bug#756922: e2fsprogs: resize2fs fails to shrink the filesystem; e2fsck reports no problems

Marcin Wolcendorf

unread,

Aug 3, 2014, 10:40:01 AM8/3/14

to

Package: e2fsprogs
Version: 1.42.11-2
Severity: normal

Dear Maintainer,

I tried to shrink a 64bit 17TB by roughly half. This was a quite fresh FS I
needed to move the data from another FS. I extended it once from ~8.5TB to ~17TB,
filled with data, then half of that data was removed. The FS is quick and dirty
- it is put on LVM with 3 PVs - one on a HDD, and 2 on LVs on a RAID.

First I have done that:

# resize2fs -M /dev/home_move/home_move_tmp
resize2fs 1.42.11 (09-Jul-2014)
Resizing the filesystem on /dev/home_move/home_move_tmp to 2008640341 (4k) blocks.
resize2fs: Attempt to write block to filesystem resulted in short write while trying to resize /dev/home_move/home_move_tmp
Please run 'e2fsck -fy /dev/home_move/home_move_tmp' to fix the filesystem
after the aborted resize operation.
# e2fsck -fy /dev/home_move/home_move_tmp
e2fsck 1.42.11 (09-Jul-2014)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences: +(2008547331--2008547343) +(2008547347--2008547359) +(2008548128--2008551455) -(2008640341--2008783756)
Fix? yes

Free blocks count wrong for group #61296 (31994, counted=28640).
Fix? yes

Free blocks count wrong for group #61298 (0, counted=5291).
Fix? yes

Free blocks count wrong for group #61299 (0, counted=32768).
Fix? yes

Free blocks count wrong for group #61300 (0, counted=32768).
Fix? yes

Free blocks count wrong for group #61301 (0, counted=32768).
Fix? yes

Free blocks count wrong for group #61302 (0, counted=32768).
Fix? yes

Free blocks count wrong for group #61303 (25715, counted=32768).
Fix? yes

home_move_tmp: ***** FILE SYSTEM WAS MODIFIED *****
home_move_tmp: 570/546639872 files (0.0% non-contiguous), 2021567334/4373115904 blocks

Then I tried to shrink it by giving the size by hand. I ended up with:

# resize2fs /dev/home_move/home_move_tmp 4269801500
resize2fs 1.42.11 (09-Jul-2014)
Resizing the filesystem on /dev/home_move/home_move_tmp to 4269801500 (4k) blocks.
resize2fs: Attempt to write block to filesystem resulted in short write while trying to resize /dev/home_move/home_move_tmp
Please run 'e2fsck -fy /dev/home_move/home_move_tmp' to fix the filesystem
after the aborted resize operation.
# e2fsck -fy /dev/home_move/home_move_tmp
e2fsck 1.42.11 (09-Jul-2014)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
home_move_tmp: 570/533729280 files (0.0% non-contiguous), 2020753972/4269802000 blocks

Now I have run out of options.
Anyway - it might be helpful:

# dumpe2fs -h /dev/home_move/home_move_tmp
dumpe2fs 1.42.11 (09-Jul-2014)
Filesystem volume name: home_move_tmp
Last mounted on: /mnt/home_move_tmp
Filesystem UUID: 80bf7fcd-eaed-4366-a6f8-8fa4c5f01b12
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr dir_index filetype meta_bg extent 64bit flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 533729280
Block count: 4269802000
Reserved block count: 0
Free blocks: 2249048028
Free inodes: 533728710
First block: 0
Block size: 4096
Fragment size: 4096
Group descriptor size: 64
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 4096
Inode blocks per group: 256
RAID stride: 64
RAID stripe width: 384
First meta block group: 2037
Flex block group size: 16
Filesystem created: Wed Jul 23 09:33:04 2014
Last mount time: Sun Aug 3 15:30:51 2014
Last write time: Sun Aug 3 16:20:53 2014
Mount count: 0
Maximum mount count: -1
Last checked: Sun Aug 3 16:20:53 2014
Check interval: 0 (<none>)
Lifetime writes: 17 TB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 28
Desired extra isize: 28
Journal inode: 8
Default directory hash: half_md4
Directory Hash Seed: e2945912-8b0f-4cc1-bdb6-d67ed74c31f3
Journal backup: inode blocks
Journal features: journal_incompat_revoke journal_64bit
Journal size: 128M
Journal length: 32768
Journal sequence: 0x0001098f
Journal start: 0

-- System Information:
Debian Release: jessie/sid
APT prefers unstable
APT policy: (500, 'unstable'), (500, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 3.14-2-amd64 (SMP w/6 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages e2fsprogs depends on:
ii e2fslibs 1.42.11-2
ii libblkid1 2.20.1-5.8
ii libc6 2.19-7
ii libcomerr2 1.42.11-2
ii libss2 1.42.11-2
ii libuuid1 2.20.1-5.8
ii util-linux 2.20.1-5.8

e2fsprogs recommends no packages.

Versions of packages e2fsprogs suggests:
pn e2fsck-static <none>
ii gpart 0.1h-11+b1
ii parted 3.1-4

-- no debconf information

--
To UNSUBSCRIBE, email to debian-bugs-...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org

Theodore Ts'o

unread,

Aug 3, 2014, 3:00:01 PM8/3/14

to

Could you try running the command:

resize2fs -d 255 /dev/home_move/home_move_tmp 4269801500 | bzip2 > /tmp/resize-debug.bz2

and then send me output of the debug file?

Thanks!!

- Ted

Theodore Ts'o

unread,

Aug 3, 2014, 6:00:02 PM8/3/14

to

On Sun, Aug 03, 2014 at 11:10:57PM +0200, Marcin Wolcendorf wrote:
>
> Wow, that was fast! Thanks!

>
> On Sun, Aug 03, 2014 at 02:23:19PM -0400, Theodore Ts'o wrote:
> > Could you try running the command:
> >
> > resize2fs -d 255 /dev/home_move/home_move_tmp 4269801500 | bzip2 > /tmp/resize-debug.bz2
> >
> > and then send me output of the debug file?

Hmm. OK, that had nothing useful in it. It looks like the resize
completely successfully, and since I didn't ask you to also redirect
stderr, it's not clear where the write error is coming from.

Let's try this. Run using "script"

% script /tmp/test.log
% export TEST_IO_FLAGS=8191
% resize2fs -d 255 /dev/home_move/home_move_tmp 4269801500
% exit

Then send me the /tmp/test.log file compressed.

Thanks,

Theodore Ts'o

unread,

Aug 3, 2014, 7:20:01 PM8/3/14

to

Hmm. I still don't see anything obviously wrong. The write requests
that resize2fs is apparently sending out look completely reasonable.
So I don't know what it's complaining about.

OK, how about this:

strace -o /tmp/strace.out resize2fs /dev/home_move/home_move_tmp 4269801500

And then send me a compressed copy of /tmp/strace.out.

Stupid question --- you've checked your kernel logs and the kernel
isn't reporting any I/O errors from the hard drive or complaints from
devicemapper, right?

Theodore Ts'o

unread,

Aug 4, 2014, 9:10:05 AM8/4/14

to

On Mon, Aug 04, 2014 at 08:41:25AM +0200, Marcin Wolcendorf wrote:
> It is funny, because when I try to shrink it to 4269800000, the e2fsck
> complains about something

If resize2fs aborts with an error, it's not surprising that e2fsck
will have some things to clean up. So that's not at all surprising.

Looking at your strace output, the error is coming from this write:

lseek(3, 4096, SEEK_SET) = 4096
write(3, "\366\7\0\0\6\10\0\0\26\10\0\0\0\0\365\17\2\0\4\0\0\0\0\0\0\0\0\0\365\17o\361"..., 8343552) = 8339456

8343552 == 8148k (2037 4k blocks)
8339456 == 8144k (2036 4k blocks)

This corresponds to this request:

Test_io: write_blk64(1, 2037)

which is where we are updating the block group descriptors. For some
reason, the kernel is reporting that it only successfully written 2036
blocks, instead of the 2037 blocks in the block group descriptors.
When writing to hard drives, the kernel should only report a short
write when there is an I/O error. Or at least, that's what the code
in lib/ext2fs/unix_io.c is assuming, and in general, it's always held
true.

What's interesting is that the corresponding read worked just fine at
the beginning of the strace:

read(3, "\366\7\0\0\6\10\0\0\26\10\0\0\0\0\365\17\2\0\4\0\0\0\0\0\0\0\0\0\365\17o\361"..., 8343552) = 8343552

So it's unlikely to be a hardware error, which makes sense since you
reported that there was no problems that you found in
/var/log/messages or dmesg.

So the only thing I can think of at this point is that it's a bug in
in devicemapper, where if a write spans a stripe, it's returning the
results in two reads, instead of one.

Hmm. Ok, can you give me the outputs to the following two commands,
run as root:

vgdisplay home_move
lvdisplay --maps /dev/home_move/home_move_tmp

Also, can you try compiling and running the following program?

- Ted

/*
* test-read.c
*/

#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <fcntl.h>

#define SIZE 8343552
unsigned char buf[SIZE];

int main(int argc, char **argv)
{
int fd;
ssize_t actual, left;
unsigned char *cp;

if (argc != 2) {
fprintf(stderr, "Usage: %s device\n", argv[0]);
exit(1);
}
fd = open(argv[1], O_RDWR);
if (fd < 0) {
perror("open");
exit(1);
}
if (lseek(fd, 4096, SEEK_SET) == (off_t) -1) {
perror("lseek");
exit(1);
}
actual = read(fd, buf, SIZE);
if (actual != SIZE) {
if (actual < 0)
perror("read");
else
fprintf(stderr, "%d bytes read, expected %d\n",
actual, SIZE);
exit(1);
}
if (lseek(fd, 4096, SEEK_SET) == (off_t) -1) {
perror("lseek 2");
exit(1);
}
actual = write(fd, buf, SIZE);
if (actual == SIZE) {
printf("successful write, finishing\n");
exit(0);
}
if (actual < 0) {
perror("write");
exit(1);
}
if (actual > SIZE) {
fprintf(stderr, "%d bytes written, expected %d ?!?\n",
actual, SIZE);
exit(1);
}
left = SIZE - actual;
cp = buf + actual;
actual = write(fd, cp, left);
if (actual != left) {
if (actual < 0)
perror("2nd write");
else
fprintf(stderr, "%d bytes write, expected %d (2)\n",
actual, SIZE);
exit(1);
}
printf("Secondary write succeeded\n");
exit(0);

Theodore Ts'o

unread,

Aug 4, 2014, 6:10:03 PM8/4/14

to

On Mon, Aug 04, 2014 at 08:37:10PM +0200, Marcin Wolcendorf wrote:
> > Hmm. Ok, can you give me the outputs to the following two commands,
> > run as root:
> >
> > vgdisplay home_move
> > lvdisplay --maps /dev/home_move/home_move_tmp
>

> Attached test_lvmVGLV.log.bz2

Hmm, it looks like you are using nested LVM's. Which is wierd, but I
don't think that should cause problem. Can you send me the results of

vgdisplay main
lvdisplay --maps /dev/main/home_tmp

>As far as I understand the code - you have tried to do just that. And, as far
>as I can understand the result - it succeeded, and with resize2fs it does not.
>Could it be gcc optimisation related?
>Should I try different compile options?

Yes, that's what my test program was trying to do. I was trying to
create a short reproduction of what was apparently happening according
to strace. And I'm completely puzzled why it might be succeeding in
the small test program, and not in resize2fs.

About the only differences that I can see is that (a) we're opening
the file with the O_EXCL file, which shouldn't be making a huge
difference:

open("/dev/home_move/home_move_tmp", O_RDWR|O_EXCL) = 3

(b) that we're doing two ioctl's which are querying information about
the block device (so it shouldn't be modifying any state in the file
descriptor)

ioctl(3, BLKDISCARDZEROES, 0) = 0
ioctl(3, BLKROGET, 0) = 0

and (c) there are many other intervening read, write, and fsync system
calls between the read and write calls in the test program.

So I'm not sure what to do at this point. I could have e2fsprogs
retry short writes much like the "secondary write" in the test
program. The downside is that while it might fix things for you, for
others, in the case where we have a real I/O error, it doubles the
delay in reporting the error (especially since the device driver often
retries multiple times before it declares an error, and if we retry
the write, the driver will then retry the failed I/O operation
multiple times --- and for each kernel dispatched request, the HDD
will often retry the write multiple times).

It's clear from the strace that the kernel is reporting a short write,
which is a bug. But if we can't reproduce the bug in a short program,
then (a) we can't easily report this bug to the device mapper kernel
developers, and (b) I can't even be sure that the workaround is
guaranteed to work.

I could send you instructions on how to build a patched e2fsprogs to
see if the workaround works, but even if it does, I'm very hesitant
about whether this is something I would be willing to check in.

- Ted

Theodore Ts'o

unread,

Aug 4, 2014, 11:00:01 PM8/4/14

to

On Tue, Aug 05, 2014 at 01:06:42AM +0200, Marcin Wolcendorf wrote:
>
> Is it possible to somehow extract the FS structures so it would be easier to
> experiment with this? I mean - if the FS structures take less space, I could
> store them and reclaim those puny 16TB; maybe some sparse image file would do?
> I'd be quite happy to help here, but I need guidance.
> Still, I would like to clean up all this mess I'm in - I (hopefully) have some
> data on this FS, that I would like to move away, but without resize2fs I do not
> have enough space.

Yes, you can --- see the "RAW IMAGE FILES" section of the e2image
program. Unfortunately, ext4 doesn't support logical file lengths >
4TB, so in order to use this I would need to expand the sparse file
onto an xfs filesystem, but that's OK, I've done that sort of thing
before.

That being said, I'm pretty sure the problem will evaporate once we
move it to a raw image file --- the short write is something which is
fundamental to the block device, not the file system image.

> > I could send you instructions on how to build a patched e2fsprogs to
> > see if the workaround works, but even if it does, I'm very hesitant
> > about whether this is something I would be willing to check in.
>

> As I said, I'd be happy to investigate some more, but then again - I'd like to
> have my disk space back, and preferably not by removing the home_move_tmp.
> So if it is possible, I would like the patched resize2fs (or instructions to
> patch it), and maybe some instructions how to dump/preserve the current FS
> structures for further investigation.

Ok, do the following:

git clone -b retry-write git://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git e2fsprogs-retry
cd e2fsprogs-retry
./configure
make -j8

Then try using the resize/resize2fs and see if that works any better for you.

If you can try to save a compressed raw image file, that would be
nice, but I expect the problem is device specific, as I mentioned
earlier.

Cheers,

Marcin Wolcendorf

unread,

Aug 8, 2014, 12:10:02 PM8/8/14

to

On Mon, Aug 04, 2014 at 10:48:28PM -0400, Theodore Ts'o wrote:
> That being said, I'm pretty sure the problem will evaporate once we
> move it to a raw image file --- the short write is something which is
> fundamental to the block device, not the file system image.

OK, so you suggest I report this bug to the kernel mainainters?

> > > I could send you instructions on how to build a patched e2fsprogs to
>

> Ok, do the following:
>
> git clone -b retry-write git://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git e2fsprogs-retry
> cd e2fsprogs-retry
> ./configure
> make -j8
>
> Then try using the resize/resize2fs and see if that works any better for you.

Actually, it does not... The result is pretty much the same:
# e2fsprogs-retry/resize/resize2fs /dev/home_move/home_move_tmp 4269800000
resize2fs 1.42.11 (09-Jul-2014)
Resizing the filesystem on /dev/home_move/home_move_tmp to 4269800000 (4k) blocks.
e2fsprogs-retry/resize/resize2fs: Attempt to write block to filesystem resulted in short write while trying to resize /dev/home_move/home_move_tmp

Please run 'e2fsck -fy /dev/home_move/home_move_tmp' to fix the filesystem
after the aborted resize operation.

> If you can try to save a compressed raw image file, that would be
> nice, but I expect the problem is device specific, as I mentioned
> earlier.

Should I attach it to this bug report? It took a few days to make it...

So summing it up - there is probably a bug in block device code, that reports a
short write, where there should be normal one, and as a result shrinking the FS
is no longer possible. That is sad...
But you have also said that you use the O_EXCL option. Maybe this casues some
problems? Is it safe (assuming nothing else uses the block device) to remove
this option and try without it?

BR,

M.W.

signature.asc

Marcin Wolcendorf

unread,

Aug 8, 2014, 4:10:01 PM8/8/14

to

On Mon, Aug 04, 2014 at 10:48:28PM -0400, Theodore Ts'o wrote:
> Ok, do the following:
>
> git clone -b retry-write git://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git e2fsprogs-retry
>
>

I had a look at the code just a few lines after your change, and something struck me. The code starts at line 258:
258 while (size > 0) {
259 if (size < channel->block_size) {
260 actual = read(data->dev, data->bounce,
261 channel->block_size);
262 if (actual != channel->block_size) {
263 retval = EXT2_ET_SHORT_READ;
264 goto error_out;
265 }
266 }
267 actual = size;
268 if (size > channel->block_size)
269 actual = channel->block_size;
270 memcpy(data->bounce, buf, actual);
271 actual = write(data->dev, data->bounce, channel->block_size);
272 if (actual != channel->block_size)
273 goto short_write;
274 size -= actual;
275 buf += actual;
276 }

Now - I have no idea, if the buffer/size is aligned, but if it is not, then the issue might repeat.
Should the same retry patch be put here, somwhere around line 272?

BR,

M.W.

signature.asc

Theodore Ts'o

unread,

Aug 8, 2014, 5:00:06 PM8/8/14

to

On Fri, Aug 08, 2014 at 06:05:27PM +0200, Marcin Wolcendorf wrote:
> On Mon, Aug 04, 2014 at 10:48:28PM -0400, Theodore Ts'o wrote:
> > That being said, I'm pretty sure the problem will evaporate once we
> > move it to a raw image file --- the short write is something which is
> > fundamental to the block device, not the file system image.
>
> OK, so you suggest I report this bug to the kernel mainainters?

Well, the problem is we can't seem to come up with a simple
reproduction case. Also, if this isn't a bleeding edge / mainline
kernel, LKML isn't probably going to be able to help you. And I'm not
sure what the Debian maintainers would do without a reproduction case.

> But you have also said that you use the O_EXCL option. Maybe this casues some
> problems? Is it safe (assuming nothing else uses the block device) to remove
> this option and try without it?

Well, it's a safety mechanism, but so long as no one else uses the
block device, yes it's safe. I'd be really surprised if it made a
difference though.

Something else you could try that might even faster is to add O_EXCL
to the test program and see if it helps us get to a reproducer.

Theodore Ts'o

unread,

Aug 8, 2014, 5:10:01 PM8/8/14

to

On Fri, Aug 08, 2014 at 09:58:33PM +0200, Marcin Wolcendorf wrote:
> On Mon, Aug 04, 2014 at 10:48:28PM -0400, Theodore Ts'o wrote:
> > Ok, do the following:
> >
> > git clone -b retry-write git://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git e2fsprogs-retry
> >
> >
>

> Now - I have no idea, if the buffer/size is aligned, but if it is not, then the issue might repeat.
> Should the same retry patch be put here, somwhere around line 272?

It shouldn't matter since the alignment issue only matters if direct
I/O was enabled, which wouldn't be the case for resize2fs.

Could you try grabbing the strace with this retry-write version of resize2fs?

Theodore Ts'o

unread,

Aug 9, 2014, 1:20:02 PM8/9/14

to

On Sat, Aug 09, 2014 at 03:42:53PM +0200, Marcin Wolcendorf wrote:
> >
> > Could you try grabbing the strace with this retry-write version of resize2fs?
>

Thanks, that found the problem! The hint was in the strace, a report
that second write failed with an EFAULT.

I've pushed out a fix to the e2fsprogs git repository, in the branch
fix-for-756922.

Can you give that a try? Use the e2fsck found in that version to make
sure the file system is sane, and then use resize2fs to shrink the
file system. Hopefully that will fix things.

Marcin Wolcendorf

unread,

Aug 10, 2014, 5:30:02 AM8/10/14

to

On Sat, Aug 09, 2014 at 01:13:49PM -0400, Theodore Ts'o wrote:
> On Sat, Aug 09, 2014 at 03:42:53PM +0200, Marcin Wolcendorf wrote:
> > >
> > > Could you try grabbing the strace with this retry-write version of resize2fs?
> >
>
> Thanks, that found the problem! The hint was in the strace, a report
> that second write failed with an EFAULT.

:) Happy to help.

> I've pushed out a fix to the e2fsprogs git repository, in the branch
> fix-for-756922.
>
> Can you give that a try? Use the e2fsck found in that version to make
> sure the file system is sane, and then use resize2fs to shrink the
> file system. Hopefully that will fix things.

I tried it, and it worked! :) Impressive job!
Already minimised the FS and on my way to remove the Frankenstein's monster.

Thanks a lot. :)

BR,

M.W.

signature.asc