'file' is gone after unmounting, kernel upgrade and reboot

38 views
Skip to first unread message

cpg

unread,
Feb 27, 2019, 9:35:23 AM2/27/19
to s3backer-devel
Hi,

I'm on "s3backer version 1.5.0 ()" and I just experienced something new. I cannot remount after a reboot.

No, it's not the mount flag.

I have 7 drives of a couple of sizes (80G to 150G). I saw a kernel update coming in via AWS using their 2018.03 AMI. It has worked well for a while.

In any case, after the reboot, two of the drives mount properly now and the rest exhibit an error. I tried to back out the kernel to the prior kernel and it did not help. The following packages upgraded:

krb5-libs-1.15.1-34.44.amzn1.x86_64
libcurl-7.61.1-7.91.amzn1.x86_64
libkadm5-1.15.1-34.44.amzn1.x86_64
krb5-devel-1.15.1-34.44.amzn1.x86_64
curl-7.61.1-7.91.amzn1.x86_64
libcurl-devel-7.61.1-7.91.amzn1.x86_64
kernel-4.14.97-74.72.amzn1.x86_64
kernel-headers-4.14.97-74.72.amzn1.x86_64
kernel-tools-4.14.97-74.72.amzn1.x86_64
aws-cfn-bootstrap-1.4-31.22.amzn1.noarch

The error looks like this:

# mount -o rw,noatime,loop /home/storage/mounts_s3b/pool-2/file /home/storage/mounts_ext4/pool-2/
mounting user 'pool-2' with a drive of 150GB
s3backer: auto-detecting block size and total file size...
s3backer: auto-detected block size=256k and total size=150g
mount: /home/storage/mounts_s3b/pool-2/file: failed to setup loop device: No such file or directory

Things have been working so well for such a long time that I confess I am rusty as how all this works. Any pointers will be appreciated.

-c

Archie Cobbs

unread,
Feb 27, 2019, 10:18:02 AM2/27/19
to s3backe...@googlegroups.com
This sounds like it might be a /dev/loop problem unrelated to s3backer. Have you run out of /dev/loop* files?

Try this test:

$ dd if=/dev/zero of=test.bin bs=1024 count=10240
$ mke2fs -F test.bin
$ mkdir test.mnt
$ mount -o loop test.bin test.mnt
$ ls -l test.mnt

If it works you should see a lost+found directory.

If instead you see the same error then it's some general issue with loopback mounts.

-AC

--
You received this message because you are subscribed to the Google Groups "s3backer-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to s3backer-deve...@googlegroups.com.
To post to this group, send email to s3backe...@googlegroups.com.
Visit this group at https://groups.google.com/group/s3backer-devel.
For more options, visit https://groups.google.com/d/optout.


--
Archie L. Cobbs

cpg

unread,
Feb 27, 2019, 12:04:07 PM2/27/19
to s3backer-devel
Thanks for the detailed response. What you suggested worked well. I see a lost+found file.

To be clear, there are two the drives that mount properly, even if they are mounted after others have errored out.

The others fail to mount consistently, even after a clean reboot.

I have a feeling it may have to do with date of creation. I am not 100% sure, but I think they are the two most recent ones (one created about 2 months ago, the other, I am not too sure, but much longer). The others are quite old.

Archie Cobbs

unread,
Feb 27, 2019, 12:47:27 PM2/27/19
to s3backe...@googlegroups.com
OK just to double check a couple of things...

Did you try the test.bin mount after the two drives that mount properly were already up & running?

If you shutdown all of the mounts, and try to mount one of the failing mounts, does it still fail even though none of the working ones are mounted? In other words, is it always the same ones that fail or does it just fail starting with whichever happens to be mounted third?

Any interesting in the output of dmesg(1) when this happens?

"failed to setup loop device" is an error message from the mount(8) command, not s3backer.

Really dumb question: you've confirmed that the files /home/storage/mounts_s3b/pool-2/file and /home/storage/mounts_ext4/pool-2/ actually exist?

Also see if any of the fixes mentioned these links apply to your situation...

-AC

cpg

unread,
Feb 27, 2019, 1:29:31 PM2/27/19
to s3backer-devel

> Did you try the test.bin mount after the two drives that mount properly were already up & running?

yes.

If you shutdown all of the mounts, and try to mount one of the failing mounts, does it still fail even though none of the working ones are mounted? In other words, is it always the same ones that fail or does it just fail starting with whichever happens to be mounted third?

Yes, I think my prior statement implied this, but to clarify, it's very consistent. The ones that work mount and work well, regardless of the order.

Also, I tried adding options loop max_loop=256 just in case the system was running out of loop devices (though it has had 7 s3backer devices mounted for at least two years solid). It does not seem to help.

 
Any interesting in the output of dmesg(1) when this happens?

only one -- that the second mount that works (the older one) should be checked -- "maximal mount count reached, running e2fsck is recommended", but it does not fail, it works.
 
"failed to setup loop device" is an error message from the mount(8) command, not s3backer.

Really dumb question: you've confirmed that the files /home/storage/mounts_s3b/pool-2/file and /home/storage/mounts_ext4/pool-2/ actually exist?

the directories exist. the "file" files (and "stats") ONLY exist after mounting succeeds for the two that it succeeds.
 
Also see if any of the fixes mentioned these links apply to your situation...

Thanks, I will be researching them. I have a feeling it's really related to s3 ... something may be happening, but i have to hunt that down.

Archie Cobbs

unread,
Feb 27, 2019, 1:38:41 PM2/27/19
to s3backe...@googlegroups.com
On Wed, Feb 27, 2019 at 12:29 PM cpg <carlos...@gmail.com> wrote:
 
"failed to setup loop device" is an error message from the mount(8) command, not s3backer.

Really dumb question: you've confirmed that the files /home/storage/mounts_s3b/pool-2/file and /home/storage/mounts_ext4/pool-2/ actually exist?

the directories exist. the "file" files (and "stats") ONLY exist after mounting succeeds for the two that it succeeds.

OK we need to separately test the two steps of (a) starting s3backer and (b) mounting the upper filesystem.

Instead of using mount(8), try running s3backer directly in the foreground with debug enabled... what happens?

E.g. something like:

$ s3backer -f --debug --debug-http ...otherflags... mybucket /my/dir

-AC

--
Archie L. Cobbs

cpg

unread,
Feb 27, 2019, 5:10:50 PM2/27/19
to s3backer-devel
ok, i peeled all the layers (I wrote a ruby gem to manage/automate things and had to dig around) and I got this on a failing drive:

2019-02-27 14:06:44 INFO: reading meta-data from cache file `/home/storage/block-cache-files/pool-4.file'
2019-02-27 14:06:44 ERROR: corrupted cache file: block 0x00000000 listed twice (in dslots 0 and 56)
2019-02-27 14:06:44 ERROR: block_cache creation failed: Invalid argument
2019-02-27 14:06:44 ERROR: fuse_op_init(): can't create s3backer_store: Invalid argument

in my ruby gem I have

    s3b = system("#{sudo} s3backer #{args}")
    if s3b
         # do the mount, etc.
    else
         # report error
    end

it appears that it stayed running despite the error?

Archie Cobbs

unread,
Feb 27, 2019, 8:17:15 PM2/27/19
to s3backe...@googlegroups.com
Looks like the cache file got corrupted somehow... hmm, "that shouldn't happen".

In any case you should be able to workaround this situation by simply deleting the cache files. You'll lose any blocks that were written but not yet flushed back out to S3 (analgous to a sudden power failure with a normal machine).

-Archie


--
You received this message because you are subscribed to the Google Groups "s3backer-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to s3backer-deve...@googlegroups.com.
To post to this group, send email to s3backe...@googlegroups.com.
Visit this group at https://groups.google.com/group/s3backer-devel.
For more options, visit https://groups.google.com/d/optout.


--
Archie L. Cobbs

cpg

unread,
Feb 28, 2019, 2:06:37 AM2/28/19
to s3backer-devel
Yes, that got things going back up again, with all the drives mounting.

Thanks for your help in chasing this down. I am not sure how it happened. Maybe the VM has developed some issues over time.

One improvement would be to exit with an error when this happens (even if it shoudn't, haha; I know the feeling). That would have made the scripting handle/reported it better.

Thanks again and great job on s3backer!

Archie Cobbs

unread,
Feb 28, 2019, 10:04:10 AM2/28/19
to s3backe...@googlegroups.com
On Thu, Feb 28, 2019 at 1:06 AM cpg <carlos...@gmail.com> wrote:
One improvement would be to exit with an error when this happens (even if it shoudn't, haha; I know the feeling). That would have made the scripting handle/reported it better.

Yes that's a good point. Because of the way it works now, the setup happens after FUSE has taken control so it's too late to bail out.

This should be easy to fix. I'll include a fix in the next release.

-AC

--
Archie L. Cobbs
Reply all
Reply to author
Forward
0 new messages