cp operation seems to be hanging

171 views
Skip to first unread message

veerals

unread,
Dec 9, 2010, 6:09:49 AM12/9/10
to s3ql
I guess I have hit upon a hang situation after initiating large file
copies onto an s3ql share. After installing s3ql, I initiated a file
copy operation to copy a 2.8GB file from my local lessfs share to an
s3 mount point. I later started two additional file copy operations
from separate shells. Even after two hours I dont see much progress.
It appears that all cp(s) are hanging (no file size progress on the
destination s3ql share) and I cant even kill the cp processes using
'kill -9' nor can I unmount the s3 share. Below is the ls, lsof,
umount and strace outputs. Is this a potential issue? How do I come
out of it? Is reboot the only option?

Platform: fedora-12
s3ql version: 0.26

[root@polaris-fedora ~]# s3qlstat /s3.xxxxxx/
Directory entries: 4
Inodes: 6
Data blocks: 8
Total data size: 772.12 MB
After de-duplication: 8.00 MB (1.04% of total)
After compression: 2.70 MB (0.35% of total, 33.71% of de-
duplicated)
Database size: 0.02 MB (uncompressed)
(some values do not take into account not-yet-uploaded dirty blocks in
cache)

[root@polaris-fedora ~]# ls -al /s3.xxxxx/
total 790656
-rw------- 1 root root 0 2010-12-09 15:07 9R1tt9.lessfs
-rw------- 1 root root 809631744 2010-12-09 14:41 CbpWbE.lessfs
drwx------ 1 root root 0 2010-12-09 14:33 lost+found
-rwxr--r-- 1 root root 0 2010-12-09 16:18 setup.py

root@polaris-fedora ~]# lsof /s3.xxxxx/
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
cp 18876 root cwd DIR 0,21 0 1 /s3.xxxxx
cp 18876 root 4w REG 0,21 809631744 2850337571 /s3.xxxxx/
CbpWbE.lessfs
cp 19179 root 4w REG 0,21 0 2406521073 /s3.xxxxx/
9R1tt9.lessfs
bash 19311 root cwd DIR 0,21 0 1 /s3.xxxxx
cp 19994 root cwd DIR 0,21 0 1 /s3.xxxxx
cp 19994 root 4w REG 0,21 0 151686278 /s3.xxxxx/
setup.py
bash 30099 root cwd DIR 0,21 0 1 /s3.xxxxx

[root@polaris-fedora ~]# umount /s3.cumulus-sline-1
umount: /s3.xxxxx: device is busy.
(In some cases useful info about processes that use
the device is found by lsof(8) or fuser(1))

[root@polaris-fedora ~]# ps -ef | grep s3ql
root 18858 1 0 14:38 ? 00:00:09 /usr/bin/python /usr/
bin/mount.s3ql --cachesize=1024000 --allow-root --compress=lzma s3://
xxxxx /s3.xxxxx/
root 19994 19311 0 16:18 pts/2 00:00:00 cp -i /root/
ExternalSources/s3ql-0.26/setup.py .
root 20251 20077 0 16:35 pts/5 00:00:00 grep s3ql

[root@polaris-fedora ~]# strace -p 18858
Process 18858 attached - interrupt to quit
futex(0xbfdb7b78, FUTEX_WAIT_PRIVATE, 0, NULL


Any help will be greatly appreciated

Thanks
Veeral

Nikolaus Rath

unread,
Dec 9, 2010, 8:27:13 AM12/9/10
to s3...@googlegroups.com
Hi,

S3qlctrl has a command that's called something like "dump-stacktrace". Please run this and then file a big report containing mount.log. I can send more detailed instructions later if needed.

Best,
Nikolaus

"veerals" <vee...@gmail.com> wrote:

> --
> You received this message because you are subscribed to the Google
> Groups "s3ql" group.
> To post to this group, send email to s3...@googlegroups.com.
> To unsubscribe from this group, send email to
> s3ql+uns...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/s3ql?hl=en.

--
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

Nikolaus Rath

unread,
Dec 9, 2010, 8:42:40 AM12/9/10
to s3...@googlegroups.com
To abort the cp processes, you can do "echo 1 > /proc/fs/fuse/connections/*/abort" (again, roughly like that). But please don't umount the fs yet, that would make further investigation much harder.

Nikolaus

"veerals" <vee...@gmail.com> wrote:

Nikolaus Rath

unread,
Dec 9, 2010, 9:24:26 AM12/9/10
to s3...@googlegroups.com
On 12/09/2010 06:09 AM, veerals wrote:
> I guess I have hit upon a hang situation after initiating large file
> copies onto an s3ql share. After installing s3ql, I initiated a file
> copy operation to copy a 2.8GB file from my local lessfs share to an
> s3 mount point. I later started two additional file copy operations
> from separate shells. Even after two hours I dont see much progress.
> It appears that all cp(s) are hanging (no file size progress on the
> destination s3ql share) and I cant even kill the cp processes using
> 'kill -9' nor can I unmount the s3 share. Below is the ls, lsof,
> umount and strace outputs. Is this a potential issue? How do I come
> out of it? Is reboot the only option?
>
>
> [root@polaris-fedora ~]# strace -p 18858
> Process 18858 attached - interrupt to quit
> futex(0xbfdb7b78, FUTEX_WAIT_PRIVATE, 0, NULL


Ok, I'm back at my computer. It seems that S3QL got stuck in a deadlock.
Thanks for reporting this. Here is what you should do:

First, get a stack trace. To do that, execute 'setfattr -n
fuse_stacktrace -v 1 [mountpoint]'. There is actually no s3qlctrl
command for that anymore.

This should result in quite a lot of messages in ~/.s3ql/mount.log.
Please create a new issue at http://code.google.com/p/s3ql/issues/entry
and attach mount.log to it.

To get rid of the cp processes, you have to execute

ls /sys/fs/fuse/connections/

This will list you all currently active fuse connections. Once you have
identified the correct connections (if there is only one fuse filesystem
mounted, you can just abort all of them), you have to execute

echo 1 > /sys/fs/fuse/connections/NN/abort

(where NN is the connection number) for each connection that you want to
abort.

Let me know if that works,

-Nikolaus

--
�Time flies like an arrow, fruit flies like a Banana.�

PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C

Veeral Shah

unread,
Dec 10, 2010, 12:14:28 AM12/10/10
to s3...@googlegroups.com
Hi Nikolaus,
Thanks for your response. Situation appears grim. Below observations
1) 'setfattr -n fuse_stacktrace -v 1 /s3.xxxxx/' hangs - doesnt throw
any stack trace
2) Directory /sys/fs/fuse/connections/ is empty.

I had sent Ctrl+C (SIGINT) to the cp commands on the command line
(apart from sending SIGKILL). Can this be the reason I dont see any
active connections in /sys/fs/fuse/connections directory

I guess I will need to reboot the machine ..

Nikolaus Rath

unread,
Dec 10, 2010, 8:30:07 AM12/10/10
to s3...@googlegroups.com
On 12/10/2010 12:14 AM, Veeral Shah wrote:
> Hi Nikolaus,
> Thanks for your response. Situation appears grim. Below observations
> 1) 'setfattr -n fuse_stacktrace -v 1 /s3.xxxxx/' hangs - doesnt throw
> any stack trace
> 2) Directory /sys/fs/fuse/connections/ is empty.

That is indeed rather grim, because it indicates that the problem is not
with S3QL but with your FUSE kernel module. What's your kernel version?
Do you have entries in /sys/fs/fuse/connections/ if there is a FUSE fs
mounted as long and everything else is still working fine?

> I had sent Ctrl+C (SIGINT) to the cp commands on the command line
> (apart from sending SIGKILL). Can this be the reason I dont see any
> active connections in /sys/fs/fuse/connections directory

No.

> I guess I will need to reboot the machine ..

In case you haven't done so yet, can you please 'gdb -p [mount.s3ql
PID]' and then enter "backtrace"? This should give a C level backtrace.


Best,

Veeral Shah

unread,
Dec 10, 2010, 10:37:51 AM12/10/10
to s3...@googlegroups.com
My comments inline with prefix veeral>

On Fri, Dec 10, 2010 at 7:00 PM, Nikolaus Rath <Niko...@rath.org> wrote:
<SNIP>


> with S3QL but with your FUSE kernel module. What's your kernel version?
> Do you have entries in /sys/fs/fuse/connections/ if there is a FUSE fs
> mounted as long and everything else is still working fine?

veeral> Actually NO. After i rebooted and was able to do basic file
copies (of smaller files), I didnt notice entries in
/sys/fs/fuse/connections/ (always empty). I am not at work right now,
will report the FUSE version in sometime.


>
> In case you haven't done so yet, can you please 'gdb -p [mount.s3ql
> PID]' and then enter "backtrace"? This should give a C level backtrace.

veeral> I m sorry. Had already rebooted.

veerals

unread,
Dec 13, 2010, 3:57:44 AM12/13/10
to s3ql
Nikolaus,
I have fuse-2.8.5-2 and fuse-libs-2.8.5-2.fc12.i686) installed on my
box. I guess it falls in line with the requirement, but still the /sys/
fs/fuse/connections/ directory showup empty.


On Dec 10, 8:37 pm, Veeral Shah <veer...@gmail.com> wrote:
> My comments inline with prefix veeral>
>

Nikolaus Rath

unread,
Dec 13, 2010, 10:02:57 AM12/13/10
to s3...@googlegroups.com
Hi,

What is your *kernel* version?

-Nikolaus

Veeral Shah

unread,
Dec 13, 2010, 11:52:51 PM12/13/10
to s3...@googlegroups.com
Hi Nikolaus
The kernel version is 2.6.31.5-127.fc12.i686.

thanks
veeral

Nikolaus Rath

unread,
Dec 18, 2010, 9:06:33 PM12/18/10
to s3...@googlegroups.com
Hi,

I did some research. It turns out that /sys/fs/fuse/connections/ has to be explicitly mounted. Maybe Fedora does not do that by default.

Could you check if the 'fusectl' file system is mounted? If not, you can mount it with

mount -t fusectl none /sys/fs/fuse/connections

This should finally give you the entries. You can find more details at
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.31.y.git;a=blob_plain;f=Documentation/filesystems/fuse.txt;hb=HEAD

Best,
Nikolaus

Reply all
Reply to author
Forward
0 new messages