Issue 148 in s3fs: Transport endpoint is not connected

1,129 views
Skip to first unread message

s3...@googlecode.com

unread,
Jan 28, 2011, 5:08:24 AM1/28/11
to s3fs-...@googlegroups.com
Status: New
Owner: ----
Labels: Type-Defect Priority-Medium

New issue 148 by pieter.m...@insitehosting.be: Transport endpoint is not
connected
http://code.google.com/p/s3fs/issues/detail?id=148

What steps will reproduce the problem?
No consistent reproduction yet. But I have seen this multiple times when at
least two processes are uploading many files.

What is the expected output? What do you see instead?
At some point all requests to the mounted bucket start failing. When trying
to change directory into the bucket, you get "Transport endpoint is not
connected".

What version of the product are you using? On what operating system?
v1.35 on Ubuntu 10.10

Please provide any additional information below.
I was running the following processes in parallel (same mounted bucket,
different subfolders):
wget -r -l1 -nd -Nc -A.png http://media.xiph.org/sintel/sintel-2k-png/
wget -r -l1 -nd -Nc -A.png http://media.xiph.org/BBB/BBB-1080-png/

The machine is running on EC2, so I was getting speeds of about 1-2 MB/s
for each wget.

In the meantime 4 other processes where occasionally writing to another
bucket. This bucket has no problems. The machine has 3 mounts (each to a
different bucket) in total, the third was not in use.

s3...@googlecode.com

unread,
Jan 28, 2011, 5:12:26 AM1/28/11
to s3fs-...@googlegroups.com

Comment #1 on issue 148 by pieter.m...@insitehosting.be: Transport endpoint

If you need any additional information, or any debugging steps I could
undertake: please let me know.

s3...@googlecode.com

unread,
Jan 28, 2011, 5:16:31 AM1/28/11
to s3fs-...@googlegroups.com

Comment #2 on issue 148 by pieter.m...@insitehosting.be: Transport endpoint

Hmm, just noticed that the wget's acutally seem to have succeeded. All the
files seem to be there. The processes where running overnight and this
morning I noticed the "Transport endpoint is not connected" error.

s3...@googlecode.com

unread,
Jan 28, 2011, 5:20:32 AM1/28/11
to s3fs-...@googlegroups.com

Comment #3 on issue 148 by pieter.m...@insitehosting.be: Transport endpoint

Ok, able to reproduce now!
1. Do one of the wget's (alternatively, probably works with any s3fs
mounted directory that contains enough files)
2. cd to the directory
3. ls (waits ~2 seconds, then says "ls: reading directory .: Software
caused connection abort")
4. ls (says "ls: cannot open directory .: Transport endpoint is not
connected")

So the problem sees to be listing the contents of a directory with a large
number of files (> 10.000 in this case). Could it be that s3fs does not
deal well with directories containing more than 1.000 files? (the default
for max-keys in a GET Bucket request according to
http://docs.amazonwebservices.com/AmazonS3/latest/API/)

s3...@googlecode.com

unread,
Jan 31, 2011, 6:42:27 AM1/31/11
to s3fs-...@googlegroups.com

Comment #4 on issue 148 by chrisjoh...@gmail.com: Transport endpoint is not
connected
http://code.google.com/p/s3fs/issues/detail?id=148

I have seen similar too

Initially changing the time-outs fixed this but then when more files were
sync-ed to S3 the same thing started to happen again consistently after
25-30s

As a way of testing try creating 3 or 4 thousand directories in an S3
bucket and then mounting the filesystem, do an ls on the mounted dir or try
and rsync it back and it will error

It seems that there's a timeout somewhere when listing large numbers of
files or folders from s3 which isn't overridable by an option as such, ls
and rsync type operations fail

s3...@googlecode.com

unread,
Feb 2, 2011, 2:38:19 PM2/2/11
to s3fs-...@googlegroups.com

Comment #5 on issue 148 by Sean.B.O...@gmail.com: Transport endpoint is not
connected
http://code.google.com/p/s3fs/issues/detail?id=148

Also seeing this behavior under Ubuntu 10.04 with fuse 2.8.4 and s3f 1.35
built from source.

This problem seemed to start around s3fs 1.25 for us.

If there's anything I can do to help further diagnose the problem please
let me know.

s3...@googlecode.com

unread,
Feb 2, 2011, 5:19:42 PM2/2/11
to s3fs-...@googlegroups.com

Comment #6 on issue 148 by yeoha...@gmail.com: Transport endpoint is not
connected
http://code.google.com/p/s3fs/issues/detail?id=148

For your information, I had the same issue and rolling back to s3fs 1.19
(following Sean's comment) fixes the issue.

s3...@googlecode.com

unread,
Feb 3, 2011, 11:07:47 AM2/3/11
to s3fs-...@googlegroups.com

Comment #7 on issue 148 by ben.lema...@gmail.com: Transport endpoint is not
connected
http://code.google.com/p/s3fs/issues/detail?id=148

same behavior on debian lenny s3fs 1.35/fuse 2.8.5 built from source.

s3...@googlecode.com

unread,
Feb 3, 2011, 1:25:49 PM2/3/11
to s3fs-...@googlegroups.com

Comment #8 on issue 148 by ben.lema...@gmail.com: Transport endpoint is not
connected
http://code.google.com/p/s3fs/issues/detail?id=148

syslog debug output:


Attachments:
debug_output.txt 3.5 KB

s3...@googlecode.com

unread,
Feb 3, 2011, 7:52:36 PM2/3/11
to s3fs-...@googlegroups.com

Comment #9 on issue 148 by moore...@suncup.net: Transport endpoint is not
connected
http://code.google.com/p/s3fs/issues/detail?id=148

Thanks for the debug_output -- I have a good guess that will mitigate the
issue -- since I haven't tried to reproduce the issue yet (I don't have a
bucket with 1000's of files), I'm looking for a volunteer to test the patch

...any takers?

But I don't think that the patch addresses the underlying issue though, and
this is how directory listings are done. s3fs_readdir is probably the most
complex piece of the this code and probably needs some tuning.

s3...@googlecode.com

unread,
Feb 3, 2011, 8:16:48 PM2/3/11
to s3fs-...@googlegroups.com

Comment #10 on issue 148 by ben.lema...@gmail.com: Transport endpoint is
not connected
http://code.google.com/p/s3fs/issues/detail?id=148

I'll definitely test, I've got a few buckets with ~30K files in them.

s3...@googlecode.com

unread,
Feb 3, 2011, 10:40:18 PM2/3/11
to s3fs-...@googlegroups.com

Comment #11 on issue 148 by moore...@suncup.net: Transport endpoint is not
connected
http://code.google.com/p/s3fs/issues/detail?id=148

Give this patch a try.

Attachments:
curle_couldnt_resolve_host.patch 370 bytes

s3...@googlecode.com

unread,
Feb 3, 2011, 11:56:07 PM2/3/11
to s3fs-...@googlegroups.com

Comment #12 on issue 148 by moore...@suncup.net: Transport endpoint is not
connected
http://code.google.com/p/s3fs/issues/detail?id=148

I think that the patch resolves the "transport endpoint not connected"
issue, but you'll still get input output errors on listing a directory with
A LOT of files

...can someone confirm?

s3...@googlecode.com

unread,
Feb 4, 2011, 4:34:37 AM2/4/11
to s3fs-...@googlegroups.com

Comment #13 on issue 148 by pieter.m...@insitehosting.be: Transport

I know get:
ls: reading directory .: Input/output error

Subsequently ls'ing or cd'ing to another directory timeouts after about 20
seconds:
cd: <directory name>: Input/output error

s3...@googlecode.com

unread,
Feb 4, 2011, 12:10:24 PM2/4/11
to s3fs-...@googlegroups.com

Comment #14 on issue 148 by moore...@suncup.net: Transport endpoint is not
connected
http://code.google.com/p/s3fs/issues/detail?id=148

It appears that the main contributing factor to this issue is the number of
files in a directory. Having a large number of files in a single directory
(I can't quantify "large" just yet, but it seems to be >= 1000) isn't
illegal, but the HTTP traffic that it presents when doing a directory
listing appears to cripple the file system.

I personally have never seen this issue with my buckets, but apparently the
practices that I use are not everyone's practices.

I created a bogus (to me) test case to try and duplicate the issue. The
patch above, resolves one of the initial failure points, but just pushes
the issue back further.

An understanding of how directory listings are made wrt to s3fs is
necessary to implement a fix. Briefly, a query is made to S3 asking for a
listing of objects that match a pattern (reminder, there is not a native
concept of directories in S3, that is, that's not how things are stored).
For each of the matching objects, then do another query to retrieve its
attributes. So a simple "ls" of a s3fs directory can generate A LOT of HTTP
traffic.

It appears that Randy (the original author) attempted to address
performance issues with this by using advanced methods in the CURL API.
Things are pointing to that area.

Fixing this may be an easy fix or a major rewrite, I do not know. As I
mentioned, this section of code is one of the more complex sections in
s3fs.cpp One thought that I have is to scrap the multicurl stuff and
replace it with a simpler brute force algorithm. It may fix the issue, but
the trade off with probably be performance.


s3...@googlecode.com

unread,
Feb 4, 2011, 1:32:20 PM2/4/11
to s3fs-...@googlegroups.com

Comment #15 on issue 148 by ben.lema...@gmail.com: Transport endpoint is
not connected
http://code.google.com/p/s3fs/issues/detail?id=148

Is it possible we're running into HTTP KeepAlive issues?

After enabling CURLOPT_VERBOSE (s3fs.cpp+586), output at failure includes a
line for each HTTP request, "Connection #0 to host example.s3.amazonaws.com
left intact".

It seems to make sense, modifying the 'max-keys' query parameter from 50,
to 1000 does allow more objects to be returned however, the amount of
before a failure remains the same: ~25s

$# cd /mnt/cloudfront/images
$# time ls


ls: reading directory .: Input/output error

real 0m25.357s
user 0m0.000s
sys 0m0.000s

$# cd /mnt/cloudfront/images
$# time ls


ls: reading directory .: Input/output error

real 0m26.869s
user 0m0.000s
sys 0m0.000s

$# cd /mnt/cloudfront/images
$# time ls


ls: reading directory .: Input/output error

real 0m26.274s
user 0m0.000s
sys 0m0.000s


Attachments:
debug_output.txt 1.2 KB

s3...@googlecode.com

unread,
Feb 4, 2011, 2:03:09 PM2/4/11
to s3fs-...@googlegroups.com

Comment #16 on issue 148 by ben.lema...@gmail.com: Transport endpoint is
not connected
http://code.google.com/p/s3fs/issues/detail?id=148

It looks like KeepAlive may actually be the issue, forcing the connection
to close after each request does fix the problem however, I'm not convinced
it's a solid solution as it's quite slow :\

Attached is a patch for testing.

Attachments:
forbid_reuse.patch 505 bytes

s3...@googlecode.com

unread,
Feb 4, 2011, 8:34:26 PM2/4/11
to s3fs-...@googlegroups.com

Comment #17 on issue 148 by moore...@gmail.com: Transport endpoint is not
connected
http://code.google.com/p/s3fs/issues/detail?id=148

Ben,

Great find. I tested this on an EC2 instance (well connected to S3) by
doing a directory listing of a bucket
that contains 10500 files. No I/O error -- it was dog slow, but it worked:

% time ls -l

...

real 1m42.467s
user 0m0.160s
sys 0m0.670s

I was able to lightswitch the issue by removing the CURLOPT_FORBID_REUSE
option:

$ time ll


ls: reading directory .: Input/output error

total 0


Looks like a good fix to me.

...more testing:

On my home machine (not so well connected to the internet) I tried the same
fix and did a directory listing of
the same 10500 file bucket. Again, no I/O error and the listing completed,
but it took nearly half an hour:

% date ; /bin/ls -l /mnt/s3/misc.suncup.org/ | wc -l ; date
Fri Feb 4 18:06:04 MST 2011
10503
Fri Feb 4 18:31:14 MST 2011

I'll do a subversion commit shortly. Thanks so much for you contribution.

s3...@googlecode.com

unread,
Feb 4, 2011, 9:27:08 PM2/4/11
to s3fs-...@googlegroups.com

Comment #18 on issue 148 by moore...@suncup.net: Transport endpoint is not
connected
http://code.google.com/p/s3fs/issues/detail?id=148

Pieter, please test r308 and report back. Thanks.

s3...@googlecode.com

unread,
Feb 9, 2011, 7:46:01 AM2/9/11
to s3fs-...@googlegroups.com

Comment #19 on issue 148 by pieter.m...@insitehosting.be: Transport

I'm currently out of the country and have limited internet access. I will
be able to test next week!

s3...@googlecode.com

unread,
Feb 14, 2011, 7:24:23 AM2/14/11
to s3fs-...@googlegroups.com

Comment #20 on issue 148 by pieter.m...@insitehosting.be: Transport

Ok, r308 seems to solve the problem.

s3...@googlecode.com

unread,
Feb 14, 2011, 12:21:10 PM2/14/11
to s3fs-...@googlegroups.com
Updates:
Status: Fixed

Comment #21 on issue 148 by moore...@suncup.net: Transport endpoint is not
connected
http://code.google.com/p/s3fs/issues/detail?id=148

(No comment was entered for this change.)

s3...@googlecode.com

unread,
Jul 9, 2012, 3:00:00 AM7/9/12
to s3fs-...@googlegroups.com

Comment #22 on issue 148 by K.Quiatk...@mytaxi.net: Transport endpoint is
not connected
http://code.google.com/p/s3fs/issues/detail?id=148

i still have this problem on r368.
Running on Amazon EC2 (Amazon Linux AMI)

s3...@googlecode.com

unread,
Jul 12, 2012, 6:24:27 AM7/12/12
to s3fs-...@googlegroups.com

Comment #23 on issue 148 by johnog...@gmail.com: Transport endpoint is not
connected
http://code.google.com/p/s3fs/issues/detail?id=148

I'm having the same issue with version 1.6.1 (r368) on CentOS 5.

s3...@googlecode.com

unread,
Oct 17, 2012, 12:36:43 AM10/17/12
to s3fs-...@googlegroups.com

Comment #24 on issue 148 by chris_r...@someones.com: Transport endpoint is
not connected
http://code.google.com/p/s3fs/issues/detail?id=148

We are having the same issue with version 1.6.1 (r368) on Debian Squeeze
Reverting back to 1.6.0 and trying again. Will post back with results

s3...@googlecode.com

unread,
Oct 30, 2012, 8:12:27 AM10/30/12
to s3fs-...@googlegroups.com

Comment #25 on issue 148 by gmason.x...@gmail.com: Transport endpoint is
not connected
http://code.google.com/p/s3fs/issues/detail?id=148

Am having the same issue with Centos 5.8 32 bit. Have tried s3fs 1.61 and
1.35 with the same outcome.

s3...@googlecode.com

unread,
Jan 16, 2013, 10:08:00 AM1/16/13
to s3fs-...@googlegroups.com

Comment #26 on issue 148 by ferran.m...@mmip.es: Transport endpoint is not
connected
http://code.google.com/p/s3fs/issues/detail?id=148

We are also having this issue with 1.6.1 on an Amazon EC2 AMI.
Our directory doesn't have too many files, though (200 or so).

s3...@googlecode.com

unread,
Jun 18, 2013, 12:47:05 PM6/18/13
to s3fs-...@googlegroups.com

Comment #27 on issue 148 by nickz...@gmail.com: Transport endpoint is not
connected
http://code.google.com/p/s3fs/issues/detail?id=148

Im having the same issue on latest S3fs when trying to upload to a S3
bucket, it appears is happening to folders that handles a lot of files and
folders only.

--
You received this message because this project is configured to send all
issue notifications to this address.
You may adjust your notification preferences at:
https://code.google.com/hosting/settings

s3...@googlecode.com

unread,
Jun 19, 2013, 9:27:39 PM6/19/13
to s3fs-...@googlegroups.com

Comment #28 on issue 148 by ggta...@gmail.com: Transport endpoint is not
connected
http://code.google.com/p/s3fs/issues/detail?id=148

Hi, nickzoid

I could not reappear this problem on s3fs(r449).
(I tested that many files(over 5000) copy to s3)

So that if you can, please try to use r449 and test it with
some "multireq_max" and "nodnscache" option.
** And if you have same problem yet, please post NEW ISSUE with more
information.

Thanks in advance for your assistance.
Reply all
Reply to author
Forward
0 new messages