error: [Errno 32] Broken pipe

4,316 views
Skip to first unread message

Sid

unread,
Oct 8, 2010, 12:23:47 AM10/8/10
to boto-users, amit....@gmail.com
We have an ajax upload that uploads files to our django app, that
uploads to S3 storage via Boto. We keep seeing error: [Errno 32]
Broken pipe generated in connections.py(or errno 110 Connection timed
out), with the traceback is pasted below.


* The error is intermittent, and happens at seemingly random times.
We've been testing various factors over the past week.
* It is not a filesize issue, since sometimes we're to upload larger
files(2-20+ MB), but have smaller files fail(less than 2MB).
* Tried setting boto connection is_secure to both True,False so that
it uses an http/https connection. But it still fails on both of them.
* We've don't seem to have any other network, connectivity issues with
S3. We do our static media sync to s3 and that has never failed in
over a year. Infact just before upload everytime the s3storage checks
for collision of filenames with a GET and that has never failed. So
i'm pretty sure it's not a connectivity issue.

* When failing the control always hangs at connection.py:

response = sender(connection, method, path, data, headers)

and then times out(after 15 minutes approx) and retries 4 more times
in quick succession. I've only seen the connection timed out error the
first time(but even that is a broken pipe usually), the others are
always Broken pipe.

* We thought it might have been an apache issue and upon research we
came across some other solutions that suggested keepalive off. We've
tried that and it still happens.
* We have spun-up 4 different EC2 instances and tried our app on all
of them, on different ubuntu versions (jaunty, lucid) figuring it
might be a version issue.
* We've also tried boto v1.9b, boto v2.02b and as git checkout of
2.0b3 which was as recent as some hours ago, with the same errors.
* We also tried using a completely different S3 account/buckets to no
avail.
* At this point majority of our uploads fail, with django generating
an error email on the offending line pasted above.


Also quite often we see a RequestTimeTooSkewed Error from S3, and our
server time now is synced via ntp every 5 mins so it's certainly not a
timing issue as we thought earlier. But it might be related to the
above 15 minute timeouts, which exceed Amazons request time limits.


Our Stack:
Ubuntu Lucid
django 1.1.1
Apache/2.2.16
mod_ssl/2.2.14
OpenSSL/0.9.8k
mod_wsgi/2.8
Python/2.6.5

----------------------------------------
I added a callback to set_contents_from_file ``cb`` kwarg to show me
how much of the file was getting transfered. It usually used to be
stuck at 0bytes when we were testing with is_secure=True. We switched
to http and our latest traceback shows some data did make it through

Traceback:
[Thu Oct 07 23:53:07 2010] [error] Canonical: PUT
[Thu Oct 07 23:53:07 2010] [error] I9b0xynC0Er2hBKKp1xfpQ==
[Thu Oct 07 23:53:07 2010] [error] application/pdf
[Thu Oct 07 23:53:07 2010] [error] Fri, 08 Oct 2010 03:53:07 GMT
[Thu Oct 07 23:53:07 2010] [error] /media.qa.some_app.com/media/dmt/
2010/wcdoedgqnh/Amendment%20No.%202%20to%20Registration%20Statement
%20on%20Form%20S-1.pdf___
[Thu Oct 07 23:53:07 2010] [error] Method: PUT
[Thu Oct 07 23:53:07 2010] [error] Path: /media/dmt/2010/wcdoedgqnh/
Amendment%20No.%202%20to%20Registration%20Statement%20on%20Form
%20S-1.pdf___
[Thu Oct 07 23:53:07 2010] [error] Data:
[Thu Oct 07 23:53:07 2010] [error] Headers: {'Content-MD5':
'I9b0ab2hBKKp1xfpQ==', 'Content-Length': '2127937', 'Expect': '100-
Continue', 'Date': 'Fri, 08 Oct 2010 03:53:07 GMT', 'Expires': 'Sun,
04 Oct 2020 23:53:07 GMT', 'Content-Type': 'application/pdf',
'Authorization': 'AWS AKIAIZI47OHXASDADZUZNUA:RkID
+Ry8/4bclktjUnljbau5s9s=', 'User-Agent': 'Boto/2.0b3 (linux2)'}
[Thu Oct 07 23:53:07 2010] [error] Host:
media.qa.some_app.com.s3.amazonaws.com
[Thu Oct 07 23:53:07 2010] [error] Callback from S3 call: 0 bytes
sent, 2127937 bytes left
[Thu Oct 07 23:53:07 2010] [error] =============
[Fri Oct 08 00:02:27 2010] [error] Traceback (most recent call last):
[Fri Oct 08 00:02:27 2010] [error] File "/home/ubuntu/.virtualenvs/
some_app/src/boto/boto/connection.py", line 414, in _mexe
[Fri Oct 08 00:02:27 2010] [error] response = sender(connection,
method, path, data, headers)
[Fri Oct 08 00:02:27 2010] [error] File "/home/ubuntu/.virtualenvs/
some_app/src/boto/boto/s3/key.py", line 457, in sender
[Fri Oct 08 00:02:27 2010] [error] http_conn.send(l)
[Fri Oct 08 00:02:27 2010] [error] File "/usr/lib/python2.6/
httplib.py", line 755, in send
[Fri Oct 08 00:02:27 2010] [error] self.sock.sendall(str)
[Fri Oct 08 00:02:27 2010] [error] File "<string>", line 1, in
sendall
[Fri Oct 08 00:02:27 2010] [error] error: [Errno 110] Connection timed
out
[Fri Oct 08 00:02:27 2010] [error] encountered error exception,
reconnecting
[Fri Oct 08 00:02:27 2010] [error] establishing HTTP connection
[Fri Oct 08 00:02:28 2010] [error] Callback from S3 call: 0 bytes
sent, 143550 bytes left
[Fri Oct 08 00:02:28 2010] [error] =============
[Fri Oct 08 00:02:28 2010] [error] Callback from S3 call: 16384 bytes
sent, 143550 bytes left
[Fri Oct 08 00:02:28 2010] [error] =============
[Fri Oct 08 00:02:28 2010] [error] Callback from S3 call: 32768 bytes
sent, 143550 bytes left
[Fri Oct 08 00:02:28 2010] [error] =============
[Fri Oct 08 00:02:28 2010] [error] Callback from S3 call: 49152 bytes
sent, 143550 bytes left
[Fri Oct 08 00:02:28 2010] [error] =============
[Fri Oct 08 00:02:28 2010] [error] Callback from S3 call: 65536 bytes
sent, 143550 bytes left
[Fri Oct 08 00:02:28 2010] [error] =============
[Fri Oct 08 00:02:28 2010] [error] Callback from S3 call: 81920 bytes
sent, 143550 bytes left
[Fri Oct 08 00:02:28 2010] [error] =============
[Fri Oct 08 00:02:28 2010] [error] -------------------------
[Fri Oct 08 00:02:28 2010] [error] Traceback (most recent call last):
[Fri Oct 08 00:02:28 2010] [error] File "/home/ubuntu/.virtualenvs/
some_app/src/boto/boto/connection.py", line 414, in _mexe
[Fri Oct 08 00:02:28 2010] [error] response = sender(connection,
method, path, data, headers)
[Fri Oct 08 00:02:28 2010] [error] File "/home/ubuntu/.virtualenvs/
some_app/src/boto/boto/s3/key.py", line 457, in sender
[Fri Oct 08 00:02:28 2010] [error] http_conn.send(l)
[Fri Oct 08 00:02:28 2010] [error] File "/usr/lib/python2.6/
httplib.py", line 755, in send
[Fri Oct 08 00:02:28 2010] [error] self.sock.sendall(str)
[Fri Oct 08 00:02:28 2010] [error] File "<string>", line 1, in
sendall
[Fri Oct 08 00:02:28 2010] [error] error: [Errno 32] Broken pipe
[Fri Oct 08 00:02:28 2010] [error] encountered error exception,
reconnecting
[Fri Oct 08 00:02:28 2010] [error] establishing HTTP connection
[Fri Oct 08 00:02:30 2010] [error] Callback from S3 call: 0 bytes
sent, 143550 bytes left
[Fri Oct 08 00:02:30 2010] [error] =============
[Fri Oct 08 00:02:30 2010] [error] Callback from S3 call: 16384 bytes
sent, 143550 bytes left
[Fri Oct 08 00:02:30 2010] [error] =============
[Fri Oct 08 00:02:30 2010] [error] Callback from S3 call: 32768 bytes
sent, 143550 bytes left
[Fri Oct 08 00:02:30 2010] [error] =============
[Fri Oct 08 00:02:30 2010] [error] Callback from S3 call: 49152 bytes
sent, 143550 bytes left
[Fri Oct 08 00:02:30 2010] [error] =============
[Fri Oct 08 00:02:30 2010] [error] Callback from S3 call: 65536 bytes
sent, 143550 bytes left
[Fri Oct 08 00:02:30 2010] [error] =============
[Fri Oct 08 00:02:30 2010] [error] -------------------------
[Fri Oct 08 00:02:30 2010] [error] Traceback (most recent call last):
[Fri Oct 08 00:02:30 2010] [error] File "/home/ubuntu/.virtualenvs/
some_app/src/boto/boto/connection.py", line 414, in _mexe
[Fri Oct 08 00:02:30 2010] [error] response = sender(connection,
method, path, data, headers)
[Fri Oct 08 00:02:30 2010] [error] File "/home/ubuntu/.virtualenvs/
some_app/src/boto/boto/s3/key.py", line 457, in sender
[Fri Oct 08 00:02:30 2010] [error] http_conn.send(l)
[Fri Oct 08 00:02:30 2010] [error] File "/usr/lib/python2.6/
httplib.py", line 755, in send
[Fri Oct 08 00:02:30 2010] [error] self.sock.sendall(str)
[Fri Oct 08 00:02:30 2010] [error] File "<string>", line 1, in
sendall
[Fri Oct 08 00:02:30 2010] [error] error: [Errno 32] Broken pipe
[Fri Oct 08 00:02:30 2010] [error] encountered error exception,
reconnecting
[Fri Oct 08 00:02:30 2010] [error] establishing HTTP connection
[Fri Oct 08 00:02:34 2010] [error] Callback from S3 call: 0 bytes
sent, 143550 bytes left
[Fri Oct 08 00:02:34 2010] [error] =============
[Fri Oct 08 00:02:34 2010] [error] Callback from S3 call: 16384 bytes
sent, 143550 bytes left
[Fri Oct 08 00:02:34 2010] [error] =============
[Fri Oct 08 00:02:34 2010] [error] Callback from S3 call: 32768 bytes
sent, 143550 bytes left
[Fri Oct 08 00:02:34 2010] [error] =============
[Fri Oct 08 00:02:34 2010] [error] Callback from S3 call: 49152 bytes
sent, 143550 bytes left
[Fri Oct 08 00:02:34 2010] [error] =============
[Fri Oct 08 00:02:34 2010] [error] Callback from S3 call: 65536 bytes
sent, 143550 bytes left
[Fri Oct 08 00:02:34 2010] [error] =============
[Fri Oct 08 00:02:34 2010] [error] -------------------------
[Fri Oct 08 00:02:34 2010] [error] Traceback (most recent call last):
[Fri Oct 08 00:02:34 2010] [error] File "/home/ubuntu/.virtualenvs/
some_app/src/boto/boto/connection.py", line 414, in _mexe
[Fri Oct 08 00:02:34 2010] [error] response = sender(connection,
method, path, data, headers)
[Fri Oct 08 00:02:34 2010] [error] File "/home/ubuntu/.virtualenvs/
some_app/src/boto/boto/s3/key.py", line 457, in sender
[Fri Oct 08 00:02:34 2010] [error] http_conn.send(l)
[Fri Oct 08 00:02:34 2010] [error] File "/usr/lib/python2.6/
httplib.py", line 755, in send
[Fri Oct 08 00:02:34 2010] [error] self.sock.sendall(str)
[Fri Oct 08 00:02:34 2010] [error] File "<string>", line 1, in
sendall
[Fri Oct 08 00:02:34 2010] [error] error: [Errno 32] Broken pipe
[Fri Oct 08 00:02:34 2010] [error] encountered error exception,
reconnecting
[Fri Oct 08 00:02:34 2010] [error] establishing HTTP connection
[Fri Oct 08 00:02:42 2010] [error] Callback from S3 call: 0 bytes
sent, 143550 bytes left
[Fri Oct 08 00:02:42 2010] [error] =============
[Fri Oct 08 00:02:42 2010] [error] Callback from S3 call: 16384 bytes
sent, 143550 bytes left
[Fri Oct 08 00:02:42 2010] [error] =============
[Fri Oct 08 00:02:42 2010] [error] Callback from S3 call: 32768 bytes
sent, 143550 bytes left
[Fri Oct 08 00:02:42 2010] [error] =============
[Fri Oct 08 00:02:42 2010] [error] Callback from S3 call: 49152 bytes
sent, 143550 bytes left
[Fri Oct 08 00:02:42 2010] [error] =============
[Fri Oct 08 00:02:42 2010] [error] Callback from S3 call: 65536 bytes
sent, 143550 bytes left
[Fri Oct 08 00:02:42 2010] [error] =============
[Fri Oct 08 00:02:42 2010] [error] -------------------------
[Fri Oct 08 00:02:42 2010] [error] Traceback (most recent call last):
[Fri Oct 08 00:02:42 2010] [error] File "/home/ubuntu/.virtualenvs/
some_app/src/boto/boto/connection.py", line 414, in _mexe
[Fri Oct 08 00:02:42 2010] [error] response = sender(connection,
method, path, data, headers)
[Fri Oct 08 00:02:42 2010] [error] File "/home/ubuntu/.virtualenvs/
some_app/src/boto/boto/s3/key.py", line 457, in sender
[Fri Oct 08 00:02:42 2010] [error] http_conn.send(l)
[Fri Oct 08 00:02:42 2010] [error] File "/usr/lib/python2.6/
httplib.py", line 755, in send
[Fri Oct 08 00:02:42 2010] [error] self.sock.sendall(str)
[Fri Oct 08 00:02:42 2010] [error] File "<string>", line 1, in
sendall
[Fri Oct 08 00:02:42 2010] [error] error: [Errno 32] Broken pipe
[Fri Oct 08 00:02:42 2010] [error] encountered error exception,
reconnecting
[Fri Oct 08 00:02:42 2010] [error] establishing HTTP connection
[Fri Oct 08 00:02:58 2010] [error] Callback from S3 call: 0 bytes
sent, 143550 bytes left
[Fri Oct 08 00:02:58 2010] [error] =============
[Fri Oct 08 00:02:58 2010] [error] Callback from S3 call: 16384 bytes
sent, 143550 bytes left
[Fri Oct 08 00:02:58 2010] [error] =============
[Fri Oct 08 00:02:58 2010] [error] Callback from S3 call: 32768 bytes
sent, 143550 bytes left
[Fri Oct 08 00:02:58 2010] [error] =============
[Fri Oct 08 00:02:58 2010] [error] Callback from S3 call: 49152 bytes
sent, 143550 bytes left
[Fri Oct 08 00:02:58 2010] [error] =============
[Fri Oct 08 00:02:58 2010] [error] Callback from S3 call: 65536 bytes
sent, 143550 bytes left
[Fri Oct 08 00:02:58 2010] [error] =============
[Fri Oct 08 00:02:58 2010] [error] Callback from S3 call: 81920 bytes
sent, 143550 bytes left
[Fri Oct 08 00:02:58 2010] [error] =============
[Fri Oct 08 00:02:58 2010] [error] -------------------------
[Fri Oct 08 00:02:58 2010] [error] Traceback (most recent call last):
[Fri Oct 08 00:02:58 2010] [error] File "/home/ubuntu/.virtualenvs/
some_app/src/boto/boto/connection.py", line 414, in _mexe
[Fri Oct 08 00:02:58 2010] [error] response = sender(connection,
method, path, data, headers)
[Fri Oct 08 00:02:58 2010] [error] File "/home/ubuntu/.virtualenvs/
some_app/src/boto/boto/s3/key.py", line 457, in sender
[Fri Oct 08 00:02:58 2010] [error] http_conn.send(l)
[Fri Oct 08 00:02:58 2010] [error] File "/usr/lib/python2.6/
httplib.py", line 755, in send
[Fri Oct 08 00:02:58 2010] [error] self.sock.sendall(str)
[Fri Oct 08 00:02:58 2010] [error] File "<string>", line 1, in
sendall
[Fri Oct 08 00:02:58 2010] [error] error: [Errno 32] Broken pipe
[Fri Oct 08 00:02:58 2010] [error] encountered error exception,
reconnecting
[Fri Oct 08 00:02:58 2010] [error] establishing HTTP connection
[Fri Oct 08 00:03:30 2010] [error] Traceback (most recent call last):
[Fri Oct 08 00:03:30 2010] [error] File "/home/ubuntu/.virtualenvs/
some_app/lib/python2.6/site-packages/django/core/handlers/base.py",
line 92, in get_response
[Fri Oct 08 00:03:30 2010] [error] response = callback(request,
*callback_args, **callback_kwargs)
[Fri Oct 08 00:03:30 2010] [error] File "/home/ubuntu/webapps/
some_app/common/decorators.py", line 13, in new_func
[Fri Oct 08 00:03:30 2010] [error] return view_func(request,
*args, **kwargs)
[Fri Oct 08 00:03:30 2010] [error] File "/home/ubuntu/webapps/
some_app/common/decorators.py", line 24, in new_func
[Fri Oct 08 00:03:30 2010] [error] return view_func(request,
*args, **kwargs)
[Fri Oct 08 00:03:30 2010] [error] File "/home/ubuntu/webapps/
some_app/common/decorators.py", line 36, in new_func
[Fri Oct 08 00:03:30 2010] [error] return view_func(request,
*args, **kwargs)
[Fri Oct 08 00:03:30 2010] [error] File "/home/ubuntu/.virtualenvs/
some_app/lib/python2.6/site-packages/django/contrib/auth/
decorators.py", line 78, in __call__
[Fri Oct 08 00:03:30 2010] [error] return self.view_func(request,
*args, **kwargs)
[Fri Oct 08 00:03:30 2010] [error] File "/home/ubuntu/webapps/
some_app/app/profiles/views/expert.py", line 512, in upload_avatar
[Fri Oct 08 00:03:30 2010] [error] user_profile = form.save()
[Fri Oct 08 00:03:30 2010] [error] File "/home/ubuntu/.virtualenvs/
some_app/lib/python2.6/site-packages/django/forms/models.py", line
407, in save
[Fri Oct 08 00:03:30 2010] [error] fail_message, commit,
exclude=self._meta.exclude)
[Fri Oct 08 00:03:30 2010] [error] File "/home/ubuntu/.virtualenvs/
some_app/lib/python2.6/site-packages/django/forms/models.py", line 78,
in save_instance
[Fri Oct 08 00:03:30 2010] [error] instance.save()
[Fri Oct 08 00:03:30 2010] [error] File "/home/ubuntu/webapps/
some_app/app/profiles/models.py", line 123, in save
[Fri Oct 08 00:03:30 2010] [error] super(UserProfile,
self).save(*args, **kwargs)
[Fri Oct 08 00:03:30 2010] [error] File "/home/ubuntu/.virtualenvs/
some_app/lib/python2.6/site-packages/django/db/models/base.py", line
410, in save
[Fri Oct 08 00:03:30 2010] [error]
self.save_base(force_insert=force_insert, force_update=force_update)
[Fri Oct 08 00:03:30 2010] [error] File "/home/ubuntu/.virtualenvs/
some_app/lib/python2.6/site-packages/django/db/models/base.py", line
473, in save_base
[Fri Oct 08 00:03:30 2010] [error] values = [(f, None, (raw and
getattr(self, f.attname) or f.pre_save(self, False))) for f in
non_pks]
[Fri Oct 08 00:03:30 2010] [error] File "/home/ubuntu/.virtualenvs/
some_app/lib/python2.6/site-packages/django/db/models/fields/
files.py", line 252, in pre_save
[Fri Oct 08 00:03:30 2010] [error] file.save(file.name, file,
save=False)
[Fri Oct 08 00:03:30 2010] [error] File "/home/ubuntu/webapps/
some_app/lib/thumbs.py", line 103, in save
[Fri Oct 08 00:03:30 2010] [error] super(ImageWithThumbsFieldFile,
self).save(name, content, save)
[Fri Oct 08 00:03:30 2010] [error] File "/home/ubuntu/.virtualenvs/
some_app/lib/python2.6/site-packages/django/db/models/fields/
files.py", line 91, in save
[Fri Oct 08 00:03:30 2010] [error] self.name =
self.storage.save(name, content)
[Fri Oct 08 00:03:30 2010] [error] File "/home/ubuntu/webapps/
some_app/lib/storage/s3.py", line 135, in save
[Fri Oct 08 00:03:30 2010] [error] return
super(S3HashFilenameStorage, self).save(filename, content)
[Fri Oct 08 00:03:30 2010] [error] File "/home/ubuntu/webapps/
some_app/lib/storage/s3.py", line 97, in save
[Fri Oct 08 00:03:30 2010] [error]
new_key.set_contents_from_file(content, headers=headers,
cb=self._set_contents_from_file_callback)
[Fri Oct 08 00:03:30 2010] [error] File "/home/ubuntu/.virtualenvs/
some_app/src/boto/boto/s3/key.py", line 618, in set_contents_from_file
[Fri Oct 08 00:03:30 2010] [error] self.send_file(fp, headers, cb,
num_cb)
[Fri Oct 08 00:03:30 2010] [error] File "/home/ubuntu/.virtualenvs/
some_app/src/boto/boto/s3/key.py", line 510, in send_file
[Fri Oct 08 00:03:30 2010] [error] sender=sender)
[Fri Oct 08 00:03:30 2010] [error] File "/home/ubuntu/.virtualenvs/
some_app/src/boto/boto/s3/connection.py", line 399, in make_request
[Fri Oct 08 00:03:30 2010] [error]
override_num_retries=override_num_retries)
[Fri Oct 08 00:03:30 2010] [error] File "/home/ubuntu/.virtualenvs/
some_app/src/boto/boto/connection.py", line 500, in make_request
[Fri Oct 08 00:03:30 2010] [error] override_num_retries)
[Fri Oct 08 00:03:30 2010] [error] File "/home/ubuntu/.virtualenvs/
some_app/src/boto/boto/connection.py", line 465, in _mexe
[Fri Oct 08 00:03:30 2010] [error] raise e
[Fri Oct 08 00:03:30 2010] [error] error: [Errno 32] Broken pipe
[Fri Oct 08 00:03:30 2010] [error]
[Fri Oct 08 00:03:32 2010] [info] [client 98.14.140.241]
(104)Connection reset by peer: core_output_filter: writing data to the
network
[Fri Oct 08 00:03:32 2010] [error] [client 98.14.140.241] mod_wsgi
(pid=27186): Exception occurred processing WSGI script '/home/ubuntu/
webapps/some_app/settings/apache/qa.wsgi.py'., referer:
https://qa.some_app.com/profiles/edit/
[Fri Oct 08 00:03:32 2010] [error] [client 98.14.140.241] IOError:
failed to write data, referer: https://qa.some_app.com/profiles/edit/

Mitchell Garnaat

unread,
Oct 8, 2010, 7:49:28 AM10/8/10
to boto-...@googlegroups.com
Hi -

Thanks for the detailed write up.  One quick follow up question:  Are you uploading multiple files in separate threads?  Or is this all one file at a time in the parent process?

Mitch


--
You received this message because you are subscribed to the Google Groups "boto-users" group.
To post to this group, send email to boto-...@googlegroups.com.
To unsubscribe from this group, send email to boto-users+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/boto-users?hl=en.


Sid

unread,
Oct 8, 2010, 9:59:13 AM10/8/10
to boto-users
The above traceback is for just *one* file uploaded via the browser.
But we've also tested multiple files uploads in different tabs, and i
think that has a higher probability to fail. I thought it might be a
threading issue, so i tried running apache using the -X flag.

$> sudo apache2ctl -X

" -X : debug mode (only one worker, do not detach)"

and it was still happening. But on htop i still see multiple items for
apache, like a dozen with different PID... which i thought was wierd
if it was only one worker? Since its random it's made it difficult to
run pdb, on it. Whenever i tried adding a set_trace, on debug mode...
the error never seems to happen. So its a definite "Heisenbug" :-)

-Sid
> ...
>
> read more »

Mitchell Garnaat

unread,
Oct 8, 2010, 12:09:12 PM10/8/10
to boto-...@googlegroups.com
You can use boto in a threaded environment but you must make sure that each thread has it's own connection object, in this case an S3Connection, because the underlying httplib.py is not threadsafe.

I'm not sure if this is something you can easily test but I'm wondering what happens if you just fire up a python shell and try to send a bunch of files to S3, outside of the context of Django/apache/etc.  I know that I do this all of the time and I never really encounter this error so I suspect it is related to the web application context in which it is being run.  It would be good to confirm that, though.

Mitch


--

Chris Moyer

unread,
Oct 8, 2010, 12:22:09 PM10/8/10
to boto-...@googlegroups.com
I have actually seen similar cases to this when uploading a lot of
files. I can't confirm if it only happens when in multi-threaded
environments or not, but I think it's really a matter of how little it
happens you'll only see it if you're uploading a crap-load.

I also noticed this usually only happens with LARGE files, i.e. files
over 300MB. Is this also your issue or is it happening with small
files as well?

We should probably think about adding in some retry logic in _mexe for
this kind of thing. Technically speaking Broken Pipes are bound to
happen, network failure is inevitable. Even if this situation is
explained elsewhere, I think a simple retry in there may be a good
idea anyway.(I don't mean simple as "we should have been doing that
all along!" but more of we shouldn't try TOO hard). Perhaps retry like
2-3 times and then give up?

Maybe that doesnt' belong in that code at all though, maybe it should
be up a layer in your application, but my thinking was that this kind
of thing is inevitable to happen so we could handle it at the library
level too. Obviously httplib doesn't though, so that's up to our
interpretation i guess 8^)

--
Chris Moyer

Michael Miller

unread,
Oct 8, 2010, 12:36:15 PM10/8/10
to boto-...@googlegroups.com
I've been continuously uploading large numbers of files to S3 (2-4 GB each) and can confirm a few key lessons:

1) Every worker thread needs to have it's own connection

2) Sometimes for what are likely network reasons the file uploads break giving this exception, a simple application retry does the trick. I've found putting in a sleep(0.1) helps between retries.

3) Amazon seems to do some "learning" about your connection pattern. I don't quite understand it, but trying to keep a large number of persistent http connections didn't seem to be the most efficient, so I resorted to just opening fresh connections for each large file upload I was doing. The overhead of an md5 and stream on a 2 GB file far outweighs the overhead of opening / closing a connection.

-Mike

Sid

unread,
Oct 8, 2010, 1:06:04 PM10/8/10
to boto-users
Thanks michael for the quick responses and also Chris for his input.

Also i do noticed that it never happens when i run the django
devserver, and not apache. So i have been giving the threading issue
some thought, but i must say i don't understand how that interaction
works just yet. Let me research that angle and stuff like
http://code.google.com/p/modwsgi/wiki/ConfigurationGuidelines#Defining_Process_Groups
. But i dont know yet how to get S3 connection to be separate for each
thread, or if if there is a way to control that somehow. I will look
up more on this and get back to the thread later today.

Also, just a point to note. We weren't able to replicate the problem
when we switched to only using HTTP, instead of ssl on our site. But
since the problem is random , i can't say for sure if it won't happen
in that case. Will look up that angle so that if it happens, so i can
atleast reject the hypothesis that it might be a problem with uploads/
ssl, because i've seen some random forum posts on that.

>We should probably think about adding in some retry logic in _mexe for
>this kind of thing. Technically speaking Broken Pipes are bound to
>happen, network failure is inevitable
I've already looked at _mexe, the retry logic already seems to be
present, right? Unless broken pipe is a special case that needs to be
handled differently. Otherwise i have been seeing it retry after the
first failure(timed out in 15 mins ), but the next retries are really
quick. The default retry is 5 times. Also i really do believe this
might not be a network connectivity issue. We run our db backup
scripts, other media backup, as a cron job and those have never failed
in over a year and those have used boto too. So something is unique
about our file upload via the app.

Also we thought it might be a load issue due to multiple uploads, cron
jobs at the same time, so we stopped all our cronjobs and we were just
uploading 1 file at a time on QA and it still seemed to be happening.
So now Michael mentioned httplib not being threadsafe, that might be
something to delve deeper into. During testing restarting apache
usually stops that problem from happening again for a while, but after
half a dozen files it starts happening intermittently.


-Sid
> ...
>
> read more »

Chris Moyer

unread,
Oct 8, 2010, 1:24:28 PM10/8/10
to boto-...@googlegroups.com
>>We should probably think about adding in some retry logic in _mexe for
>>this kind of thing. Technically speaking Broken Pipes are bound to
>>happen, network failure is inevitable
> I've already looked at _mexe, the retry logic already seems to be
> present, right? Unless broken pipe is a special case that needs to be
> handled differently. Otherwise i have been seeing it retry after the
> first failure(timed out in 15 mins ), but the next retries are really
> quick. The default retry is 5 times. Also i really do believe this
> might not be a network connectivity issue. We run our db backup
> scripts, other media backup, as a cron job and those have never failed
> in over a year and those have used boto too. So something is unique
> about our file upload via the app.

Yes, broken pipes are different. Its a "socket type error" instead of
an "HTTP type error". IMO httplib handles it poorly.


--
Chris Moyer

Sid

unread,
Oct 10, 2010, 5:16:14 PM10/10/10
to boto-users
To help debug this issue i'd added explicit timeout=10 to
HTTPConnection/HTTPSConnection in connections.py. Not sure if that's
backward compatible with older version.
http://github.com/sidmitra/boto/commit/b979e71e66efe96353fa7cda9497cb578e9cdd39

The reason i added the timeout was simply to make the debugging
process quicker and not have to wait 15 minutes to see the exception.
But strangely, this issue hasn't occured after that and its been 3
days now. I did see the timeout a couple of times(not errno 110
though, but due to my explicit timeout), but it recovers on the next
retry automatically, which wasn't happening before. Will keep an eye
out on the logs though, but for now i'm considering this as closed.

mitch

unread,
Dec 6, 2010, 9:52:54 AM12/6/10
to boto-users
Hi -

While implementing the S3 Multipart Upload code over the weekend, I
came across an issue that may be related to this.

I was consistently getting a "Broken Pipe" error when attempting to
upload a part. I could see the retry mechanism doing it's thing but
at the end of the process, the only error that popped out was "Broken
Pipe". I was stumped for a while and decided to try looking at the
actual HTTP traffic between AWS and boto. It was then that I realized
that the actual error that was being returned by AWS was a 403 caused
by a signature mismatch. However, that error was completely masked.

I think what's happening is that when the request is a PUT or POST,
AWS issues the error response but for some reason httplib is ignoring
it and attempting to send the data anyway. This then causes the
"Broken Pipe" error because AWS has already closed the connection due
to the error.

I'm not sure how to fix this. At this point, I'm not even sure who's
to blame. But I think that's what's happening.

Mitch

On Oct 10, 4:16 pm, Sid <sidmitra....@gmail.com> wrote:
> To help debug this issue i'd added explicit timeout=10 to
> HTTPConnection/HTTPSConnection in connections.py. Not sure if that's
> backward compatible with older version.http://github.com/sidmitra/boto/commit/b979e71e66efe96353fa7cda9497cb...
>
> The reason i added the timeout was simply to make the debugging
> process quicker and not have to wait 15 minutes to see the exception.
> But strangely, this issue hasn't occured after that and its been 3
> days now.   I did see the timeout a couple of times(not errno 110
> though, but due to my explicit timeout), but it recovers on the next
> retry automatically, which wasn't happening before. Will keep an eye
> out on the logs though, but for now i'm considering this as closed.
>
> On Oct 8, 10:24 pm, Chris Moyer <koper...@gmail.com> wrote:
>
>
>
>
>
>
>
> > >>We should probably think about adding in some retry logic in _mexe for
> > >>this kind of thing. Technically speakingBrokenPipes are bound to
> > >>happen, network failure is inevitable
> > > I've already looked at _mexe, the retry logic already seems to be
> > > present, right? Unlessbrokenpipeis a special case that needs to be
> > > handled differently. Otherwise i have been seeing it retry after the
> > > first failure(timed out in 15 mins ), but the next retries are really
> > > quick. The default retry is 5 times. Also i really do believe this
> > > might not be a network connectivity issue. We run our db backup
> > > scripts, other media backup, as a cron job and those have never failed
> > > in over a year and those have used boto too. So something is unique
> > > about our file upload via the app.
>
> > Yes,brokenpipes are different. Its a "socket type error" instead of

Nick Barendt

unread,
Jan 10, 2011, 3:46:39 PM1/10/11
to boto-users
Are there any updates on the "Broken Pipe" error?

I'm using 2.0b3 and in general things seem to be working okay, but
occasionally (and apparently randomly) we're getting the "Broken Pipe"
error on uploads to S3.
Typically medium-sized files, in the range of 50MB to 200MB.

I have verified that the "Broken Pipe" error does occur in single-
threaded environment (e.g., from an interactive Python shell).

Any thoughts or debugging directions to proceed would be appreciated.

Thanks,

-Nick

Mitchell Garnaat

unread,
Jan 10, 2011, 3:59:46 PM1/10/11
to boto-...@googlegroups.com
I would love to see what's actually going over the wire when you get the Broken Pipe error.  Any chance you could run through a proxy for a while to try to capture the traffic?

Mitch

Nick Barendt

unread,
Jan 10, 2011, 4:10:18 PM1/10/11
to boto-...@googlegroups.com

  That was my next step, but hoping to avoid it :-(
  I'll do some testing with a proxy and report what I find.

Thanks,

-Nick

Mitch Garnaat

unread,
Jan 10, 2011, 4:15:37 PM1/10/11
to boto-...@googlegroups.com
Thanks.  I know it's a hassle nut I think it's the only way we are going to get to the bottom of it.

Mitch

Nick Barendt

unread,
Jan 11, 2011, 11:24:36 AM1/11/11
to boto-...@googlegroups.com

  Any suggestions on proxies to use?
  I'm running on an EC2 instance.
  I've tried Charles on another instance and configuring boto to use that instance as its proxy.  Charles is capturing most operations, but not the PUTs and GETs for some reason (the most interesting transactions for debugging this of course).
  I've tried tcpdump on the instance, but the kernel is dropping lots of packets, unsurprisingly.
  Any feedback appreciated.

Thanks,

-Nick

Nick Barendt

unread,
Jan 11, 2011, 9:12:38 PM1/11/11
to boto-...@googlegroups.com

  A short update on the "broken pipe" error.  I was unable to capture the transaction with a proxy, but I was tantalizingly close.
  Up against a deadline, I tried the hack of creating a new S3 connection for every operation, and that seems to have made the problem go away.  I had one instance running the older code (using a persistent connection), and another instance running the newer (single-use connections) and the "old code" instance would periodically generate the Broken Pipe error while the "new code" instance would not (both pulling jobs from the same SQS queue, so the operations that the "old code" instance threw exceptions on were eventually redelivered to the "new code" instance which was able to get the job done.).  I removed the patch on the "new code" instance and it started failing in the same way as before.
  From looking at the captures I was able to get, I'm starting to wonder (with no proof, just a hunch) if there is some issue related to the persistent HTTP connection.
  I don't have time right now unfortunately, to dig any deeper into the problem, or the boto code :-(  We'll limp along with the single-use connections for at least the near future.

-Nick

Sid

unread,
Jul 25, 2011, 9:50:35 AM7/25/11
to boto-...@googlegroups.com, ni...@barendt.com
This problem has now popped up again after i upgraded Boto to the new version from git. I'd added a http connection timeout argument, which i guess is now an official argument via the boto config file. So i upgraded(since i needed SES support). I added a timeout to the boto.cfg file, i see the value being picked up correctly(by printing it in the module). But now all the errors i used to get, when i originally created this topic have come back and all the same symptoms.

Will explore more today and update this thread.

Sid

unread,
Jul 29, 2011, 8:34:43 AM7/29/11
to boto-...@googlegroups.com
Never mind. PEBKAC.

The boto config wasn't readable. Anyways in case people are still getting Broken pipe a lot, a stop gap solution that's worked for me over the past year is to add a timeout like so:

#/etc/boto.cfg
[Boto]
http_socket_timeout=10

I haven't gotten any upload errors since then.

Paul Wiseman

unread,
May 6, 2012, 2:12:28 PM5/6/12
to boto-...@googlegroups.com
I was having broken pipe errors using multipart, i found out it was from setting reduced_redundancy=true on upload_part_from_file method of MultiPartUpload. I needed to pass reduced_redundancy=true instead to bucket.initiate_multipart_upload and ensure reduced_redundancy was false on upload_part_from_file to get rrs over multipart. Just incase anyone else is searching the same things i was when having this problem :)
Reply all
Reply to author
Forward
0 new messages