error in connecting to s3 when using proxy

1,677 views
Skip to first unread message

Matt Billenstein

unread,
Nov 22, 2010, 3:19:51 PM11/22/10
to boto-...@googlegroups.com
I'm seeing an error when connecting to s3 using a proxy (squid):

push@server:~$ http_proxy="http://user@password@10.0.0.2:3128" python test.py

Traceback (most recent call last):
File "test.py", line 12, in <module>
bucket = conn.create_bucket('foo.vazor.com')
File "/usr/lib/pymodules/python2.6/boto/s3/connection.py", line 314,
in create_bucket
data=data)
File "/usr/lib/pymodules/python2.6/boto/s3/connection.py", line 342,
in make_request
data, host, auth_path, sender)
File "/usr/lib/pymodules/python2.6/boto/connection.py", line 459, in
make_request
return self._mexe(method, path, data, headers, host, sender)
File "/usr/lib/pymodules/python2.6/boto/connection.py", line 386, in
_mexe
connection = self.get_http_connection(host, self.is_secure)
File "/usr/lib/pymodules/python2.6/boto/connection.py", line 288, in
get_http_connection
return self.new_http_connection(host, is_secure)
File "/usr/lib/pymodules/python2.6/boto/connection.py", line 298, in
new_http_connection
connection = self.proxy_ssl()
File "/usr/lib/pymodules/python2.6/boto/connection.py", line 351, in
proxy_ssl
sslSock = httplib.ssl.SSLSocket(sock)
File "/usr/lib/python2.6/ssl.py", line 118, in __init__
self.do_handshake()
File "/usr/lib/python2.6/ssl.py", line 293, in do_handshake
self._sslobj.do_handshake()
ssl.SSLError: [Errno 8] _ssl.c:480: EOF occurred in violation of
protocol

Here is my test script:


import boto

access_key = 'access_key_here'
secret_access_key = 'secret_access_key_here'
conn = boto.connect_s3(
aws_access_key_id = access_key,
aws_secret_access_key = secret_access_key,
)

bucket = conn.create_bucket('foo.vazor.com')


A curl to an https website using the same proxy seems to work so I think
that piece is working properly... Anyone have any ideas what might be
going on here?

thx

Matt

--
Matt Billenstein
ma...@vazor.com
http://www.vazor.com/

Mitchell Garnaat

unread,
Nov 22, 2010, 3:33:02 PM11/22/10
to boto-...@googlegroups.com
Hi -

You will have to pass in args to the S3Connection constructor for proxy, proxy_port, proxy_user, and proxy_pass.

Mitch

On Mon, Nov 22, 2010 at 3:19 PM, Matt Billenstein <ma...@vazor.com> wrote:
I'm seeing an error when connecting to s3 using a proxy (squid):

push@server:~$ http_proxy="http://user@pass...@10.0.0.2:3128" python test.py

--
You received this message because you are subscribed to the Google Groups "boto-users" group.
To post to this group, send email to boto-...@googlegroups.com.
To unsubscribe from this group, send email to boto-users+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/boto-users?hl=en.


Matt Billenstein

unread,
Nov 22, 2010, 3:55:01 PM11/22/10
to boto-...@googlegroups.com
It parses this out of the environment no? I can see a connection on the
proxy using the command line 'http_proxy="..." python test.py'

m

On Mon, Nov 22, 2010 at 03:33:02PM -0500, Mitchell Garnaat wrote:
> Hi -
> You will have to pass in args to the S3Connection constructor for proxy,
> proxy_port, proxy_user, and proxy_pass.
> Mitch
>

> On Mon, Nov 22, 2010 at 3:19 PM, Matt Billenstein <[1]ma...@vazor.com>


> wrote:
>
> I'm seeing an error when connecting to s3 using a proxy (squid):
>

> push@server:~$ http_proxy="[2]http://user@[3]pass...@10.0.0.2:3128"


> python test.py
>
> Traceback (most recent call last):

> *File "test.py", line 12, in <module>
> * *bucket = conn.create_bucket('[4]foo.vazor.com')
> *File "/usr/lib/pymodules/python2.6/boto/s3/connection.py", line 314,
> in create_bucket
> * *data=data)
> *File "/usr/lib/pymodules/python2.6/boto/s3/connection.py", line 342,
> in make_request
> * *data, host, auth_path, sender)
> *File "/usr/lib/pymodules/python2.6/boto/connection.py", line 459, in
> make_request
> * *return self._mexe(method, path, data, headers, host, sender)
> *File "/usr/lib/pymodules/python2.6/boto/connection.py", line 386, in
> _mexe
> * *connection = self.get_http_connection(host, self.is_secure)
> *File "/usr/lib/pymodules/python2.6/boto/connection.py", line 288, in
> get_http_connection
> * *return self.new_http_connection(host, is_secure)
> *File "/usr/lib/pymodules/python2.6/boto/connection.py", line 298, in
> new_http_connection
> * *connection = self.proxy_ssl()
> *File "/usr/lib/pymodules/python2.6/boto/connection.py", line 351, in
> proxy_ssl
> * *sslSock = httplib.ssl.SSLSocket(sock)
> *File "/usr/lib/python2.6/ssl.py", line 118, in __init__
> * *self.do_handshake()
> *File "/usr/lib/python2.6/ssl.py", line 293, in do_handshake
> * *self._sslobj.do_handshake()


> ssl.SSLError: [Errno 8] _ssl.c:480: EOF occurred in violation of
> protocol
>
> Here is my test script:
>
> import boto
>
> access_key = 'access_key_here'
> secret_access_key = 'secret_access_key_here'
> conn = boto.connect_s3(

> * * * *aws_access_key_id * * * = access_key,
> * * * *aws_secret_access_key * = secret_access_key,
> )
>
> bucket = conn.create_bucket('[5]foo.vazor.com')


>
> A curl to an https website using the same proxy seems to work so I think

> that piece is working properly... *Anyone have any ideas what might be


> going on here?
>
> thx
>
> Matt
>
> --
> Matt Billenstein

> [6]ma...@vazor.com
> [7]http://www.vazor.com/


> --
> You received this message because you are subscribed to the Google
> Groups "boto-users" group.

> To post to this group, send email to [8]boto-...@googlegroups.com.


> To unsubscribe from this group, send email to

> [9]boto-users+...@googlegroups.com.


> For more options, visit this group at

> [10]http://groups.google.com/group/boto-users?hl=en.


>
> --
> You received this message because you are subscribed to the Google Groups
> "boto-users" group.
> To post to this group, send email to boto-...@googlegroups.com.
> To unsubscribe from this group, send email to
> boto-users+...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/boto-users?hl=en.
>

> References
>
> Visible links
> 1. mailto:ma...@vazor.com
> 2. http://user/
> 3. http://pass...@10.0.0.2:3128/
> 4. http://foo.vazor.com/
> 5. http://foo.vazor.com/
> 6. mailto:ma...@vazor.com
> 7. http://www.vazor.com/
> 8. mailto:boto-...@googlegroups.com
> 9. mailto:boto-users%2Bunsu...@googlegroups.com
> 10. http://groups.google.com/group/boto-users?hl=en

Jeff Garbers

unread,
Nov 22, 2010, 4:11:37 PM11/22/10
to boto-...@googlegroups.com
Matt - please post back if you have success or failures with Mitch's suggestion. I've been noticing
some problems with Squid and Microsoft Forefront TMG access to S3 and SQS, including the
"EOF" one you hit. I have two or three patches nearly ready to submit, but I can pass them along
to you if you continue to have trouble -- maybe you can confirm that they address your problem.

Mitch, when they're ready, should I just send you those fixes, or post them here, or what?

Matt Billenstein

unread,
Nov 22, 2010, 4:20:13 PM11/22/10
to boto-...@googlegroups.com
On Mon, Nov 22, 2010 at 04:11:37PM -0500, Jeff Garbers wrote:
> Matt - please post back if you have success or failures with Mitch's
> suggestion. I've been noticing some problems with Squid and Microsoft
> Forefront TMG access to S3 and SQS, including the "EOF" one you hit. I
> have two or three patches nearly ready to submit, but I can pass them
> along to you if you continue to have trouble -- maybe you can confirm
> that they address your problem.

Please send them along -- I failed to mention I'd tried explicitely
passing the proxy params in the s3 connect before I sent that first
message with the same results...

m

> Mitch, when they're ready, should I just send you those fixes, or post them here, or what?
>
> On Nov 22, 2010, at 3:33 PM, Mitchell Garnaat wrote:
>
> > You will have to pass in args to the S3Connection constructor for proxy, proxy_port, proxy_user, and proxy_pass.
> >
> > Mitch
>

Mitchell Garnaat

unread,
Nov 22, 2010, 4:55:00 PM11/22/10
to boto-...@googlegroups.com
Yes, it will look for an environment variable called "http_proxy" and parse the various settings from that.  Or, you can also specify it in the boto config file.

Mitch

Mitchell Garnaat

unread,
Nov 22, 2010, 4:56:44 PM11/22/10
to boto-...@googlegroups.com
I'll take patches any way I can get them 8^)

Under the circumstances, posting the patches here would probably make sense so Matt can give them a try.  If they seem to be working, I'll be happy to incorporate them into the github master.

Mitch

Jeff Garbers

unread,
Nov 22, 2010, 5:25:22 PM11/22/10
to boto-...@googlegroups.com
Okay, here are the fixes I've been working on -- this is right-out-of-the-oven stuff
that I haven't even tested with the Boto tests yet, so be careful. 

I'll explain them here rather than just post diffs.  I'm working from boto 2.0b3, by the way,
but if you're using something earlier you should be able to find the relevant bits there.

1) One problem (maybe Matt's?) is caused by the use of the "Expect" header, which 
was added to Boto to  support large (>2.1G, if I remember the comment correctly) S3 items.
Squid doesn't support this header -- it doesn't care much for HTTP/1.1, apparently -- 
and rejects  your request with a status code  of 417  (Expection Failed).  Boto doesn't 
deal with the 417. It just tries to parse Squid's nice HTML error page as AWS XML... to 
no good end.

The fix I'm working on detects the 417 condition and retries the request with the 'Expect' 
header removed, but that's a bit long to post yet. It also keeps track of the fact
that it saw the 417 and doesn't use the 'Except' again for that connection.

For now, Matt, you could just try commenting out this line in s3/key.py:

        headers['Expect'] = '100-Continue'

and see if that helps. Assuming you're not posting S3 objects over 2.1G, this might 
get you back up and running.

2) The second issue was caused by a spurious linefeed in the 'Proxy-Authorization'
header value that broke proxy requests.  Either Squid or AWS interpreted
the extra LF as a blank line, causing the 'User-Agent' and, more importantly,
the AWS 'Authorization' headers to be ignored. With no auth header, 
AWS just redirects to a supporting Web page, again providing HTML
that Boto doesn't really want to see.

This one took some serious packet-level snooping to find.

Around line 380 in boto/connection.py, you want to add .strip() to remove the
extra LF generated by base64.encodestring -- so it should be like this:

    def get_proxy_auth_header(self):
        auth = base64.encodestring(self.proxy_user+':'+self.proxy_pass).strip()

(If your proxy userid and password are over about 130 characters together, you'd 
need to use .replace('\n','') instead of .strip() to get rid of LFs that encodestring leaves 
in the *middle* of the string as well as the end.  But we seem to use strip() everywhere
else so I left it this way for consistency.)

3) The last problem -- and I'm still working on this one -- comes in around line 474 of
boto/connection.py, where we do this:

        if self.use_proxy:
            path = self.prefix_proxy_to_path(path, host)

This one is related to Microsoft's Forefront TMG, not Squid, so it's probably not your
issue. But it appears that when you're accessing an HTTPS resource through the
proxy -- and the proxy is tunneling for you -- the scheme and host parts of the URI do
NOT, as far as I can tell, need to be included in the request line.

The reason is that when you're accessing an http resource through the proxy, you send
it a request line like

because the proxy needs the hostname of the resource you're trying to reach. However,
if you're accessing an https-based resource, you'll want to tunnel.  You send a CONNECT
request to the proxy, establish a secure socket directly to "example.com" (in this case), 
and then just send

GET /index.html HTTP/1.1

since at that point you're doing HTTP directly with example.com through the tunnel.

(I've only been dealing with this for a few days, so if somebody detects mistakes in
my reasoning here, please let me know.)

In any case, my fix is simple:

        if self.use_proxy:
            if not self.is_secure:
                path = self.prefix_proxy_to_path(path, host)

but unfortunately there's more to do; SQS access over HTTPS still isn't working through
Squid *or* TMG if that fix is in place.

Phew. Hope all this helps -- these have been hard-fought fixes over the past several
days, and it'd be nice to know that *somebody* else got some benefit from it!

Mihai Ibanescu

unread,
Nov 22, 2010, 5:25:58 PM11/22/10
to boto-...@googlegroups.com
Matt,

I've seen proxies drop connections when they are overloaded, and for
no apparent reason too.

We've used with success this patch:

http://people.rpath.com/~misa/boto-ssl-reconnect.patch

If your problem is intermittent, please try the patch and let me know
if it helps.

Mitch, I don't remember if I ever submitted this patch - I am sure I
didn't since you took all my patches so far :-/ I am not very proud of
it, ideally it should wait a random amount of time (exponential
backoff would be nice).

Mihai

Mihai Ibanescu

unread,
Nov 22, 2010, 5:35:12 PM11/22/10
to boto-...@googlegroups.com
Let's stop using encodestring() and strip(). base64.b64encode() will
do just fine.

httplib/urllib/urllib2 and friends had this problem until very
recently, and it got propagated in all sorts of other places
(including my own code).

Mihai

Jeff Garbers

unread,
Nov 22, 2010, 5:39:10 PM11/22/10
to boto-...@googlegroups.com
On Nov 22, 2010, at 5:35 PM, Mihai Ibanescu wrote:

> Let's stop using encodestring() and strip(). base64.b64encode() will
> do just fine.

Yes, that makes a lot more sense. I've made that change to my fixes and it'll be
that way when I post them (that is, after they *work*).

Matt Billenstein

unread,
Nov 22, 2010, 6:00:25 PM11/22/10
to boto-...@googlegroups.com
On Mon, Nov 22, 2010 at 05:25:22PM -0500, Jeff Garbers wrote:
> 2) The second issue was caused by a spurious linefeed in the
> 'Proxy-Authorization' header value that broke proxy requests.
> Either Squid or AWS interpreted the extra LF as a blank line,
> causing the 'User-Agent' and, more importantly, the AWS
> 'Authorization' headers to be ignored. With no auth header, AWS
> just redirects to a supporting Web page, again providing HTML that
> Boto doesn't really want to see. This one took some serious
> packet-level snooping to find. Around line 380 in
> boto/connection.py, you want to add .strip() to remove the extra LF
> generated by base64.encodestring -- so it should be like this:
>
> def get_proxy_auth_header(self):
> auth =
> base64.encodestring(self.proxy_user+':'+self.proxy_pass).strip()
>
> (If your proxy userid and password are over about 130 characters
> together, you'd need to use .replace('\n','') instead of .strip()
> to get rid of LFs that encodestring leaves in the *middle* of the
> string as well as the end. But we seem to use strip() everywhere
> else so I left it this way for consistency.)

Yes! This is the one -- oddly I don't have any issue with the Expect
header it seems though...

Thanks!

m

Matt Billenstein

unread,
Nov 22, 2010, 6:02:29 PM11/22/10
to boto-...@googlegroups.com
BTW, this behavior in base64 of appending a newline has bit me in the
ass a couple times now -- why does it do that?

m

> --
> You received this message because you are subscribed to the Google Groups "boto-users" group.
> To post to this group, send email to boto-...@googlegroups.com.
> To unsubscribe from this group, send email to boto-users+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/boto-users?hl=en.
>

--

Jeff Garbers

unread,
Nov 22, 2010, 6:52:13 PM11/22/10
to boto-...@googlegroups.com
On Nov 22, 2010, at 6:02 PM, Matt Billenstein wrote:

> BTW, this behavior in base64 of appending a newline has bit me in the
> ass a couple times now -- why does it do that?

Apparently you're not the only one - Mihai said

> Let's stop using encodestring() and strip(). base64.b64encode() will
> do just fine.
>

> httplib/urllib/urllib2 and friends had this problem until very
> recently, and it got propagated in all sorts of other places
> (including my own code).

I think encodestring() adds the newlines for MIME compliance. From the Wikipedia
page about Base64,

> MIME does not specify a fixed length for Base64-encoded lines, but it does
> specify a maximum line length of 76 characters. Additionally it specifies that
> any extra-alphabetic characters must be ignored by a compliant decoder,
> although most implementations use a CR/LF newline pair to delimit encoded
> lines.

So encodestring() is trying to help you out by keeping the b64-representation of
your string from being too long -- making it MIME-ready.

If you're using strings that are short, or if you're not going to stick the result in
a MIME message, use base64.b64encode() per Mihai's suggestion.

Jeff Garbers

unread,
Nov 22, 2010, 7:09:22 PM11/22/10
to boto-...@googlegroups.com
On Nov 22, 2010, at 6:00 PM, Matt Billenstein wrote:

> Yes! This is the one -- oddly I don't have any issue with the Expect
> header it seems though...
>
> Thanks!

Glad to help. I'm still struggling to get SQS and S3 working properly behind
both Squid and MS Forefront TMG.

The problem is that when using a proxy, Boto wants to add the scheme and
host part of the protocol to the path when building the request. This seems
to work fine for all of my tests *except* when accessing AWS over SSL
using TMG.

When connecting to AWS using regular HTTP, Boto adds the prefix and sends

> GET http://s3.amazonaws.com/ HTTP/1.1

TMG is fine with this, parsing the absolute URI and doing its proxy thing.

When connecting to AWS using HTTPS, though, Boto sends

> GET https://s3.amazonaws.com/ HTTP/1.1

and TMG responds with an empty HTML page, <HTML></HTML>, along
with a header

> Refresh: 0; URL=https://s3.amazonaws.com/

as if specifying the full URI, not just the pathname, was making TMG think
it should just redirect you.

I'm not really clear what's going on in here yet... trying to figure it all out.
If anybody has any insights I'd be delighted to hear them.

Thanks!

Mihai Ibanescu

unread,
Nov 22, 2010, 7:21:12 PM11/22/10
to boto-...@googlegroups.com
If the GET https:// is sent _after_ you established the tunnel to
amazonaws, then I doubt it's the proxy's fault. The proxy has
absolutely no way to know what's going through the tunnel.

Now, it may be that amazonaws is barfing if you are sending it a full
URL in the GET request. This is not a problem when you're going plain
HTTP, because the proxy strips it for you, but it may be a problem
when going through the tunnel.

I was unable to reproduce my theory though - I did:

telnet s3.amazonaws.com 80
GET http://s3.amazonaws.com/ HTTP/1.1
Host: s3.amazonaws.com

HTTP/1.1 307 Temporary Redirect
x-amz-id-2: brJJF38JiYRXvH2mlqm94MdOyBmmA3wIEzimhIkXTuBa/fwadvsPr5pkZrJmeMub
x-amz-request-id: D1AB97B498F5E69C
Date: Tue, 23 Nov 2010 00:20:28 GMT
Location: http://aws.amazon.com/s3
Content-Length: 0
Server: AmazonS3

so that seems reasonable.

Mihai

Jeff Garbers

unread,
Nov 22, 2010, 7:48:58 PM11/22/10
to boto-...@googlegroups.com
On Nov 22, 2010, at 7:21 PM, Mihai Ibanescu wrote:

> If the GET https:// is sent _after_ you established the tunnel to
> amazonaws, then I doubt it's the proxy's fault. The proxy has
> absolutely no way to know what's going through the tunnel.

That's what I thought, but I'd swear *somehow* TMG is getting in
the way. This may be incorrect sleuthing, but TMG uses upper-case
HTML tags in its error pages -- and the response I get back
along with that "Refresh" header is

> <HTML></HTML>

whereas I don't recall ever seeing upper-case HTML tags from
Amazon. Not a very strong clue, but a clue.

Will investigate further and report back here.

Reply all
Reply to author
Forward
0 new messages