Boto interface

18 views
Skip to first unread message

melc...@gmail.com

unread,
Mar 17, 2007, 7:40:47 AM3/17/07
to boto-users
Hi. I've started using boto's s3. The module is very nice, everything
works just fine, but i definitely dont like provided interface. It's
not very handy to use provided objects and modules.

I'd like to participate in module developement and make it more user
friendly to end user.

1. it would be very nice if boto.s3 contains all neccessary objects,
without forcing me from import seperate submodules. It's clean and
simple. boto/s3/__init__.py should be:

from connection import S3Connection as Connection
from key import Key
from bucket import Bucket

__all__ = ['Connection', 'Key', 'Bucket']

2. Key interface

Very often lines of code are:

k = key(bucket)
k.key = 'name-of-key'

User should have ability to specife key name on it initialization:

k = key(bucket, 'name_of_key')

another method of bucket would be useful:

bucket.create_key('name_of_bucket') # no need to use Key class
explicitly.

3. More 'pythonic' interfaces for accessing buckets and keys:

for bucket in s3_conn:
for key in bucket:

And more and more and more..
If Mitch are interested in such changes - i can continue describing a
better interface for boto.s3 and provide all patches implmenting this.

Thank you :)

Mitch....@gmail.com

unread,
Mar 17, 2007, 10:29:45 AM3/17/07
to boto-users
Hi -

Thanks for the feedback and I ideas. I appreciate it. See my
comments below:


On Mar 17, 7:40 am, "melcha...@gmail.com" <melcha...@gmail.com> wrote:
> Hi. I've started using boto's s3. The module is very nice, everything
> works just fine, but i definitely dont like provided interface. It's
> not very handy to use provided objects and modules.
>
> I'd like to participate in module developement and make it more user
> friendly to end user.
>
> 1. it would be very nice if boto.s3 contains all neccessary objects,
> without forcing me from import seperate submodules. It's clean and
> simple. boto/s3/__init__.py should be:
>
> from connection import S3Connection as Connection
> from key import Key
> from bucket import Bucket
>
> __all__ = ['Connection', 'Key', 'Bucket']
>

This seems reasonable. The full paths would still work so existing
code would not be impacted.

> 2. Key interface
>
> Very often lines of code are:
>
> k = key(bucket)
> k.key = 'name-of-key'
>
> User should have ability to specife key name on it initialization:
>
> k = key(bucket, 'name_of_key')

This would be easy to add and, as long as we provided a default value,
existing code would not be impacted.

>
> another method of bucket would be useful:
>
> bucket.create_key('name_of_bucket') # no need to use Key class
> explicitly.

Something very similar to this was added in the latest release. The
method is actually called new_key rather than create_key and it
doesn't accept the name of the key as a param but that can be added.


>
> 3. More 'pythonic' interfaces for accessing buckets and keys:
>
> for bucket in s3_conn:
> for key in bucket:
>

Perhaps we could use generators for this. Bitbucket included a
generator for bucket listing and it worked well. It's more efficient
because you don't actually have to grab all of the results into memory
at one time. You could also seemlessly handle the results paging from
S3 (or SQS).

> And more and more and more..
> If Mitch are interested in such changes - i can continue describing a
> better interface for boto.s3 and provide all patches implmenting this.
>
> Thank you :)

I'm always happy to accept patches and new code. Thanks!

Mitch

Ansel Halliburton

unread,
Mar 19, 2007, 12:54:44 PM3/19/07
to boto-users
I second all these suggestions. These are all fairly minor things
that bit me as I started using Boto, so I think these changes would
lower the learning curve.
:)

On Mar 17, 7:29 am, "m...@garnaat.com" <Mitch.Garn...@gmail.com>
wrote:

Mitch....@gmail.com

unread,
Mar 19, 2007, 6:34:13 PM3/19/07
to boto-users
Hi -

I just checked in some changes that address some of these. I'd be
interested in feedback. I'm not guaranteeing that what's in there
right now will be the way things end up but it's a start. Changes
include:

Added a generator function in bucket.py. This allows you to do things
like this:

>>> c = boto.connect_s3()
>>> bucket = c.find_bucket('foo')
>>> for key in bucket:
... do something interesting
>>>

The really neat thing is that it doesn't matter how many keys are in
bucket. The generator handles all of the paging behind the scenes and
just keeps returning keys until you've visiting all of them.

I also added an __iter__ method to the S3Connection class so it, too,
can behave like this. So, you can do:

>>> for bucket in c:
... do some bucket thing
>>>

I also added the key_name param (optional) to the Key constructor and
to the new_key method of the bucket object as suggested by melchakov
as well as the suggestions for the __all__ attribute for the s3
package.

These changes don't cause any incompatibilities with existing code so
you can ignore them if you want. I'd be interested in feedback,
though, because I'd like to make similar changes to the other modules.

Mitch

Let me know what you think

Ansel Halliburton

unread,
Mar 19, 2007, 6:43:17 PM3/19/07
to boto-users
Cool; I just updated and will give you some feedback tomorrow or Wed.!

On Mar 19, 3:34 pm, "m...@garnaat.com" <Mitch.Garn...@gmail.com>

Ansel Halliburton

unread,
Mar 20, 2007, 8:19:26 PM3/20/07
to boto-users
Looks good! I just switched my whole app over to use Boto's S3,
including some of these new methods, and things are working perfectly.

corvin

unread,
Mar 22, 2007, 8:25:34 AM3/22/07
to boto-users

On Mar 20, 1:34 am, "m...@garnaat.com" <Mitch.Garn...@gmail.com>
wrote:

> I also added the key_name param (optional) to the Key constructor and
> to the new_key method of the bucket object as suggested by melchakov
> as well as the suggestions for the __all__ attribute for the s3
> package.

Thanks for changes.

I think it's better to replace, or alias, for compatibility, the name
of method.

1. If there is a create_bucket - let's follow this naming rule and
have create_key.
2. Create is a verb but new is an adjective. Verb is more intuitive
for action, returning something, at least for me :)


Mitch....@gmail.com

unread,
Mar 22, 2007, 9:07:09 AM3/22/07
to boto-users
I think consistency is important but, to me, the two methods are
fundamentally different. In create_bucket, there is a side effect on
the server. In other words, you are actually creating a new bucket in
S3. However, with new_key, you are just creating a local Key object
instance which could eventually become a key in S3.

Perhaps there is another, better, name for the new_key method? Or
should we call it create_key and actually have an empty object created
in S3?

Mitch

Alexey Melchakov

unread,
May 15, 2007, 10:56:47 AM5/15/07
to boto-users

Hi, Mitch.
I'm back to using boto, and have a few new suggestions:

1. Lets have key.delete(), it's simple to implement
2. Lets rename Key.key into Key.name? ;) or have an alias, people
around me alway misstype this property.
3.

# params can be one of: prefix, marker, max-keys, delimiter
# as defined in S3 Developer's Guide, however since max-keys is
not
# a legal variable in Python you have to pass maxkeys and this
# method will munge it (Ugh!)
def get_all_keys(self, headers=None, **params):
for k,v in params.items():
if k == 'maxkeys':
k = 'max-keys'

You may replace the code with k.replace('_', '-'), if there aren't any
params like 'param_name' - autoconvertion from 'underscore_scheme' to
'dashed-scheme' will solve that "Ugh" ;)

Thanks for boto, again :)

Alexey Melchakov

unread,
May 15, 2007, 12:27:08 PM5/15/07
to boto-users

Haha, Mitch, i hit the same problems with httplib, chunked and read(),
like you (saw a few comments from you in archives of mail.python.org),
but i implemented the exists() method for key, using head, may you
include it in boto?

Most code was copied from get_file, but i fixed httplib bug with
chunked head requests manually calling resp.close() (if nither
resp.read() nor resp.close() is called next httplib request raises
ResponseNotReady exception):

def exists(self, headers=None):
if not headers:
headers = {}
http_conn = self.bucket.connection.connection
final_headers = boto.utils.merge_meta(headers, {})
path = '/%s/%s' % (self.bucket.name, self.key)
path = urllib.quote(path)
self.bucket.connection.add_aws_auth_header(final_headers,
'HEAD', path)
if (self.bucket.connection.use_proxy == True):
path = self.bucket.connection.prefix_proxy_to_path(path)
http_conn.putrequest('HEAD', path)
for key in final_headers:
http_conn.putheader(key,final_headers[key])
http_conn.endheaders()
resp = http_conn.getresponse()
resp.close()
if resp.status == 200:
return True
elif resp.status == 404:
return False
else:
raise S3ResponseError(resp.status, resp.reason)

PS: httplib is the most buggie std lib module i have ever seed ;)

Mitchell Garnaat

unread,
May 15, 2007, 9:29:36 PM5/15/07
to boto-...@googlegroups.com
Hi Alexey -

Those are all good suggestions.  I'll incorporate them sometime this week.  Thanks!

Mitch

Mitchell Garnaat

unread,
May 15, 2007, 9:37:57 PM5/15/07
to boto-...@googlegroups.com
Yeah, I have had quite a few problems with httplib.  It's definitely not up to the same standard as the other Python libs.

Did you check out the lookup method of the Bucket object?  It does much of what your exists method does.  We should probably merge the two and decide where it should go.  If it stays in bucket.py we could add an exists method to key.py that just calls it.

Mitch

On 5/15/07, Alexey Melchakov < melc...@gmail.com> wrote:


Haha, Mitch, i hit the same problems with httplib, chunked and read(),
like you (saw a few comments from you in archives of mail.python.org),
but i implemented the exists() method for key, using head, may you
include it in boto?

Most code was copied from get_file, but i fixed httplib bug with
chunked head requests manually calling resp.close() (if nither
resp.read() nor resp.close() is called next httplib request raises
ResponseNotReady exception):

    def exists(self, headers=None):
        if not headers:
            headers = {}
        http_conn = self.bucket.connection.connection
        final_headers = boto.utils.merge_meta(headers, {})
        path = '/%s/%s' % (self.bucket.name , self.key)

        path = urllib.quote(path)
        self.bucket.connection.add_aws_auth_header(final_headers,
'HEAD', path)
        if (self.bucket.connection.use_proxy == True):
            path = self.bucket.connection.prefix_proxy_to_path(path)
        http_conn.putrequest('HEAD', path)
        for key in final_headers:
            http_conn.putheader(key,final_headers[key])
        http_conn.endheaders()
        resp = http_conn.getresponse()
        resp.close()
        if resp.status == 200:
            return True
        elif resp.status == 404:
            return False
        else:
            raise S3ResponseError( resp.status, resp.reason)

Alexey Melchakov

unread,
May 16, 2007, 8:49:32 AM5/16/07
to boto-users

I missed lookup..

I suppose Bucket.lookup should be renamed Bucket.get_key(name) ->
(Key, None), and Key.exists() -> (bool) should be implemented as
"return bool(self.bucket.lookup(self.name))".

And i more like response.close() thank response.chunked=0 ;)

Also i found 2 bugs:

1. Key.send_file() fails at this:

except socket.error, e:
print 'Caught a socket error, trying to recover'
self.bucket.connection.make_http_connection()
fp.seek(0)
self.send_file(fp, headers) # < -- HERE
except Exception, e:
print 'Caught an unexpected exception'
self.bucket.connection.make_http_connection()
raise e
if response.status != 200:

after self.send_file() call is finished code after excepts is
executed, and there are not any responses there, after execept
socket.error. This really should look like this:

except socket.error, e:
print 'Caught a socket error, trying to recover'
self.bucket.connection.make_http_connection()
fp.seek(0)
self.send_file(fp, headers)
return

2. for key in bucket code fails, Bucket.__iter__ should be:

def __iter__(self):
return iter(BucketListResultSet(self))

Mitchell Garnaat

unread,
May 16, 2007, 3:55:24 PM5/16/07
to boto-...@googlegroups.com
Regarding Key.key => Key.name.  I agree that it should probably have been Key.name from the start but I don't want to change it now in a way that will break everyone's programs.  One approach would be to add attribute lookup handlers to the Key object, like this:

    def __getattr__(self, name):
        if name == 'name':
            return self.key
        else:
            raise AttributeError

    def __setattr__(self, name, value):
        if name == 'name':
            self.__dict__['key'] = value
        else:
            self.__dict__[name] = value

That way the actual attribute would still be stored in "key" but any attempts to set or get the attribute "name" would access the same attribute.

Mitch

On 5/15/07, Alexey Melchakov <melc...@gmail.com> wrote:

Sylvain Hellegouarch

unread,
May 16, 2007, 5:48:52 PM5/16/07
to boto-...@googlegroups.com


2007/5/16, Mitchell Garnaat <mitch....@gmail.com>:
Yeah, I have had quite a few problems with httplib.  It's definitely not up to the same standard as the other Python libs.


Have you ever considered httplib2 by any chance instead?

- Sylvain


Mitchell Garnaat

unread,
May 16, 2007, 7:19:06 PM5/16/07
to boto-...@googlegroups.com
I did look at that once and I was impressed but there was something that it was missing that I needed.  Unfortunately, I can't for the life of me remember what that is.  From reading the web site it sounds like it should work.  I'm hesitant to introduce dependencies on non-standard packages but I might consider it in this case.  We might even be able to push all of the authentication back into httplib2.

I'll have another look.

Mitch

Alexey Melchakov

unread,
May 17, 2007, 2:45:12 AM5/17/07
to boto-users

If you agree 'name' is a better name - make it primary, and 'key'
secondary. New users will understood that and use correct name from
the start.

On May 16, 11:55 pm, "Mitchell Garnaat" <mitch.garn...@gmail.com>
wrote:


> Regarding Key.key => Key.name. I agree that it should probably have been
> Key.name from the start but I don't want to change it now in a way that will
> break everyone's programs. One approach would be to add attribute lookup
> handlers to the Key object, like this:
>
> def __getattr__(self, name):
> if name == 'name':
> return self.key
> else:
> raise AttributeError
>
> def __setattr__(self, name, value):
> if name == 'name':
> self.__dict__['key'] = value
> else:
> self.__dict__[name] = value
>
> That way the actual attribute would still be stored in "key" but any
> attempts to set or get the attribute "name" would access the same attribute.
>
> Mitch
>

Reply all
Reply to author
Forward
0 new messages