I'd like to participate in module developement and make it more user
friendly to end user.
1. it would be very nice if boto.s3 contains all neccessary objects,
without forcing me from import seperate submodules. It's clean and
simple. boto/s3/__init__.py should be:
from connection import S3Connection as Connection
from key import Key
from bucket import Bucket
__all__ = ['Connection', 'Key', 'Bucket']
2. Key interface
Very often lines of code are:
k = key(bucket)
k.key = 'name-of-key'
User should have ability to specife key name on it initialization:
k = key(bucket, 'name_of_key')
another method of bucket would be useful:
bucket.create_key('name_of_bucket') # no need to use Key class
explicitly.
3. More 'pythonic' interfaces for accessing buckets and keys:
for bucket in s3_conn:
for key in bucket:
And more and more and more..
If Mitch are interested in such changes - i can continue describing a
better interface for boto.s3 and provide all patches implmenting this.
Thank you :)
Thanks for the feedback and I ideas. I appreciate it. See my
comments below:
On Mar 17, 7:40 am, "melcha...@gmail.com" <melcha...@gmail.com> wrote:
> Hi. I've started using boto's s3. The module is very nice, everything
> works just fine, but i definitely dont like provided interface. It's
> not very handy to use provided objects and modules.
>
> I'd like to participate in module developement and make it more user
> friendly to end user.
>
> 1. it would be very nice if boto.s3 contains all neccessary objects,
> without forcing me from import seperate submodules. It's clean and
> simple. boto/s3/__init__.py should be:
>
> from connection import S3Connection as Connection
> from key import Key
> from bucket import Bucket
>
> __all__ = ['Connection', 'Key', 'Bucket']
>
This seems reasonable. The full paths would still work so existing
code would not be impacted.
> 2. Key interface
>
> Very often lines of code are:
>
> k = key(bucket)
> k.key = 'name-of-key'
>
> User should have ability to specife key name on it initialization:
>
> k = key(bucket, 'name_of_key')
This would be easy to add and, as long as we provided a default value,
existing code would not be impacted.
>
> another method of bucket would be useful:
>
> bucket.create_key('name_of_bucket') # no need to use Key class
> explicitly.
Something very similar to this was added in the latest release. The
method is actually called new_key rather than create_key and it
doesn't accept the name of the key as a param but that can be added.
>
> 3. More 'pythonic' interfaces for accessing buckets and keys:
>
> for bucket in s3_conn:
> for key in bucket:
>
Perhaps we could use generators for this. Bitbucket included a
generator for bucket listing and it worked well. It's more efficient
because you don't actually have to grab all of the results into memory
at one time. You could also seemlessly handle the results paging from
S3 (or SQS).
> And more and more and more..
> If Mitch are interested in such changes - i can continue describing a
> better interface for boto.s3 and provide all patches implmenting this.
>
> Thank you :)
I'm always happy to accept patches and new code. Thanks!
Mitch
On Mar 17, 7:29 am, "m...@garnaat.com" <Mitch.Garn...@gmail.com>
wrote:
I just checked in some changes that address some of these. I'd be
interested in feedback. I'm not guaranteeing that what's in there
right now will be the way things end up but it's a start. Changes
include:
Added a generator function in bucket.py. This allows you to do things
like this:
>>> c = boto.connect_s3()
>>> bucket = c.find_bucket('foo')
>>> for key in bucket:
... do something interesting
>>>
The really neat thing is that it doesn't matter how many keys are in
bucket. The generator handles all of the paging behind the scenes and
just keeps returning keys until you've visiting all of them.
I also added an __iter__ method to the S3Connection class so it, too,
can behave like this. So, you can do:
>>> for bucket in c:
... do some bucket thing
>>>
I also added the key_name param (optional) to the Key constructor and
to the new_key method of the bucket object as suggested by melchakov
as well as the suggestions for the __all__ attribute for the s3
package.
These changes don't cause any incompatibilities with existing code so
you can ignore them if you want. I'd be interested in feedback,
though, because I'd like to make similar changes to the other modules.
Mitch
Let me know what you think
On Mar 19, 3:34 pm, "m...@garnaat.com" <Mitch.Garn...@gmail.com>
> I also added the key_name param (optional) to the Key constructor and
> to the new_key method of the bucket object as suggested by melchakov
> as well as the suggestions for the __all__ attribute for the s3
> package.
Thanks for changes.
I think it's better to replace, or alias, for compatibility, the name
of method.
1. If there is a create_bucket - let's follow this naming rule and
have create_key.
2. Create is a verb but new is an adjective. Verb is more intuitive
for action, returning something, at least for me :)
Perhaps there is another, better, name for the new_key method? Or
should we call it create_key and actually have an empty object created
in S3?
Mitch
1. Lets have key.delete(), it's simple to implement
2. Lets rename Key.key into Key.name? ;) or have an alias, people
around me alway misstype this property.
3.
# params can be one of: prefix, marker, max-keys, delimiter
# as defined in S3 Developer's Guide, however since max-keys is
not
# a legal variable in Python you have to pass maxkeys and this
# method will munge it (Ugh!)
def get_all_keys(self, headers=None, **params):
for k,v in params.items():
if k == 'maxkeys':
k = 'max-keys'
You may replace the code with k.replace('_', '-'), if there aren't any
params like 'param_name' - autoconvertion from 'underscore_scheme' to
'dashed-scheme' will solve that "Ugh" ;)
Thanks for boto, again :)
Most code was copied from get_file, but i fixed httplib bug with
chunked head requests manually calling resp.close() (if nither
resp.read() nor resp.close() is called next httplib request raises
ResponseNotReady exception):
def exists(self, headers=None):
if not headers:
headers = {}
http_conn = self.bucket.connection.connection
final_headers = boto.utils.merge_meta(headers, {})
path = '/%s/%s' % (self.bucket.name, self.key)
path = urllib.quote(path)
self.bucket.connection.add_aws_auth_header(final_headers,
'HEAD', path)
if (self.bucket.connection.use_proxy == True):
path = self.bucket.connection.prefix_proxy_to_path(path)
http_conn.putrequest('HEAD', path)
for key in final_headers:
http_conn.putheader(key,final_headers[key])
http_conn.endheaders()
resp = http_conn.getresponse()
resp.close()
if resp.status == 200:
return True
elif resp.status == 404:
return False
else:
raise S3ResponseError(resp.status, resp.reason)
PS: httplib is the most buggie std lib module i have ever seed ;)
Haha, Mitch, i hit the same problems with httplib, chunked and read(),
like you (saw a few comments from you in archives of mail.python.org),
but i implemented the exists() method for key, using head, may you
include it in boto?
Most code was copied from get_file, but i fixed httplib bug with
chunked head requests manually calling resp.close() (if nither
resp.read() nor resp.close() is called next httplib request raises
ResponseNotReady exception):
def exists(self, headers=None):
if not headers:
headers = {}
http_conn = self.bucket.connection.connection
final_headers = boto.utils.merge_meta(headers, {})
path = '/%s/%s' % (self.bucket.name , self.key)
path = urllib.quote(path)
self.bucket.connection.add_aws_auth_header(final_headers,
'HEAD', path)
if (self.bucket.connection.use_proxy == True):
path = self.bucket.connection.prefix_proxy_to_path(path)
http_conn.putrequest('HEAD', path)
for key in final_headers:
http_conn.putheader(key,final_headers[key])
http_conn.endheaders()
resp = http_conn.getresponse()
resp.close()
if resp.status == 200:
return True
elif resp.status == 404:
return False
else:
raise S3ResponseError( resp.status, resp.reason)
I suppose Bucket.lookup should be renamed Bucket.get_key(name) ->
(Key, None), and Key.exists() -> (bool) should be implemented as
"return bool(self.bucket.lookup(self.name))".
And i more like response.close() thank response.chunked=0 ;)
Also i found 2 bugs:
1. Key.send_file() fails at this:
except socket.error, e:
print 'Caught a socket error, trying to recover'
self.bucket.connection.make_http_connection()
fp.seek(0)
self.send_file(fp, headers) # < -- HERE
except Exception, e:
print 'Caught an unexpected exception'
self.bucket.connection.make_http_connection()
raise e
if response.status != 200:
after self.send_file() call is finished code after excepts is
executed, and there are not any responses there, after execept
socket.error. This really should look like this:
except socket.error, e:
print 'Caught a socket error, trying to recover'
self.bucket.connection.make_http_connection()
fp.seek(0)
self.send_file(fp, headers)
return
2. for key in bucket code fails, Bucket.__iter__ should be:
def __iter__(self):
return iter(BucketListResultSet(self))
Yeah, I have had quite a few problems with httplib. It's definitely not up to the same standard as the other Python libs.
On May 16, 11:55 pm, "Mitchell Garnaat" <mitch.garn...@gmail.com>
wrote:
> Regarding Key.key => Key.name. I agree that it should probably have been
> Key.name from the start but I don't want to change it now in a way that will
> break everyone's programs. One approach would be to add attribute lookup
> handlers to the Key object, like this:
>
> def __getattr__(self, name):
> if name == 'name':
> return self.key
> else:
> raise AttributeError
>
> def __setattr__(self, name, value):
> if name == 'name':
> self.__dict__['key'] = value
> else:
> self.__dict__[name] = value
>
> That way the actual attribute would still be stored in "key" but any
> attempts to set or get the attribute "name" would access the same attribute.
>
> Mitch
>