Hi,
I have a boto use case where I hook up a wrapper class that exposes a Key object as a seekable file, to the Google Cloud Storage resumable upload handler (which needs to be able to seek in various places). The way seek is implemented in this wrapper class is it performs a range GET at the specified starting offset.
This works fine, but before doing the range GET I need to close the currently open Key; and when I do that, the current Key.close() method reads all the data from the socket (as
required by by httplib before you can send a new request). This can be quite wasteful, e.g., in cases where the open Key was positioned near the start of a 100GB object, and then (through the file wrapper) we need to close and seek to somewhere else in the file -- at that point, we would read the entire 100GB and throw away the data, which is slow and wasteful.
I have a proposed solution at pull request
1252, which adds an optional 'fast' bool param to the Key.close() call (which simply causes the close function not to call resp.read()). This works because if you neglect to read the data off the socket before sending a new request,
boto will get an httplib
exception and simply close the connection and reopen.
So, adding this fast param is kind of a hack, in that it depends on this other implementation detail. But I don't currently have a better idea how to do it. I suppose I could try digging down through the boto connection handling layer to the underlying httplib and close the socket, but that seems like a hack too.
Does anyone have thoughts/suggestions about this? Or, alternatively, how would people feel about this param I added, if I update the doc to adequately describe what's going on? A potential downside I see is if boto users start using/depending on this param and later someone decides to change the connection exception handling, it could break these users' code.
Thanks,
Mike Schwartz