As you've probably seen by now, I've been hacking on an experimental branch that migrates CouchDB-Python from using httplib2 to just using httplib directly.
It would be nice to see some folks tried it out and reported any problems, or just thoughts in general.
The primary reason I started this branch was that I was unhappy how httplib2 doesn't currently support streaming uploads or downloads of attachments. Request and response bodies are always represented by strings, and thus must be loaded into memory in their entirety. Obviously that's a bad idea for large attachments.
But there are other advantages, of course: * A depency removed; on Python 2.6 we run on the stdlib, on earlier versions you just need a supported JSON module. * The client interface should be thread-safe. The HTTP connection pool is protected by a lock, but I still need to figure out whether stuff like the cache needs locking, and where exactly. * There's no maximum number of connections per host. * All the HTTP stuff is under our control. httplib doesn't really attempt to hide the socket layer as much as httplib2 does, and the branch uses that to work around some known quirks in the httplib code. * Performance seems to be improved, but I really need to measure that to make a real statement. In general though, the couchdb.http code only supports the protocol features actually used by CouchDB, so it can afford to be a lot simpler.
On the downside, there's only simple support for Basic auth at the moment, and there's no persistent caching by default (only a local memory cache). But I think it should be simple for people to add other authentication methods and persistent caching implementations without needing to monkeypatch the couchdb.http code (the auth API may need refinement though).
Depending on your feedback, I'd like to merge this work back into trunk sometime soon. So please let me know what you think, and any problems you discover.
> As you've probably seen by now, I've been hacking on an experimental
> branch that migrates CouchDB-Python from using httplib2 to just using
> httplib directly.
> It would be nice to see some folks tried it out and reported any
> problems, or just thoughts in general.
I'll switch over to using it locally to test it out a bit. However, I just
played with it in a Python shell and got a BadStatusLine exception after a
couple of requests. I'll take a look later (got some work to do first) and
report back.
> The primary reason I started this branch was that I was unhappy how
> httplib2 doesn't currently support streaming uploads or downloads of
> attachments. Request and response bodies are always represented by
> strings, and thus must be loaded into memory in their entirety.
> Obviously that's a bad idea for large attachments.
:nod: It's horrifying how many HTTP client libraries forget that it's quite
common to send and receive very large files.
> But there are other advantages, of course:
> * A depency removed; on Python 2.6 we run on the stdlib, on earlier
> versions you just need a supported JSON module.
> * The client interface should be thread-safe. The HTTP connection
> pool is protected by a lock, but I still need to figure out whether
> stuff like the cache needs locking, and where exactly.
> * There's no maximum number of connections per host.
> * All the HTTP stuff is under our control. httplib doesn't really
> attempt to hide the socket layer as much as httplib2 does, and the
> branch uses that to work around some known quirks in the httplib code.
> * Performance seems to be improved, but I really need to measure
> that to make a real statement. In general though, the couchdb.http
> code only supports the protocol features actually used by CouchDB, so
> it can afford to be a lot simpler.
> On the downside, there's only simple support for Basic auth at the
> moment, and there's no persistent caching by default (only a local
> memory cache). But I think it should be simple for people to add other
> authentication methods and persistent caching implementations without
> needing to monkeypatch the couchdb.http code (the auth API may need
> refinement though).
> Depending on your feedback, I'd like to merge this work back into
> trunk sometime soon. So please let me know what you think, and any
> problems you discover.
On Wed, Sep 9, 2009 at 12:19, Matt Goodall<matt.good...@gmail.com> wrote: > I'll switch over to using it locally to test it out a bit. However, I just > played with it in a Python shell and got a BadStatusLine exception after a > couple of requests. I'll take a look later (got some work to do first) and > report back.
Ugh, BadStatusLine is a seriously useless exception, it's hard to debug.
>> Depending on your feedback, I'd like to merge this work back into >> trunk sometime soon. So please let me know what you think, and any >> problems you discover.
> I'll switch over to using it locally to test it out a bit. However, > I just played with it in a Python shell and got a BadStatusLine > exception after a couple of requests. I'll take a look later (got > some work to do first) and report back.
I think that *may* be due to unread content on the socket from a previous response.
One note on downloading attachments, as that's not documented on the branch yet: any non-JSON and non-empty response entity body is returned as a file-like object, which is either a simple StringIO wrapper (for small responses) or a couchdb.http.ResponseBody instance (a thin layer over the underlying socket).
So whenever you get non-JSON data, which is the case for attachments [1], you need to either fully consume the response via .read ([amount]), or call its .close() method. Both result in the connection returning into a reusable state.
Perhaps we can detect connections left in unusable state, and just close/reconnect in that case. Although, now that I think about it, if you don't consume/close a ResponseBody, the associated connection is not returned to the connection pool, either. Hm.
> On 09.09.2009, at 12:19, Matt Goodall wrote:
> > I'll switch over to using it locally to test it out a bit. However,
> > I just played with it in a Python shell and got a BadStatusLine
> > exception after a couple of requests. I'll take a look later (got
> > some work to do first) and report back.
> I think that *may* be due to unread content on the socket from a
> previous response.
No, looks like an inactivity problem.
Mochiweb (the CouchDB web server component, for those that don't know) has a
default idle timeout of 30s after which it closes the connection. The
httplib branch does not appear to be handling closed connections well yet.
The standard httplib module raises a BadStatusError to indicate a (probably)
closed connection.
I guess it's fair to assume that if a valid status line is not sent back
then the server either didn't receive the request or died before it had the
chance to do anything with it. So, catching a BadStatusError, recycling the
connection and trying again is probably reasonable and, unlike httplib2, the
only time it should retry IMHO.
I don't know of any other way to detect a closed socket. IIRC, a closed
socket just looks readable after a select.select() call and you have to read
at least 1 byte to find out if there's really any data or if it's been
closed. Not very helpful and, besides, messing around with select is
probably going to lead into cross-platform problems.
> One note on downloading attachments, as that's not documented on the
> branch yet: any non-JSON and non-empty response entity body is
> returned as a file-like object, which is either a simple StringIO
> wrapper (for small responses) or a couchdb.http.ResponseBody instance
> (a thin layer over the underlying socket).
> So whenever you get non-JSON data, which is the case for attachments
> [1], you need to either fully consume the response via .read
> ([amount]), or call its .close() method. Both result in the connection
> returning into a reusable state.
> Perhaps we can detect connections left in unusable state, and just
> close/reconnect in that case. Although, now that I think about it, if
> you don't consume/close a ResponseBody, the associated connection is
> not returned to the connection pool, either. Hm.
>> On 09.09.2009, at 12:19, Matt Goodall wrote:
>> > I'll switch over to using it locally to test it out a bit. However,
>> > I just played with it in a Python shell and got a BadStatusLine
>> > exception after a couple of requests. I'll take a look later (got
>> > some work to do first) and report back.
>> I think that *may* be due to unread content on the socket from a
>> previous response.
> No, looks like an inactivity problem.
Forgot to post this, although it's fairly obvious:
db = couchdb.Server()['test']
list(db.view('_all_docs', limit=1))
time.sleep(32)
list(db.view('_all_docs', limit=1))
> Mochiweb (the CouchDB web server component, for those that don't know) has
> a default idle timeout of 30s after which it closes the connection. The
> httplib branch does not appear to be handling closed connections well yet.
> The standard httplib module raises a BadStatusError to indicate a (probably)
> closed connection.
> I guess it's fair to assume that if a valid status line is not sent back
> then the server either didn't receive the request or died before it had the
> chance to do anything with it. So, catching a BadStatusError, recycling the
> connection and trying again is probably reasonable and, unlike httplib2, the
> only time it should retry IMHO.
> I don't know of any other way to detect a closed socket. IIRC, a closed
> socket just looks readable after a select.select() call and you have to read
> at least 1 byte to find out if there's really any data or if it's been
> closed. Not very helpful and, besides, messing around with select is
> probably going to lead into cross-platform problems.
> - Matt
>> One note on downloading attachments, as that's not documented on the
>> branch yet: any non-JSON and non-empty response entity body is
>> returned as a file-like object, which is either a simple StringIO
>> wrapper (for small responses) or a couchdb.http.ResponseBody instance
>> (a thin layer over the underlying socket).
>> So whenever you get non-JSON data, which is the case for attachments
>> [1], you need to either fully consume the response via .read
>> ([amount]), or call its .close() method. Both result in the connection
>> returning into a reusable state.
>> Perhaps we can detect connections left in unusable state, and just
>> close/reconnect in that case. Although, now that I think about it, if
>> you don't consume/close a ResponseBody, the associated connection is
>> not returned to the connection pool, either. Hm.
> On 09.09.2009, at 12:19, Matt Goodall wrote: > > I'll switch over to using it locally to test it out a bit. However, > > I just played with it in a Python shell and got a BadStatusLine > > exception after a couple of requests. I'll take a look later (got > > some work to do first) and report back.
> I think that *may* be due to unread content on the socket from a > previous response.
> No, looks like an inactivity problem.
> Forgot to post this, although it's fairly obvious:
Sorry for dropping out here, I tried this but can't reproduce the problem on either OS X or Debian. If you have a fix for this, can you please post it?
> On 09.09.2009, at 15:03, Matt Goodall wrote:
>> On 09.09.2009, at 12:19, Matt Goodall wrote:
>> > I'll switch over to using it locally to test it out a bit. However,
>> > I just played with it in a Python shell and got a BadStatusLine
>> > exception after a couple of requests. I'll take a look later (got
>> > some work to do first) and report back.
>> I think that *may* be due to unread content on the socket from a
>> previous response.
>> No, looks like an inactivity problem.
>> Forgot to post this, although it's fairly obvious:
> Sorry for dropping out here, I tried this but can't reproduce the
> problem on either OS X or Debian. If you have a fix for this, can you
> please post it?
Tried the above again (on Ubuntu 9.10) and I still get the error. I'll
take a look and try to fix it.
I don't have any insightful feedback, but I did cut over to the httplib
branch to work around bug 95 (a.k.a. bug 85). I'm not hitting every feature
of the codebase, but after pulling down the new branch the bug was resolved
and everything in my app seems to work great. Given the number of problems
that have been reported with httplib2, this certainly seems like the way to
go.
On Tue, Sep 8, 2009 at 1:38 PM, Christopher Lenz <cml...@gmx.de> wrote:
> Hey all,
> As you've probably seen by now, I've been hacking on an experimental
> branch that migrates CouchDB-Python from using httplib2 to just using
> httplib directly.
> It would be nice to see some folks tried it out and reported any
> problems, or just thoughts in general.
> The primary reason I started this branch was that I was unhappy how
> httplib2 doesn't currently support streaming uploads or downloads of
> attachments. Request and response bodies are always represented by
> strings, and thus must be loaded into memory in their entirety.
> Obviously that's a bad idea for large attachments.
> But there are other advantages, of course:
> * A depency removed; on Python 2.6 we run on the stdlib, on earlier
> versions you just need a supported JSON module.
> * The client interface should be thread-safe. The HTTP connection
> pool is protected by a lock, but I still need to figure out whether
> stuff like the cache needs locking, and where exactly.
> * There's no maximum number of connections per host.
> * All the HTTP stuff is under our control. httplib doesn't really
> attempt to hide the socket layer as much as httplib2 does, and the
> branch uses that to work around some known quirks in the httplib code.
> * Performance seems to be improved, but I really need to measure
> that to make a real statement. In general though, the couchdb.http
> code only supports the protocol features actually used by CouchDB, so
> it can afford to be a lot simpler.
> On the downside, there's only simple support for Basic auth at the
> moment, and there's no persistent caching by default (only a local
> memory cache). But I think it should be simple for people to add other
> authentication methods and persistent caching implementations without
> needing to monkeypatch the couchdb.http code (the auth API may need
> refinement though).
> Depending on your feedback, I'd like to merge this work back into
> trunk sometime soon. So please let me know what you think, and any
> problems you discover.