httplib branch

Christopher Lenz

unread,

Sep 8, 2009, 5:38:08 PM9/8/09

to couchdb...@googlegroups.com

Hey all,

As you've probably seen by now, I've been hacking on an experimental
branch that migrates CouchDB-Python from using httplib2 to just using
httplib directly.

The branch can be found here:

<http://couchdb-python.googlecode.com/svn/branches/experimental/httplib/
>

It would be nice to see some folks tried it out and reported any
problems, or just thoughts in general.

The primary reason I started this branch was that I was unhappy how
httplib2 doesn't currently support streaming uploads or downloads of
attachments. Request and response bodies are always represented by
strings, and thus must be loaded into memory in their entirety.
Obviously that's a bad idea for large attachments.

But there are other advantages, of course:
* A depency removed; on Python 2.6 we run on the stdlib, on earlier
versions you just need a supported JSON module.
* The client interface should be thread-safe. The HTTP connection
pool is protected by a lock, but I still need to figure out whether
stuff like the cache needs locking, and where exactly.
* There's no maximum number of connections per host.
* All the HTTP stuff is under our control. httplib doesn't really
attempt to hide the socket layer as much as httplib2 does, and the
branch uses that to work around some known quirks in the httplib code.
* Performance seems to be improved, but I really need to measure
that to make a real statement. In general though, the couchdb.http
code only supports the protocol features actually used by CouchDB, so
it can afford to be a lot simpler.

On the downside, there's only simple support for Basic auth at the
moment, and there's no persistent caching by default (only a local
memory cache). But I think it should be simple for people to add other
authentication methods and persistent caching implementations without
needing to monkeypatch the couchdb.http code (the auth API may need
refinement though).

Depending on your feedback, I'd like to merge this work back into
trunk sometime soon. So please let me know what you think, and any
problems you discover.

Thanks!
--
Christopher Lenz
cmlenz at gmx.de
http://www.cmlenz.net/

Matt Goodall

unread,

Sep 9, 2009, 6:19:27 AM9/9/09

to couchdb...@googlegroups.com

2009/9/8 Christopher Lenz <cml...@gmx.de>

Hey all,

As you've probably seen by now, I've been hacking on an experimental
branch that migrates CouchDB-Python from using httplib2 to just using
httplib directly.

The branch can be found here:

<http://couchdb-python.googlecode.com/svn/branches/experimental/httplib/
>

It would be nice to see some folks tried it out and reported any
problems, or just thoughts in general.

I'll switch over to using it locally to test it out a bit. However, I just played with it in a Python shell and got a BadStatusLine exception after a couple of requests. I'll take a look later (got some work to do first) and report back.

The primary reason I started this branch was that I was unhappy how
httplib2 doesn't currently support streaming uploads or downloads of
attachments. Request and response bodies are always represented by
strings, and thus must be loaded into memory in their entirety.
Obviously that's a bad idea for large attachments.

:nod: It's horrifying how many HTTP client libraries forget that it's quite common to send and receive very large files.

Dirkjan Ochtman

unread,

Sep 9, 2009, 6:26:18 AM9/9/09

to couchdb...@googlegroups.com

On Wed, Sep 9, 2009 at 12:19, Matt Goodall<matt.g...@gmail.com> wrote:
> I'll switch over to using it locally to test it out a bit. However, I just
> played with it in a Python shell and got a BadStatusLine exception after a
> couple of requests. I'll take a look later (got some work to do first) and
> report back.

Ugh, BadStatusLine is a seriously useless exception, it's hard to debug.

>> Depending on your feedback, I'd like to merge this work back into
>> trunk sometime soon. So please let me know what you think, and any
>> problems you discover.

Sounds great.

Cheers,

Dirkjan

Christopher Lenz

unread,

Sep 9, 2009, 7:50:19 AM9/9/09

to couchdb...@googlegroups.com

On 09.09.2009, at 12:19, Matt Goodall wrote:
> I'll switch over to using it locally to test it out a bit. However,
> I just played with it in a Python shell and got a BadStatusLine
> exception after a couple of requests. I'll take a look later (got
> some work to do first) and report back.

I think that *may* be due to unread content on the socket from a
previous response.

One note on downloading attachments, as that's not documented on the
branch yet: any non-JSON and non-empty response entity body is
returned as a file-like object, which is either a simple StringIO
wrapper (for small responses) or a couchdb.http.ResponseBody instance
(a thin layer over the underlying socket).

So whenever you get non-JSON data, which is the case for attachments
[1], you need to either fully consume the response via .read
([amount]), or call its .close() method. Both result in the connection
returning into a reusable state.

Perhaps we can detect connections left in unusable state, and just
close/reconnect in that case. Although, now that I think about it, if
you don't consume/close a ResponseBody, the associated connection is
not returned to the connection pool, either. Hm.

Cheers,

Matt Goodall

unread,

Sep 9, 2009, 8:58:59 AM9/9/09

to couchdb...@googlegroups.com

2009/9/9 Christopher Lenz <cml...@gmx.de>

On 09.09.2009, at 12:19, Matt Goodall wrote:
> I'll switch over to using it locally to test it out a bit. However,
> I just played with it in a Python shell and got a BadStatusLine
> exception after a couple of requests. I'll take a look later (got
> some work to do first) and report back.

I think that *may* be due to unread content on the socket from a
previous response.

No, looks like an inactivity problem.

Mochiweb (the CouchDB web server component, for those that don't know) has a default idle timeout of 30s after which it closes the connection. The httplib branch does not appear to be handling closed connections well yet. The standard httplib module raises a BadStatusError to indicate a (probably) closed connection.

I guess it's fair to assume that if a valid status line is not sent back then the server either didn't receive the request or died before it had the chance to do anything with it. So, catching a BadStatusError, recycling the connection and trying again is probably reasonable and, unlike httplib2, the only time it should retry IMHO.

I don't know of any other way to detect a closed socket. IIRC, a closed socket just looks readable after a select.select() call and you have to read at least 1 byte to find out if there's really any data or if it's been closed. Not very helpful and, besides, messing around with select is probably going to lead into cross-platform problems.

- Matt

Matt Goodall

unread,

Sep 9, 2009, 9:03:31 AM9/9/09

to couchdb...@googlegroups.com

2009/9/9 Matt Goodall <matt.g...@gmail.com>

2009/9/9 Christopher Lenz <cml...@gmx.de>

On 09.09.2009, at 12:19, Matt Goodall wrote:
> I'll switch over to using it locally to test it out a bit. However,
> I just played with it in a Python shell and got a BadStatusLine
> exception after a couple of requests. I'll take a look later (got
> some work to do first) and report back.

I think that *may* be due to unread content on the socket from a
previous response.

No, looks like an inactivity problem.

Forgot to post this, although it's fairly obvious:

db = couchdb.Server()['test']

list(db.view('_all_docs', limit=1))

time.sleep(32)

list(db.view('_all_docs', limit=1))

Christopher Lenz

unread,

Nov 6, 2009, 5:28:22 PM11/6/09

to couchdb...@googlegroups.com

On 09.09.2009, at 15:03, Matt Goodall wrote:
> On 09.09.2009, at 12:19, Matt Goodall wrote:
> > I'll switch over to using it locally to test it out a bit. However,
> > I just played with it in a Python shell and got a BadStatusLine
> > exception after a couple of requests. I'll take a look later (got
> > some work to do first) and report back.
>
> I think that *may* be due to unread content on the socket from a
> previous response.
>
> No, looks like an inactivity problem.
>
> Forgot to post this, although it's fairly obvious:
>
> db = couchdb.Server()['test']
> list(db.view('_all_docs', limit=1))
> time.sleep(32)
> list(db.view('_all_docs', limit=1))

Sorry for dropping out here, I tried this but can't reproduce the
problem on either OS X or Debian. If you have a fix for this, can you
please post it?

Cheers,
Chris

Matt Goodall

unread,

Nov 9, 2009, 9:46:52 AM11/9/09

to couchdb...@googlegroups.com

2009/11/6 Christopher Lenz <cml...@gmx.de>:

Tried the above again (on Ubuntu 9.10) and I still get the error. I'll
take a look and try to fix it.

- Matt

Christopher Groskopf

unread,

Nov 9, 2009, 11:39:23 AM11/9/09

to couchdb...@googlegroups.com

Christopher,

I don't have any insightful feedback, but I did cut over to the httplib branch to work around bug 95 (a.k.a. bug 85). I'm not hitting every feature of the codebase, but after pulling down the new branch the bug was resolved and everything in my app seems to work great. Given the number of problems that have been reported with httplib2, this certainly seems like the way to go.

Cheers,
Chris

Reply all

Reply to author

Forward