Next steps

51 views
Skip to first unread message

Dirkjan Ochtman

unread,
Jan 11, 2010, 5:44:41 AM1/11/10
to couchdb...@googlegroups.com
Hi all,

Before the holiday I was speaking a bit with Christopher in IRC about
what he was envisioning for couchdb-python's next release. I'd like to
start moving on some things again, but it would be nice to move the
stuff we agreed on out of the way first.

But of course I didn't log that conversation at the time and I don't
think the channel has public logfiles, so I forgot a bunch of the
things we wanted to do. I think the plan was as such:

- Break compatibility by renaming couchdb.schema to something more
sensible, and renaming some other classes
- Something in the Database needs to be changed from "session" to
"http" or the other way around, I forget what it was
- Merge the httplib branch into the default branch
- Fix some API consistencies (method calls vs. properties -> should
all become method calls)

After this, the idea was to declare the API somewhat-more stable by
tagging 0.7 as a beta-quality release (after making the move to this
release painful for some people by changing the API; at least we're
making all the changes at once!).

So, what are other thoughts for where we should be going? Some things
I still want to look at is continuous changes via some generator API,
and the managed API that was talked about on this list a long time
ago.

Cheers,

Dirkjan

Christopher Lenz

unread,
Jan 20, 2010, 5:02:35 AM1/20/10
to couchdb...@googlegroups.com
On 11.01.2010, at 11:44, Dirkjan Ochtman wrote:
> Before the holiday I was speaking a bit with Christopher in IRC about
> what he was envisioning for couchdb-python's next release. I'd like to
> start moving on some things again, but it would be nice to move the
> stuff we agreed on out of the way first.
>
> But of course I didn't log that conversation at the time and I don't
> think the channel has public logfiles, so I forgot a bunch of the
> things we wanted to do. I think the plan was as such:
>
> - Break compatibility by renaming couchdb.schema to something more
> sensible, and renaming some other classes

Yeah, in retrospect I think the "schema" name was a really bad choice. I'd suggest we move to "mapping", as in mapping JSON to Python objects and back. So the couchdb.schema module would become couchdb.mapping, and the "Schema" class would be renamed to "Mapping". Also, it'd be nice to rename the couchdb.schema.Document class so that it isn't as easily confused with couchdb.client.Document. Maybe DocumentMapping? Not sure what to do with couchdb.schema.View.

> - Something in the Database needs to be changed from "session" to
> "http" or the other way around, I forget what it was

That's just a change that already happened httplib branch. Needs to be documented together with other backwards incompatible changes.

> - Merge the httplib branch into the default branch
> - Fix some API consistencies (method calls vs. properties -> should
> all become method calls)
>
> After this, the idea was to declare the API somewhat-more stable by
> tagging 0.7 as a beta-quality release (after making the move to this
> release painful for some people by changing the API; at least we're
> making all the changes at once!).

Thanks,
--
Christopher Lenz
cmlenz at gmx.de
http://www.cmlenz.net/

Dirkjan Ochtman

unread,
Jan 24, 2010, 6:57:24 AM1/24/10
to couchdb...@googlegroups.com
On Wed, Jan 20, 2010 at 11:02, Christopher Lenz <cml...@gmx.de> wrote:
> Yeah, in retrospect I think the "schema" name was a really bad choice. I'd suggest we move to "mapping", as in mapping JSON to Python objects and back. So the couchdb.schema module would become couchdb.mapping, and the "Schema" class would be renamed to "Mapping". Also, it'd be nice to rename the couchdb.schema.Document class so that it isn't as easily confused with couchdb.client.Document. Maybe DocumentMapping? Not sure what to do with couchdb.schema.View.

I just pushed the change from schema to mapping. As for the Document
(and View?), maybe MappedDocument (and MappedView)? I also fixed the
property vs. method inconsistency.

> That's just a change that already happened httplib branch. Needs to be documented together with other backwards incompatible changes.

What kind of incompatible changes are there, other than the three-way
instead of two-way return value? I just tried merging httplib to
default, but I ended up with a test failure that wasn't very obvious:

======================================================================
ERROR: test_view_compaction (couchdb.tests.client.DatabaseTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/djc/src/couchdb-python/couchdb/tests/client.py", line
95, in tearDown
self.server.delete('python-tests')
File "/home/djc/src/couchdb-python/couchdb/client.py", line 196, in delete
del self[name]
File "/home/djc/src/couchdb-python/couchdb/client.py", line 131, in
__delitem__
self.resource.delete(validate_dbname(name))
File "/home/djc/src/couchdb-python/couchdb/http.py", line 340, in delete
return self._request('DELETE', path, headers=headers, **params)
File "/home/djc/src/couchdb-python/couchdb/http.py", line 360, in _request
credentials=self.credentials)
File "/home/djc/src/couchdb-python/couchdb/http.py", line 189, in request
resp = _try_request()
File "/home/djc/src/couchdb-python/couchdb/http.py", line 155, in _try_request
conn.endheaders()
File "/usr/lib/python2.6/httplib.py", line 892, in endheaders
self._send_output()
File "/usr/lib/python2.6/httplib.py", line 764, in _send_output
self.send(msg)
File "/usr/lib/python2.6/httplib.py", line 743, in send
self.sock.sendall(str)
File "<string>", line 1, in sendall
error: [Errno 104] Connection reset by peer

----------------------------------------------------------------------
Ran 88 tests in 15.307s

FAILED (errors=1)

(I didn't see this on any of the branches before merging.)

Cheers,

Dirkjan

Dirkjan Ochtman

unread,
Jan 29, 2010, 10:24:40 AM1/29/10
to couchdb...@googlegroups.com
Hi all,

I just merged the httplib branch into default after finding out what
the problem with test was (namely me being stupid about implementing
view compaction). I've also implemented an API for the continuous
changes feed, which was quite easy to build on top couchdb.http (but
required one ugly hack resulting from what I consider a bug in CouchDB
that will hopefully be solved). Some review of the current state of
the default branch would be quite useful, since I'd like to push a
release out soonish.

On Sun, Jan 24, 2010 at 12:57, Dirkjan Ochtman <dir...@ochtman.nl> wrote:
> I just pushed the change from schema to mapping. As for the Document
> (and View?), maybe MappedDocument (and MappedView)?

I still think it would be helpful to rename mapping.Document and
mapping.View (to prevent confusion with the classes in
couchdb.client), but I'm not sure about good names. I think
MappedDocument is quite long for the intended purpose (parent class
for all mappified classes). MappedView doesn't seem like a good name
for what it does, maybe ViewMapper or ViewProperty is better? Who's up
for some bikeshedding?

Cheers,

Dirkjan

Dirkjan Ochtman

unread,
Jan 30, 2010, 10:22:21 AM1/30/10
to couchdb...@googlegroups.com
On Fri, Jan 29, 2010 at 16:24, Dirkjan Ochtman <dir...@ochtman.nl> wrote:
> On Sun, Jan 24, 2010 at 12:57, Dirkjan Ochtman <dir...@ochtman.nl> wrote:
>> I just pushed the change from schema to mapping. As for the Document
>> (and View?), maybe MappedDocument (and MappedView)?
>
> I still think it would be helpful to rename mapping.Document and
> mapping.View (to prevent confusion with the classes in
> couchdb.client), but I'm not sure about good names. I think
> MappedDocument is quite long for the intended purpose (parent class
> for all mappified classes). MappedView doesn't seem like a good name
> for what it does, maybe ViewMapper or ViewProperty is better? Who's up
> for some bikeshedding?

More suggestions: ViewField, MappedDoc.

Cheers,

Dirkjan

Matt Goodall

unread,
Jan 30, 2010, 1:21:29 PM1/30/10
to couchdb...@googlegroups.com
On 29 January 2010 15:24, Dirkjan Ochtman <dir...@ochtman.nl> wrote:
Hi all,

I just merged the httplib branch into default after finding out what
the problem with test was (namely me being stupid about implementing
view compaction). I've also implemented an API for the continuous
changes feed, which was quite easy to build on top couchdb.http (but
required one ugly hack resulting from what I consider a bug in CouchDB
that will hopefully be solved).

I count two ugly hacks ;-). The continuous changes response is chunked (although that's entirely up to CouchDB) but the response is not being read as chunked. It simply skips over the chunk headers. Now, that's working ok right now but could be messed up if CouchDB changes its implementation. Good enough for now, I say.

I've also pushed a couple of fixes (see commits) to the continuous changes feed handling, hope they're ok for others.
 
Some review of the current state of
the default branch would be quite useful, since I'd like to push a
release out soonish.

I have a couple of things I'm a bit suspicious of but haven't had time to look into them more yet:

1. ResponseBody.__iter__ returns quite different data from ResponseBody.read, e.g. __iter__ returns chunked encoding headers. I'm guessing __iter__ is supposed to yield a line of actual response data at a time.

2. Caching is now in-memory. However, I didn't notice anything that manages the size of the cache. That could lead to some nasty memory increases.

3. I think the response is being read() after a HEAD request. I believe that can cause problems in the HTTPConnection's internal state machine, although I don't know if it matters here.

- Matt

Dirkjan Ochtman

unread,
Jan 31, 2010, 9:06:28 AM1/31/10
to couchdb...@googlegroups.com
On Sat, Jan 30, 2010 at 19:21, Matt Goodall <matt.g...@gmail.com> wrote:
> I count two ugly hacks ;-). The continuous changes response is chunked
> (although that's entirely up to CouchDB) but the response is not being read
> as chunked. It simply skips over the chunk headers. Now, that's working ok
> right now but could be messed up if CouchDB changes its implementation. Good
> enough for now, I say.

I wondered what those were yesterday, but figured I didn't need them
anyway. You provided me with enough clue to look up the details of
chunked transfers, and I think the current implementation (in
4a8ece20c421) should do much better.

> I've also pushed a couple of fixes (see commits) to the continuous changes
> feed handling, hope they're ok for others.

Yeah, I figured I'd just get my minimal use case working, but
obviously glossed over some details (my test database got a new doc
each 60s, so I never saw a last_seq come in with the default
settings). Thanks for fixing it up!

> I have a couple of things I'm a bit suspicious of but haven't had time to
> look into them more yet:
> 1. ResponseBody.__iter__ returns quite different data from
> ResponseBody.read, e.g. __iter__ returns chunked encoding headers. I'm
> guessing __iter__ is supposed to yield a line of actual response data at a
> time.

I think this has been fixed now, right?

> 2. Caching is now in-memory. However, I didn't notice anything that manages
> the size of the cache. That could lead to some nasty memory increases.

Do you mean for chunked transfers, or just in general?

> 3. I think the response is being read() after a HEAD request. I believe that
> can cause problems in the HTTPConnection's internal state machine, although
> I don't know if it matters here.

Again, in general in couchdb.http, or in the specific case of
_changes? (I'm guessing not the latter, since continuous changes at
least seems to GET only.)

Cheers,

Dirkjan

Dirkjan Ochtman

unread,
Feb 1, 2010, 7:17:13 AM2/1/10
to couchdb...@googlegroups.com
On Sat, Jan 30, 2010 at 19:21, Matt Goodall <matt.g...@gmail.com> wrote:
> 2. Caching is now in-memory. However, I didn't notice anything that manages
> the size of the cache. That could lead to some nasty memory increases.

This should be fixed in 3aa33bd89426. I picked some more-or-less
random values for the cache size, if anyone has any comments as to
what are better values that might be useful.

Cheers,

Dirkjan

Dirkjan Ochtman

unread,
Feb 1, 2010, 3:27:19 PM2/1/10
to couchdb...@googlegroups.com, cml...@apache.org
On Fri, Jan 29, 2010 at 16:24, Dirkjan Ochtman <dir...@ochtman.nl> wrote:
> On Sun, Jan 24, 2010 at 12:57, Dirkjan Ochtman <dir...@ochtman.nl> wrote:
>> I just pushed the change from schema to mapping. As for the Document
>> (and View?), maybe MappedDocument (and MappedView)?
>
> I still think it would be helpful to rename mapping.Document and
> mapping.View (to prevent confusion with the classes in
> couchdb.client), but I'm not sure about good names. I think
> MappedDocument is quite long for the intended purpose (parent class
> for all mappified classes). MappedView doesn't seem like a good name
> for what it does, maybe ViewMapper or ViewProperty is better? Who's up
> for some bikeshedding?

No one? I think I'll go with ViewField and MappedDoc if no one speaks up.

I think it'd be nice to release soon (and thereby up the ante on the
API promise), so it would be nice if more people chimed in here.

Cheers,

Dirkjan

Matt Goodall

unread,
Feb 2, 2010, 5:56:19 AM2/2/10
to couchdb...@googlegroups.com, cml...@apache.org
On 1 February 2010 20:27, Dirkjan Ochtman <dir...@ochtman.nl> wrote:
On Fri, Jan 29, 2010 at 16:24, Dirkjan Ochtman <dir...@ochtman.nl> wrote:
> On Sun, Jan 24, 2010 at 12:57, Dirkjan Ochtman <dir...@ochtman.nl> wrote:
>> I just pushed the change from schema to mapping. As for the Document
>> (and View?), maybe MappedDocument (and MappedView)?
>
> I still think it would be helpful to rename mapping.Document and
> mapping.View (to prevent confusion with the classes in
> couchdb.client), but I'm not sure about good names. I think
> MappedDocument is quite long for the intended purpose (parent class
> for all mappified classes). MappedView doesn't seem like a good name
> for what it does, maybe ViewMapper or ViewProperty is better? Who's up
> for some bikeshedding?

No one? I think I'll go with ViewField and MappedDoc if no one speaks up.

I don't care about the schema names, I don't use them.
 

I think it'd be nice to release soon (and thereby up the ante on the
API promise), so it would be nice if more people chimed in here.


Sorry, been busy.

I've got some follow up comments about the other stuff, i.e. continuous changes and the more general stuff. I'll try to post more very soon but I need to get some other stuff out of the way.

- Matt

Dirkjan Ochtman

unread,
Feb 2, 2010, 6:00:48 AM2/2/10
to couchdb...@googlegroups.com
On Tue, Feb 2, 2010 at 11:56, Matt Goodall <matt.g...@gmail.com> wrote:
> Sorry, been busy.
> I've got some follow up comments about the other stuff, i.e. continuous
> changes and the more general stuff. I'll try to post more very soon but I
> need to get some other stuff out of the way.

Actually you were the one who did provide comments, so I wasn't blaming you. :)

I'll wait for your feedback (and ping again in a few days or so?).

Cheers,

Dirkjan

Matt Goodall

unread,
Feb 9, 2010, 11:34:39 AM2/9/10
to couchdb...@googlegroups.com
Hi,

Sorry for the delay in getting back.

I've just been looking through the most recent set of commits you made to the continuous changes API. Sorry, but I think it's now even worse than my "Dirty evil hack" ;-). It's partly my fault - I should have committed some tests!

* ResponseBody.__iter__ only works for chunked responses. If someone tries to iterate a non-chunked response then it will end up eating real data as chunk headers. Need to check for a "Transfer-Encoding: chunked" header first, but see below.
* There's a CRLF, i.e. an empty chunk, after the final 0 chunk header but it's not being read at the moment. That's possibly leaving the connection in an invalid state for the next request on the socket.
* ResponseBody.__iter__ still isn't yielding lines. It now yields chunks. A chunk happens to be a line for a changes feed but that's not going to be true for other requests and CouchDB could start sending multiple lines in a chunk in the future.
* There's nothing in Database._changes that reads the remainder of the response so it's not returning the connection to the pool any more.

I can see two solutions from here:

1. Make ResponseBody.__iter__ yield lines and drive Database._changes from that. That might mean reading 1 byte at a time to detect EOL but the connection's socket should be SOCK_STREAM so it should be buffered by the OS making lots of 1-byte reads almost palatable.
2. Implement chunked response reading specifically for the changes API with the understanding that we may have to fix this at some point, i.e. if CouchDB changes its implementation.

Personally, I think option (1) is best, assuming there are no horrible performance costs.

I'm going to start writing that now, see if I can get it finished before I leave work. Should have a patch ready before long.

- Matt

Dirkjan Ochtman

unread,
Feb 9, 2010, 11:39:41 AM2/9/10
to couchdb...@googlegroups.com
On Tue, Feb 9, 2010 at 17:34, Matt Goodall <matt.g...@gmail.com> wrote:
> * ResponseBody.__iter__ only works for chunked responses. If someone tries
> to iterate a non-chunked response then it will end up eating real data as
> chunk headers. Need to check for a "Transfer-Encoding: chunked" header
> first, but see below.

Yes, this is by design. I think it should be an iterator over chunks.

> * There's a CRLF, i.e. an empty chunk, after the final 0 chunk header but
> it's not being read at the moment. That's possibly leaving the connection in
> an invalid state for the next request on the socket.

Okay, we should fix that.

> * ResponseBody.__iter__ still isn't yielding lines. It now yields chunks. A
> chunk happens to be a line for a changes feed but that's not going to be
> true for other requests and CouchDB could start sending multiple lines in a
> chunk in the future.

Right.

> * There's nothing in Database._changes that reads the remainder of the
> response so it's not returning the connection to the pool any more.

I think it should be right now, though this maybe requires more testing.

Cheers,

Dirkjan

Matt Goodall

unread,
Feb 9, 2010, 12:20:16 PM2/9/10
to couchdb...@googlegroups.com
On 9 February 2010 16:39, Dirkjan Ochtman <dir...@ochtman.nl> wrote:
On Tue, Feb 9, 2010 at 17:34, Matt Goodall <matt.g...@gmail.com> wrote:
> * ResponseBody.__iter__ only works for chunked responses. If someone tries
> to iterate a non-chunked response then it will end up eating real data as
> chunk headers. Need to check for a "Transfer-Encoding: chunked" header
> first, but see below.

Yes, this is by design. I think it should be an iterator over chunks.

OK, but I think that would be quite surprising to developers.

The body of a response is returned as either a StringIO or as a ResponseBody. StringIO has a read() method and is iterable too. StringIO's __iter__ yields lines, ResponseBody yields chunks. It's entirely up to CouchDB how it decides to send responses so you never know which version you'll get (or rather should not depend on it).
 

> * There's a CRLF, i.e. an empty chunk, after the final 0 chunk header but
> it's not being read at the moment. That's possibly leaving the connection in
> an invalid state for the next request on the socket.

Okay, we should fix that.

> * ResponseBody.__iter__ still isn't yielding lines. It now yields chunks. A
> chunk happens to be a line for a changes feed but that's not going to be
> true for other requests and CouchDB could start sending multiple lines in a
> chunk in the future.

Right.

Right, it's wrong IMHO ;-).
 

> * There's nothing in Database._changes that reads the remainder of the
> response so it's not returning the connection to the pool any more.

I think it should be right now, though this maybe requires more testing.

Not in the default branch. I have some tests for this now, I'll push them in a moment until we've decided how best to fix the chunked stuff.


 
- Matt

Matt Goodall

unread,
Feb 9, 2010, 12:25:33 PM2/9/10
to couchdb...@googlegroups.com
Attached is my version with a line-yielding ResponseBody. See what you think.

- Matt
changes.patch

Dirkjan Ochtman

unread,
Feb 9, 2010, 12:29:39 PM2/9/10
to couchdb...@googlegroups.com
On Tue, Feb 9, 2010 at 18:25, Matt Goodall <matt.g...@gmail.com> wrote:
> Attached is my version with a line-yielding ResponseBody. See what you
> think.

I don't much like reading byte-by-byte. I think the reading from the
socket should still be done chunk-by-chunk (this is chunked
transfer-encoding!), even if we yield lines.

Cheers,

Dirkjan

Matt Goodall

unread,
Feb 9, 2010, 2:10:42 PM2/9/10
to couchdb...@googlegroups.com

Ok, your call although it's not uncommon (see the socket module). I
think chunked encoding is more for the benefit of the sender, but
there's no reason the receiver can't take advantage of it too.

So, that presumably means we need to strip back the response data
object types to only have a read method for consistency?

- Matt

Dirkjan Ochtman

unread,
Feb 9, 2010, 2:36:17 PM2/9/10
to couchdb...@googlegroups.com
On Tue, Feb 9, 2010 at 20:10, Matt Goodall <matt.g...@gmail.com> wrote:
> Ok, your call although it's not uncommon (see the socket module). I think
> chunked encoding is more for the benefit of the sender, but there's no
> reason the receiver can't take advantage of it too.

Right. I don't see much reason *not* to take advantage of it, and it
seems like Python-level read(1)s would be slowing it down
significantly.

> So, that presumably means we need to strip back the response data object
> types to only have a read method for consistency?

Consistency with what, exactly? We could wrap each chunk iterated over
in a StringIO if you think that is better.

Cheers,

Dirkjan

Matt Goodall

unread,
Feb 9, 2010, 3:11:54 PM2/9/10
to couchdb...@googlegroups.com
Consistency between StringIO and ResponseBody - the two types used to represent the response content.

ResponseBody was clearly originally written to provide a similar API to StringIO: __iter__, read and close. __iter__ is now quite different in hg.

- Matt

Dirkjan Ochtman

unread,
Feb 9, 2010, 3:32:27 PM2/9/10
to couchdb...@googlegroups.com
On Tue, Feb 9, 2010 at 21:11, Matt Goodall <matt.g...@gmail.com> wrote:
> Consistency between StringIO and ResponseBody - the two types used to
> represent the response content.
> ResponseBody was clearly originally written to provide a similar API to
> StringIO: __iter__, read and close. __iter__ is now quite different in hg.

Hm, I'm not entirely sure I see this consistency. For one thing,
ResponseBody didn't have an __iter__ method prior to my initial
implementation of the feed=continuous support (i.e. in revision
4fd836).

Both before and after my changes, Session.request() may return a
ResponseBody(), a StringIO() or a dictionary representing a JSON
object. Before my changes, the ResponseBody had only read(size=None)
and close() methods. I only added an __iter__ method to it. We can
still have that iterate over lines if you think that improves API
consistency, though I'd like it to read a chunk at a time from the
socket.

Cheers,

Dirkjan

Matt Goodall

unread,
Feb 9, 2010, 7:16:22 PM2/9/10
to couchdb...@googlegroups.com
Agh, sorry ... I completely misread the commits. I checked out an old revision and saw __iter__ implemented but I obviously screwed up with hg somehow.

Apologies for the noise and confusion, I got hung up on what I thought was the original purpose of __iter__ and so was trying to keep the API in tact.

So, ok, how about renaming __iter__ to iter_chunks to avoid more confusion (iterating file-like things tends to yield lines) and fix up the test cases I committed? It might be a good idea to raise an error if it's not a chunked response, just in case.

- Matt

Dirkjan Ochtman

unread,
Feb 10, 2010, 3:03:15 AM2/10/10
to couchdb...@googlegroups.com
On Wed, Feb 10, 2010 at 01:16, Matt Goodall <matt.g...@gmail.com> wrote:
> Apologies for the noise and confusion, I got hung up on what I thought was
> the original purpose of __iter__ and so was trying to keep the API in tact.
> So, ok, how about renaming __iter__ to iter_chunks to avoid more confusion
> (iterating file-like things tends to yield lines) and fix up the test cases
> I committed? It might be a good idea to raise an error if it's not a chunked
> response, just in case.

Sounds good!

Cheers,

Dirkjan

Matt Goodall

unread,
Feb 10, 2010, 4:05:42 AM2/10/10
to couchdb...@googlegroups.com
On 9 February 2010 20:32, Dirkjan Ochtman <dir...@ochtman.nl> wrote:
Both before and after my changes, Session.request() may return a
ResponseBody(), a StringIO() or a dictionary representing a JSON
object.

In bed last night and thought ... wait, does that mean that an 'application/json' attachment is deserialized to a Python object instead of being left as bytes? Will check when I get to work but it looks like it from the code.

- Matt

Matt Goodall

unread,
Feb 10, 2010, 4:07:01 AM2/10/10
to couchdb...@googlegroups.com
Do you want me to take a look this morning?

- Matt

Dirkjan Ochtman

unread,
Feb 10, 2010, 4:28:45 AM2/10/10
to couchdb...@googlegroups.com
On Wed, Feb 10, 2010 at 10:05, Matt Goodall <matt.g...@gmail.com> wrote:
> In bed last night and thought ... wait, does that mean that an
> 'application/json' attachment is deserialized to a Python object instead of
> being left as bytes? Will check when I get to work but it looks like it from
> the code.

I don't think it does that for exceptions, but I'm not sure.

On Wed, Feb 10, 2010 at 10:07, Matt Goodall <matt.g...@gmail.com> wrote:
> Do you want me to take a look this morning?

Sure, except I also had some other ideas I wanted to experiment
with... I came to the conclusion that we probably don't want to
iterate over chunks anyway (they're an impl detail, we should iterate
over lines), so in that case maybe it would be nice to have two
methods, _chunkiter and _lineiter on the ResponseBody (both
generators), and have __iter__ return either depending on whether the
response is chunked (we pass in an extra chunked variable).

Maybe you could start by fixing the tests you committed? :)

Cheers,

Dirkjan

Matt Goodall

unread,
Feb 10, 2010, 7:35:21 AM2/10/10
to couchdb...@googlegroups.com
On 10 February 2010 09:28, Dirkjan Ochtman <dir...@ochtman.nl> wrote:
On Wed, Feb 10, 2010 at 10:05, Matt Goodall <matt.g...@gmail.com> wrote:
> In bed last night and thought ... wait, does that mean that an
> 'application/json' attachment is deserialized to a Python object instead of
> being left as bytes? Will check when I get to work but it looks like it from
> the code.

I don't think it does that for exceptions, but I'm not sure.

'application/json' attachments are getting parsed ... oops. I have a test so I'll create a ticket and add the test to it later.
 

On Wed, Feb 10, 2010 at 10:07, Matt Goodall <matt.g...@gmail.com> wrote:
> Do you want me to take a look this morning?

Sure, except I also had some other ideas I wanted to experiment
with... I came to the conclusion that we probably don't want to
iterate over chunks anyway (they're an impl detail, we should iterate
over lines),

Yep, I may have mentioned that a couple of times already ;-).
 
so in that case maybe it would be nice to have two
methods, _chunkiter and _lineiter on the ResponseBody (both
generators), and have __iter__ return either depending on whether the
response is chunked (we pass in an extra chunked variable).

Maybe you could start by fixing the tests you committed? :)

Done, will push in a moment.

- Matt

Dirkjan Ochtman

unread,
Feb 11, 2010, 9:34:02 AM2/11/10
to couchdb...@googlegroups.com
On Wed, Feb 10, 2010 at 13:35, Matt Goodall <matt.g...@gmail.com> wrote:
> Yep, I may have mentioned that a couple of times already ;-).

Right, you managed to convert me. See my latest commit. I thought the
comments in Database._changes() were rather too chatty, so I condensed
them a bit. The idea for ResponseBody is that, once we have a use for
non-chunked iteration over lines, we can add that in, but since we
don't currently have any users, I figured I'd leave that out for now.

If you want to fix issue 114, I'd suggest the fix is for that too all
move to the client module.

Cheers,

Dirkjan

Matt Goodall

unread,
Feb 11, 2010, 9:45:37 AM2/11/10
to couchdb...@googlegroups.com
On 11 February 2010 14:34, Dirkjan Ochtman <dir...@ochtman.nl> wrote:
On Wed, Feb 10, 2010 at 13:35, Matt Goodall <matt.g...@gmail.com> wrote:
> Yep, I may have mentioned that a couple of times already ;-).

Right, you managed to convert me. See my latest commit. I thought the
comments in Database._changes() were rather too chatty, so I condensed
them a bit.

Fair enough.
 
The idea for ResponseBody is that, once we have a use for
non-chunked iteration over lines, we can add that in, but since we
don't currently have any users, I figured I'd leave that out for now.

Agreed.


If you want to fix issue 114, I'd suggest the fix is for that too all
move to the client module.

Yep, I'll take a look. It'll be tedious work but shouldn't be difficult.

- Matt

Matt Goodall

unread,
Feb 11, 2010, 10:06:25 AM2/11/10
to couchdb...@googlegroups.com
On 11 February 2010 14:45, Matt Goodall <matt.g...@gmail.com> wrote:
On 11 February 2010 14:34, Dirkjan Ochtman <dir...@ochtman.nl> wrote:
 
The idea for ResponseBody is that, once we have a use for
non-chunked iteration over lines, we can add that in, but since we
don't currently have any users, I figured I'd leave that out for now.

Agreed.

Actually, why not just say a response body only has the following API:

    def read(size=None)
    def close()

StringIO and ResponseBody already support that but I think we need to change the attachment API a little (I'll post a ticket and test about this) and a simplified API would make that easier to implement.

Also, as you pointed out, reading a byte at a time is not nice and the only other way I can think of is to muck around with async sockets.

- Matt

Dirkjan Ochtman

unread,
Feb 11, 2010, 10:13:54 AM2/11/10
to couchdb...@googlegroups.com
On Thu, Feb 11, 2010 at 16:06, Matt Goodall <matt.g...@gmail.com> wrote:
> Actually, why not just say a response body only has the following API:
>     def read(size=None)
>     def close()

You mean, keep being iterable out of the stated API?

> StringIO and ResponseBody already support that but I think we need to change
> the attachment API a little (I'll post a ticket and test about this) and a
> simplified API would make that easier to implement.
> Also, as you pointed out, reading a byte at a time is not nice and the only
> other way I can think of is to muck around with async sockets.

I'm not sure exactly what you are talking about here? (As in, what is
this designed to solve?)

Cheers,

Dirkjan

Matt Goodall

unread,
Feb 11, 2010, 10:39:39 AM2/11/10
to couchdb...@googlegroups.com
On 11 February 2010 15:13, Dirkjan Ochtman <dir...@ochtman.nl> wrote:
On Thu, Feb 11, 2010 at 16:06, Matt Goodall <matt.g...@gmail.com> wrote:
> Actually, why not just say a response body only has the following API:
>     def read(size=None)
>     def close()

You mean, keep being iterable out of the stated API?

Yes, and keep it as an underscore method on ResponseBody as you previously suggested, e.g _iterlines. It's only the changes API that needs is so far anyway.
 

> StringIO and ResponseBody already support that but I think we need to change
> the attachment API a little (I'll post a ticket and test about this) and a
> simplified API would make that easier to implement.
> Also, as you pointed out, reading a byte at a time is not nice and the only
> other way I can think of is to muck around with async sockets.

I'm not sure exactly what you are talking about here? (As in, what is
this designed to solve?)

I was hoping to post a ticket to explain but ok ... Database.get_attachment() currently only returns the response body, there's no way to get the content type, size, etc which is kind of important information.

I was thinking that get_attachment could return some sort of Attachment object with a content_type attribute as well as whatever methods a response body provides. If Attachment only has to include read() and close() methods then it should be able to wrap itself around a StringIO or a ResponseBody easily.

The "byte at a time" bit was just about supporting __iter__ for a non-chunked ResponseBody. Perhaps I should stop thinking aloud in emails and concentrate on the work I have to get done here ;-).

Hope that makes a bit more sense now. I will post a ticket about get_attachment so we can discuss properly.

- Matt

Dirkjan Ochtman

unread,
Feb 11, 2010, 11:12:31 AM2/11/10
to couchdb...@googlegroups.com
On Thu, Feb 11, 2010 at 16:39, Matt Goodall <matt.g...@gmail.com> wrote:
> Yes, and keep it as an underscore method on ResponseBody as you previously
> suggested, e.g _iterlines. It's only the changes API that needs is so far
> anyway.

[snip]

> I was hoping to post a ticket to explain but ok ...
> Database.get_attachment() currently only returns the response body, there's
> no way to get the content type, size, etc which is kind of important
> information.

ResponseBody has .resp.msg, which has all of this information. Both of
the above seem like you're overthinking the API. The current version
seems quite simple & transparent. By making it more consistent, you
also significantly increase API surface, which doesn't help much, IMO.

> I was thinking that get_attachment could return some sort of Attachment
> object with a content_type attribute as well as whatever methods a response
> body provides. If Attachment only has to include read() and close() methods
> then it should be able to wrap itself around a StringIO or a ResponseBody
> easily.
> The "byte at a time" bit was just about supporting __iter__ for a
> non-chunked ResponseBody. Perhaps I should stop thinking aloud in emails and
> concentrate on the work I have to get done here ;-).

We could do an iter-over-lines for non-chunked ResponseBodies, but I'm
not sure where that would be useful.

> Hope that makes a bit more sense now. I will post a ticket about
> get_attachment so we can discuss properly.

I think mailing lists are more suitable for discussions, actually.

Cheers,

Dirkjan

Reply all
Reply to author
Forward
0 new messages