Before the holiday I was speaking a bit with Christopher in IRC about
what he was envisioning for couchdb-python's next release. I'd like to
start moving on some things again, but it would be nice to move the
stuff we agreed on out of the way first.
But of course I didn't log that conversation at the time and I don't
think the channel has public logfiles, so I forgot a bunch of the
things we wanted to do. I think the plan was as such:
- Break compatibility by renaming couchdb.schema to something more
sensible, and renaming some other classes
- Something in the Database needs to be changed from "session" to
"http" or the other way around, I forget what it was
- Merge the httplib branch into the default branch
- Fix some API consistencies (method calls vs. properties -> should
all become method calls)
After this, the idea was to declare the API somewhat-more stable by
tagging 0.7 as a beta-quality release (after making the move to this
release painful for some people by changing the API; at least we're
making all the changes at once!).
So, what are other thoughts for where we should be going? Some things
I still want to look at is continuous changes via some generator API,
and the managed API that was talked about on this list a long time
ago.
Cheers,
Dirkjan
Yeah, in retrospect I think the "schema" name was a really bad choice. I'd suggest we move to "mapping", as in mapping JSON to Python objects and back. So the couchdb.schema module would become couchdb.mapping, and the "Schema" class would be renamed to "Mapping". Also, it'd be nice to rename the couchdb.schema.Document class so that it isn't as easily confused with couchdb.client.Document. Maybe DocumentMapping? Not sure what to do with couchdb.schema.View.
> - Something in the Database needs to be changed from "session" to
> "http" or the other way around, I forget what it was
That's just a change that already happened httplib branch. Needs to be documented together with other backwards incompatible changes.
> - Merge the httplib branch into the default branch
> - Fix some API consistencies (method calls vs. properties -> should
> all become method calls)
>
> After this, the idea was to declare the API somewhat-more stable by
> tagging 0.7 as a beta-quality release (after making the move to this
> release painful for some people by changing the API; at least we're
> making all the changes at once!).
Thanks,
--
Christopher Lenz
cmlenz at gmx.de
http://www.cmlenz.net/
I just pushed the change from schema to mapping. As for the Document
(and View?), maybe MappedDocument (and MappedView)? I also fixed the
property vs. method inconsistency.
> That's just a change that already happened httplib branch. Needs to be documented together with other backwards incompatible changes.
What kind of incompatible changes are there, other than the three-way
instead of two-way return value? I just tried merging httplib to
default, but I ended up with a test failure that wasn't very obvious:
======================================================================
ERROR: test_view_compaction (couchdb.tests.client.DatabaseTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/djc/src/couchdb-python/couchdb/tests/client.py", line
95, in tearDown
self.server.delete('python-tests')
File "/home/djc/src/couchdb-python/couchdb/client.py", line 196, in delete
del self[name]
File "/home/djc/src/couchdb-python/couchdb/client.py", line 131, in
__delitem__
self.resource.delete(validate_dbname(name))
File "/home/djc/src/couchdb-python/couchdb/http.py", line 340, in delete
return self._request('DELETE', path, headers=headers, **params)
File "/home/djc/src/couchdb-python/couchdb/http.py", line 360, in _request
credentials=self.credentials)
File "/home/djc/src/couchdb-python/couchdb/http.py", line 189, in request
resp = _try_request()
File "/home/djc/src/couchdb-python/couchdb/http.py", line 155, in _try_request
conn.endheaders()
File "/usr/lib/python2.6/httplib.py", line 892, in endheaders
self._send_output()
File "/usr/lib/python2.6/httplib.py", line 764, in _send_output
self.send(msg)
File "/usr/lib/python2.6/httplib.py", line 743, in send
self.sock.sendall(str)
File "<string>", line 1, in sendall
error: [Errno 104] Connection reset by peer
----------------------------------------------------------------------
Ran 88 tests in 15.307s
FAILED (errors=1)
(I didn't see this on any of the branches before merging.)
Cheers,
Dirkjan
I just merged the httplib branch into default after finding out what
the problem with test was (namely me being stupid about implementing
view compaction). I've also implemented an API for the continuous
changes feed, which was quite easy to build on top couchdb.http (but
required one ugly hack resulting from what I consider a bug in CouchDB
that will hopefully be solved). Some review of the current state of
the default branch would be quite useful, since I'd like to push a
release out soonish.
On Sun, Jan 24, 2010 at 12:57, Dirkjan Ochtman <dir...@ochtman.nl> wrote:
> I just pushed the change from schema to mapping. As for the Document
> (and View?), maybe MappedDocument (and MappedView)?
I still think it would be helpful to rename mapping.Document and
mapping.View (to prevent confusion with the classes in
couchdb.client), but I'm not sure about good names. I think
MappedDocument is quite long for the intended purpose (parent class
for all mappified classes). MappedView doesn't seem like a good name
for what it does, maybe ViewMapper or ViewProperty is better? Who's up
for some bikeshedding?
Cheers,
Dirkjan
More suggestions: ViewField, MappedDoc.
Cheers,
Dirkjan
Hi all,
I just merged the httplib branch into default after finding out what
the problem with test was (namely me being stupid about implementing
view compaction). I've also implemented an API for the continuous
changes feed, which was quite easy to build on top couchdb.http (but
required one ugly hack resulting from what I consider a bug in CouchDB
that will hopefully be solved).
Some review of the current state of
the default branch would be quite useful, since I'd like to push a
release out soonish.
I wondered what those were yesterday, but figured I didn't need them
anyway. You provided me with enough clue to look up the details of
chunked transfers, and I think the current implementation (in
4a8ece20c421) should do much better.
> I've also pushed a couple of fixes (see commits) to the continuous changes
> feed handling, hope they're ok for others.
Yeah, I figured I'd just get my minimal use case working, but
obviously glossed over some details (my test database got a new doc
each 60s, so I never saw a last_seq come in with the default
settings). Thanks for fixing it up!
> I have a couple of things I'm a bit suspicious of but haven't had time to
> look into them more yet:
> 1. ResponseBody.__iter__ returns quite different data from
> ResponseBody.read, e.g. __iter__ returns chunked encoding headers. I'm
> guessing __iter__ is supposed to yield a line of actual response data at a
> time.
I think this has been fixed now, right?
> 2. Caching is now in-memory. However, I didn't notice anything that manages
> the size of the cache. That could lead to some nasty memory increases.
Do you mean for chunked transfers, or just in general?
> 3. I think the response is being read() after a HEAD request. I believe that
> can cause problems in the HTTPConnection's internal state machine, although
> I don't know if it matters here.
Again, in general in couchdb.http, or in the specific case of
_changes? (I'm guessing not the latter, since continuous changes at
least seems to GET only.)
Cheers,
Dirkjan
This should be fixed in 3aa33bd89426. I picked some more-or-less
random values for the cache size, if anyone has any comments as to
what are better values that might be useful.
Cheers,
Dirkjan
No one? I think I'll go with ViewField and MappedDoc if no one speaks up.
I think it'd be nice to release soon (and thereby up the ante on the
API promise), so it would be nice if more people chimed in here.
Cheers,
Dirkjan
On Fri, Jan 29, 2010 at 16:24, Dirkjan Ochtman <dir...@ochtman.nl> wrote:No one? I think I'll go with ViewField and MappedDoc if no one speaks up.
> On Sun, Jan 24, 2010 at 12:57, Dirkjan Ochtman <dir...@ochtman.nl> wrote:
>> I just pushed the change from schema to mapping. As for the Document
>> (and View?), maybe MappedDocument (and MappedView)?
>
> I still think it would be helpful to rename mapping.Document and
> mapping.View (to prevent confusion with the classes in
> couchdb.client), but I'm not sure about good names. I think
> MappedDocument is quite long for the intended purpose (parent class
> for all mappified classes). MappedView doesn't seem like a good name
> for what it does, maybe ViewMapper or ViewProperty is better? Who's up
> for some bikeshedding?
I think it'd be nice to release soon (and thereby up the ante on the
API promise), so it would be nice if more people chimed in here.
Actually you were the one who did provide comments, so I wasn't blaming you. :)
I'll wait for your feedback (and ping again in a few days or so?).
Cheers,
Dirkjan
Yes, this is by design. I think it should be an iterator over chunks.
> * There's a CRLF, i.e. an empty chunk, after the final 0 chunk header but
> it's not being read at the moment. That's possibly leaving the connection in
> an invalid state for the next request on the socket.
Okay, we should fix that.
> * ResponseBody.__iter__ still isn't yielding lines. It now yields chunks. A
> chunk happens to be a line for a changes feed but that's not going to be
> true for other requests and CouchDB could start sending multiple lines in a
> chunk in the future.
Right.
> * There's nothing in Database._changes that reads the remainder of the
> response so it's not returning the connection to the pool any more.
I think it should be right now, though this maybe requires more testing.
Cheers,
Dirkjan
On Tue, Feb 9, 2010 at 17:34, Matt Goodall <matt.g...@gmail.com> wrote:Yes, this is by design. I think it should be an iterator over chunks.
> * ResponseBody.__iter__ only works for chunked responses. If someone tries
> to iterate a non-chunked response then it will end up eating real data as
> chunk headers. Need to check for a "Transfer-Encoding: chunked" header
> first, but see below.
Okay, we should fix that.
> * There's a CRLF, i.e. an empty chunk, after the final 0 chunk header but
> it's not being read at the moment. That's possibly leaving the connection in
> an invalid state for the next request on the socket.
Right.
> * ResponseBody.__iter__ still isn't yielding lines. It now yields chunks. A
> chunk happens to be a line for a changes feed but that's not going to be
> true for other requests and CouchDB could start sending multiple lines in a
> chunk in the future.
I think it should be right now, though this maybe requires more testing.
> * There's nothing in Database._changes that reads the remainder of the
> response so it's not returning the connection to the pool any more.
I don't much like reading byte-by-byte. I think the reading from the
socket should still be done chunk-by-chunk (this is chunked
transfer-encoding!), even if we yield lines.
Cheers,
Dirkjan
Ok, your call although it's not uncommon (see the socket module). I
think chunked encoding is more for the benefit of the sender, but
there's no reason the receiver can't take advantage of it too.
So, that presumably means we need to strip back the response data
object types to only have a read method for consistency?
- Matt
Right. I don't see much reason *not* to take advantage of it, and it
seems like Python-level read(1)s would be slowing it down
significantly.
> So, that presumably means we need to strip back the response data object
> types to only have a read method for consistency?
Consistency with what, exactly? We could wrap each chunk iterated over
in a StringIO if you think that is better.
Cheers,
Dirkjan
Hm, I'm not entirely sure I see this consistency. For one thing,
ResponseBody didn't have an __iter__ method prior to my initial
implementation of the feed=continuous support (i.e. in revision
4fd836).
Both before and after my changes, Session.request() may return a
ResponseBody(), a StringIO() or a dictionary representing a JSON
object. Before my changes, the ResponseBody had only read(size=None)
and close() methods. I only added an __iter__ method to it. We can
still have that iterate over lines if you think that improves API
consistency, though I'd like it to read a chunk at a time from the
socket.
Cheers,
Dirkjan
Sounds good!
Cheers,
Dirkjan
Both before and after my changes, Session.request() may return aResponseBody(), a StringIO() or a dictionary representing a JSON
object.
I don't think it does that for exceptions, but I'm not sure.
On Wed, Feb 10, 2010 at 10:07, Matt Goodall <matt.g...@gmail.com> wrote:
> Do you want me to take a look this morning?
Sure, except I also had some other ideas I wanted to experiment
with... I came to the conclusion that we probably don't want to
iterate over chunks anyway (they're an impl detail, we should iterate
over lines), so in that case maybe it would be nice to have two
methods, _chunkiter and _lineiter on the ResponseBody (both
generators), and have __iter__ return either depending on whether the
response is chunked (we pass in an extra chunked variable).
Maybe you could start by fixing the tests you committed? :)
Cheers,
Dirkjan
On Wed, Feb 10, 2010 at 10:05, Matt Goodall <matt.g...@gmail.com> wrote:I don't think it does that for exceptions, but I'm not sure.
> In bed last night and thought ... wait, does that mean that an
> 'application/json' attachment is deserialized to a Python object instead of
> being left as bytes? Will check when I get to work but it looks like it from
> the code.
> Do you want me to take a look this morning?Sure, except I also had some other ideas I wanted to experiment
with... I came to the conclusion that we probably don't want to
iterate over chunks anyway (they're an impl detail, we should iterate
over lines),
so in that case maybe it would be nice to have two
methods, _chunkiter and _lineiter on the ResponseBody (both
generators), and have __iter__ return either depending on whether the
response is chunked (we pass in an extra chunked variable).
Maybe you could start by fixing the tests you committed? :)
Right, you managed to convert me. See my latest commit. I thought the
comments in Database._changes() were rather too chatty, so I condensed
them a bit. The idea for ResponseBody is that, once we have a use for
non-chunked iteration over lines, we can add that in, but since we
don't currently have any users, I figured I'd leave that out for now.
If you want to fix issue 114, I'd suggest the fix is for that too all
move to the client module.
Cheers,
Dirkjan
On Wed, Feb 10, 2010 at 13:35, Matt Goodall <matt.g...@gmail.com> wrote:Right, you managed to convert me. See my latest commit. I thought the
> Yep, I may have mentioned that a couple of times already ;-).
comments in Database._changes() were rather too chatty, so I condensed
them a bit.
The idea for ResponseBody is that, once we have a use for
non-chunked iteration over lines, we can add that in, but since we
don't currently have any users, I figured I'd leave that out for now.
If you want to fix issue 114, I'd suggest the fix is for that too all
move to the client module.
On 11 February 2010 14:34, Dirkjan Ochtman <dir...@ochtman.nl> wrote:
The idea for ResponseBody is that, once we have a use fornon-chunked iteration over lines, we can add that in, but since we
don't currently have any users, I figured I'd leave that out for now.Agreed.
You mean, keep being iterable out of the stated API?
> StringIO and ResponseBody already support that but I think we need to change
> the attachment API a little (I'll post a ticket and test about this) and a
> simplified API would make that easier to implement.
> Also, as you pointed out, reading a byte at a time is not nice and the only
> other way I can think of is to muck around with async sockets.
I'm not sure exactly what you are talking about here? (As in, what is
this designed to solve?)
Cheers,
Dirkjan
On Thu, Feb 11, 2010 at 16:06, Matt Goodall <matt.g...@gmail.com> wrote:You mean, keep being iterable out of the stated API?
> Actually, why not just say a response body only has the following API:
> def read(size=None)
> def close()
I'm not sure exactly what you are talking about here? (As in, what is
> StringIO and ResponseBody already support that but I think we need to change
> the attachment API a little (I'll post a ticket and test about this) and a
> simplified API would make that easier to implement.
> Also, as you pointed out, reading a byte at a time is not nice and the only
> other way I can think of is to muck around with async sockets.
this designed to solve?)
[snip]
> I was hoping to post a ticket to explain but ok ...
> Database.get_attachment() currently only returns the response body, there's
> no way to get the content type, size, etc which is kind of important
> information.
ResponseBody has .resp.msg, which has all of this information. Both of
the above seem like you're overthinking the API. The current version
seems quite simple & transparent. By making it more consistent, you
also significantly increase API surface, which doesn't help much, IMO.
> I was thinking that get_attachment could return some sort of Attachment
> object with a content_type attribute as well as whatever methods a response
> body provides. If Attachment only has to include read() and close() methods
> then it should be able to wrap itself around a StringIO or a ResponseBody
> easily.
> The "byte at a time" bit was just about supporting __iter__ for a
> non-chunked ResponseBody. Perhaps I should stop thinking aloud in emails and
> concentrate on the work I have to get done here ;-).
We could do an iter-over-lines for non-chunked ResponseBodies, but I'm
not sure where that would be useful.
> Hope that makes a bit more sense now. I will post a ticket about
> get_attachment so we can discuss properly.
I think mailing lists are more suitable for discussions, actually.
Cheers,
Dirkjan