Issue 146 in couchdb-python: Up-to-dated view server

121 views
Skip to first unread message

couchdb...@googlecode.com

unread,
Aug 16, 2010, 10:20:14 AM8/16/10
to couchdb...@googlegroups.com
Status: New
Owner: ----
Labels: Type-Defect Priority-Medium

New issue 146 by kxepal: Up-to-dated view server
http://code.google.com/p/couchdb-python/issues/detail?id=146

This is port of javascript view server for couchdb server 0.10+
Support of shows, lists, filters and validate_doc_updates for pythonic
views included.

Due to currently there is no couchdb-python API to manipulate this
functions n, I didn't wrote any tests, but it passed all tutorial examples.

Attachments:
view.py 20.8 KB

couchdb...@googlecode.com

unread,
Aug 16, 2010, 10:37:25 AM8/16/10
to couchdb...@googlegroups.com
Updates:
Labels: -Type-Defect Type-Enhancement

Comment #1 on issue 146 by djc.ocht...@gmail.com: Up-to-dated view server
http://code.google.com/p/couchdb-python/issues/detail?id=146

Did you write this yourself? Is it a port of the JavaScript view server, or
did you find some documentation on the view protocol?

I definitely want an updated view server, but I prefer that we only add
something for which we have good tests, so that we can prevent it from
going stale again. Would you be in a position to write tests for this? Some
design notes on the protocol would likely also be helpful.

If we decide to take this, I also have some code/style nits.

couchdb...@googlecode.com

unread,
Aug 16, 2010, 10:59:56 AM8/16/10
to couchdb...@googlegroups.com

Comment #2 on issue 146 by kxepal: Up-to-dated view server
http://code.google.com/p/couchdb-python/issues/detail?id=146

It's javascript view server ported by me. I've going by only one thing - to
make view server maximum compatible with js one, because it's always
up-to-dated and have released first. So as close pythonic view server will
be to it so easy support will be.

Shame on me, I've thought for some unknown reason that view server must be
tested via couchdb-python API - just have look at tests/view.py. I'll make
them for tomorrow.

couchdb...@googlecode.com

unread,
Aug 19, 2010, 9:38:33 AM8/19/10
to couchdb...@googlegroups.com

Comment #3 on issue 146 by kxepal: Up-to-dated view server
http://code.google.com/p/couchdb-python/issues/detail?id=146

I've found a little problems due passing official view server tests, but
now all fine.

I'll make better exceptions handling + crush tests later - I really dont
like this forest of try..except, but couldnt invent something better now -
need some time to think about it. So this is just check point(:

In additional to javascript view-server features I've implemented two
behaviors:
- sealed document: changing document in map function makes no sense for
other map functions within single view.
- any pythonic exception will crush view server while javascript view
server allow crushing only on fatal errors - I dont know which ones python
have.
Everything else seems same.

Attachments:
view.py 24.3 KB
view.py 19.1 KB

couchdb...@googlecode.com

unread,
Aug 19, 2010, 9:43:38 AM8/19/10
to couchdb...@googlegroups.com

Comment #4 on issue 146 by kxepal: Up-to-dated view server
http://code.google.com/p/couchdb-python/issues/detail?id=146

oops, forgot to remove debug handler from view server.

Attachments:
view.py 24.2 KB

couchdb...@googlecode.com

unread,
Sep 3, 2010, 7:32:26 AM9/3/10
to couchdb...@googlegroups.com

Comment #5 on issue 146 by kxepal: Up-to-dated view server
http://code.google.com/p/couchdb-python/issues/detail?id=146

Ok, here is updated view server. Major changes:
- support any couchdb server since 0.9 version. By default view server
works in mode of compatibility with latest couchdb server version, just run
it with --couchdb-version key, e.g. view.py --couchdb-version=0.10, to make
it work well for 0.10 couchdb server.
- Tests are included for each supported version - all of them had been
ported from javascript view server.
- Assertion error within validate_doc_update function doesn't count as
Fatal like any other pythonic exception and will be wrapped as Forbidden
error
- "Reduce output must shirnk more rapidly" error now may be occured
- More verbose debug logging

Also I have to add function versionizing decorator to split their behavior
for each couchdb version. I thinks it will be useful in future to keep
legacy support without serious code rewriting.

Attachments:
view.py 32.3 KB
test.py 31.2 KB
testutil.py 3.4 KB
util.py 1.9 KB

couchdb...@googlecode.com

unread,
Sep 13, 2010, 8:10:35 AM9/13/10
to couchdb...@googlegroups.com

Comment #6 on issue 146 by kxepal: Up-to-dated view server
http://code.google.com/p/couchdb-python/issues/detail?id=146

Changes:
- fixed compatibility for python-2.4+
- added more verbose debug output

I've removed old attachments, because they are not actual for now.

Attachments:
view.py 33.1 KB
util.py 1.9 KB
test.py 31.1 KB
testutil.py 3.4 KB

couchdb...@googlecode.com

unread,
Sep 19, 2010, 9:19:58 AM9/19/10
to couchdb...@googlegroups.com

Comment #7 on issue 146 by djc.ocht...@gmail.com: Up-to-dated view server
http://code.google.com/p/couchdb-python/issues/detail?id=146

I'm going to need to find a solid chunk of time to review this, which might
take a while, but I'd really like to take this for the next release...

couchdb...@googlecode.com

unread,
Dec 22, 2010, 4:21:22 AM12/22/10
to couchdb...@googlegroups.com

Comment #8 on issue 146 by djc.ocht...@gmail.com: Up-to-dated view server
http://code.google.com/p/couchdb-python/issues/detail?id=146

Issue 140 has been merged into this issue.

couchdb...@googlecode.com

unread,
Feb 12, 2011, 2:08:51 PM2/12/11
to couchdb...@googlegroups.com

Comment #9 on issue 146 by kxepal: Up-to-dated view server
http://code.google.com/p/couchdb-python/issues/detail?id=146

I've fixed issue #163 in this view server
http://code.google.com/r/kxepal-couchdb-python-featured/source/detail?r=e1448890d2223e0321f96459a53c1757dc5b9662
(just have not seen any reasons to attach once again all files for several
small changes)

In over way it's ready for 1.0.2 since main change in js view server was in
sealing documents for map func, but this feature is done already.

I will port sofa and tapirwiki for this view server within next two weeks,
so this made a great challenge for him and may be more fixed will come if
I've missed something.

couchdb...@googlecode.com

unread,
Mar 22, 2011, 7:07:02 AM3/22/11
to couchdb...@googlegroups.com

Comment #10 on issue 146 by kxepal: Up-to-dated view server
http://code.google.com/p/couchdb-python/issues/detail?id=146

I've found one thing missed - require function to have some modular
application within document design. But there are some questions about it:
- should it work as in javascript view-server: wrap some abstract code and
return export values? Would be better to implement python-like import?
- should it support eggs? I think it should, but I have no idea how to
import eggs inline without saving them on disk. This could be a problem for
application hosters.
any other behavior suggested?

couchdb...@googlecode.com

unread,
Mar 22, 2011, 11:31:38 AM3/22/11
to couchdb...@googlegroups.com

Comment #11 on issue 146 by extempor...@gmail.com: Up-to-dated view server
http://code.google.com/p/couchdb-python/issues/detail?id=146

What I'm doing to handle some efficiency issues with import:

def fun():
import datetime
... other imports ...
def fun(doc):
...
delta = (datetime.strptime(doc[...]) - datetime.strptime(doc[...])).days
...
return fun
fun = fun()

What really ought to happen is that the view server should go through each
variable in the exec's locals and check the __module__. If __module__
exists, but is None, and the variable points to a callable, then use that
as the map/reduce func, and error out if more than one is found. This would
be backwards compatible with existing view functions, but would make it so
closures are not necessary.

I think the eggs deployment issue has been tackled many times before.
Importing eggs inline would obliterate responsiveness. Does couch time you
out if you take to long? I'd be concerned that it would/should. If an
application hoster supports python, then sooner or later they'd need to
come up with a solution to handle 3rd party software since, frankly, the
power of using python as a view server doesn't just lie in "it looks nice"
and "it has yield".

To meet couch's same-code-same-result requirements (no side effects), we
would maybe have to mark imports with python and module version strings,
and push that back into couch. For example, 'import mythirdpartyegg' would
then append '#mythirdpartegg py cpython-2.6.5 mod 3.11r7112' to the end of
the eval string, one per unique detected module. Any time you do any module
upgrades, you can just delete the version-marking comments out of the view
func manually, and couch will regenerate it.

However, most modules *tend* to be stable enough API-wise that this isn't a
problem. If there were any behavior altering bugs/changes to the code, an
administrator could achieve this manually.

couchdb...@googlecode.com

unread,
Mar 23, 2011, 4:45:30 PM3/23/11
to couchdb...@googlegroups.com

Comment #12 on issue 146 by kxepal: Up-to-dated view server
http://code.google.com/p/couchdb-python/issues/detail?id=146

Several questions, could I?
1) what the reason of such wrapper against:


def fun(doc):
...
delta = (datetime.strptime(doc[...]) - datetime.strptime(doc[...])).days
...

import datetime

this is not very pythonic to place imports below, but datetime will be
tried to import only once. However, this style could produce another
problem: views must have not any state and any dependence from source which
could be changed later.

2) Hoster could provide 3rd party modules, but it couldn't provide all
versions of each template engine, for example. May be you needs trunk
jinja2 with you own patches, who knows? So idea to create fully portable
pythonic couchapp will be failed.

3) Is preprocessors statements really good idea? I saw them in couchapp,
but they have been used only for declaration, not within document design
nor view server.

As intermediate result, this is implementation of require function as is it
works for javascript view server:
http://code.google.com/r/kxepal-couchdb-python-featured/source/detail?r=0b6625db473e74a83df7e9a339899a3c318f7b80

I still need to finish some details, so attachments with new version of
view server will be later. Sorry, Dirkjan, it seems you to have revise it
once again, but I'll include documentation for each vital function and more
tests to make process more easy just(:

couchdb...@googlecode.com

unread,
Mar 24, 2011, 2:28:05 PM3/24/11
to couchdb...@googlegroups.com

Comment #13 on issue 146 by extempor...@gmail.com: Up-to-dated view server
http://code.google.com/p/couchdb-python/issues/detail?id=146

You *could* put imports after the inner func that needs them, and python's
scoping rules would resolve variable lookups, but I agree, it's not
Pythonic. Closures themselves are not very Pythonic, either. Anyways, why
not put imports first, as per my suggestion?

The whole reason for using a closure is to avoid the performance penalty of
repeating the imports for each document. If a function needed k imports and
there are n documents in a couch database, then, compared to the closure
technique shown above, there'd be n*k redundant imports taking place, which
is very slow (python doesn't re-import the module, but there is overhead
involved, which can be significant).

See:
http://wiki.python.org/moin/PythonSpeed/PerformanceTips#Import_Statement_Overhead

I disagree with you on the severity of Couch's "no side-effects"
requirement for view/show/list functions. It's a matter of practicality,
not a matter of theory. Yes, Couch says that the same document passed to
the same code must produce the same output, no matter how many times it's
executed. If the document doesn't change, then the result doesn't change.

However, this mandate is only for data correctness. If the module you're
importing in a view function gets upgraded, and its behavior changes, but
your view function stays the same (so couch doesn't regenerate the view),
then all that happens is that your view will be incorrect. Couch won't
break (couch won't know, mind, or even care).

Also, using module imports don't count as "side effects" at all. Aside from
the random module, almost all modules (including 3rd party ones) are
stateless in their behavior. For example, it'd be fine to describe a shape
as a list of points in a couch document, and then use PIL to draw that to a
png image in a show function, or to use couch to store server access logs,
and then use pychart from a list function to generate an svg rendered line
graph of server traffic.

Furthermore, the above "same code, same doc, same result" requirement does
not apply across all time. For example, I could define a view function that
imported some module, and then change the behavior of the module. All I'd
have to do fix the consistency of the view would be to add a single space
to the end of the view function, save the design document to couch, and
then remove that space, and save the design document again. The code is
*exactly* the same as before, yet we side-stepped the
upgrade-changes-behavior issue, and caused couch to regenerate its views to
reflect the new behavior. And this process can be automated (alternately,
you could delete the views from the design doc, do a view cleanup, and then
reupload the original design document, which would require only one view
regeneration instead of two). As long as you regenerate your views when
needed, the "could be changed later" issue isn't a problem at all.

In general, hosting vendors are not ever going to support all versions of
all potential packages. They're either going to support only a handful of
popular modules (probably Django templating) and force long release cycles,
or they'll provide you with a few megabytes of space to upload your own
modules in your own private module path, or they'll not allow the use of
any kind of 3rd party modules (in which case you might as well use
Javascript).

couchdb...@googlecode.com

unread,
Mar 24, 2011, 3:41:06 PM3/24/11
to couchdb...@googlegroups.com

Comment #14 on issue 146 by kxepal: Up-to-dated view server
http://code.google.com/p/couchdb-python/issues/detail?id=146

At first till I don't forget, thanks you for detail reply(:

So about imports:
Allow to have them on top of design function as PEP told us I see is ok too
and this is much more intuitive behavior.
However, only map functions are cached: reduces/shows/lists/updates and
others are recompiling for each call, so all this import optimization
tricks are not so useful as they have to be.
I suppose that also would be better to extend preimported packages with
most popular and useful, which probably would be always imported, such as:
time, datetime, re, hashlib, math, random, itertools and others. But that
would be very implicitly feature without reading of docs and not only I
should decide what will be in this list.

No side effect is requirement for views only, afaik, because index is based
on view result only, while shows/lists just the way to show data in nicer
form.
There is one more thing to keep views as much stable and independent from
side effects as possible: if you have dozen millions documents last thing
that you would like to do is to rebuild view index, because this would take
hours. Yes, trick with secondary server and replacement view index is nice
idea, but you still have to lose your hours and you'll have service down
for a some time.
However, in 1.1.x branch was added feature to require view/lib stored
module for map functions.

Suddenly for couchdb view servers, hoster wouldn't provide some space for
your own modules because that would require some additional interface,
monitor to reload module set in realtime without forced restart of view
server and...this solution killing portable pythonic couchapps. Javascript
couchapps are awesome because you just have to type: "couchapp push" and
that's all - it works!

So, what resolution will be?

couchdb...@googlecode.com

unread,
Mar 26, 2011, 2:05:09 AM3/26/11
to couchdb...@googlegroups.com

Comment #15 on issue 146 by extempor...@gmail.com: Up-to-dated view server
http://code.google.com/p/couchdb-python/issues/detail?id=146

Can you link some documentation on that 1.1.x feature? That's something I'd
be *very* interested in learning about.

Show and list functions are supposed to be side effect free too. That way,
they can be cached by couch (though I'm unsure if couch itself actually
does that). I'm pretty sure couch does proper Etag handling of the
show/list results, so if you're expecting that you can generate a new
result each time someone accesses a doc via show, or a set of documents via
list, know that couch will *tell* the client/browser to use what it already
has if none of the pertinent documents in the database have changed.

Check out:
* http://guide.couchdb.org/draft/show.html#constraints
* http://guide.couchdb.org/draft/transforming.html#example (see the
first "lightbulb" blockquote in that section).

Hah, you're right about the map caching. I forgot that some of the others
don't cache! Hmmmmm. We could do our *own* caching. I'm not sure if that's
considered bad behavior or not, but I don't see how it makes a difference,
and really, I think the fact that they send the reduce function *every
time* a reduce computation is needed is a bad choice in protocol design --
it's simpler, yes, but they could have just added 'load' and 'unload'
commands for functions, so that you can do one-time compilation.

What we can do is cache the reduce/list/show functions they give us and run
the computation. Next time they pass us a function, we do a string compare
on the new code for that function to the string of code we originally
received for that named list/show/reduce function. If it's the same as
before, then our compilation step becomes a no-op, and we just use what we
already had. If the function is different, then we assume that the design
doc has been updated, and we recompile. This way, we can do things like use
closures for those performance gains. Couch's own rules and reasons for
side-effect free functions are what gives us the right to do this kind of
caching.

Moving on... well random probably shouldn't be imported (or at least not
used by any couch stuff), since by its very nature, it'll produce different
results every time.

As for downtime, in many cases, couch can service requests while an index
is being rebuilt. Also, you can easily replicate to another
database/server/whatever (secondary server), rebuild it there, and
temporarily make that the primary database your serving from while you
rebuild the index on your real primary. That sounds complicated, but as we
all know, in couch that takes less thought than it took me to write about
it just now. Also, that's all assuming that couch's index hot-rebuild
doesn't cover your use case. Hot-swapping couch databases and even couch
server instances, or adding redundancies and failovers is a fact of life
with couch -- sure, there are plenty of us running just one couch instance
for a given application, but it's so painless to temporarily add another
copy ad-hoc, and tear it back down when you don't need to again. Unlike
with other systems, you don't even really need to plan ahead when you do it.

You've got a good point. I'm not sure what the resolution would be. Clearly
python wins over javascript for couchdb not because of its pretty and
concise syntax, since view/list/show functions are about the same size in
either language if you aren't allowed to import anything. Python would win
out because of its standard library (which is API-stable enough for couch),
and because of its 3rd party modules.

In any case, behavior-changing module upgrades could only be handled by
rebuilding the index. Even though the code that couch can see (the code
stored in the design doc) hasn't changed, the code it links to has. So you
simply have to treat it in same way as if you changed a line of code in the
map func itself, and there's no way around that.

Just like with retooling your own view/list/show function code, you have to
strike a balance between the time you need to spend rebuilding an index,
and the benefits you get from changing the code. After all, you can always
choose to *not* upgrade your python or module to a new version, and just
because there is a new version, doesn't mean you need it.

couchdb...@googlecode.com

unread,
Mar 26, 2011, 4:09:25 AM3/26/11
to couchdb...@googlegroups.com

Comment #16 on issue 146 by kxepal: Up-to-dated view server
http://code.google.com/p/couchdb-python/issues/detail?id=146

The only documentation I saw is the source code(:
https://github.com/apache/couchdb/commit/7665e449cdfff1e660ed2bbac3de4507cb063a18#share/server/state.js
AFAIK, this command passed automatically if ddoc has views/lib/... path
set, but I'm not sure. However, I could think in another way while looking
on test case.

Caching shows/lists/other ddoc subcommands may be possible, but this cache
would be reseted on each design document update. Reduce functions couldn't
be cached without source code comparing. However, this trick wouldn't work
with 0.10.0.
There is command ["reset"] to clean up map functions cache and drop all you
configuration: mime types, reduce_limit etc. However, again, it's system
wide, not available from the outside.
I need some time for experiments to understand all profits and all flaws
for such caching. If it hadn't been implemented for javascript view server,
there must be some reason, right? First one I see, if you update 3d party
package within system, your cached byte code wouldn't be updated too -
design have not been changed! - and you'll have a lot of fun in this case(:
It could be recompiled once again for such fail, but tests are still needed.

I have also found case that breaks idea with imports on top of function:
>>> import datetime
>>> from itertools import groupby
>>> def test(doc):
>>> yield doc['_id'], 'passed'
the result namespace would be always:
{'datetime': <module 'datetime'
from '/usr/lib/python2.4/lib-dynload/datetime.so'>,
'groupby': <type 'itertools.groupby'>,
'test': <function test at 0x7f3395044938>}
So, those function that iterator would found will be groupby. That's wrong
one, but it returns two value tuple, but will generate very strange error:
>>> TypeError: <generator object at 0x7fcebfa58908> is not JSON serializable
Totaly crushing view server. You'll have to spend a lot of time with
--debug option enabled to understand why, but currently it would not help
you in such case without additional logging. And if generators was JSON
serializable you've got even wrong result without any warnings. Still not
very explicitly and relaxing behavior ):
Binding by names? Not an option.

Random module shouldn't be used for views for sure, but it could be useful
for lists to randomization output.

Idea with swapping temp/production databases is nice too if temporary couch
instance could serve for a while as production one...but I suppose this
interesting disquisition not for this issue(;

In next things I'll agree with you - we have to find mostly ideal point of
balance. Hard optimizations and tricks is part of highload environment.
There could also be used pypy instead, other faster json module etc. Our
task is to create tools that are works, works good, but also have some
space for heavy optimization with some trade off.

couchdb...@googlecode.com

unread,
Mar 26, 2011, 7:26:25 AM3/26/11
to couchdb...@googlegroups.com

Comment #17 on issue 146 by extempor...@gmail.com: Up-to-dated view server
http://code.google.com/p/couchdb-python/issues/detail?id=146

Yeah, I'll have to look into that 'require' thing. On first glance, it
looks like couchjs is doing a request to the design doc for the
dependencies.

Right, as said in a previous post, in order for the above-the-function
option to work, without the use of a closure, you (the view server function
compiler) would have to check every key in the locals dictionary that exec
generates to see if it has a __module__ attribute, and if that attribute
has the value of None. The only backwards compatible requirement we need is
that there is only one callable object (usually a function) that has
__module__ set to None (since non-imported local functions/classes will
have __module__ of None).

>>> code = """
... from datetime import datetime
...
... class OldStyleClass:
... pass
...
... class NewStyleClass(object):
... pass
...
... y = 17
...
... def test():
... return 5
... """
>>> locals = {}
>>> exec code in {}, locals
>>> locals
{'y': 17, 'test': <function test at 0x7f7e25f10320>, 'NewStyleClass':
<class 'NewStyleClass'>, 'OldStyleClass': <class __builtin__.OldStyleClass
at 0x7f7e25f196b0>, 'datetime': <type 'datetime.datetime'>}
>>> for key in locals:
... if callable(locals[key]):
... if locals[key].__module__:
... print key, "is *not* a candidate, since it's imported from",
locals[key].__module__
... else:
... print key, "is a candidate (hopefully the only one, or we'll have
to error out)"
... else:
... print key, "isn't even callable, so we don't care about it"
...
y isn't even callable, so we don't care about it
test is a candidate (hopefully the only one, or we'll have to error out)
NewStyleClass is *not* a candidate, since it's imported from __builtin__
OldStyleClass is *not* a candidate, since it's imported from __builtin__
datetime is *not* a candidate, since it's imported from datetime

Huh, so apparently class definitions inside of an exec will be associated
with the __builtin__ module, so we'd have to check for that, as well. But
in general, it's easy to do a backwards-compatible check for non-imported
callables.

Oh, perhaps the answer to the module distribution problem is to put a
custom import mechanism that checks for those modules as attachments to the
view/list/show functions design doc before checking the normal on-disk
module path. Couch's _changes API would need to be monitored for design doc
changes by couchpy too, so that couchpy can know when it needs to reload
modules. If this were achievable, you could bundle your modules in the
design doc itself (regular zip files and eggs could be supported).

The best way to get a good system in place for this is not to work around
Couch's API, but instead to work directly with the Apache Couch community
to support everything we're talking about, since none of it violates the
side-effect-free requirements of couch if dependency checking can be moved
into couch. This wouldn't mean that couch would have to understand any
programming language, but would be able to handle changes to certain
special design doc keys. For example, couch could *hypothetically* do:

{
"_id": "_design/app",
"lib": {
"calc": "def something_statistical(a,b,c,d): return (a,b,c,d)",
"image": "#some-package v1.3.2",
"chart": "#other-package v4.1.7",
},
"depends": {
"views.test": ["lib/calc"],
"shows.graph": ["lib/chart"],
"lib.chart": ["lib/image", "_attachments/something_local.egg"]
},
"views": {
"test": {
"map": "def fun(doc): yield doc['_id'],
calc.something_statistical(*[doc.get(k) for k in 'abcd'])"
}
}
}

Once again, this doesn't exist in couch, but if it were implemented, couch
would only need to know how to interpret the "depends" key. If a string
in "lib" changes (couch doesn't need to know or care what the contents of
that string mean), then everything that depends on it needs to get updated,
just like it reindexes views when the view function strings are changed. In
the case of list or shows, this would mean setting a new Etag that
invalidates client-cached versions of the previous show/list results. Couch
would also need to send the dependency to the view server when it's needed,
in the form of some kind of addlib command. couchpy itself could ignore the
# version stubs, since those would just be there to provide an easy upgrade
path for libraries. Or it could compare the version shown there to the
version of the module it imports, and update the design doc if a new
version is found on the module path. Dependencies starting
with "_attachments" could be handled specially by couch.

couchdb...@googlecode.com

unread,
Mar 26, 2011, 7:34:26 AM3/26/11
to couchdb...@googlegroups.com

Comment #18 on issue 146 by extempor...@gmail.com: Up-to-dated view server
http://code.google.com/p/couchdb-python/issues/detail?id=146

By the way, you're right that use of random is side-effect-free. Just keep
in mind that couch's Etag/caching semantics will make it so that, if an
HTTP client does proper caching, it'll do a conditional GET request for the
show/list the next time you ask for it, and unless one of the documents the
list/show depends on has changed, couch will tell that client that the
resource has not been updated.

Therefore, your random lists will only look random once per depended-upon
document update. This is on a client-by-client basis, though. If you have
your own caching proxy in the middle, or something like couchbase starts
having its own response cache, then everybody will see the same random
results on each request, until the next time one of the pertinent documents
is updated. This is another "good thing" that couch provides, because even
though it might hurt you 5% of the time, it really helps with scalability
and responsiveness 95% of the time.

couchdb...@googlecode.com

unread,
Mar 26, 2011, 9:08:49 AM3/26/11
to couchdb...@googlegroups.com

Comment #19 on issue 146 by kxepal: Up-to-dated view server
http://code.google.com/p/couchdb-python/issues/detail?id=146

> Yeah, I'll have to look into that 'require' thing. On first glance, it


> looks like couchjs is doing a request to the design doc for the
> dependencies.

It doesn't but it have access to it via closure. It just have passed to
compile function as second argument.

Your example could pass and work as "expected", but it just a case. There
are
others that wouldn't worked as "expected". There is needed just stable
entranse point.
May be some kind of decorator would be solution as:
>>> import datetime
>>> def helper(item):
>>> ...
>>> @main
>>> def mapfun(doc):
>>> ...
But would it be good, explicitly and clean? Looks like the same as
predefined function with special name. I suppose there is no so much need
in complex code block. One node - one function. Libs will take others with
exported statements as they have been designed to do + eggs as libs to
store more complex packages.

> Couch's _changes API would need to be monitored for design doc changes by
> couchpy too, so that couchpy can know when it needs to reload modules.
> If this were achievable, you could bundle your modules in the design doc
> itself (regular zip files and eggs could be supported).

It doesn't needs as if design document have been changed there would be
passed command to refresh it within local cache.

Also note, that attachments is separate entity that just binded to
document, but doesn't pass with it. So to call attachment from show/list
you have to make pure http request - madness!(:

> For example, couch could *hypothetically* do: ...
Too complex solution: instead of just create function you have to create it
+ set up all required dependences to make to work correct. Same thing does
require function currently - just invoke it and extract needed exported
statement.

> By the way, you're right that use of random is side-effect-free. Just keep
> in mind that couch's Etag/caching semantics will make it so that, if an
> HTTP
> client does proper caching, it'll do a conditional GET request for the
> show/list the next time you ask for it, and unless one of the documents
> the
> list/show depends on has changed, couch will tell that client that the
> resource has not been updated.

In show/list function you could set your own headers and disable caching
via Expires header. It has higher priority than Etag one. Actually, Etag
only __may__ be used for cache proposes.

couchdb...@googlecode.com

unread,
Mar 26, 2011, 10:08:51 AM3/26/11
to couchdb...@googlegroups.com

Comment #20 on issue 146 by extempor...@gmail.com: Up-to-dated view server
http://code.google.com/p/couchdb-python/issues/detail?id=146

Good idea! As you indicated, if you have a single callable, then it'll work
as expected. If you have more than one callable, you must designate it with
@main.

Well the main point is, that your idea provides a mechanism for the
programmer to be as expressive an succinct as they need.

...


def helper(item):
...
@main
def mapfun(doc):
...

is the equivalent of:

def mapfun():
...
def helper(item):
...
def mapfun(doc):
...
return mapfun
mapfun = mapfun()

Only difference is that the decorator is a *lot* easier.

The point about my dependency solution is that it lets couch handle
recursive dependencies with respect to index rebuilding and Etag handling,
so that couch can make sure that all data is consistent. I agree, it's
complex, and there's bound to be a better way (I don't care for my solution
either -- it's an initial suggestion). I just know that if we have to
manage recursive (or even non-recursive) dependencies ourselves, then it
won't work. Sooner or later, we'd end up with a badly inconsistent
database, with bugs that are really hard to notice.

Dependencies are necessary because in the typical couch application (at
least all of the ones I've done), there is a lot of duplicate code, and
that duplicate code makes the application much much harder to maintain.

Do you really want to override Etag handling as done by couch. Put it this
way, Etag is the absolute best caching mechanism available to you, but it's
also *very* complex to get it right. Enterprise-grade server software often
fails to handle it usefully (Apache uses inode numbers, which does not
allow you to cluster while still keeping out-of-the-box caching), and many
high-end websites with big budgets never manage to implement it, instead
using expires headers, or spending money on a secondary server to deal with
application inefficiencies.

Couch manages Etags perfectly, so even though it's easy to add nodes to
scale couch, Etags work in the opposite direction, making it so you have
much less of a need to scale. If you have something that's completely
dynamic, and there's nothing in couch's architecture that tells you that
you must use idempotent show/list functions, then by all means, send a
no-cache header. But if you have something that is barely dynamic (like you
include a random hash with the output, just for the heck of it, or you want
to add a string saying 'response generated in 0.0013 seconds'), then you
probably want to rethink what you're trying to do, since you're sacrificing
a lot to gain so little.

couchdb...@googlecode.com

unread,
Mar 26, 2011, 10:27:57 AM3/26/11
to couchdb...@googlegroups.com

Comment #21 on issue 146 by kxepal: Up-to-dated view server
http://code.google.com/p/couchdb-python/issues/detail?id=146

> Only difference is that the decorator is a *lot* easier.
Easier? May be. Implicitly? For sure. You have always keep in mind this
@main decorator. However, I see we've come to current, original state -
single function which creates inner context (;

> Do you really want to override Etag handling as done by couch.

I don't mean to override it, but I've answer to you how to workaround cache
case based on Etag.
About Etag:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.19
a little below you may found Expires header.


couchdb...@googlecode.com

unread,
Apr 2, 2011, 2:01:16 PM4/2/11
to couchdb...@googlegroups.com

Comment #22 on issue 146 by kxepal: Up-to-dated view server
http://code.google.com/p/couchdb-python/issues/detail?id=146

At first, sorry for those "wall of text" that subscribers had received
from us - probably we have to create separate topic on groups, but now it's
too late. And special sorry to those one, who unstarred this issue - he
wouldn't receive notification about new version of view server that I would
like to attach for testing.



Short changelog:

! remove dependency from versioning decorator

! fix Mime class and show functions with provides/response_with methods -
they was just totaly broken.

! fix Python exception encoding for CouchDB versions < 0.11.0

! correct filters for versions >= 0.11.1. There is no more userctx
argument, beware!

+ add missed ddoc cache (thanks for this discussion)

+ add support for add_lib command. CouchDB version >= 1.1.x required

+ add support for views command. Currently available for trunk version

+ add support for secobj for validate_doc_update commands. Requires CouchDB
>= 0.11.1. However, this argument leaved as optional due to it doesn't
mentioned in most examples.

+ add require function with same behavior as same javascript function has

+ add docstrings for most valuable methods with descriptions and examples

~ add support for 0.8.0 version - that was too easy (:
~ allow imports in design functions (see notes below)

~ _log function has been replaced by logging handler

~ correct error message for design function wrong definition

~ code cleanup, reorganisation, formatting fixes

~ more tests added and passed (47 total, 5 failed for 0.9.0 due to I
couldn't reproduce valid behavior - have someone windows binaries of 0.9.x?)



Something about imports:

http://mail.python.org/pipermail/python-list/2007-September/507450.html

I really hadn't knew about this behavior(: So any imports at top level are
useless if only they are not be explicitly passed to target function as
arguments or through decorator. However, I've allow usage of them due to
perfomance reasons.

More detaled history of changes avaiable in viewserver branch:
https://code.google.com/r/kxepal-couchdb-python-featured/source/list?r=viewserver

Next questions that I have:
1. Should I split view.py into view package(propbably better name it
viewserver package) due to a code growing and missing support of sphinx
autodocumentation?
2. Should I add preimported modules? I've stoped at next ones: base64,
calendar, datetime, math, random, re, time - they are quite common, useful
and avaiable in all supported versions.
3. Should I add eggs support via --egg-cache parameter where storage folder
would be specified? Eggs could be stored as base64 encoded strings, not as
attachments due to they are not avaiable from view server.

Attachments:
view.py 54.7 KB
testutil.py 3.9 KB
view.py 75.2 KB

couchdb...@googlecode.com

unread,
May 9, 2011, 12:26:00 PM5/9/11
to couchdb...@googlegroups.com

Comment #23 on issue 146 by kxepal: Up-to-dated view server
http://code.google.com/p/couchdb-python/issues/detail?id=146

Ok, I'll answer on those questions by myself(:

> Should I split view.py into view package(propbably better name it
> viewserver package) due to a code growing and missing support of sphinx
> autodocumentation?

Yes, I should. Because operate with 2K of very nested codebase with massive
cross functions dependencies is not easy and missing sphinx autodoc feature
makes to be sad.

> Should I add preimported modules? I've stoped at next ones: base64,
> calendar, datetime, math, random, re, time - they are quite common,

> useful and available in all supported versions.
No, I shouldn't. Because I couldn't decide the developer needs for current
project, even if those modules are all fits to most tasks. Instead of that,
I've create something like QueryServer constructor, which could be used to
create your own QueryServer with your own behavior without couchdb-python
code patching. Petty nice solution, right?(; See `construct_server`
function in `couchdb.server.__init__.py` for how the default query server
is defined.

> Should I add eggs support via --egg-cache parameter where storage folder
> would be specified? Eggs could be stored as base64 encoded strings, not

> as attachments due to they are not available from view server.
Yes, I should. Because this feature provides too much to leave it ignored.
However, it's optional and must be enabled explicitly for security and
compatibility reasons. To store eggs within design documents you should
encode egg as base64 string. See documentation for examples.

So, query server was totally refactored from single module to full package
and here is new version changes:
+ add support eggs as modules.
+ add option to control GET request to update functions.
+ add query server constructor: define your own context, error handlers,
commands(if you've own CouchDB fork or living with very nightly builds) and
more.
+ add query server documentation article.
+ add own logging channel for each part of query server.
~ update "Writing views in Python" documentation article.
~ fix doc strings to make them more sphinx friendly.
~ fix for require circular references (COUCHDB-1075)
- remove debug decorator, because now you may implement it by your own if
you'd like.

Tested on Python 2.4-2.7 and PyPy 1.5. All changes are still available at
http://code.google.com/r/kxepal-couchdb-python-featured/source/list?r=viewserver

And that's all I think(: Could someone review documentation articles due to
my poor english knowledge and code to decide is there something needed to
change? Any ideas? Criticism? Thanks(:

Attachments:
query-server.tar.gz 34.2 KB

couchdb...@googlecode.com

unread,
May 22, 2011, 12:39:03 PM5/22/11
to couchdb...@googlegroups.com

Comment #24 on issue 146 by kxepal: Up-to-dated view server
http://code.google.com/p/couchdb-python/issues/detail?id=146

tested on android 2.3.4 Google Nexus One using Py4A application. To share
my happiness do next things:
1. copy couchdb package folder to
/sdcard/com.googlecode.pythonforandroid/extras/python (query server imports
are absolute and uses couchdb package as root)
2. create file on sdcard, for example /sdcard/couchpy, and place next code
into it:
PYTHONPATH=/data/data/com.googlecode.pythonforandroid/files/python/lib/python2.6/lib-dynload
PYTHONPATH=${PYTHONPATH}:/mnt/sdcard/com.googlecode.pythonforandroid/extras/python
export PYTHONPATH
export PYTHONHOME=/data/data/com.googlecode.pythonforandroid/files/python
export
LD_LIBRARY_PATH=/data/data/com.googlecode.pythonforandroid/files/python/lib
/data/data/com.googlecode.pythonforandroid/files/python/bin/python
/mnt/sdcard/com.googlecode.pythonforandroid/extras/python/couchdb/view.py
--couchdb-version=1.0.0
3. add next line to query_servers section in CouchDB configuration:
python = sh -e /sdcard/couchpy
4. ...
5. now you could use pythonic design documents on android(:

couchdb...@googlecode.com

unread,
Sep 13, 2011, 6:45:48 AM9/13/11
to couchdb...@googlegroups.com

Comment #25 on issue 146 by kxepal: Up-to-dated view server
http://code.google.com/p/couchdb-python/issues/detail?id=146

It's good thing to review your own code after some time passed. This update
includes a lot of fixes and even some new features:

global:
- removed global state and cross module references (WOO-HOO!)
- rewritten QueryServer api
- added SimpleQueryServer as high level abstraction on top of QS internals
- added MockQueryServer to help write unittests
- fix docstring and typos.
- query server logs are more useful now in debug mode
- update documentation with android paragraph and how to customize query
server
- place TODO references to actual CouchDB issues: COUCHDB-729, COUCHDB-282,
COUCHDB-1261, COUCHDB-898. I could fix them locally, but this will make
more differences between original JS server and Python one.
- add more over 170 test cases

compiler:
- fix crush for compilation of source code with windows newlines
- fix double crush if function compilation failed
- fix crush for malformed base64 encoded egg
- fix crush on egg cache usage
- code refactoring

stream:
- abstraction from JSON module exception type on decode/encode operations

render:
- fix COUCHDB-1272
- code refactoring

validate:
- prevent query server crush by validate_doc_update on Python exceptions

views:
- reduce_output_overflow error now will be raised properly
- small refactoring
- document seal now works better with copy.deepcopy()

design functions:
- send(), start(), provides(), register_type() available only for show and
list functions
- get_row() available only for list functions
- log() function is not proxy of logging.info anymore

All test passed for:
- Python 2.4 to 2.7
- PyPy 1.5 and 1.6
- Android 2.3.4 with Python for Android version 5 against CouchDB-1.0,
andorid-0.1 and MobileFuton 1.7

Please, could someone review docstrings and sphinx docs? I'm sure
documentation text is far from good state /:

Attachments:
queryserver.zip 47.7 KB

couchdb...@googlecode.com

unread,
Feb 12, 2012, 6:49:51 AM2/12/12
to couchdb...@googlegroups.com

Comment #26 on issue 146 by djc.ochtman: Up-to-dated view server
http://code.google.com/p/couchdb-python/issues/detail?id=146

I took a look at this, but I'm having some trouble getting the tests
running. In particular, this bit doesn't seem to work, independent of the
view server used:

djc@enrai couchdb-python $ python
Python 2.7.2 (default, Oct 24 2011, 10:16:20)
[GCC 4.5.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import subprocess
>>> pipe = subprocess.Popen(['/usr/bin/python2.7', 'couchdb/view.py'],
>>> shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE,
>>> stderr=subprocess.STDOUT)
>>> pipe.stdin.write('["reset"]\n')
>>> pipe.stdout.readline()
^CTraceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyboardInterrupt


Meanwhile, this works just fine from the command-line:

djc@enrai couchdb-python $ python couchdb/view.py
["reset"]
true

I looked at the queryserver.zip from comment 25 and the files from comment
6. It seems to me that the former is too large and complex to take into
couchdb-python at this time. The stuff from comment 6 is much more simple,
but I couldn't run the test suite due to the above issue.

Finally, the code in comment 6 does a bunch of stuff to stay compatible
with all of 0.9, 0.10 and 0.11+. I would propose that any new view server
code we take for our next release be limited to supporting
0.11+-compatible; that's already quite old at this point.

couchdb...@googlecode.com

unread,
Feb 12, 2012, 7:42:01 AM2/12/12
to couchdb...@googlegroups.com

Comment #27 on issue 146 by kxepal: Up-to-dated view server
http://code.google.com/p/couchdb-python/issues/detail?id=146

Hi, Dirkjan!

Thanks for first review(: Actually, I never run it as subprocess, but if
you take a look at couchdb/tests/testutil.py::QueryServer so it could be
run as subprocess, just remove shell=True from Popen.

There is a huge difference between comment 6 and comment 25. It's not in
code size, it's in bugs, code complexity, documentation, tests, features,
logging and how easily you could extend it without getting things broken.
Support of old CouchDB releases is not a little part of it, just a few
functions that easily could be removed. For example, I've easily added
multiprocessing support for qs#25 for map/reduce functions just by
decorating server/views.py functions without touching source code.

Also, you have to change a look from comment 6 to comment 22. That was the
last version of all-in-one-file queryserver, but it still buggy by design.

Main goal of qs#25 and all code splitting was to simplify future support,
allow to extend it easy and help with couchapps unittesting because now you
could run it not as subprocess. And it had been reached.

You may read changelogs in this thread and in my clone at viewserver
branch, they are quite full.

I admit, that it's a little big patch for about 250KB of code(removing
docstrings could reduce it by half I sure), but I'd like to take support of
it, because I use it for everyday tasks, I knew each line of it, I'd like
to help couchdb-python project and I do not want to create
yet-another-python-queryserver-project. Peoples knows about couchdb-python,
knows about his viewserver and expecting that it's fine. Why not to satisfy
them?

couchdb...@googlecode.com

unread,
Aug 3, 2012, 12:54:14 PM8/3/12
to couchdb...@googlegroups.com

Comment #28 on issue 146 by kxepal: Up-to-dated view server
http://code.google.com/p/couchdb-python/issues/detail?id=146

Updated Python query server in attachments. After almost year usage in
production there was fixed some small problems:
- Eventually crush on chunk encoding for _list functions.
- View lib cleanup on reset command
- Handle single named MIME type params e.g. application/pdf;base64
- Reduced useless logging output to improve they readability.
- Fix COUCHDB-1330.
- Fix crush on malformed MIME type.
- Couchapp modules no more needed to be wrapped into some scope: just write
regular
Python code for them. For example:
{{{
import datetime

def foo(datetime=datetime):
return
datetime.datetime.utcnow().replace(microsecond=0).isoformat('T')

exports['foo'] = foo
}}}
Now you may remove any proxy hacks to have simply and expected code
behavior:
{{{
import datetime

def foo():
return
datetime.datetime.utcnow().replace(microsecond=0).isoformat('T')

exports['foo'] = foo
}}}

This change doesn't affects on other ddoc functions: show/lists/views etc.

Attachments:
queryserver.zip 51.1 KB

couchdb...@googlecode.com

unread,
Sep 21, 2012, 4:35:25 AM9/21/12
to couchdb...@googlegroups.com
Updates:
Owner: kxepal

Comment #29 on issue 146 by djc.ochtman: Up-to-dated view server
http://code.google.com/p/couchdb-python/issues/detail?id=146

(No comment was entered for this change.)

couchdb...@googlecode.com

unread,
Jul 15, 2014, 3:29:03 AM7/15/14
to couchdb...@googlegroups.com

Comment #30 on issue 146 by djc.ochtman: Up-to-dated view server
http://code.google.com/p/couchdb-python/issues/detail?id=146

This issue has been migrated to GitHub. Please continue discussion here:

https://github.com/djc/couchdb-python/issues/146

--
You received this message because this project is configured to send all
issue notifications to this address.
You may adjust your notification preferences at:
https://code.google.com/hosting/settings
Reply all
Reply to author
Forward
0 new messages