Did you write this yourself? Is it a port of the JavaScript view server, or did you find some documentation on the view protocol?
I definitely want an updated view server, but I prefer that we only add something for which we have good tests, so that we can prevent it from going stale again. Would you be in a position to write tests for this? Some design notes on the protocol would likely also be helpful.
If we decide to take this, I also have some code/style nits.
It's javascript view server ported by me. I've going by only one thing - to make view server maximum compatible with js one, because it's always up-to-dated and have released first. So as close pythonic view server will be to it so easy support will be.
Shame on me, I've thought for some unknown reason that view server must be tested via couchdb-python API - just have look at tests/view.py. I'll make them for tomorrow.
I've found a little problems due passing official view server tests, but now all fine.
I'll make better exceptions handling + crush tests later - I really dont like this forest of try..except, but couldnt invent something better now - need some time to think about it. So this is just check point(:
In additional to javascript view-server features I've implemented two behaviors: - sealed document: changing document in map function makes no sense for other map functions within single view. - any pythonic exception will crush view server while javascript view server allow crushing only on fatal errors - I dont know which ones python have. Everything else seems same.
Ok, here is updated view server. Major changes: - support any couchdb server since 0.9 version. By default view server works in mode of compatibility with latest couchdb server version, just run it with --couchdb-version key, e.g. view.py --couchdb-version=0.10, to make it work well for 0.10 couchdb server. - Tests are included for each supported version - all of them had been ported from javascript view server. - Assertion error within validate_doc_update function doesn't count as Fatal like any other pythonic exception and will be wrapped as Forbidden error - "Reduce output must shirnk more rapidly" error now may be occured - More verbose debug logging
Also I have to add function versionizing decorator to split their behavior for each couchdb version. I thinks it will be useful in future to keep legacy support without serious code rewriting.
In over way it's ready for 1.0.2 since main change in js view server was in sealing documents for map func, but this feature is done already.
I will port sofa and tapirwiki for this view server within next two weeks, so this made a great challenge for him and may be more fixed will come if I've missed something.
I've found one thing missed - require function to have some modular application within document design. But there are some questions about it: - should it work as in javascript view-server: wrap some abstract code and return export values? Would be better to implement python-like import? - should it support eggs? I think it should, but I have no idea how to import eggs inline without saving them on disk. This could be a problem for application hosters. any other behavior suggested?
What I'm doing to handle some efficiency issues with import:
def fun(): import datetime ... other imports ... def fun(doc): ... delta = (datetime.strptime(doc[...]) - datetime.strptime(doc[...])).days ... return fun fun = fun()
What really ought to happen is that the view server should go through each variable in the exec's locals and check the __module__. If __module__ exists, but is None, and the variable points to a callable, then use that as the map/reduce func, and error out if more than one is found. This would be backwards compatible with existing view functions, but would make it so closures are not necessary.
I think the eggs deployment issue has been tackled many times before. Importing eggs inline would obliterate responsiveness. Does couch time you out if you take to long? I'd be concerned that it would/should. If an application hoster supports python, then sooner or later they'd need to come up with a solution to handle 3rd party software since, frankly, the power of using python as a view server doesn't just lie in "it looks nice" and "it has yield".
To meet couch's same-code-same-result requirements (no side effects), we would maybe have to mark imports with python and module version strings, and push that back into couch. For example, 'import mythirdpartyegg' would then append '#mythirdpartegg py cpython-2.6.5 mod 3.11r7112' to the end of the eval string, one per unique detected module. Any time you do any module upgrades, you can just delete the version-marking comments out of the view func manually, and couch will regenerate it.
However, most modules *tend* to be stable enough API-wise that this isn't a problem. If there were any behavior altering bugs/changes to the code, an administrator could achieve this manually.
Several questions, could I? 1) what the reason of such wrapper against: def fun(doc): ... delta = (datetime.strptime(doc[...]) - datetime.strptime(doc[...])).days ... import datetime
this is not very pythonic to place imports below, but datetime will be tried to import only once. However, this style could produce another problem: views must have not any state and any dependence from source which could be changed later.
2) Hoster could provide 3rd party modules, but it couldn't provide all versions of each template engine, for example. May be you needs trunk jinja2 with you own patches, who knows? So idea to create fully portable pythonic couchapp will be failed.
3) Is preprocessors statements really good idea? I saw them in couchapp, but they have been used only for declaration, not within document design nor view server.
I still need to finish some details, so attachments with new version of view server will be later. Sorry, Dirkjan, it seems you to have revise it once again, but I'll include documentation for each vital function and more tests to make process more easy just(:
You *could* put imports after the inner func that needs them, and python's scoping rules would resolve variable lookups, but I agree, it's not Pythonic. Closures themselves are not very Pythonic, either. Anyways, why not put imports first, as per my suggestion?
The whole reason for using a closure is to avoid the performance penalty of repeating the imports for each document. If a function needed k imports and there are n documents in a couch database, then, compared to the closure technique shown above, there'd be n*k redundant imports taking place, which is very slow (python doesn't re-import the module, but there is overhead involved, which can be significant).
I disagree with you on the severity of Couch's "no side-effects" requirement for view/show/list functions. It's a matter of practicality, not a matter of theory. Yes, Couch says that the same document passed to the same code must produce the same output, no matter how many times it's executed. If the document doesn't change, then the result doesn't change.
However, this mandate is only for data correctness. If the module you're importing in a view function gets upgraded, and its behavior changes, but your view function stays the same (so couch doesn't regenerate the view), then all that happens is that your view will be incorrect. Couch won't break (couch won't know, mind, or even care).
Also, using module imports don't count as "side effects" at all. Aside from the random module, almost all modules (including 3rd party ones) are stateless in their behavior. For example, it'd be fine to describe a shape as a list of points in a couch document, and then use PIL to draw that to a png image in a show function, or to use couch to store server access logs, and then use pychart from a list function to generate an svg rendered line graph of server traffic.
Furthermore, the above "same code, same doc, same result" requirement does not apply across all time. For example, I could define a view function that imported some module, and then change the behavior of the module. All I'd have to do fix the consistency of the view would be to add a single space to the end of the view function, save the design document to couch, and then remove that space, and save the design document again. The code is *exactly* the same as before, yet we side-stepped the upgrade-changes-behavior issue, and caused couch to regenerate its views to reflect the new behavior. And this process can be automated (alternately, you could delete the views from the design doc, do a view cleanup, and then reupload the original design document, which would require only one view regeneration instead of two). As long as you regenerate your views when needed, the "could be changed later" issue isn't a problem at all.
In general, hosting vendors are not ever going to support all versions of all potential packages. They're either going to support only a handful of popular modules (probably Django templating) and force long release cycles, or they'll provide you with a few megabytes of space to upload your own modules in your own private module path, or they'll not allow the use of any kind of 3rd party modules (in which case you might as well use Javascript).
At first till I don't forget, thanks you for detail reply(:
So about imports: Allow to have them on top of design function as PEP told us I see is ok too and this is much more intuitive behavior. However, only map functions are cached: reduces/shows/lists/updates and others are recompiling for each call, so all this import optimization tricks are not so useful as they have to be. I suppose that also would be better to extend preimported packages with most popular and useful, which probably would be always imported, such as: time, datetime, re, hashlib, math, random, itertools and others. But that would be very implicitly feature without reading of docs and not only I should decide what will be in this list.
No side effect is requirement for views only, afaik, because index is based on view result only, while shows/lists just the way to show data in nicer form. There is one more thing to keep views as much stable and independent from side effects as possible: if you have dozen millions documents last thing that you would like to do is to rebuild view index, because this would take hours. Yes, trick with secondary server and replacement view index is nice idea, but you still have to lose your hours and you'll have service down for a some time. However, in 1.1.x branch was added feature to require view/lib stored module for map functions.
Suddenly for couchdb view servers, hoster wouldn't provide some space for your own modules because that would require some additional interface, monitor to reload module set in realtime without forced restart of view server and...this solution killing portable pythonic couchapps. Javascript couchapps are awesome because you just have to type: "couchapp push" and that's all - it works!
Can you link some documentation on that 1.1.x feature? That's something I'd be *very* interested in learning about.
Show and list functions are supposed to be side effect free too. That way, they can be cached by couch (though I'm unsure if couch itself actually does that). I'm pretty sure couch does proper Etag handling of the show/list results, so if you're expecting that you can generate a new result each time someone accesses a doc via show, or a set of documents via list, know that couch will *tell* the client/browser to use what it already has if none of the pertinent documents in the database have changed.
Hah, you're right about the map caching. I forgot that some of the others don't cache! Hmmmmm. We could do our *own* caching. I'm not sure if that's considered bad behavior or not, but I don't see how it makes a difference, and really, I think the fact that they send the reduce function *every time* a reduce computation is needed is a bad choice in protocol design -- it's simpler, yes, but they could have just added 'load' and 'unload' commands for functions, so that you can do one-time compilation.
What we can do is cache the reduce/list/show functions they give us and run the computation. Next time they pass us a function, we do a string compare on the new code for that function to the string of code we originally received for that named list/show/reduce function. If it's the same as before, then our compilation step becomes a no-op, and we just use what we already had. If the function is different, then we assume that the design doc has been updated, and we recompile. This way, we can do things like use closures for those performance gains. Couch's own rules and reasons for side-effect free functions are what gives us the right to do this kind of caching.
Moving on... well random probably shouldn't be imported (or at least not used by any couch stuff), since by its very nature, it'll produce different results every time.
As for downtime, in many cases, couch can service requests while an index is being rebuilt. Also, you can easily replicate to another database/server/whatever (secondary server), rebuild it there, and temporarily make that the primary database your serving from while you rebuild the index on your real primary. That sounds complicated, but as we all know, in couch that takes less thought than it took me to write about it just now. Also, that's all assuming that couch's index hot-rebuild doesn't cover your use case. Hot-swapping couch databases and even couch server instances, or adding redundancies and failovers is a fact of life with couch -- sure, there are plenty of us running just one couch instance for a given application, but it's so painless to temporarily add another copy ad-hoc, and tear it back down when you don't need to again. Unlike with other systems, you don't even really need to plan ahead when you do it.
You've got a good point. I'm not sure what the resolution would be. Clearly python wins over javascript for couchdb not because of its pretty and concise syntax, since view/list/show functions are about the same size in either language if you aren't allowed to import anything. Python would win out because of its standard library (which is API-stable enough for couch), and because of its 3rd party modules.
In any case, behavior-changing module upgrades could only be handled by rebuilding the index. Even though the code that couch can see (the code stored in the design doc) hasn't changed, the code it links to has. So you simply have to treat it in same way as if you changed a line of code in the map func itself, and there's no way around that.
Just like with retooling your own view/list/show function code, you have to strike a balance between the time you need to spend rebuilding an index, and the benefits you get from changing the code. After all, you can always choose to *not* upgrade your python or module to a new version, and just because there is a new version, doesn't mean you need it.
Caching shows/lists/other ddoc subcommands may be possible, but this cache would be reseted on each design document update. Reduce functions couldn't be cached without source code comparing. However, this trick wouldn't work with 0.10.0. There is command ["reset"] to clean up map functions cache and drop all you configuration: mime types, reduce_limit etc. However, again, it's system wide, not available from the outside. I need some time for experiments to understand all profits and all flaws for such caching. If it hadn't been implemented for javascript view server, there must be some reason, right? First one I see, if you update 3d party package within system, your cached byte code wouldn't be updated too - design have not been changed! - and you'll have a lot of fun in this case(: It could be recompiled once again for such fail, but tests are still needed.
I have also found case that breaks idea with imports on top of function:
the result namespace would be always: {'datetime': <module 'datetime' from '/usr/lib/python2.4/lib-dynload/datetime.so'>, 'groupby': <type 'itertools.groupby'>, 'test': <function test at 0x7f3395044938>} So, those function that iterator would found will be groupby. That's wrong one, but it returns two value tuple, but will generate very strange error:
>>> TypeError: <generator object at 0x7fcebfa58908> is not JSON serializable
Totaly crushing view server. You'll have to spend a lot of time with --debug option enabled to understand why, but currently it would not help you in such case without additional logging. And if generators was JSON serializable you've got even wrong result without any warnings. Still not very explicitly and relaxing behavior ): Binding by names? Not an option.
Random module shouldn't be used for views for sure, but it could be useful for lists to randomization output.
Idea with swapping temp/production databases is nice too if temporary couch instance could serve for a while as production one...but I suppose this interesting disquisition not for this issue(;
In next things I'll agree with you - we have to find mostly ideal point of balance. Hard optimizations and tricks is part of highload environment. There could also be used pypy instead, other faster json module etc. Our task is to create tools that are works, works good, but also have some space for heavy optimization with some trade off.
Yeah, I'll have to look into that 'require' thing. On first glance, it looks like couchjs is doing a request to the design doc for the dependencies.
Right, as said in a previous post, in order for the above-the-function option to work, without the use of a closure, you (the view server function compiler) would have to check every key in the locals dictionary that exec generates to see if it has a __module__ attribute, and if that attribute has the value of None. The only backwards compatible requirement we need is that there is only one callable object (usually a function) that has __module__ set to None (since non-imported local functions/classes will have __module__ of None).
>>> code = """
... from datetime import datetime ... ... class OldStyleClass: ... pass ... ... class NewStyleClass(object): ... pass ... ... y = 17 ... ... def test(): ... return 5 ... """
{'y': 17, 'test': <function test at 0x7f7e25f10320>, 'NewStyleClass': <class 'NewStyleClass'>, 'OldStyleClass': <class __builtin__.OldStyleClass at 0x7f7e25f196b0>, 'datetime': <type 'datetime.datetime'>}
>>> for key in locals:
... if callable(locals[key]): ... if locals[key].__module__: ... print key, "is *not* a candidate, since it's imported from", locals[key].__module__ ... else: ... print key, "is a candidate (hopefully the only one, or we'll have to error out)" ... else: ... print key, "isn't even callable, so we don't care about it" ... y isn't even callable, so we don't care about it test is a candidate (hopefully the only one, or we'll have to error out) NewStyleClass is *not* a candidate, since it's imported from __builtin__ OldStyleClass is *not* a candidate, since it's imported from __builtin__ datetime is *not* a candidate, since it's imported from datetime
Huh, so apparently class definitions inside of an exec will be associated with the __builtin__ module, so we'd have to check for that, as well. But in general, it's easy to do a backwards-compatible check for non-imported callables.
Oh, perhaps the answer to the module distribution problem is to put a custom import mechanism that checks for those modules as attachments to the view/list/show functions design doc before checking the normal on-disk module path. Couch's _changes API would need to be monitored for design doc changes by couchpy too, so that couchpy can know when it needs to reload modules. If this were achievable, you could bundle your modules in the design doc itself (regular zip files and eggs could be supported).
The best way to get a good system in place for this is not to work around Couch's API, but instead to work directly with the Apache Couch community to support everything we're talking about, since none of it violates the side-effect-free requirements of couch if dependency checking can be moved into couch. This wouldn't mean that couch would have to understand any programming language, but would be able to handle changes to certain special design doc keys. For example, couch could *hypothetically* do:
Once again, this doesn't exist in couch, but if it were implemented, couch would only need to know how to interpret the "depends" key. If a string in "lib" changes (couch doesn't need to know or care what the contents of that string mean), then everything that depends on it needs to get updated, just like it reindexes views when the view function strings are changed. In the case of list or shows, this would mean setting a new Etag that invalidates client-cached versions of the previous show/list results. Couch would also need to send the dependency to the view server when it's needed, in the form of some kind of addlib command. couchpy itself could ignore the # version stubs, since those would just be there to provide an easy upgrade path for libraries. Or it could compare the version shown there to the version of the module it imports, and update the design doc if a new version is found on the module path. Dependencies starting with "_attachments" could be handled specially by couch.
By the way, you're right that use of random is side-effect-free. Just keep in mind that couch's Etag/caching semantics will make it so that, if an HTTP client does proper caching, it'll do a conditional GET request for the show/list the next time you ask for it, and unless one of the documents the list/show depends on has changed, couch will tell that client that the resource has not been updated.
Therefore, your random lists will only look random once per depended-upon document update. This is on a client-by-client basis, though. If you have your own caching proxy in the middle, or something like couchbase starts having its own response cache, then everybody will see the same random results on each request, until the next time one of the pertinent documents is updated. This is another "good thing" that couch provides, because even though it might hurt you 5% of the time, it really helps with scalability and responsiveness 95% of the time.
> Yeah, I'll have to look into that 'require' thing. On first glance, it > looks like couchjs is doing a request to the design doc for the > dependencies.
It doesn't but it have access to it via closure. It just have passed to compile function as second argument.
Your example could pass and work as "expected", but it just a case. There are others that wouldn't worked as "expected". There is needed just stable entranse point. May be some kind of decorator would be solution as:
But would it be good, explicitly and clean? Looks like the same as predefined function with special name. I suppose there is no so much need in complex code block. One node - one function. Libs will take others with exported statements as they have been designed to do + eggs as libs to store more complex packages.
> Couch's _changes API would need to be monitored for design doc changes by > couchpy too, so that couchpy can know when it needs to reload modules. > If this were achievable, you could bundle your modules in the design doc > itself (regular zip files and eggs could be supported).
It doesn't needs as if design document have been changed there would be passed command to refresh it within local cache.
Also note, that attachments is separate entity that just binded to document, but doesn't pass with it. So to call attachment from show/list you have to make pure http request - madness!(:
> For example, couch could *hypothetically* do: ...
Too complex solution: instead of just create function you have to create it + set up all required dependences to make to work correct. Same thing does require function currently - just invoke it and extract needed exported statement.
> By the way, you're right that use of random is side-effect-free. Just keep > in mind that couch's Etag/caching semantics will make it so that, if an > HTTP > client does proper caching, it'll do a conditional GET request for the > show/list the next time you ask for it, and unless one of the documents > the > list/show depends on has changed, couch will tell that client that the > resource has not been updated.
In show/list function you could set your own headers and disable caching via Expires header. It has higher priority than Etag one. Actually, Etag only __may__ be used for cache proposes.
Good idea! As you indicated, if you have a single callable, then it'll work as expected. If you have more than one callable, you must designate it with @main.
Well the main point is, that your idea provides a mechanism for the programmer to be as expressive an succinct as they need.
Only difference is that the decorator is a *lot* easier.
The point about my dependency solution is that it lets couch handle recursive dependencies with respect to index rebuilding and Etag handling, so that couch can make sure that all data is consistent. I agree, it's complex, and there's bound to be a better way (I don't care for my solution either -- it's an initial suggestion). I just know that if we have to manage recursive (or even non-recursive) dependencies ourselves, then it won't work. Sooner or later, we'd end up with a badly inconsistent database, with bugs that are really hard to notice.
Dependencies are necessary because in the typical couch application (at least all of the ones I've done), there is a lot of duplicate code, and that duplicate code makes the application much much harder to maintain.
Do you really want to override Etag handling as done by couch. Put it this way, Etag is the absolute best caching mechanism available to you, but it's also *very* complex to get it right. Enterprise-grade server software often fails to handle it usefully (Apache uses inode numbers, which does not allow you to cluster while still keeping out-of-the-box caching), and many high-end websites with big budgets never manage to implement it, instead using expires headers, or spending money on a secondary server to deal with application inefficiencies.
Couch manages Etags perfectly, so even though it's easy to add nodes to scale couch, Etags work in the opposite direction, making it so you have much less of a need to scale. If you have something that's completely dynamic, and there's nothing in couch's architecture that tells you that you must use idempotent show/list functions, then by all means, send a no-cache header. But if you have something that is barely dynamic (like you include a random hash with the output, just for the heck of it, or you want to add a string saying 'response generated in 0.0013 seconds'), then you probably want to rethink what you're trying to do, since you're sacrificing a lot to gain so little.
> Only difference is that the decorator is a *lot* easier.
Easier? May be. Implicitly? For sure. You have always keep in mind this @main decorator. However, I see we've come to current, original state - single function which creates inner context (;
> Do you really want to override Etag handling as done by couch.
At first, sorry for those "wall of text" that subscribers had received
from us - probably we have to create separate topic on groups, but now it's
too late. And special sorry to those one, who unstarred this issue - he
wouldn't receive notification about new version of view server that I would
like to attach for testing.
Short changelog:
! remove dependency from versioning decorator
! fix Mime class and show functions with provides/response_with methods -
they was just totaly broken.
! fix Python exception encoding for CouchDB versions < 0.11.0
! correct filters for versions >= 0.11.1. There is no more userctx
argument, beware!
+ add missed ddoc cache (thanks for this discussion)
+ add support for add_lib command. CouchDB version >= 1.1.x required
+ add support for views command. Currently available for trunk version
+ add support for secobj for validate_doc_update commands. Requires CouchDB
>= 0.11.1. However, this argument leaved as optional due to it doesn't
mentioned in most examples.
+ add require function with same behavior as same javascript function has
+ add docstrings for most valuable methods with descriptions and examples
~ add support for 0.8.0 version - that was too easy (:
~ allow imports in design functions (see notes below)
~ _log function has been replaced by logging handler
~ correct error message for design function wrong definition
~ code cleanup, reorganisation, formatting fixes
~ more tests added and passed (47 total, 5 failed for 0.9.0 due to I
couldn't reproduce valid behavior - have someone windows binaries of 0.9.x?)
I really hadn't knew about this behavior(: So any imports at top level are
useless if only they are not be explicitly passed to target function as
arguments or through decorator. However, I've allow usage of them due to
perfomance reasons.
Next questions that I have:
1. Should I split view.py into view package(propbably better name it
viewserver package) due to a code growing and missing support of sphinx
autodocumentation?
2. Should I add preimported modules? I've stoped at next ones: base64,
calendar, datetime, math, random, re, time - they are quite common, useful
and avaiable in all supported versions.
3. Should I add eggs support via --egg-cache parameter where storage folder
would be specified? Eggs could be stored as base64 encoded strings, not as
attachments due to they are not avaiable from view server.
> Should I split view.py into view package(propbably better name it > viewserver package) due to a code growing and missing support of sphinx > autodocumentation?
Yes, I should. Because operate with 2K of very nested codebase with massive cross functions dependencies is not easy and missing sphinx autodoc feature makes to be sad.
> Should I add preimported modules? I've stoped at next ones: base64, > calendar, datetime, math, random, re, time - they are quite common, > useful and available in all supported versions.
No, I shouldn't. Because I couldn't decide the developer needs for current project, even if those modules are all fits to most tasks. Instead of that, I've create something like QueryServer constructor, which could be used to create your own QueryServer with your own behavior without couchdb-python code patching. Petty nice solution, right?(; See `construct_server` function in `couchdb.server.__init__.py` for how the default query server is defined.
> Should I add eggs support via --egg-cache parameter where storage folder > would be specified? Eggs could be stored as base64 encoded strings, not > as attachments due to they are not available from view server.
Yes, I should. Because this feature provides too much to leave it ignored. However, it's optional and must be enabled explicitly for security and compatibility reasons. To store eggs within design documents you should encode egg as base64 string. See documentation for examples.
So, query server was totally refactored from single module to full package and here is new version changes: + add support eggs as modules. + add option to control GET request to update functions. + add query server constructor: define your own context, error handlers, commands(if you've own CouchDB fork or living with very nightly builds) and more. + add query server documentation article. + add own logging channel for each part of query server. ~ update "Writing views in Python" documentation article. ~ fix doc strings to make them more sphinx friendly. ~ fix for require circular references (COUCHDB-1075) - remove debug decorator, because now you may implement it by your own if you'd like.
And that's all I think(: Could someone review documentation articles due to my poor english knowledge and code to decide is there something needed to change? Any ideas? Criticism? Thanks(:
tested on android 2.3.4 Google Nexus One using Py4A application. To share my happiness do next things: 1. copy couchdb package folder to /sdcard/com.googlecode.pythonforandroid/extras/python (query server imports are absolute and uses couchdb package as root) 2. create file on sdcard, for example /sdcard/couchpy, and place next code into it: PYTHONPATH=/data/data/com.googlecode.pythonforandroid/files/python/lib/pyth on2.6/lib-dynload PYTHONPATH=${PYTHONPATH}:/mnt/sdcard/com.googlecode.pythonforandroid/extras /python export PYTHONPATH export PYTHONHOME=/data/data/com.googlecode.pythonforandroid/files/python export LD_LIBRARY_PATH=/data/data/com.googlecode.pythonforandroid/files/python/lib /data/data/com.googlecode.pythonforandroid/files/python/bin/python /mnt/sdcard/com.googlecode.pythonforandroid/extras/python/couchdb/view.py --couchdb-version=1.0.0 3. add next line to query_servers section in CouchDB configuration: python = sh -e /sdcard/couchpy 4. ... 5. now you could use pythonic design documents on android(: