From: Armin Ronacher <armin.r...@active-4.com>Date: August 2, 2009 1:10:29 PM CDTTo: "DiPierro, Massimo" <MDiP...@cs.depaul.edu>Subject: My thoughts on web2pyHi Massimo,
First a word of warning. I send you this mail in private because I want
to share my thoughts on the web2py code and because I hope I can help
with that somehow. You're free to publish this mail if you want.
As you might remember I came in contract with web2py a long ago when you
were evaluating pygments as a syntax highlighter for the web2py
documentation or something and you decided to implement your own because
you were unable to "ship" pygments as part of web2py. As far as I
remember py2app or py2exe were not able to pick up the on-demand imports
in the lexer module.
I have been watching the development ever since because (And please
don't think bad of me because of that) it appeared to me that web2py was
a joke. I remember reading some very early revisions of your PDF
documentation where I remember a code that looks something like that:
db.define_table('something', db.Field('meh', 'date',
default=date.today()))
That line freaked me out. This line alone showed me that web2py is
doing something very, very weird. From this line I could see that it
means one of the following two things:
- either the code has a bug for long running applications (in this case
when the application code runs longer than a day in which case the
default would have to change dynamically)
- or the global scope has different semantics and is reevaluated each
request.
Turns out the latter is what happens in web2py. There still is a race
condition but only a small one. The bigger problem with this however is
that this means the (not so) static definitions are reevaluated each
request.
Especially for the database table definitions (which cause many function
calls) this means an enormous per-request penality if you have many of
them. I even remember reading a case like this in the web2py newsgroup
about someone who had twenty table definitions or something and the
majority of the request time was spent setting up the tables.
However that also means another thing. Standard Python optimizations no
longer apply. In Python usually modules are cached in an interpreter
bound dictionary for reusing. This of course does not apply for web2py.
However the second problem here is that only the local scope gets
certain optimizations in cpython such as fast-local lookup. Instead of
storing the variables in a dictionary they are instead stored in an an
object where items are looked up by index (fast locals) rather than by a
string.
Now of course one can argue that this is the way it's intended. That's
nice but for that, the documentation is lacking. There is no
explanation of what implications this has on the code. And in
comparison with "enterprise ready" that gives me a strange feeling about
the project.
I'm not even exactly sure what kills web2py for me in the end, but I'm
pretty sure that part of the reason is the way the project is advertised.
web2py does not have worse code than many popular libraries out there.
BeautifulSoup comes to mind. That library does some really ugly things
there including monkeypatching of the standard library. It does
different things depending on the version of Python it's running on and
in the code are some really ugly idioms like a unicode subclass that has
a reference to the parsed tree that results in memory leaks very easily
because it's nearly impossible to get rid of the reference (actually
just by exploiting some weird Python internals that are mostly
undocumented). But the author of BeautifulSoup does not advertise it as
a backwards-compatible, enterprise ready HTML parser.
The web2py code on the other hand has some really problematic idioms as
well. The execution of models, views and whatnot in a made-up namespace
comes to mind. Due to the decision of being backwards compatible it
will be nearly impossible to "fix" those things. However there are so
many implications nobody yet knows about which is terribel for large
projects.
For example the database table redfinition problem cited above. Of
course that particular problem can be worked around with making the
definition code internally lazy or something but there can be more
nobody yet knows about.
Especially side-effects of code execution can do some terrible damage.
Decorators that would register code would instantly leak for example.
And some libraries provide auto-registration that you would not
instantly know about.
Also libraries like pickle do not support weird situations like this.
So I guess in the end I'm not (only) complaining about the web2py code
or decisions there, but mainly how web2py is advertised as enterprise
ready when nobody knows how these idioms scale or behave.
Another part is that web2py (and this is something web2py has in common
with so many other web frameworks) is the craze to reimplement
everything. I know you don't want to pull depedencies in but from what
I can see you're providing one-click installers for the framework which
could ship external dependencies. Both py2exe and py2app provide
support for external dependencies and I can't see what "not depending on
webob/werkzeug" gives you there.
The reason why I'm addressing this is that there are so many
limitations, problems and weird aspects in WSGI and HTTP that are hard
to tackle and many people get wrong. I'm not saying you're getting it
wrong, I haven't even studied the code in question, but I'm pretty sure
it would make sense to collaborate on that.
About the code quality itself, not the (IMO) broken concepts I picked
some of the snippets I found quickly searching with the help of ack:
- Missing `close()` calls with files (ignored matches from compileapp):
gluon/fileutils.py:160: f.write(open(tarname, 'rb').read())
gluon/fileutils.py:246: data = open(filename, 'rb').read()
gluon/highlight.py:313: data = open(sys.argv[1]).read()
gluon/languages.py:37: lang_text = open(filename, 'r').read()...
gluon/languages.py:153: data = open(file, 'r').read()
gluon/main.py:107:web2py_version = open(os.path.join(web2py_path, ...
gluon/rewrite.py:26: exec open('routes.py', 'r').read() ...
gluon/template.py:104: text = open(os.path.join(path, ...
gluon/template.py:118: parent = open(t, 'rb').read()
gluon/template.py:134: child = open(t, 'rb').read()
gluon/tools.py:1934: return urllib.urlopen(url).read()
gluon/widget.py:55:ProgramVersion = open('VERSION', ...
- Same in streamer.py for checking for file existence
(open(static_file))
- Template engine seems to be missing proper escapes. Having ''' in the
template should be enough to break it.
- Whitespace between "else" and colon in the template engine is unlikely
to work with the current parsing rules as far as I can see.
- xmlrpc system sends text/xml content type which is with a missing
charset property set to us-ascii. No charset is set so it has to be
specified explicitly (proper header would be application/xml;
charset=utf-8)
- Fileutils has a listdir method that does very inefficient deleting
from the list. Current code.
50 for dir in dirs[:]:
51 if dir.startswith('.'):
52 dirs.remove(dir)
That creates a shallow copy, for each item it performs an O(n) lookup
for the item and the remove is a O(n) operation as well. Even
unecessary because you have the position if you enumerate over the
list.
However you can do that with a list comprehension:
50 dirs[:] = [x for x in dirs if not x.startswith('.')]
- Having classes in local scope (fileutils.py for the tarfile) makes it
incredible hard to reuse code. Not a good idea, move that into the
global scope instead. Will also help performance.
- There is no bug in shutil.copyfileobj. That function works perfetly
and does what the documentation says it should do. Maybe you were
copying from the wsgi input stream. That of course won't work that
easy because there is no end-of-file marker in the input stream. See
werkzeug's LimitedStream for ways how to fix that problem. A similar
class also exists in WebOb but I don't remember the name.
- You should not inherit from BaseException unless you are dealing with
special exceptions that should not be catched by an "except Exception"
- there are realtive imports all over the place. This is a deprecated
unfeature, switch to "from gluon.html import blah" instead of the
current "from html import blah".
- the email code in tools is vulnerable to header injections if the
input is unfiltered. If this is intentional it should be documented
in the code.
- There is no proper undecoding of the path info in main.py but spaces
are transformed into underscores. However a plus is converted into a
space as well which is against the RFC. Plus is only an alias for %20
in the querystring of a URL.
- request.vars is either a dict of lists or a dict of plain strings.
This means the user has to perform instance checks each access, not
very clever. Other systems fix that by providing multi-value dicts.
- time.strftime should not be used for header-date formatting because
it's affected by locales. That means if a user is using locales it
will break all headers sent that deal with dates.
This list was created in ~15 minutes from quickly checking some files.
If one does a check on the code more things will come up. Some things
are also underspecified or unclear in the code. For example how the
code is supposed to deal with unicode (it does that in very different
ways) and how headers are quoted/unquoted, how URLs are decoded etc.
Also the code architecture is suboptimal in places. Hardcoded support
for Google appengine in places where you wouldn't expect it (like
validators etc.)
For an "enterprise" framework this is a bit too much smelly code that is
found in no time.
What would I fix? Stop advertising the framework as enterprise ready
until it matured more. And do less advertising in general. And if you
can, fix some of the things in the code. Those that are caused by
design choices won't be fixed anyway.
Regards,
Armin
+1 Really bad idea. I have never seen even a shadow of reason for this. We trust Massimo with his declaration. Please, try to understand. This is about serious production environments, important applications and making money using web2py. We could consider it if there was a real need. But there is none.
On Aug 4, 2009 9:00 PM, "Fran" <franc...@googlemail.com> wrote:On Aug 4, 7:30 pm, Pynthon <forumx...@gmail.com> wrote: > Yeah maybe a complete web2py 2 rewrite. W...
I believe this is a bad idea as we really don't want to break existing
apps - this is a *key* strength of web2py & one many of it's adherents
really value. This is what justifies the term 'Enterprise'.
Many of the 'issues' are easy to solve without doing so & the rest are
pretty much design decisions.
They are seen as key strengths for some & key weaknesses for others.
Let those who see them as strengths enjoy this & if others cannot live
with what they see as weaknesses, let them choose from the many other
frameworks out there.
F --~--~---------~--~----~------------~-------~--~----~ You received this message because you are s...