I've decided to knock TG on the head for a while and pick up the pace on
PHP a bit, driving more into PHP5 with some of my Python lessons. I
don't have a problem with Python by the way, just not for web
development. Anyway I thought you might be interested in why I'm doing
this, so here are some points:
CherryPy
========
* It's slow and doesn't allow me the fine-grained control I need for my
web projects.
* No obvious easy way to do URL rewriting. And no controller.default()
doesn't count.
* I think the sluggishness is mostly because it's written in Python.
Also I can't find a good way to let it handle multiple requests at a
time. I wrote an AJAX in-house tool recently. It aggregates data from a
website using BeautifulSoup. There's an option to aggregate lots of data
at the same time, which it achieves by doing lots of XMLHTTP requests.
CherryPy doesn't seem happy with doing more than 2 processes at a time,
even with thread_pool increased. There could also be a locking issue.
I'm pretty sure this isn't related to FF's "max connections/server"
features - I'm aware of those. I know a similar PHP tool works fine.
SQLObject
=========
* I really like SO. Its inteface is great, I wish I could do joins in
such an easy manner with PHP. However it is just so slow and flaky I
can't handle it any more. Selecting a list of IDs then selecting each
row by ID in turn is just unacceptable. REALLY unacceptable. Even my
project manager has noticed a website is slow because of this.
* I can't stand how SO will bomb out on UnicodeErrors causing a DOS on
that page.
* I also can't stand how if you remove a row from the database that has
a reference somewhere, SO will raise SQLObjectNotFound whenever it goes
near that data. My PHP apps don't suffer from this because I write the
complete three-table inner join in SQL, which will ignore missing
references (and is a boatload faster). It makes the DB a little messy
sometimes, but that's nothing compared to DOSing the page with a 500
error. Updating my CRUD methods every time I associate a new object is
*not* fun web development.
I know the last point can be solved with using a DB that supports
foreign key constrains. We use MySQL, there are no PG servers, that's
just the way of it. I could convert my tables to InnoDB - I probably
will in future.
Kid
===
* The "NoneType is not callable" bug gets old real fast
Documentation
=============
Mostly my fault for using 0.9a*, but 0.9a* contains the only features
which attract me. Nevertheless, no docs is no good, and I need to get
stuff done *now*
=========================================================
However I will miss some things about TG. Identity is great, top draw on
that one. I'll have to port it to PHP very soon. I also like widgets,
very much, although without accurate documentation right now it's
difficult for me to save time by using them on anything but the most
basic of forms. I'm sure with time this problem will go away as I
remember more of the API.
I'm pretty sure I'll come back, probaly when First Class is ready. As I
understand it, FC will be WSGI (which I think means I can run it under
Apache without too much flakiness). Also SA support should be properly
finished, tested and documented by then, which means I can ditch SO. I
will never use SO until it fixes the way in which it selects data from
the database. Sometimes slow is just too slow.
Actually I'll probably use TG before then. There are certain classes of
sites I think TG would be perfect for, but I'll have to think very hard
about the specific requirements of the site before firing up TG again.
Thanks guys, you're doing a great job.
-Rob
Thanks for the comments. I'm just going to try to give you hope that
the future will be better :) and ask a couple of questions.
> CherryPy
> ========
>
> It's slow and doesn't allow me the fine-grained control
> I need for my web projects.
FWIW, CP 3 (fast approaching beta) is about twice as fast as CP 2. I'd
be very interested to know more about what you mean by "fine-grained
control". Now is the time to get feature requests in. ;)
> No obvious easy way to do URL rewriting. And no
> controller.default() doesn't count.
CP 3 will have full support for custom dispatchers, like Routes or
Django-style regexes.
> I think the sluggishness is mostly because it's written
> in Python. Also I can't find a good way to let it handle
> multiple requests at a time. I wrote an AJAX in-house
> tool recently. It aggregates data from a website using
> BeautifulSoup. There's an option to aggregate lots of data
> at the same time, which it achieves by doing lots of
> XMLHTTP requests. CherryPy doesn't seem happy with
> doing more than 2 processes at a time, even with
> thread_pool increased. There could also be a locking
> issue. I'm pretty sure this isn't related to FF's
> "max connections/server" features - I'm aware of
> those. I know a similar PHP tool works fine.
These are always hard to address because the locking issue might be
completely outside of CherryPy; I've heard scattered reports of locking
issues but haven't been able to reproduce them. If there's any way you
could demo the problem, I'd be *very* glad to review it.
C'mon back someday!
Robert Brewer
System Architect
Amor Ministries
fuma...@amor.org
>From my experience with a Java webapp framework written by a friend of
mine, retrieving IDs first, then the objects in a second pass, has been
one of the best design decisions he made. I liked the idea so much that
we started implementing it in our custom ORM built on Zope, and
witnessed a very significant speed improvement. Why this works:
1) the 1st phase (retrieving IDs), even with complex joins and filters,
can be really
fast, since the database won't have to deal with any real data, only
primary keys and indexes.
2) the 2nd phase (retrieving objects) is made really fast too by using
aggressive caching. Except for the 1st access to an object, there won't
be any more actual database "select", since the object will be
retrieved from the cache.
A note about the 2nd phase: ideally, it should be done in a single
query. For example, if phase one returned a list of (1, 2, 3, 4, 5),
and we already have (1, 2, 3) in the cache, the second phase should do
a single select with "WHERE id IN (4, 5)". Your comment suggests that
SQLObject may do it as 2 distinct selects, which indeed would be
suboptimal.
Now bear with me: I did not talk about SQLObject in particular. I am
still new to it, and I don't know enough about its inner workings to
vouch for or against it. All I am saying is: don't blame the idea of
splitting IDs retrieval and objects retrieval. IMHO, it's one of the
best things since sliced bread!
Cheers,
--
Yves-Eric
Sorry to hear you returning to PHP for a bit, but I understand the
need to do what your work requires.
On Aug 8, 2006, at 10:15 AM, Robin Haswell wrote:
> CherryPy
> ========
>
> * It's slow and doesn't allow me the fine-grained control I need
> for my
> web projects.
> * No obvious easy way to do URL rewriting. And no controller.default()
> doesn't count.
Bob addressed these. First class will definitely have these
specifically addressed in some fashion.
> * I think the sluggishness is mostly because it's written in Python.
> Also I can't find a good way to let it handle multiple requests at a
> time. I wrote an AJAX in-house tool recently. It aggregates data
> from a
> website using BeautifulSoup. There's an option to aggregate lots of
> data
> at the same time, which it achieves by doing lots of XMLHTTP requests.
> CherryPy doesn't seem happy with doing more than 2 processes at a
> time,
> even with thread_pool increased. There could also be a locking issue.
> I'm pretty sure this isn't related to FF's "max connections/server"
> features - I'm aware of those. I know a similar PHP tool works fine.
I don't think Python is the issue.
> SQLObject
> =========
>
> * I really like SO. Its inteface is great, I wish I could do joins in
> such an easy manner with PHP. However it is just so slow and flaky I
> can't handle it any more. Selecting a list of IDs then selecting each
> row by ID in turn is just unacceptable. REALLY unacceptable. Even my
> project manager has noticed a website is slow because of this.
> * I can't stand how SO will bomb out on UnicodeErrors causing a DOS on
> that page.
> * I also can't stand how if you remove a row from the database that
> has
> a reference somewhere, SO will raise SQLObjectNotFound whenever it
> goes
> near that data. My PHP apps don't suffer from this because I write the
> complete three-table inner join in SQL, which will ignore missing
> references (and is a boatload faster). It makes the DB a little messy
> sometimes, but that's nothing compared to DOSing the page with a 500
> error. Updating my CRUD methods every time I associate a new object is
> *not* fun web development.
>
> I know the last point can be solved with using a DB that supports
> foreign key constrains. We use MySQL, there are no PG servers, that's
> just the way of it. I could convert my tables to InnoDB - I probably
> will in future.
SQLAlchemy is the answer here.
> Kid
> ===
>
> * The "NoneType is not callable" bug gets old real fast
I'm actually very impressed with what I've seen of Markup so far. I'm
hoping to see some kind of combination of Kid and Markup's
technologies that would put this to rest once and for all.
> Documentation
> =============
>
> Mostly my fault for using 0.9a*, but 0.9a* contains the only features
> which attract me. Nevertheless, no docs is no good, and I need to get
> stuff done *now*
This is definitely being addressed. Improving our state of online
docs now and ongoing is my current top priority for the project.
Beyond that, half of "Rapid Web Applications with TurboGears" should
be available online soon, and all of it is slated to be available at
the end of October.
> =========================================================
>
> However I will miss some things about TG. Identity is great, top
> draw on
> that one. I'll have to port it to PHP very soon. I also like widgets,
> very much, although without accurate documentation right now it's
> difficult for me to save time by using them on anything but the most
> basic of forms. I'm sure with time this problem will go away as I
> remember more of the API.
>
> I'm pretty sure I'll come back, probaly when First Class is ready.
> As I
> understand it, FC will be WSGI (which I think means I can run it under
> Apache without too much flakiness). Also SA support should be properly
> finished, tested and documented by then, which means I can ditch SO. I
> will never use SO until it fixes the way in which it selects data from
> the database. Sometimes slow is just too slow.
>
> Actually I'll probably use TG before then. There are certain
> classes of
> sites I think TG would be perfect for, but I'll have to think very
> hard
> about the specific requirements of the site before firing up TG again.
>
> Thanks guys, you're doing a great job.
Thanks for the feedback, Rob. Good luck with your projects, and stay
tuned here!
Kevin
...and were you using CherryPy sessions, by any chance?
>>From my experience with a Java webapp framework written by a friend of
> mine, retrieving IDs first, then the objects in a second pass, has been
> one of the best design decisions he made. I liked the idea so much that
> we started implementing it in our custom ORM built on Zope, and
> witnessed a very significant speed improvement. Why this works:
>
> 1) the 1st phase (retrieving IDs), even with complex joins and filters,
> can be really
> fast, since the database won't have to deal with any real data, only
> primary keys and indexes.
>
> 2) the 2nd phase (retrieving objects) is made really fast too by using
> aggressive caching. Except for the 1st access to an object, there won't
> be any more actual database "select", since the object will be
> retrieved from the cache.
>
> A note about the 2nd phase: ideally, it should be done in a single
> query. For example, if phase one returned a list of (1, 2, 3, 4, 5),
> and we already have (1, 2, 3) in the cache, the second phase should do
> a single select with "WHERE id IN (4, 5)". Your comment suggests that
> SQLObject may do it as 2 distinct selects, which indeed would be
> suboptimal.
I assume you have a global cache right? Otherwise I do wonder how this
works when several clients update the database.
Now I don't quite understand the benefit of your technique. You say that
by only requesting IDs in the first query you reduce the load of data
retrieved by the database, but why don't you simply select the columns you
do need to process? I mean in the second select it is not sure that you
will need all the colums and you might waste some CPU anyway.
Besides, it is also possible that between the time you request an ID and
the time you actually fetch the row for that ID, this one may have been
deleted and you will hit an error.
I really fail to understand the benefit of that technique but I'm not a
database/ORM expert anyway.
- Sylvain
> Besides, it is also possible that between the time you request an ID and
> the time you actually fetch the row for that ID, this one may have been
> deleted and you will hit an error.
This depends on the isolation level and how he started the process... I
believe that he can work with a snapshot where all retrieved IDs still have
their data available if he's inside a transaction and had the correct
isolation level on his database / connection to the database.
> I really fail to understand the benefit of that technique but I'm not a
> database/ORM expert anyway.
Probably they're working more on the client side -- doing the FK consistency,
JOINs, filtering, etc. -- than on server side. For the server side I'd go
with a function, a view or even something that would retrieve what I need
directly.
To make one operation with SQLObject, make it a list():
data = model.MyTable.select(orderBy = model.MyTable.q.description)
data = list(data) # <-- this makes one select only
(Of course, you can write it in one line, I just wanted to point out what
makes the "single" access to the database. --- I believe there are two, one
to retrieve the columns and one to retrieve the data.)
There were techniques and products shown here (such as memcached) to optimize
things and implement a global cache... Those should also help with the
database hitting problem, but I have never tried it to see how it will handle
SQL Object SelectResults...
--
Jorge Godoy <jgo...@gmail.com>
Maybe it's not too late for TG to get a nod like this:
http://37signals.com/svn/archives2/apple_includes_rails_with_leopard.php
That would be sweet, eh?
Sorry to see you go. One point, though...
Robin Haswell wrote:
> SQLObject
> =========
> * I also can't stand how if you remove a row from the database that has
> a reference somewhere, SO will raise SQLObjectNotFound whenever it goes
> near that data. My PHP apps don't suffer from this because I write the
> complete three-table inner join in SQL, which will ignore missing
> references (and is a boatload faster). It makes the DB a little messy
> sometimes, but that's nothing compared to DOSing the page with a 500
> error. Updating my CRUD methods every time I associate a new object is
> *not* fun web development.
>
> I know the last point can be solved with using a DB that supports
> foreign key constrains. We use MySQL, there are no PG servers, that's
> just the way of it. I could convert my tables to InnoDB - I probably
> will in future.
I definitely agree with this, however, SQLObject has a fairly
undocumented feature where it will "fake" referential integrity
whenever you use a ForeignKey column on a DB without native referential
integrity. You get at it via the "cascade" keyword argument:
(from SQLObject col.py:)
# cascade can be one of:
# None: no constraint is generated
# True: a CASCADE constraint is generated
# False: a RESTRICT constraint is generated
# 'null': a SET NULL trigger is generated
All the magic happens in destroySelf() (which is itself called by
delete()). And it works on MySQL with MyISAM tables. Plus, when you
migrate to an engine that *does* support referential integrity, the
constraints get generated automagically for nearly seamless transition.
It took me quite a bit of searching on mailing lists and newsgroups to
find this tidbit. Hopefully this will help some other hapless TG
early-adopter.
Yes in our case we have a global cache. It is possible to make it work
with local caches too, with some cache invalidation mechanism so that a
client can signal all others when an object is updated.
> Now I don't quite understand the benefit of your technique.
I was not convinced at first either, but I saw the results and it does
work. I guess one way to understand why it works is to take an example.
I have a reasonably large table in PostgreSQL here, and let's say I
want to build a "Top 100" page. Omitting the "ORDER BY rating LIMIT
100" for readability, here are some timing results from tests I just
ran:
Scenario 1: a simple "select *":
SELECT * --> 312 ms
==> 312 ms spent in DB access for each page view.
Scenario 2: select just the needed columns for the page:
SELECT id, category, title, year, rating, votes --> 156 ms
==> 156 ms spent in DB access for each page view. Also note that this
requires your building a custom query (which want to avoid, and that's
the reason why I am using an ORM layer).
Scenario 3: the two phase retrieval -->
Phase 1: SELECT id --> 47 ms
Phase 2: SELECT * --> 312 ms
On the 1st page view only: phase 1 + phase 2 = 359 ms spent in DB
access.
On all subsequent page views: only phase one + 0 (cache hit on phase
two) = 47 ms in DB access.
==> 47 ms spent in DB access for each page view.
As you can see, in my case, scenario 3 is almost an order of magnitude
faster that scenario 1. Of course, YMMV.
Cheers,
--
Yves-Eric
The reason PHP works is because it lies on top of Apache. I would
recommend that you check out mod_python if you haven't already. It
allows you to plug Python code directly into each phase of Apache
request handling - and I used to do all my webapps with it before
going to TG (I had been using PHP for several years before going to
mod_python). It's very fast, and you can configure Apache any way you
like in terms of processes/threads etc.
You will need some boilerplate code - which I believe you will with
PHP anyway - but making use of Routes will go a long way and give you
maximum flexibility - plus, you can use all the great Python libraries
- sqlalchemy, markup, etc..
I have to admit myself, that if I were to write a mission-critical
application that needed to handle high loads - I would probably go
back to mod_python + homemade framework instead of TG.
Arnar
Considering the fact CP3 has now a built-on mod_python adapter, I'm sure
you will change your opinion in the future ;)
- Sylvain
That sounds good - also, I'm looking forward to "native" Routes support.
I'm having a hard time finding anything useful on cherrpy.org - where
can I read about upcoming features in CP3?
btw, where can I find decent CP documentation,
http://docs.cherrypy.org/ is not exactly well organized.
Arnar
Well you could find that :)
http://docs.cherrypy.org/writing-your-own-dispatcher
That explains how one could write its own dispatcher using Routes and use
it in CP3.
I am a bit sensitive on the documentation subject. I agree with you, CP
documentation sucks. Big time. I'm saddened byt that state as much as you.
However http://docs.cherrypy.org/ is there for people to contribute and
only a few have done so far (which I really appreciate). I mean it's also
up to the community to be active sometime. I do hope once I am finished
writing the CherryPy book (which should be published in a few months) I'll
be able to improve the situation but I do not have the time for now.
You know, I think SQLAlchemy and SQLObject documentation suck a lot as
well but because I can't contribute I don't judge them ;)
Sorry it's not personnal towards you Arnar. It's just that documentation
is opened to everybody to improve but very few people actually take of
their time to do it.
- Sylvain
Non taken :o) I would be happy to contribute if I had the knowledge.
In the meantime, I found this to be an excellent resource:
http://www.aminus.org/blogs/index.php/fumanchu?cat=64
in case someone else is looking
Arnar
He he not wondering why it is so good since Robert is the one behind
CherryPy 3 and most of CherryPy 2 :)
- Sylvain