Why I'm knocking TG on the head for a bit

0 views
Skip to first unread message

Robin Haswell

unread,
Aug 8, 2006, 10:15:33 AM8/8/06
to turbo...@googlegroups.com
Hey guys

I've decided to knock TG on the head for a while and pick up the pace on
PHP a bit, driving more into PHP5 with some of my Python lessons. I
don't have a problem with Python by the way, just not for web
development. Anyway I thought you might be interested in why I'm doing
this, so here are some points:

CherryPy
========

* It's slow and doesn't allow me the fine-grained control I need for my
web projects.
* No obvious easy way to do URL rewriting. And no controller.default()
doesn't count.
* I think the sluggishness is mostly because it's written in Python.
Also I can't find a good way to let it handle multiple requests at a
time. I wrote an AJAX in-house tool recently. It aggregates data from a
website using BeautifulSoup. There's an option to aggregate lots of data
at the same time, which it achieves by doing lots of XMLHTTP requests.
CherryPy doesn't seem happy with doing more than 2 processes at a time,
even with thread_pool increased. There could also be a locking issue.
I'm pretty sure this isn't related to FF's "max connections/server"
features - I'm aware of those. I know a similar PHP tool works fine.

SQLObject
=========

* I really like SO. Its inteface is great, I wish I could do joins in
such an easy manner with PHP. However it is just so slow and flaky I
can't handle it any more. Selecting a list of IDs then selecting each
row by ID in turn is just unacceptable. REALLY unacceptable. Even my
project manager has noticed a website is slow because of this.
* I can't stand how SO will bomb out on UnicodeErrors causing a DOS on
that page.
* I also can't stand how if you remove a row from the database that has
a reference somewhere, SO will raise SQLObjectNotFound whenever it goes
near that data. My PHP apps don't suffer from this because I write the
complete three-table inner join in SQL, which will ignore missing
references (and is a boatload faster). It makes the DB a little messy
sometimes, but that's nothing compared to DOSing the page with a 500
error. Updating my CRUD methods every time I associate a new object is
*not* fun web development.

I know the last point can be solved with using a DB that supports
foreign key constrains. We use MySQL, there are no PG servers, that's
just the way of it. I could convert my tables to InnoDB - I probably
will in future.

Kid
===

* The "NoneType is not callable" bug gets old real fast

Documentation
=============

Mostly my fault for using 0.9a*, but 0.9a* contains the only features
which attract me. Nevertheless, no docs is no good, and I need to get
stuff done *now*

=========================================================

However I will miss some things about TG. Identity is great, top draw on
that one. I'll have to port it to PHP very soon. I also like widgets,
very much, although without accurate documentation right now it's
difficult for me to save time by using them on anything but the most
basic of forms. I'm sure with time this problem will go away as I
remember more of the API.

I'm pretty sure I'll come back, probaly when First Class is ready. As I
understand it, FC will be WSGI (which I think means I can run it under
Apache without too much flakiness). Also SA support should be properly
finished, tested and documented by then, which means I can ditch SO. I
will never use SO until it fixes the way in which it selects data from
the database. Sometimes slow is just too slow.

Actually I'll probably use TG before then. There are certain classes of
sites I think TG would be perfect for, but I'll have to think very hard
about the specific requirements of the site before firing up TG again.

Thanks guys, you're doing a great job.

-Rob

fumanchu

unread,
Aug 8, 2006, 1:18:53 PM8/8/06
to TurboGears
Hi, Robin,

Thanks for the comments. I'm just going to try to give you hope that
the future will be better :) and ask a couple of questions.

> CherryPy
> ========


>
> It's slow and doesn't allow me the fine-grained control
> I need for my web projects.

FWIW, CP 3 (fast approaching beta) is about twice as fast as CP 2. I'd
be very interested to know more about what you mean by "fine-grained
control". Now is the time to get feature requests in. ;)

> No obvious easy way to do URL rewriting. And no
> controller.default() doesn't count.

CP 3 will have full support for custom dispatchers, like Routes or
Django-style regexes.

> I think the sluggishness is mostly because it's written
> in Python. Also I can't find a good way to let it handle
> multiple requests at a time. I wrote an AJAX in-house
> tool recently. It aggregates data from a website using
> BeautifulSoup. There's an option to aggregate lots of data
> at the same time, which it achieves by doing lots of
> XMLHTTP requests. CherryPy doesn't seem happy with
> doing more than 2 processes at a time, even with
> thread_pool increased. There could also be a locking
> issue. I'm pretty sure this isn't related to FF's
> "max connections/server" features - I'm aware of
> those. I know a similar PHP tool works fine.

These are always hard to address because the locking issue might be
completely outside of CherryPy; I've heard scattered reports of locking
issues but haven't been able to reproduce them. If there's any way you
could demo the problem, I'd be *very* glad to review it.

C'mon back someday!


Robert Brewer
System Architect
Amor Ministries
fuma...@amor.org

Yves-Eric Martin

unread,
Aug 8, 2006, 9:01:13 PM8/8/06
to TurboGears
Robin Haswell wrote:
> SQLObject
> =========
>
> * I really like SO. Its inteface is great, I wish I could do joins in
> such an easy manner with PHP. However it is just so slow and flaky I
> can't handle it any more. Selecting a list of IDs then selecting each
> row by ID in turn is just unacceptable. REALLY unacceptable. Even my
> project manager has noticed a website is slow because of this.

>From my experience with a Java webapp framework written by a friend of
mine, retrieving IDs first, then the objects in a second pass, has been
one of the best design decisions he made. I liked the idea so much that
we started implementing it in our custom ORM built on Zope, and
witnessed a very significant speed improvement. Why this works:

1) the 1st phase (retrieving IDs), even with complex joins and filters,
can be really
fast, since the database won't have to deal with any real data, only
primary keys and indexes.

2) the 2nd phase (retrieving objects) is made really fast too by using
aggressive caching. Except for the 1st access to an object, there won't
be any more actual database "select", since the object will be
retrieved from the cache.

A note about the 2nd phase: ideally, it should be done in a single
query. For example, if phase one returned a list of (1, 2, 3, 4, 5),
and we already have (1, 2, 3) in the cache, the second phase should do
a single select with "WHERE id IN (4, 5)". Your comment suggests that
SQLObject may do it as 2 distinct selects, which indeed would be
suboptimal.

Now bear with me: I did not talk about SQLObject in particular. I am
still new to it, and I don't know enough about its inner workings to
vouch for or against it. All I am saying is: don't blame the idea of
splitting IDs retrieval and objects retrieval. IMHO, it's one of the
best things since sliced bread!


Cheers,

--
Yves-Eric

Kevin Dangoor

unread,
Aug 9, 2006, 12:04:43 AM8/9/06
to turbo...@googlegroups.com
Hi Rob,

Sorry to hear you returning to PHP for a bit, but I understand the
need to do what your work requires.

On Aug 8, 2006, at 10:15 AM, Robin Haswell wrote:

> CherryPy
> ========
>
> * It's slow and doesn't allow me the fine-grained control I need
> for my
> web projects.
> * No obvious easy way to do URL rewriting. And no controller.default()
> doesn't count.

Bob addressed these. First class will definitely have these
specifically addressed in some fashion.

> * I think the sluggishness is mostly because it's written in Python.
> Also I can't find a good way to let it handle multiple requests at a
> time. I wrote an AJAX in-house tool recently. It aggregates data
> from a
> website using BeautifulSoup. There's an option to aggregate lots of
> data
> at the same time, which it achieves by doing lots of XMLHTTP requests.
> CherryPy doesn't seem happy with doing more than 2 processes at a
> time,
> even with thread_pool increased. There could also be a locking issue.
> I'm pretty sure this isn't related to FF's "max connections/server"
> features - I'm aware of those. I know a similar PHP tool works fine.

I don't think Python is the issue.

> SQLObject
> =========
>
> * I really like SO. Its inteface is great, I wish I could do joins in
> such an easy manner with PHP. However it is just so slow and flaky I
> can't handle it any more. Selecting a list of IDs then selecting each
> row by ID in turn is just unacceptable. REALLY unacceptable. Even my
> project manager has noticed a website is slow because of this.
> * I can't stand how SO will bomb out on UnicodeErrors causing a DOS on
> that page.
> * I also can't stand how if you remove a row from the database that
> has
> a reference somewhere, SO will raise SQLObjectNotFound whenever it
> goes
> near that data. My PHP apps don't suffer from this because I write the
> complete three-table inner join in SQL, which will ignore missing
> references (and is a boatload faster). It makes the DB a little messy
> sometimes, but that's nothing compared to DOSing the page with a 500
> error. Updating my CRUD methods every time I associate a new object is
> *not* fun web development.
>
> I know the last point can be solved with using a DB that supports
> foreign key constrains. We use MySQL, there are no PG servers, that's
> just the way of it. I could convert my tables to InnoDB - I probably
> will in future.

SQLAlchemy is the answer here.

> Kid
> ===
>
> * The "NoneType is not callable" bug gets old real fast

I'm actually very impressed with what I've seen of Markup so far. I'm
hoping to see some kind of combination of Kid and Markup's
technologies that would put this to rest once and for all.

> Documentation
> =============
>
> Mostly my fault for using 0.9a*, but 0.9a* contains the only features
> which attract me. Nevertheless, no docs is no good, and I need to get
> stuff done *now*

This is definitely being addressed. Improving our state of online
docs now and ongoing is my current top priority for the project.
Beyond that, half of "Rapid Web Applications with TurboGears" should
be available online soon, and all of it is slated to be available at
the end of October.

> =========================================================
>
> However I will miss some things about TG. Identity is great, top
> draw on
> that one. I'll have to port it to PHP very soon. I also like widgets,
> very much, although without accurate documentation right now it's
> difficult for me to save time by using them on anything but the most
> basic of forms. I'm sure with time this problem will go away as I
> remember more of the API.
>
> I'm pretty sure I'll come back, probaly when First Class is ready.
> As I
> understand it, FC will be WSGI (which I think means I can run it under
> Apache without too much flakiness). Also SA support should be properly
> finished, tested and documented by then, which means I can ditch SO. I
> will never use SO until it fixes the way in which it selects data from
> the database. Sometimes slow is just too slow.
>
> Actually I'll probably use TG before then. There are certain
> classes of
> sites I think TG would be perfect for, but I'll have to think very
> hard
> about the specific requirements of the site before firing up TG again.
>
> Thanks guys, you're doing a great job.

Thanks for the feedback, Rob. Good luck with your projects, and stay
tuned here!

Kevin

fumanchu

unread,
Aug 9, 2006, 2:42:32 AM8/9/06
to TurboGears
> Also I can't find a good way to let it handle
> multiple requests at a time.

...and were you using CherryPy sessions, by any chance?

Sylvain Hellegouarch

unread,
Aug 9, 2006, 2:45:06 AM8/9/06
to turbo...@googlegroups.com
Yves,

>>From my experience with a Java webapp framework written by a friend of
> mine, retrieving IDs first, then the objects in a second pass, has been
> one of the best design decisions he made. I liked the idea so much that
> we started implementing it in our custom ORM built on Zope, and
> witnessed a very significant speed improvement. Why this works:
>
> 1) the 1st phase (retrieving IDs), even with complex joins and filters,
> can be really
> fast, since the database won't have to deal with any real data, only
> primary keys and indexes.
>
> 2) the 2nd phase (retrieving objects) is made really fast too by using
> aggressive caching. Except for the 1st access to an object, there won't
> be any more actual database "select", since the object will be
> retrieved from the cache.
>
> A note about the 2nd phase: ideally, it should be done in a single
> query. For example, if phase one returned a list of (1, 2, 3, 4, 5),
> and we already have (1, 2, 3) in the cache, the second phase should do
> a single select with "WHERE id IN (4, 5)". Your comment suggests that
> SQLObject may do it as 2 distinct selects, which indeed would be
> suboptimal.

I assume you have a global cache right? Otherwise I do wonder how this
works when several clients update the database.

Now I don't quite understand the benefit of your technique. You say that
by only requesting IDs in the first query you reduce the load of data
retrieved by the database, but why don't you simply select the columns you
do need to process? I mean in the second select it is not sure that you
will need all the colums and you might waste some CPU anyway.

Besides, it is also possible that between the time you request an ID and
the time you actually fetch the row for that ID, this one may have been
deleted and you will hit an error.

I really fail to understand the benefit of that technique but I'm not a
database/ORM expert anyway.

- Sylvain

Jorge Godoy

unread,
Aug 9, 2006, 7:22:04 AM8/9/06
to turbo...@googlegroups.com
"Sylvain Hellegouarch" <s...@defuze.org> writes:

> Besides, it is also possible that between the time you request an ID and
> the time you actually fetch the row for that ID, this one may have been
> deleted and you will hit an error.

This depends on the isolation level and how he started the process... I
believe that he can work with a snapshot where all retrieved IDs still have
their data available if he's inside a transaction and had the correct
isolation level on his database / connection to the database.

> I really fail to understand the benefit of that technique but I'm not a
> database/ORM expert anyway.

Probably they're working more on the client side -- doing the FK consistency,
JOINs, filtering, etc. -- than on server side. For the server side I'd go
with a function, a view or even something that would retrieve what I need
directly.

To make one operation with SQLObject, make it a list():

data = model.MyTable.select(orderBy = model.MyTable.q.description)
data = list(data) # <-- this makes one select only

(Of course, you can write it in one line, I just wanted to point out what
makes the "single" access to the database. --- I believe there are two, one
to retrieve the columns and one to retrieve the data.)


There were techniques and products shown here (such as memcached) to optimize
things and implement a global cache... Those should also help with the
database hitting problem, but I have never tried it to see how it will handle
SQL Object SelectResults...

--
Jorge Godoy <jgo...@gmail.com>

isaac

unread,
Aug 9, 2006, 2:54:38 PM8/9/06
to turbo...@googlegroups.com
TG is still a bit messy (hence its Alpha-ness), but it's going to rock
any minute. If you need to ship your app next week, today's probably
not the time to adopt it.

Maybe it's not too late for TG to get a nod like this:
http://37signals.com/svn/archives2/apple_includes_rails_with_leopard.php

That would be sweet, eh?

Rick

unread,
Aug 9, 2006, 5:31:38 PM8/9/06
to TurboGears
Robin,

Sorry to see you go. One point, though...

Robin Haswell wrote:
> SQLObject
> =========

> * I also can't stand how if you remove a row from the database that has
> a reference somewhere, SO will raise SQLObjectNotFound whenever it goes
> near that data. My PHP apps don't suffer from this because I write the
> complete three-table inner join in SQL, which will ignore missing
> references (and is a boatload faster). It makes the DB a little messy
> sometimes, but that's nothing compared to DOSing the page with a 500
> error. Updating my CRUD methods every time I associate a new object is
> *not* fun web development.
>
> I know the last point can be solved with using a DB that supports
> foreign key constrains. We use MySQL, there are no PG servers, that's
> just the way of it. I could convert my tables to InnoDB - I probably
> will in future.

I definitely agree with this, however, SQLObject has a fairly
undocumented feature where it will "fake" referential integrity
whenever you use a ForeignKey column on a DB without native referential
integrity. You get at it via the "cascade" keyword argument:
(from SQLObject col.py:)
# cascade can be one of:
# None: no constraint is generated
# True: a CASCADE constraint is generated
# False: a RESTRICT constraint is generated
# 'null': a SET NULL trigger is generated

All the magic happens in destroySelf() (which is itself called by
delete()). And it works on MySQL with MyISAM tables. Plus, when you
migrate to an engine that *does* support referential integrity, the
constraints get generated automagically for nearly seamless transition.

It took me quite a bit of searching on mailing lists and newsgroups to
find this tidbit. Hopefully this will help some other hapless TG
early-adopter.

Yves-Eric Martin

unread,
Aug 9, 2006, 11:15:12 PM8/9/06
to TurboGears
Sylvain Hellegouarch wrote:
> I assume you have a global cache right? Otherwise I do wonder how this
> works when several clients update the database.

Yes in our case we have a global cache. It is possible to make it work
with local caches too, with some cache invalidation mechanism so that a
client can signal all others when an object is updated.


> Now I don't quite understand the benefit of your technique.

I was not convinced at first either, but I saw the results and it does
work. I guess one way to understand why it works is to take an example.
I have a reasonably large table in PostgreSQL here, and let's say I
want to build a "Top 100" page. Omitting the "ORDER BY rating LIMIT
100" for readability, here are some timing results from tests I just
ran:

Scenario 1: a simple "select *":
SELECT * --> 312 ms
==> 312 ms spent in DB access for each page view.

Scenario 2: select just the needed columns for the page:
SELECT id, category, title, year, rating, votes --> 156 ms
==> 156 ms spent in DB access for each page view. Also note that this
requires your building a custom query (which want to avoid, and that's
the reason why I am using an ORM layer).

Scenario 3: the two phase retrieval -->
Phase 1: SELECT id --> 47 ms
Phase 2: SELECT * --> 312 ms
On the 1st page view only: phase 1 + phase 2 = 359 ms spent in DB
access.
On all subsequent page views: only phase one + 0 (cache hit on phase
two) = 47 ms in DB access.
==> 47 ms spent in DB access for each page view.


As you can see, in my case, scenario 3 is almost an order of magnitude
faster that scenario 1. Of course, YMMV.


Cheers,

--
Yves-Eric

Arnar Birgisson

unread,
Aug 10, 2006, 5:40:04 AM8/10/06
to turbo...@googlegroups.com
On 8/8/06, Robin Haswell <r...@digital-crocus.com> wrote:
> CherryPy
> ========
>
> * It's slow and doesn't allow me the fine-grained control I need for my
> web projects.
> * No obvious easy way to do URL rewriting. And no controller.default()
> doesn't count.
> * I think the sluggishness is mostly because it's written in Python.
> Also I can't find a good way to let it handle multiple requests at a
> time. I wrote an AJAX in-house tool recently. It aggregates data from a
> website using BeautifulSoup. There's an option to aggregate lots of data
> at the same time, which it achieves by doing lots of XMLHTTP requests.
> CherryPy doesn't seem happy with doing more than 2 processes at a time,
> even with thread_pool increased. There could also be a locking issue.
> I'm pretty sure this isn't related to FF's "max connections/server"
> features - I'm aware of those. I know a similar PHP tool works fine.

The reason PHP works is because it lies on top of Apache. I would
recommend that you check out mod_python if you haven't already. It
allows you to plug Python code directly into each phase of Apache
request handling - and I used to do all my webapps with it before
going to TG (I had been using PHP for several years before going to
mod_python). It's very fast, and you can configure Apache any way you
like in terms of processes/threads etc.

You will need some boilerplate code - which I believe you will with
PHP anyway - but making use of Routes will go a long way and give you
maximum flexibility - plus, you can use all the great Python libraries
- sqlalchemy, markup, etc..

I have to admit myself, that if I were to write a mission-critical
application that needed to handle high loads - I would probably go
back to mod_python + homemade framework instead of TG.

Arnar

Sylvain Hellegouarch

unread,
Aug 10, 2006, 6:30:12 AM8/10/06
to turbo...@googlegroups.com

> I have to admit myself, that if I were to write a mission-critical
> application that needed to handle high loads - I would probably go
> back to mod_python + homemade framework instead of TG.
>

Considering the fact CP3 has now a built-on mod_python adapter, I'm sure
you will change your opinion in the future ;)

- Sylvain

Arnar Birgisson

unread,
Aug 10, 2006, 7:16:24 AM8/10/06
to turbo...@googlegroups.com

That sounds good - also, I'm looking forward to "native" Routes support.

I'm having a hard time finding anything useful on cherrpy.org - where
can I read about upcoming features in CP3?

btw, where can I find decent CP documentation,
http://docs.cherrypy.org/ is not exactly well organized.

Arnar

Sylvain Hellegouarch

unread,
Aug 10, 2006, 7:44:41 AM8/10/06
to turbo...@googlegroups.com

> That sounds good - also, I'm looking forward to "native" Routes support.
>
> I'm having a hard time finding anything useful on cherrpy.org - where
> can I read about upcoming features in CP3?
>
> btw, where can I find decent CP documentation,
> http://docs.cherrypy.org/ is not exactly well organized.

Well you could find that :)
http://docs.cherrypy.org/writing-your-own-dispatcher

That explains how one could write its own dispatcher using Routes and use
it in CP3.

I am a bit sensitive on the documentation subject. I agree with you, CP
documentation sucks. Big time. I'm saddened byt that state as much as you.

However http://docs.cherrypy.org/ is there for people to contribute and
only a few have done so far (which I really appreciate). I mean it's also
up to the community to be active sometime. I do hope once I am finished
writing the CherryPy book (which should be published in a few months) I'll
be able to improve the situation but I do not have the time for now.

You know, I think SQLAlchemy and SQLObject documentation suck a lot as
well but because I can't contribute I don't judge them ;)

Sorry it's not personnal towards you Arnar. It's just that documentation
is opened to everybody to improve but very few people actually take of
their time to do it.

- Sylvain

Arnar Birgisson

unread,
Aug 10, 2006, 8:03:34 AM8/10/06
to turbo...@googlegroups.com
On 8/10/06, Sylvain Hellegouarch <s...@defuze.org> wrote:
> Sorry it's not personnal towards you Arnar. It's just that documentation
> is opened to everybody to improve but very few people actually take of
> their time to do it.

Non taken :o) I would be happy to contribute if I had the knowledge.

In the meantime, I found this to be an excellent resource:
http://www.aminus.org/blogs/index.php/fumanchu?cat=64

in case someone else is looking

Arnar

Sylvain Hellegouarch

unread,
Aug 10, 2006, 8:12:24 AM8/10/06
to turbo...@googlegroups.com

He he not wondering why it is so good since Robert is the one behind
CherryPy 3 and most of CherryPy 2 :)

- Sylvain

Reply all
Reply to author
Forward
0 new messages