[pyDAL][FYI] performance notice

Giovanni Barillari

unread,

Nov 10, 2015, 6:06:54 AM11/10/15

to web2py-developers

Hi all,

yesterday Kirill Klenov published an update of his python web framework benchmarks and added weppy to the suite: http://klen.github.io/py-frameworks-bench/

Now, the interesting fact is that it outperforms popular minimalistic frameworks like flask and pyramid with simple json requests:

Name	50% (ms)	75% (ms)	Avg (ms)	Req/s
Wheezy.Web	6.9	7.8	6.96	28714
Falcon	8.24	8.39	8.26	24201
Bottle	10.73	11.2	13.39	18193
Weppy	18.37	18.87	18.46	10853
Pyramid	18.57	19.04	46.89	9178
Flask	29.11	29.6	29.38	6831
Django	45.47	46.18	46.61	4330
Aiohttp	46.88	47.47	51.86	4223
Muffin	49.45	49.99	54.22	4013
Tornado	63.33	64.32	63.67	3142

but it performs worst when dealing with database objects compared to the same frameworks (that use SQLAlchemy):

Name	50% (ms)	75% (ms)	Avg (ms)	Req/s
Aiohttp	64.97	397.81	384	501.5
Muffin	425.5	526.15	761.01	418.6
Wheezy.Web	422.67	436.88	609.16	369.4
Tornado	605.23	619.83	591.05	327.35
Bottle	462.88	481.32	725.3	316.8
Falcon	436.01	457.18	906.96	295.1
Pyramid	491.75	519.03	880.64	265.3
Flask	511.83	535.05	865.54	259.45
Weppy	667	679.66	764.5	258.6
Django	1589.02	2012.29	2294.97	67.9

Well, it's not so bad compared to the others, since there's no so much difference, but the obvious conclusion is that pyDAL is causing the performance drop, and specifically is making weppy loosing the request processing advantage we seen without dal.

In fact, if pyDAL performance would be comparable with SQLAlchemy, also in the third test weppy would perform similar to bottle, and not like flask. Means that, req/s with pyDAL are comparable with flask with SQLAlchemy just because weppy handles the other parts of the request flow faster.

All of this to say I would spent a bit of my time in the next week in analyzing pydal's flows in order to understand which parts of the code runs slower, and, eventually, try to boost that parts.

Any contribution in this way would be nice :)

/Giovanni

Niphlod

unread,

Nov 10, 2015, 8:05:32 AM11/10/15

to web2py-developers

the most obvious to me is that pydal doesn't trust the adapter and reparses the resultset according to the model.

Massimo DiPierro

unread,

Nov 10, 2015, 9:15:03 AM11/10/15

to web2py-d...@googlegroups.com

there should an option to switch the parsing off. It can be done now but it is convoluted. We need a simple flag in select parse=False.

--
-- mail from:GoogleGroups "web2py-developers" mailing list
make speech: web2py-d...@googlegroups.com
unsubscribe: web2py-develop...@googlegroups.com
details : http://groups.google.com/group/web2py-developers
the project: http://code.google.com/p/web2py/
official : http://www.web2py.com/
---
You received this message because you are subscribed to the Google Groups "web2py-developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to web2py-develop...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Giovanni Barillari

unread,

Nov 10, 2015, 9:32:26 AM11/10/15

to web2py-developers

But why is it the default behavior?

/Giovanni

Massimo DiPierro

unread,

Nov 10, 2015, 9:39:22 AM11/10/15

to web2py-d...@googlegroups.com

Every adapter returns something different for tool, date, datetime, etc. Moreover they return a list of tuples, not objects. We'd like django want db independent objects most of the times.

--

Giovanni Barillari

unread,

Nov 10, 2015, 9:47:26 AM11/10/15

to web2py-developers

@niphol @massimo wait, in the test the flask app uses SQLAlchemy ORM which loads objects too.

The return values are `Message` instances not just raw values from the db. So turning off the pyDAL parsing would just means faking the test. No?

Massimo DiPierro

unread,

Nov 10, 2015, 9:49:14 AM11/10/15

to web2py-d...@googlegroups.com

I have not looked on detail and I am not suggesting we do that. If that is the case we should investigate more.

Giovanni Barillari

unread,

Nov 10, 2015, 9:55:02 AM11/10/15

to web2py-developers

This is the flask app code from the test:

db = SQLAlchemy(app)

class Message(db.Model):
    __tablename__ = 'message'
    id = db.Column(db.Integer, primary_key=True)
    content = db.Column(db.String(length=512))

@app.route('/complete')
def complete():
    messages = list(Message.query.all())
    messages.append(Message(content='Hello, World!'))
    messages.sort(key=lambda m: m.content)
    return flask.render_template('template.html', messages=messages)

And this is the code same code from weppy app:

db = DAL(app, migrate=False, migrate_enabled=False, pool_size=10)

class Message(Model):
    tablename = 'message'
    content = Field()

db.define_models(Message)

@app.expose(template='template.html')
def complete():
    messages = db(db.Message).select().as_list()
    messages.append(dict(id=None, content='Hello, World!'))
    messages.sort(key=lambda m: m['content'])
    return dict(messages=messages)

Massimo DiPierro

unread,

Nov 10, 2015, 10:04:56 AM11/10/15

to web2py-d...@googlegroups.com

Yep. Look similar

Michele Comitini

unread,

Nov 10, 2015, 10:37:31 AM11/10/15

to web2py-developers

what's the reason of poolsize=10 ?

Massimo DiPierro

unread,

Nov 10, 2015, 10:52:57 AM11/10/15

to web2py-d...@googlegroups.com

This: db(db.Message).select().as_list()

may be the culprit because we loop over all the records twice:

- select loops over the list of tuples from the db and parses each into a Row object

- as_list loops again and converts the Rows of Row objects into a list of dict object.

It will be faster if you do b(db.Message).select(cacheable=True)

because it will not create additional unused methods for the Row objects.

It may be faster without the .as_list()

It would be even faster if select had an option to return a list of dict in the first place by using an alternative _parse(…) method.

Massimo

Giovanni Barillari

unread,

Nov 10, 2015, 12:05:52 PM11/10/15

to web2py-developers

what's the reason of poolsize=10 ?

You should ask to the author of the benchmark suite, I've just used the same pool size of the other benchmarked frameworks.

Il giorno martedì 10 novembre 2015 16:52:57 UTC+1, Massimo Di Pierro ha scritto:

This: db(db.Message).select().as_list()
may be the culprit because we loop over all the records twice:
- select loops over the list of tuples from the db and parses each into a Row object
- as_list loops again and converts the Rows of Row objects into a list of dict object.

It will be faster if you do b(db.Message).select(cacheable=True)
because it will not create additional unused methods for the Row objects.

It may be faster without the .as_list()

It would be even faster if select had an option to return a list of dict in the first place by using an alternative _parse(…) method.

Massimo

I've did .as_list() because the test does a sorting adding a new object, but since pyDAL is not an ORM, the two possibilities were to make everything a dict or create a fake Row object. I picked the first, but maybe the 2nd would be faster..

To unsubscribe from this group and stop receiving emails from it, send an email to web2py-developers+unsub...@googlegroups.com.

Massimo DiPierro

unread,

Nov 10, 2015, 12:10:34 PM11/10/15

to web2py-d...@googlegroups.com

which DB was it used? with sqlite there is no pool_size

Giovanni Barillari

unread,

Nov 10, 2015, 12:11:30 PM11/10/15

to web2py-developers

Postgres

To unsubscribe from this group and stop receiving emails from it, send an email to web2py-developers+unsubscrib...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Niphlod

unread,

Nov 10, 2015, 12:36:28 PM11/10/15

to web2py-developers

just to check-in, I'm not saying that THAT is the root cause but the most probable one.

BTW: being 1 req/sec "behind" flask+sqlalchemy is a hell of a result, so take a minute to "kudos yourself".

back on the "probable cause" .... Working with pydal is a breeze because e.g. if you have a datetime Field but on the backend is a datetime-like string, pydal doesn't complain.

IMHO (as pointed out many times) the base adapter has TOO many intricacies to deal with each and different backend that SHOULD be moved/overridden in the specific adapter code. e.g. : even if for historical reasons sqlite stores dates as strings, that's not a valid reason for parse() to inspect postgres resultset holding native datetimes inspecting each and every datetime column for every row.

parse() is quite "intelligent" for being so generic, but it comes at a cost of basically basically isinstance()ing every value.

Massimo DiPierro

unread,

Nov 10, 2015, 12:39:10 PM11/10/15

to web2py-d...@googlegroups.com

I agree with everything you say.

Massimo

Paolo Valleri

unread,

Nov 10, 2015, 1:17:26 PM11/10/15

to web2py-d...@googlegroups.com

Hi, parse() and parse_value() are the bottleneck but also as_list() is not really performing very well.

On my pc, with 100000 records, using psycopg2 I got:

~4s with rows = db(db.tt).select()

~7s with rows = db(db.tt).select().as_list()

By removing '''value = value.decode(self.db._db_codec)''' from parse_value you can save almost 1s

Paolo

To unsubscribe from this group and stop receiving emails from it, send an email to web2py-develop...@googlegroups.com.

Giovanni Barillari

unread,

Nov 10, 2015, 4:55:18 PM11/10/15

to web2py-developers

Il giorno martedì 10 novembre 2015 18:36:28 UTC+1, Niphlod ha scritto:

just to check-in, I'm not saying that THAT is the root cause but the most probable one.
BTW: being 1 req/sec "behind" flask+sqlalchemy is a hell of a result, so take a minute to "kudos yourself".

LoL :)

back on the "probable cause" .... Working with pydal is a breeze because e.g. if you have a datetime Field but on the backend is a datetime-like string, pydal doesn't complain.
IMHO (as pointed out many times) the base adapter has TOO many intricacies to deal with each and different backend that SHOULD be moved/overridden in the specific adapter code. e.g. : even if for historical reasons sqlite stores dates as strings, that's not a valid reason for parse() to inspect postgres resultset holding native datetimes inspecting each and every datetime column for every row.
parse() is quite "intelligent" for being so generic, but it comes at a cost of basically basically isinstance()ing every value.

Hi, parse() and parse_value() are the bottleneck but also as_list() is not really performing very well.

On my pc, with 100000 records, using psycopg2 I got:
~4s with rows = db(db.tt).select()
~7s with rows = db(db.tt).select().as_list()
By removing '''value = value.decode(self.db._db_codec)''' from parse_value you can save almost 1s

Paolo

Ok so, a bit of recap:

- the code of the test can be better

- the parse() method of the base adapter can be optimized considering the real adapter

- the as_list() method can be optimized

- the decoding can be optimized

Considering all of this, I'm gonna propose now a proposal I was packing lately:

1) Refactor adapters completely, making the BaseAdapter something more like a scaffold, and trying to avoid all the if-blocks depending on the engine (since this should be handled by the fact of subclassing the adapter class)

2) Take out all the SQL "strings" into separated classes called 'dialects', so a BaseDialect class and the all the specific dialect classes -> this would make adapters cleaner and split the abstraction levels between operations (adapter) and sql syntax (dialect)

3) Optimize the adapter functions on the specific adapters taking advantage of the refactored code allowing more flexibility

If all of you agree, I will work on 1 and 2 and will leave 3 on "more advanced" developers on the single drivers.

/Giovanni

this for the next months

Actually I would like to propose this for the next month

Massimo DiPierro

unread,

Nov 10, 2015, 5:06:10 PM11/10/15

to web2py-d...@googlegroups.com

On Nov 10, 2015, at 3:55 PM, Giovanni Barillari <giovanni....@gmail.com> wrote:

Il giorno martedì 10 novembre 2015 18:36:28 UTC+1, Niphlod ha scritto:
just to check-in, I'm not saying that THAT is the root cause but the most probable one.
BTW: being 1 req/sec "behind" flask+sqlalchemy is a hell of a result, so take a minute to "kudos yourself”.

I concur!

LoL :)

back on the "probable cause" .... Working with pydal is a breeze because e.g. if you have a datetime Field but on the backend is a datetime-like string, pydal doesn't complain.
IMHO (as pointed out many times) the base adapter has TOO many intricacies to deal with each and different backend that SHOULD be moved/overridden in the specific adapter code. e.g. : even if for historical reasons sqlite stores dates as strings, that's not a valid reason for parse() to inspect postgres resultset holding native datetimes inspecting each and every datetime column for every row.
parse() is quite "intelligent" for being so generic, but it comes at a cost of basically basically isinstance()ing every value.

Hi, parse() and parse_value() are the bottleneck but also as_list() is not really performing very well.
On my pc, with 100000 records, using psycopg2 I got:
~4s with rows = db(db.tt).select()
~7s with rows = db(db.tt).select().as_list()
By removing '''value = value.decode(self.db._db_codec)''' from parse_value you can save almost 1s

Paolo

Ok so, a bit of recap:
- the code of the test can be better
- the parse() method of the base adapter can be optimized considering the real adapter
- the as_list() method can be optimized
- the decoding can be optimized

yes

Considering all of this, I'm gonna propose now a proposal I was packing lately:
1) Refactor adapters completely, making the BaseAdapter something more like a scaffold, and trying to avoid all the if-blocks depending on the engine (since this should be handled by the fact of subclassing the adapter class)

2) Take out all the SQL "strings" into separated classes called 'dialects', so a BaseDialect class and the all the specific dialect classes -> this would make adapters cleaner and split the abstraction levels between operations (adapter) and sql syntax (dialect)
3) Optimize the adapter functions on the specific adapters taking advantage of the refactored code allowing more flexibility

Yes. But lets’ start from the parse function since the time to actually generate the SQL is negligible compared to the time to parse the output.

The code that generates the SQL is already very modular and there are very few IF statements.

If all of you agree, I will work on 1 and 2 and will leave 3 on "more advanced" developers on the single drivers.

/Giovanni

this for the next months
Actually I would like to propose this for the next month

Reply all

Reply to author

Forward