[pyDAL][FYI] performance notice

179 views
Skip to first unread message

Giovanni Barillari

unread,
Nov 10, 2015, 6:06:54 AM11/10/15
to web2py-developers
Hi all,
yesterday Kirill Klenov published an update of his python web framework benchmarks and added weppy to the suite: http://klen.github.io/py-frameworks-bench/

Now, the interesting fact is that it outperforms popular minimalistic frameworks like flask and pyramid with simple json requests:

Name50% (ms)75% (ms)Avg (ms)Req/sTimeouts
Wheezy.Web6.97.86.9628714 
Falcon8.248.398.2624201 
Bottle10.7311.213.3918193 
Weppy18.3718.8718.4610853 
Pyramid18.5719.0446.899178 
Flask29.1129.629.386831 
Django45.4746.1846.614330 
Aiohttp46.8847.4751.864223 
Muffin49.4549.9954.224013 
Tornado63.3364.3263.673142 

but it performs worst when dealing with database objects compared to the same frameworks (that use SQLAlchemy):

Name50% (ms)75% (ms)Avg (ms)Req/sTimeouts
Aiohttp64.97397.81384501.5 
Muffin425.5526.15761.01418.6 
Wheezy.Web422.67436.88609.16369.4 
Tornado605.23619.83591.05327.35 
Bottle462.88481.32725.3316.8 
Falcon436.01457.18906.96295.1 
Pyramid491.75519.03880.64265.3 
Flask511.83535.05865.54259.45 
Weppy667679.66764.5258.6 
Django1589.022012.292294.9767.9


Well, it's not so bad compared to the others, since there's no so much difference, but the obvious conclusion is that pyDAL is causing the performance drop, and specifically is making weppy loosing the request processing advantage we seen without dal.
In fact, if pyDAL performance would be comparable with SQLAlchemy, also in the third test weppy would perform similar to bottle, and not like flask. Means that, req/s with pyDAL are comparable with flask with SQLAlchemy just because weppy handles the other parts of the request flow faster.

All of this to say I would spent a bit of my time in the next week in analyzing pydal's flows in order to understand which parts of the code runs slower, and, eventually, try to boost that parts.
Any contribution in this way would be nice :)

/Giovanni

Niphlod

unread,
Nov 10, 2015, 8:05:32 AM11/10/15
to web2py-developers
the most obvious to me is that pydal doesn't trust the adapter and reparses the resultset according to the model. 

Massimo DiPierro

unread,
Nov 10, 2015, 9:15:03 AM11/10/15
to web2py-d...@googlegroups.com
there should an option to switch the parsing off. It can be done now but it is convoluted. We need a simple flag in select parse=False.

--
-- mail from:GoogleGroups "web2py-developers" mailing list
make speech: web2py-d...@googlegroups.com
unsubscribe: web2py-develop...@googlegroups.com
details : http://groups.google.com/group/web2py-developers
the project: http://code.google.com/p/web2py/
official : http://www.web2py.com/
---
You received this message because you are subscribed to the Google Groups "web2py-developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to web2py-develop...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Giovanni Barillari

unread,
Nov 10, 2015, 9:32:26 AM11/10/15
to web2py-developers
But why is it the default behavior?

/Giovanni

Massimo DiPierro

unread,
Nov 10, 2015, 9:39:22 AM11/10/15
to web2py-d...@googlegroups.com

Every adapter returns something different for tool, date, datetime, etc. Moreover they return a list of tuples, not objects. We'd like django want db independent objects most of the times.

--

Giovanni Barillari

unread,
Nov 10, 2015, 9:47:26 AM11/10/15
to web2py-developers
@niphol @massimo wait, in the test the flask app uses SQLAlchemy ORM which loads objects too. 
The return values are `Message` instances not just raw values from the db. So turning off the pyDAL parsing would just means faking the test. No?

Massimo DiPierro

unread,
Nov 10, 2015, 9:49:14 AM11/10/15
to web2py-d...@googlegroups.com

I have not looked on detail and I am not suggesting we do that. If that is the case we should investigate more.

Giovanni Barillari

unread,
Nov 10, 2015, 9:55:02 AM11/10/15
to web2py-developers
This is the flask app code from the test:

db = SQLAlchemy(app)

class Message(db.Model):
    __tablename__ = 'message'
    id = db.Column(db.Integer, primary_key=True)
    content = db.Column(db.String(length=512))

@app.route('/complete')
def complete():
    messages = list(Message.query.all())
    messages.append(Message(content='Hello, World!'))
    messages.sort(key=lambda m: m.content)
    return flask.render_template('template.html', messages=messages)

And this is the code same code from weppy app:

db = DAL(app, migrate=False, migrate_enabled=False, pool_size=10)

class Message(Model):
    tablename = 'message'
    content = Field()

db.define_models(Message)

@app.expose(template='template.html')
def complete():
    messages = db(db.Message).select().as_list()
    messages.append(dict(id=None, content='Hello, World!'))
    messages.sort(key=lambda m: m['content'])
    return dict(messages=messages)

Massimo DiPierro

unread,
Nov 10, 2015, 10:04:56 AM11/10/15
to web2py-d...@googlegroups.com

Yep. Look similar

Michele Comitini

unread,
Nov 10, 2015, 10:37:31 AM11/10/15
to web2py-developers
what's the reason of poolsize=10 ?

Massimo DiPierro

unread,
Nov 10, 2015, 10:52:57 AM11/10/15
to web2py-d...@googlegroups.com
This: db(db.Message).select().as_list()
may be the culprit because we loop over all the records twice:
- select loops over the list of tuples from the db and parses each into a Row object
- as_list loops again and converts the Rows of Row objects into a list of dict object.

It will be faster if you do b(db.Message).select(cacheable=True)
because it will not create additional unused methods for the Row objects.

It may be faster without the .as_list()

It would be even faster if select had an option to return a list of dict in the first place by using an alternative _parse(…) method.

Massimo

Giovanni Barillari

unread,
Nov 10, 2015, 12:05:52 PM11/10/15
to web2py-developers
what's the reason of poolsize=10 ?

You should ask to the author of the benchmark suite, I've just used the same pool size of the other benchmarked frameworks.

Il giorno martedì 10 novembre 2015 16:52:57 UTC+1, Massimo Di Pierro ha scritto:
This: db(db.Message).select().as_list()
may be the culprit because we loop over all the records twice:
- select loops over the list of tuples from the db and parses each into a Row object
- as_list loops again and converts the Rows of Row objects into a list of dict object.

It will be faster if you do b(db.Message).select(cacheable=True)
because it will not create additional unused methods for the Row objects.

It may be faster without the .as_list()

It would be even faster if select had an option to return a list of dict in the first place by using an alternative _parse(…) method.

Massimo


I've did .as_list() because the test does a sorting adding a new object, but since pyDAL is not an ORM, the two possibilities were to make everything a dict or create a fake Row object. I picked the first, but maybe the 2nd would be faster..
 
To unsubscribe from this group and stop receiving emails from it, send an email to web2py-developers+unsub...@googlegroups.com.

Massimo DiPierro

unread,
Nov 10, 2015, 12:10:34 PM11/10/15
to web2py-d...@googlegroups.com
which DB was it used? with sqlite there is no pool_size

Giovanni Barillari

unread,
Nov 10, 2015, 12:11:30 PM11/10/15
to web2py-developers
Postgres
To unsubscribe from this group and stop receiving emails from it, send an email to web2py-developers+unsubscrib...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Niphlod

unread,
Nov 10, 2015, 12:36:28 PM11/10/15
to web2py-developers
just to check-in, I'm not saying that THAT is the root cause but the most probable one. 
BTW: being 1 req/sec "behind" flask+sqlalchemy is a hell of a result, so take a minute to "kudos yourself".

back on the "probable cause" .... Working with pydal is a breeze because e.g. if you have a datetime Field but on the backend is a datetime-like string, pydal doesn't complain.
IMHO (as pointed out many times) the base adapter has TOO many intricacies to deal with each and different backend that SHOULD be moved/overridden in the specific adapter code. e.g. : even if for historical reasons sqlite stores dates as strings, that's not a valid reason for parse() to inspect postgres resultset holding native datetimes inspecting each and every datetime column for every row.
parse() is quite "intelligent" for being so generic, but it comes at a cost of basically basically isinstance()ing every value. 

Massimo DiPierro

unread,
Nov 10, 2015, 12:39:10 PM11/10/15
to web2py-d...@googlegroups.com
I agree with everything you say.

Massimo

Paolo Valleri

unread,
Nov 10, 2015, 1:17:26 PM11/10/15
to web2py-d...@googlegroups.com
Hi, parse() and parse_value() are the bottleneck but also as_list() is not really performing very well.
On my pc, with 100000 records, using psycopg2 I got:
~4s with rows = db(db.tt).select() 
~7s with rows = db(db.tt).select().as_list()

By removing '''value = value.decode(self.db._db_codec)''' from parse_value you can save almost 1s


 Paolo

To unsubscribe from this group and stop receiving emails from it, send an email to web2py-develop...@googlegroups.com.

Giovanni Barillari

unread,
Nov 10, 2015, 4:55:18 PM11/10/15
to web2py-developers


Il giorno martedì 10 novembre 2015 18:36:28 UTC+1, Niphlod ha scritto:
just to check-in, I'm not saying that THAT is the root cause but the most probable one. 
BTW: being 1 req/sec "behind" flask+sqlalchemy is a hell of a result, so take a minute to "kudos yourself".

LoL :)
 

back on the "probable cause" .... Working with pydal is a breeze because e.g. if you have a datetime Field but on the backend is a datetime-like string, pydal doesn't complain.
IMHO (as pointed out many times) the base adapter has TOO many intricacies to deal with each and different backend that SHOULD be moved/overridden in the specific adapter code. e.g. : even if for historical reasons sqlite stores dates as strings, that's not a valid reason for parse() to inspect postgres resultset holding native datetimes inspecting each and every datetime column for every row.
parse() is quite "intelligent" for being so generic, but it comes at a cost of basically basically isinstance()ing every value.  
 
Hi, parse() and parse_value() are the bottleneck but also as_list() is not really performing very well.

On my pc, with 100000 records, using psycopg2 I got:
~4s with rows = db(db.tt).select() 
~7s with rows = db(db.tt).select().as_list()
By removing '''value = value.decode(self.db._db_codec)''' from parse_value you can save almost 1s

 Paolo

 Ok so, a bit of recap:
- the code of the test can be better
- the parse() method of the base adapter can be optimized considering the real adapter
- the as_list() method can be optimized
- the decoding can be optimized

Considering all of this, I'm gonna propose now a proposal I was packing lately:
1) Refactor adapters completely, making the BaseAdapter something more like a scaffold, and trying to avoid all the if-blocks depending on the engine (since this should be handled by the fact of subclassing the adapter class)
2) Take out all the SQL "strings" into separated classes called 'dialects', so a BaseDialect class and the all the specific dialect classes -> this would make adapters cleaner and split the abstraction levels between operations (adapter) and sql syntax (dialect)
3) Optimize the adapter functions on the specific adapters taking advantage of the refactored code allowing more flexibility

If all of you agree, I will work on 1 and 2 and will leave 3 on "more advanced" developers on the single drivers.

/Giovanni

 this for the next months
Actually I would like to propose this for the next month

Massimo DiPierro

unread,
Nov 10, 2015, 5:06:10 PM11/10/15
to web2py-d...@googlegroups.com
On Nov 10, 2015, at 3:55 PM, Giovanni Barillari <giovanni....@gmail.com> wrote:



Il giorno martedì 10 novembre 2015 18:36:28 UTC+1, Niphlod ha scritto:
just to check-in, I'm not saying that THAT is the root cause but the most probable one. 
BTW: being 1 req/sec "behind" flask+sqlalchemy is a hell of a result, so take a minute to "kudos yourself”.

I concur!

LoL :)
 

back on the "probable cause" .... Working with pydal is a breeze because e.g. if you have a datetime Field but on the backend is a datetime-like string, pydal doesn't complain.
IMHO (as pointed out many times) the base adapter has TOO many intricacies to deal with each and different backend that SHOULD be moved/overridden in the specific adapter code. e.g. : even if for historical reasons sqlite stores dates as strings, that's not a valid reason for parse() to inspect postgres resultset holding native datetimes inspecting each and every datetime column for every row.
parse() is quite "intelligent" for being so generic, but it comes at a cost of basically basically isinstance()ing every value.  
 
Hi, parse() and parse_value() are the bottleneck but also as_list() is not really performing very well.
On my pc, with 100000 records, using psycopg2 I got:
~4s with rows = db(db.tt).select() 
~7s with rows = db(db.tt).select().as_list()
By removing '''value = value.decode(self.db._db_codec)''' from parse_value you can save almost 1s

 Paolo

 Ok so, a bit of recap:
- the code of the test can be better
- the parse() method of the base adapter can be optimized considering the real adapter
- the as_list() method can be optimized
- the decoding can be optimized

yes


Considering all of this, I'm gonna propose now a proposal I was packing lately:
1) Refactor adapters completely, making the BaseAdapter something more like a scaffold, and trying to avoid all the if-blocks depending on the engine (since this should be handled by the fact of subclassing the adapter class)
2) Take out all the SQL "strings" into separated classes called 'dialects', so a BaseDialect class and the all the specific dialect classes -> this would make adapters cleaner and split the abstraction levels between operations (adapter) and sql syntax (dialect)
3) Optimize the adapter functions on the specific adapters taking advantage of the refactored code allowing more flexibility

Yes. But lets’ start from the parse function since the time to actually generate the SQL is negligible compared to the time to parse the output.
The code that generates the SQL is already very modular and there are very few IF statements.

If all of you agree, I will work on 1 and 2 and will leave 3 on "more advanced" developers on the single drivers.

/Giovanni

 this for the next months
Actually I would like to propose this for the next month

Reply all
Reply to author
Forward
0 new messages