Making the DAL orders-of-magnitude faster...

342 views
Skip to first unread message

Arnon Marcus

unread,
Nov 19, 2013, 3:00:24 PM11/19/13
to web...@googlegroups.com
We recently did some profiling on some slow queries, and found out something pretty surprising:
The parsing of the result-set into Rows object, incurs an overhead that is many times the time of the actual query (including the entire round-trip to the database).

We are using web2py 2.7.2 and a PostgreSQL database (9.3x64 on CentOS-6.4x64) using python 2.6x64 (on Win7x64) and profiling with cProfile and RunSnakeRun.

It's a moderately-complex query of a result-set of about a 1000 rows in a single query.
Here is the result:



The time it takes the database to fetch them, is 0.02s.
The entire time of the select, including the parse, is about 0.79s (In other tests it was 1.07s).
It is just an example, but we had queries that took around 5.x-9.x seconds (!) in which the database round-trip was a mere 0.04s...

This is an order-of-magnitude difference between the parsing and the actual query.
The gap gets worse the more results there are in the result-set.

Now, in any discussion about performance of server-side web-frameworks, you almost unanimously here this argument:
"...the database always ends up being the bottleneck anyways..."
So, this is not supposed to happen, right?
Either we have a really slow framework, or a really fast database...

I then did a little digging, to see if we can use this to our advantage (since we rarely have a need for complex usage-patters of the Rows object anyways...)

First thing I did, was to see if the executesql method, could return something just a bit more useful then it usually does, when using as_dict=True on it - I changed this line:
    def executesql(self, query, placeholders=None, as_dict=False,fields=None, colnames=None):
       
....
       
if as_dict:
           
if not hasattr(adapter.cursor,'description'):
               
raise RuntimeError("database does not support executesql(...,as_dict=True)")
            columns
= adapter.cursor.description
            fields = [f[0] for f in columns]
            data
= adapter._fetchall()
           
return [dict(zip(fields,row)) for row in data]
into this:
    def executesql(self, query, placeholders=None, as_dict=False,fields=None, colnames=None):
       
....
       
if as_dict:
           
if not hasattr(adapter.cursor,'description'):
               
raise RuntimeError("database does not support executesql(...,as_dict=True)")
            columns
= adapter.cursor.description
            fields = colnames or [f[0] for f in columns]
            data
= adapter._fetchall()
           
return [dict(zip(fields,row)) for row in data]
1 line of code...

This single change, makes the function potentially-able to return a list of dictionaries, each with practically-usable keys.
We went after a key-naming of '<TableName>.<FieldName>' as is in the actual SQL string.
Then I though, why not use the SQL string itself?
And thus the quickSelect method was born into the Set class:

    def select(self, *fields, **attributes):
        adapter
= self.db._adapter
        tablenames
= adapter.tables(self.query,
                                    attributes
.get('join',None),
                                    attributes
.get('left',None),
                                    attributes
.get('orderby',None),
                                    attributes
.get('groupby',None))
        fields
= adapter.expand_all(fields, tablenames)
       
return adapter.select(self.query,fields,attributes)
###############################################################
   
def quickSelect(self, *fields, **attributes):
        query
= self._select(*fields, **attributes)
        colnames
= query.split(' FROM')[0][7:-1].split(', ')
       
return self._db.executesql(query, colnames=colnames, as_dict=True)
###############################################################
   
def nested_select(self,*fields,**attributes):
       
return Expression(self.db,self._select(*fields,**attributes))
3 lines of code... Could even be shorten to 2...

Was was the outcome?
Well, lets see:

Voila!
0.04s for a practically-usable list of dictionaries(!)
An order-of-magnitude improvement in less than 5 lines of code...

I though I could do better, and wrap the result-set myself with a simple class, and inject the dictionaries into the object.__dict__ attribute, so I could emulate, and perhaps even point-in-place replace the "real" select() method itself (including support for JOINs)
This way, I could leave all our app-code unchanged:

class SimpleObject:
   
def __init__(self, attributes=None):
       
self.__dict__ = attributes


class Set(object):
   
...
   
def select(self, *fields, **attributes):
        query
= self._select(*fields, **attributes)
        colnames
= query.split(' FROM')[0][7:-1].split(', ')
        tableNames
= [colname.split('.')[0] for colname in colnames]
        uniqueTableNames
= set(tableNames)
        records
= self._db.executesql(query, colnames=colnames, as_dict=True)
       
if len(uniqueTableNames) is 1: # Single Table
            tableNameLemgth
= len(uniqueTableNames[0]) + 1
           
return [
               
SimpleObject(
                    dict
(
                       
[
                           
(
                                key
[tableNameLemgth:-1],
                                value
                           
) for key, value in record.iteritems()
                       
]
                   
) for record in records
               
)
           
]
       
else: # Multiple Tables (JOIN)
            tableNamesLemgth
= dict(   # A dictionary mapping table-names to their length+1
               
[                      # This is for later "easy" splicing of the key-names
                   
(
                         tableName
,
                         
(len(tableName) + 1)
                     
) for tableName in uniqueTableNames
               
]
           
)
           
return [
               
SimpleObject(
                    dict
(
                       
[
                           
(
                                tableName
,
                               
SimpleObject(
                                    dict
(
                                       
[
                                           
(
                                                key
[tableNamesLemgth[tableName]:-1],
                                                value
                                           
) for key, value in record.iteritems() if (
                                              key
[:(tableNamesLemgth[tableName] - 1)] == tableName
                                             
)
                                       
]
                                   
)
                               
)
                           
) for tableName in tableNames
                       
]
                   
)
               
) for record in records
           
]

But that didn't work as I expected... I got an even worst result than the original web2py Rows parsing...
Tried to remove the SimpleObject class from the equation, and just return nested-dictionaries - No luck... Same result...
So I guess the Rows parsing isn't "slow" after all, as it's doing a better job than I can...

However, I still think there is room for having my quickSelect() method, in conjunction with the small addition to the executesql() method - for people who want to dramatically speed-up their application, and are willing to pay the price of getting back a flat list of dictionaries with "table.filed" key-names.

We're finding it potentially useful enough, that we are prepared to experiment this in our entire app...
The speed-bump is just too impressive (and tempting) to overlook...

What say you?

Arnon Marcus

unread,
Nov 19, 2013, 3:03:34 PM11/19/13
to web...@googlegroups.com
Can the conversion (Rows parsing) be somehow sped-up by C-compiling it?
This could be "super"...
Anybody knows Cython...?
:) 

Anthony

unread,
Nov 19, 2013, 3:24:52 PM11/19/13
to web...@googlegroups.com
Note, you can do:

db(query).select(..., processor=myprocessor)

where myprocessor is a function that takes rows, fields, and colnames arguments. Perhaps, as you suggest, we could include a built-in quick processor that simply generates a list of dictionaries or Storage objects. We might also include options to turn various features on and off (e.g., turn on parsing of dates, but turn off generation of Reference objects for reference fields).

Note, you might also see if .select(..., cacheable=False) speeds things up a bit.

Anthony

Anthony

unread,
Nov 19, 2013, 3:31:01 PM11/19/13
to web...@googlegroups.com
You can also always do:

db.executesql(db(query)._select(), as_dict=True)

to get just a list of dictionaries for any DAL select. Allowing custom column names, as you suggest, sounds like a good idea.

Anthony

Niphlod

unread,
Nov 19, 2013, 3:45:52 PM11/19/13
to web...@googlegroups.com
if only those announcements came with actual reproducible tests..... life would be easier for us :-P
I'd guess your usecase is either 1k rows with 50 columns or a really slow python server, because I can't reproduce those kind of "advertised magnitude gains". Just a 4x that is quite expected since Rows carries around lots of helpers.

CPython 2.7.3
speed_executesql() total=0.846748 avg=0.008467
speed_normal_rows() total=4.214647 avg=0.042146
speed_rows_with_cacheable() total=3.705196 avg=0.037052
speed_executesql_with_colnames() total=0.844287 avg=0.008443
speed_rows_with_simple_processor() total=0.891577 avg=0.008916

pypy 1.9.0
speed_executesql() total=3.344496 avg=0.033445
speed_normal_rows() total=3.832903 avg=0.038329
speed_rows_with_cacheable() total=3.472066 avg=0.034721
speed_executesql_with_colnames() total=2.808536 avg=0.028085
speed_rows_with_simple_processor() total=3.268108 avg=0.032681

Attached the test code.

test_executesql.py

Arnon Marcus

unread,
Nov 19, 2013, 5:16:37 PM11/19/13
to
10x Niphlod, I'll check that tomorrow...

I can provide any code you like - this is solid and consistent.

I am running this on my desktop-workstation which runs windows7x64.
It's an Intel Core-i7 870 w/ 8GB RAM, running Python 2.6x64 via a simple command-line g-event server...
Don't think I have a problem there...

The database itself sits as a stand-alone service on an old "boxx" pizza-box-like server on our server-room, running nothing but PostgreSQL 9.3 (x64) and PG-Bouncer (for connection pooling) - on a minimalist CentOS 6.4 (x64) installation.
I optimized it as far as I can, with the data weighing less than a 100MB total, I have the data-folder sitting on a mounted "tmpfs" of 3GB (a-la RAMDISK), and the schema is heavily "index"ed,, so the queries are as fast as they can be.

As I said, it may be a case of simply having a very fast database-story...

"I'd guess your usecase is either 1k rows with 50"
That's exactly what I said - yes, it's a ~1K rows query, of a JOIN of 2 tables, selecting for about 20 columns.
That is our use-case - it can't be any different (at least not easily...)

We are not really using this version of web2py - I just did that for texting and posting here.
But we initially encountered this case using a very old version of web2py (1.8.95) - in which the exact numbers where:

904 rows in the result-set:
- Regular select() : 2.241 seconds
- In it, the portion that the executesql took: 0.023 seconds.
That is roughly a x44 times gap...

And as I said - Trying to convert the result-set into a simple list of simple-class-instances, yielded similar results.
Which made it conclusive : The overhead was in the object-instanciation and other pluming-related stuff like "__getitem__" and ".iteritems()" and the like...

I think the test-code you gave here is interesting - but to make it closer to our use-case, I would have a more complex query tried.
First, have 2 tables with a foreign-key linking them.
Second, they should each have about ~20+ fields of varying types.
Then the query should be a join of them, getting most/all columns.

What I notices, was that a "flat" query (one with no JOINs) may end-up being MUCH "lighter" to parse into Rows (or Rows-like) objects, compared to a JOIN'ed one, as there are nested-loops involved, for having these intermediary Table-representing objects, that then have to have the result-set filtered-into them, according to each colname's "table-name" portion.
i.e:
Lets say we have 3 tables:
-  Budgets:
      name (string)
      project (foreign-key to Projects)
-  Projects:
      name (string)
      client (foreign-key to Clients)
-  Clients:
      name (string)

Now, if we JOIN them in a query, each result in the result-set would have to be segregated into either of the 3 table-representational-objects in each Row object.

A colname-attached result-record may look like this:
{
'Budgets.name':'My budget',
'Budgets.project':742,
'Projects.name':'My project',
'Projects.client':445,
'Clients.name':'My client'
}

A Row object's general structure would have to end-up looking like this:
{
'Budgets':{
                'name':'My budget'
                'project:742
               }
'Projects':{
                 'name':'My project'
                'project:445
              }
'Clients':{
               'name':'My client'
             }
{

Now, multiply the tables and/or columns, and you already have a very different story of record-filtering/segregation and object-instanciation...
Very different than the test-code you posted here.

If you then multiply that by a 1000, you may end-up getting 4000 object-instanciations just for this simple example (for each Row object, of which there would be a 1000. there would be another 3 objects - meaning, another 3000 objects...) And that is just a join of just 3 tables - we have many queries much-much more complex than that...

The interesting (and somewhat disappointing) factoid I got from your results, was that PyPy was just as "slow" in un-parsed mode... ;)

Michele Comitini

unread,
Nov 19, 2013, 5:27:23 PM11/19/13
to web...@googlegroups.com
Use the profiler to make your tests even more effective


2013/11/19 Arnon Marcus <a.m.m...@gmail.com>
10x Niphlod, I'll check that tomorrow...

I can provide any code you like - this is solid and consistent.

I am running this on my desktop-workstation which runs windows7x64.
It's an Intel Core-i7 870 w/ 8GB RAM, running Python 2.6x64 via a simple command-line g-event server...
Don't think I have a problem there...

The database itself sits as a stand-alone service on an old "boxx" pizza-box-like server on our server-room, running nothing but PostgreSQL 9.3 (x64) and PG-Bouncer (for connection pooling) - on a minimalist CentOS 6.4 (x64) installation.
I optimized it as far as I can, with the data weighing less than a 100MB total, I have the data-folder sitting on a mounted "tmpfs" of 3GB (a-la RAMDISK), and the schema is heavily "index"ed,, so the queries are as fast as they can be.

As I said, it may be a case of simply having a very fast database-story...
"I'd guess your usecase is either 1k rows with 50"
That's exactly what I said - yes, it's a ~1K rows query, of a JOIN of 2 tables, selecting for about 20 columns.
That is our use-case - it can't be any different (at least not easily...)

We are not really using this version of web2py - I just did that for texting and posting here.
But we initially encountered this case using a very old version of web2py (1.8.95) - in which the exact numbers where:

904 rows in the result-set:
- Regular select() : 2.241 seconds
- In it, the portion that the executesql took: 0.023 seconds.
That is roughly a x44 times gap...

And as I said - Trying to convert the result-set into a simple list of simple-class-instances, yielded similar results.
Which made it conclusive : The overhead was in the object-instanciation and other pluming-related stuff like "__getitem__" and ".iteritems()" and the like...

I think the test-code you gave here is interesting - but to make it closer to our use-case, I would have a more complex query tried.
First, have 2 tables with a foreign-key linking them.
Second, they should each have about ~20+ fields of varying types.
Then the query should be a join of them, getting most/all columns.

What I notices, was that a "flat" query (one with no JOINs) may end-up being MUCH heavier to parse into Rows (or Rows-like) objects, as there are nested-loops involved, for having these intermediary Table-representing objects, that then have to have the result-set filtered-into them, according to each colname's "table-name" portion.

--
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
---
You received this message because you are subscribed to the Google Groups "web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to web2py+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Arnon Marcus

unread,
Nov 19, 2013, 5:54:30 PM11/19/13
to
You can also always do:
db.executesql(db(query)._select(), as_dict=True)
to get just a list of dictionaries for any DAL select. Allowing custom column names, as you suggest, sounds like a good idea.

Well, that's basically what my suggested quickSelect() method is doing...
But as you said - out of the box, there is a 'colname' argument that is disregarded inside the 'as_dict' if-block, so you can only get a practically-unusably-orgeenized pile of values in each record in the record-set... Having the table+column names ordered as they were in the query-string itself, as the mapping-source for the disctionary-keys of each record, is what I am really proposing here - and it is SOOO simple and short to implement...

As for being it an attribute of the .select() method, or a separate method, is generally a matter of personal-taste, I thing - I initially had like a 'quick' default-argument added to the original .select() method, and had that "popped"-out of the **arguments dictionary with a Fals'y default...

That's a minor api-structure detail, I personally don't care that much...

I DO, however, recoil form the idea of having all of my selects in my application look like:
"db.executesql(db(query)._select(<my-selection-attrs>), as_dict=True, colname=<some-list-of-column-and-field-names-from-somewhere>)
...
That's just way too ugly for my taste...

Arnon Marcus

unread,
Nov 19, 2013, 7:26:10 PM11/19/13
to web...@googlegroups.com


On Wednesday, November 20, 2013 12:27:23 AM UTC+2, Michele Comitini wrote:
Use the profiler to make your tests even more effective


Well, I AM using cProfiler that comes with python...
Not sure this comment was directed to me, or Niphlod...

I wasn't using 'timeit', though, so that might have something to do with the discrepancy...

Generally, there is no perfect-profiler... They all 'lie' differently, and to different degrees... It's all 'Heisenberg' stuff to some degree or another, as any timing code is a change to the execution-logic...

cProfiler is a c-compiled-level instrumentation of the actual c-python interpreter - as far as I understand...

'timeit' is just a python-module in the STL...
You have to code the setup/execution/tear-down structure yourself, as Niphlod did - cProfiler just lets you run your code unchanged..

There are overheads to each, and they might be different, so that may help explain the different results we are getting, but I'm still not convinced that this is the largest influence on the discrepancy. I think it has more to do with the actual structure of the use-case.

If you have a simple app with either a simple schema/query-patterns, small-payload queries, or both, than yeah, the parsing overhead would probably seem negligible - but once you get past that, hand you data-structure, schema,  query-patterns and payloads start inflating, that these issues start to emerge.

If I am misinterpreting the test-results, misunderstanding the implications, or in some way not reading correctly what's going on, I'd love to be corrected - but this is how it appears to me now.

Arnon Marcus

unread,
Nov 20, 2013, 6:04:35 AM11/20/13
to web...@googlegroups.com
I get these results:

C:\Python26\python.exe F:/Experimental/web2py_272/web2py/test_executesql.py
Times are for 100 iterations
Preparing....
started bench
speed_executesql() total=1.132703 avg=0.011327
speed_normal_rows() total=5.320367 avg=0.053204
speed_rows_with_cacheable() total=5.349665 avg=0.053497
speed_executesql_with_colnames() total=1.083485 avg=0.010835
speed_rows_with_simple_processor() total=1.202628 avg=0.012026
Process finished with exit code 0

Pretty similar results...
Although a little slower overall, same gaps appear.


Then I tested it with timeit using your approach:

Times are for 10 iterations
Preparing....
started bench
speed_normal_rows() total=47.711369 avg=4.771137
speed_executesql_with_colnames() total=0.513249 avg=0.051325

That is a x92.959318 times (!) faster
Almost full 2 orders of magnitude.

Paolo Valleri

unread,
Nov 20, 2013, 6:19:16 AM11/20/13
to web...@googlegroups.com
How should I test 'your approach'? Would you share a patch?
Please, compare the two tests with the same number of iterations.

Paolo

Niphlod

unread,
Nov 20, 2013, 6:47:47 AM11/20/13
to web...@googlegroups.com


On Tuesday, November 19, 2013 11:12:52 PM UTC+1, Arnon Marcus wrote:
10x Niphlod, I'll check that tomorrow...

I can provide any code you like - this is solid and consistent.

Exactly what I'm asking. a self-contained testbase to reproduce (including data, models and such).
 

The interesting (and somewhat disappointing) factoid I got from your results, was that PyPy was just as "slow" in un-parsed mode... ;)

pypy isn't a silver bullet for everything. 

BTW, lots of the time is spent actually interacting with SQLite, so the fact that a pure python library on pypy catches up with the C library shipped with python is quite itself the proof that pypy is a reeeeaally good project.

BTW2: look at the "with_simple_processor" code. It does what your patch, but in one line, and it works for any select().

Arnon Marcus

unread,
Nov 20, 2013, 8:32:12 AM11/20/13
to web...@googlegroups.com
Here we go:

I edited the original test-code that Niphlod posted, just enough to put my hypothesis to the test.
Here are the results:

C:\Python26\python.exe F:/Experimental/web2py_272/web2py/test_executesql.py
Times are for 100 iterations
Preparing....

started bench
speed_executesql
() total=0.920402 avg=0.009204
speed_normal_rows
() total=20.190159 avg=0.201902
speed_rows_with_cacheable
() total=18.954632 avg=0.189546
speed_executesql_with_colnames
() total=1.070893 avg=0.010709
speed_rows_with_simple_processor
() total=1.118015 avg=0.011180



Process finished with exit code 0


I left in the addition of 'colname or [...' to the 'if as_dict:' block of the executesql() in the dal.py, so it actually did the addition of the colnames that were given - otherwise, in the original test-code of Niphlod, it was disregarded... (as 'as_dict=True' is passed-in)
So, the 'speed_executesql_with_colnames()' is actually doing what it should in this test-run.

Let's see:
speed_executesql_with_colnames is about x20 times faster than speed_normal_rows in this run...

You can test it out yourself - here is the code:

test_executesql.py

Paolo Valleri

unread,
Nov 20, 2013, 8:52:42 AM11/20/13
to web...@googlegroups.com
Hi, below my results:
Times are for 100 iterations
Preparing....
started bench
speed_executesql() total=0.876057 avg=0.008761
speed_normal_rows() total=15.076992 avg=0.150770
speed_rows_with_cacheable() total=13.477355 avg=0.134774
speed_executesql_with_colnames() total=0.989993 avg=0.009900
speed_rows_with_simple_processor() total=1.084824 avg=0.010848


Mind that your code doesn't work properly, I got the error: "sqlite3.IntegrityError: foreign key constraint failed". Find attached an updated version.

Paolo
test_executesql_0.1.py

Arnon Marcus

unread,
Nov 20, 2013, 9:45:37 AM11/20/13
to web...@googlegroups.com
Didn't have that bug, but you are correct - the fix makes it safer.

So, you are getting a x15 times performance-boost - interesting...

BTW: For anyone not checking the code, it is now creating 2 tables of 10 fields each (simple string fields), and another field for the second-table, linking it to the first (foreign key).
It then inserts a 1000 records to each table - then queries all records with a JOIN query.
Pretty straight-forward.
I suppose if the fields were more varied in type, the gap would be even bigger.

I am doing some more tests with a more customizable version of the code:

Let's see what happens when we increase the record-count 10 fold (210k values):

Times are for 100 iterations, of querying for 10000 results, each with 21 values
Preparing....
started bench
speed_executesql
() total=11.081626 avg=0.110816
speed_normal_rows
() total=204.889276 avg=2.048893
speed_rows_with_cacheable
() total=190.929696 avg=1.909297
speed_executesql_with_colnames
() total=13.822615 avg=0.138226
speed_rows_with_simple_processor
() total=13.638187 avg=0.136382

Hmm... Ok, not much different - about x15.9 times faster.

Now, let's increase the column-count 10 times (same total of 210k values):

Times are for 100 iterations, of querying for 1000 results, each with 201 values
Preparing....
started bench
speed_executesql
() total=10.222960 avg=0.102230
speed_normal_rows
() total=159.059873 avg=1.590599
speed_rows_with_cacheable
() total=157.991245 avg=1.579912
speed_executesql_with_colnames
() total=10.322220 avg=0.103222
speed_rows_with_simple_processor
() total=10.510867 avg=0.105109

Same thing, x15.9 times faster...

Hmm... I'll try to customize the table-count...
test_executesql_0.2.py

Arnon Marcus

unread,
Nov 20, 2013, 12:12:00 PM11/20/13
to web...@googlegroups.com
I have elaborated the test-script - this time, with a customizable-number of chained-tables.
I went all the way, to test the hypothesis that it is the JOINs that are the culprit.
Lets see how we do with a single record, joining 50 tables of 1 field each + foreign-key-field (except the first clumn): 

C:\Python26\python.exe F:/Experimental/web2py_272/web2py/test_executesql_0.3.py
Preparing....
Times are for 100 iterations, into 50 tables, of querying for 1 results, each with 99 values
started bench
speed_executesql
() total=0.019787 avg=0.000198
speed_normal_rows
() total=0.731611 avg=0.007316
speed_rows_with_cacheable
() total=0.669495 avg=0.006695
speed_executesql_with_colnames
() total=0.016620 avg=0.000166
speed_rows_with_simple_processor
() total=0.364101 avg=0.003641



Process finished with exit code 0



Again - a x44 times increase in performance - quite consistent.

Now let's see a more modest and realistic example - 5 tables of 10 fields each - 1k records query:

C:\Python26\python.exe F:/Experimental/web2py_272/web2py/test_executesql_0.3.py
Preparing....
Times are for 100 iterations, into 5 tables, of querying for 1000 results, each with 54 values
started bench
speed_executesql
() total=2.619404 avg=0.026194
speed_normal_rows
() total=52.479113 avg=0.524791
speed_rows_with_cacheable
() total=48.437764 avg=0.484378
speed_executesql_with_colnames
() total=2.789842 avg=0.027898
speed_rows_with_simple_processor
() total=2.865223 avg=0.028652


This is x19 times speed-improvement between speed_executesql_with_colnames and speed_normal_rows
This is x17 times speed-improvement between ..simple_processor and ..rows_with_cacheable

Here is the file:
test_executesql_0.3.py

Arnon Marcus

unread,
Nov 20, 2013, 12:40:48 PM11/20/13
to web...@googlegroups.com
Everyone:

You may play with the numbers at the top in this version of the code - everything is now customizable for generating any number of chained-joined tables with any number of fields and inserting any number of records.

I find that the more joins you have, the more overhead you'll pay when parsing to a Rows object with regular .select().
It can be anywhere from a x15 speed-bump with just 2 tables, up-to x44 with an extreme-case of 50 tables.
Usually, it would hover around ~x20 or so - in real-world use-cases.
But this is using just simple string-based fields... It goes way-up from there when you introduce other data-types.
As evident with our real-world example of almost x100 penalty for regular .select() in a simple 2-table join of about 15 columns.

Caching gains maybe 5% improvement...

I also found the .select(processor=<>) approach to be anywhere from 10% slower to x6 slower than my .quickSelect() approach for the same queries. The gap get's worse for more simple the use-case:
i.e:
C:\Python26\python.exe F:/Experimental/web2py_272/web2py/test_executesql_0.3.py
Preparing....
Times are for 100 iterations, into 2 tables, of querying for 1 results, each with 3 values
started bench
speed_executesql() total=0.002431 avg=0.000024
speed_normal_rows() total=0.028979 avg=0.000290
speed_rows_with_cacheable() total=0.029703 avg=0.000297
speed_executesql_with_colnames() total=0.002551 avg=0.000026
speed_rows_with_simple_processor() total=0.015669 avg=0.000157

Process finished with exit code 0

speed_rows_with_simple_processor() is x6 times slower than speed_executesql_with_colnames()

This is when configured as such:
import timeit
from gluon.dal import DAL, Field


tableCount
= 2
fieldCount
= 1
recordCount
= 1
times
= 100


##prepare a large dict
...

Granted, this is hardly an issue, though...
But there is a gradient between the more simple and the more complex.

Try it out...
test_executesql_0.3.py

Anthony

unread,
Nov 20, 2013, 2:35:07 PM11/20/13
to web...@googlegroups.com
Can you figure out why .select(processor=...) is slower than .executesql()? At least for iterating over the rows, they appear to do the same thing. Is there just a small fixed difference between the two that remains nearly the same no matter how long the overall processing takes?

Also, note that your .executesql() testing times should include the following code in order to be comparable with the .select() examples:

selection = [db[tableName].ALL for tableName in tableNames]
sql
= db(query)._select(*selection)
sql_colnames
= sql.split(' FROM')[0][7:].split(', ')

Those lines are used only by the .executesql() examples, and should therefore not be excluded from the timings.

But this is using just simple string-based fields... It goes way-up from there when you introduce other data-types.

Of course, when you're selecting other data types from the database, you typically need them to be parsed anyway. If we implement some kind of faster parsing functionality, we should probably enable options to specify which data types to parse. A result set with zero parsing of any data types may be very fast, but then very inconvenient to work with (and may require manual parsing, eliminating the initial speed benefits).

Anthony

Niphlod

unread,
Nov 20, 2013, 3:51:17 PM11/20/13
to web...@googlegroups.com
splitting the hair in a half,
db(query)._select()
takes roughly 0.0005 in my computer, compared to the "total" of executesql() that has it "pregenerated", of 0.026, so we're talking about the 2% of it.
Times to generate the query are actually a lot faster than the time to parse the resultset (as expected).
Let's see if I can break further down the parse() to see where it "drops" from 0.022 to 0.396

Niphlod

unread,
Nov 20, 2013, 4:49:06 PM11/20/13
to web...@googlegroups.com
re-splitting the hair in half.
I'm comparing
db(query).select(cacheable=True)
with
db(query).select(processor=myprocessor)
to find the culprit.
parse_value() is evidently the one blowing times up
Skipping back a little bit, I can shave a 1% (from 0.401 to 0.397) changing
#L2221
to
fields[j]['type']

Let's move "forward" to parse_value() (gets called 1k * num_of_columns times).
Lot's of ifs (L2016, 2021, 2023).
Commenting the first drops to 0.389
Commenting also the second drops to 0.318
Commenting also the third drops to 0.299

So, what actually I didn't expect sucking up times are those 3 ifs, that shaved off the 25% of the performances.
I guess those are "safety-precautions".... but are they really needed ?

Arnon Marcus

unread,
Nov 20, 2013, 7:28:34 PM11/20/13
to web...@googlegroups.com
My guess about .select(processor=..) when compared to having a separate "flat" function, is that the additional layer of indirection (function invocation) incurs a small penalty that shows-up.

I did a non-web2py-related test earlier, that showed consisant x2.2 penalty of adding a single-layer of indirection, by re-factoring-out a line into a single-liner function

Was something like this:

amount = 100


def doWorkDirectly():
   
List = []
   
for i in range(amount):
       
List.appent('a')




def doWork(List): List.append('a')


def doWorkIndirectly():
   
List = []
   
for i in range(amount):
        doWork
(List)


 
if you bench doWorkDirectly against doWorkIndirecty, you would get a very solid and sonsistant x2.2 penalty in favor of doWorkDirectly - regardless of the order-of-magnitude of 'amount' (same gap for 10, 1000 and 100000)...

So this is a c-python implementation detail.

Granted, it would show-up more in very simple processing-payload, as there the invocation-overhead overshadows the actual computation expense - in more expensive procedures,  there would still be the same penalty, but it would be less noticeable.

So, passing a one-liner callback (or a lambda, though I havent tried that...) into ".select(processor=...)", for a computationally-simple expense such as zipping a list into a dictionary, .may fall into that category - you may loose more than you gain by it...

As for the parsing logic - that's beyond me, personally...
Shouldn't that move into the web2py-devs group?

Paolo Valleri

unread,
Nov 22, 2013, 5:10:43 AM11/22/13
to web...@googlegroups.com
+ 1 for changing dal.py:8359  in
fields = colnames or [f[0] for f in columns]

This will allow us to change the columns name even when as_dict=True. Example:
sql  = db(db.coordinate.is_active == True)._select(db.coordinate.point2d.count(), db.coordinate.point3d.count())
rows = db.executesql(sql, colnames=['count_point2d', 'count_point3d'], as_dict=True)
print sql, rows[0:1]
will print:
SELECT  COUNT(coordinate.point2d), COUNT(coordinate.point3d) FROM coordinate WHERE (coordinate.is_active = 'T');
[{'count_point2d': 35148L, 'count_point3d': 35148L}]

while without the proposed change, I get:
 sql  = db(db.coordinate.is_active == True)._select(db.coordinate.point2d.count(), db.coordinate.point3d.count())
 rows = db.executesql(sql, as_dict=True)
 print sql, rows[0:1]

will print:

    SELECT  COUNT(coordinate.point2d), COUNT(coordinate.point3d) FROM coordinate WHERE (coordinate.is_active = 'T');
   [{'count': 35148L}]

The dict has only one element instead of two! This is because the key called 'count' can appear only once.
A workaround I found is to use an alias

count2d = db.coordinate.point2d.count().with_alias('count2d')
count3d = db.coordinate.point2d.count().with_alias('count3d')
sql  = db(db.coordinate)._select(count2d, count3d)
print sql
print db.executesql(sql,as_dict=True)
the output
SELECT  COUNT(coordinate.point2d) AS count2d, COUNT(coordinate.point2d) AS count3d FROM coordinate WHERE (coordinate.id IS NOT NULL);
[{'count2d': 60480L, 'count3d': 60480L}]
Now the dict has the two elements I was looking for. 
Beside the fact that this workaround is really less intuitive, it doesn't work with point2d and point3d since they're geometry() and alias at the moment doesn't work with them: see https://code.google.com/p/web2py/issues/detail?id=1789&sort=-id

If there are no objections I will make a PR later today.
Paolo

David Simmons

unread,
Mar 12, 2014, 5:43:52 AM3/12/14
to web...@googlegroups.com
Hi All

I've just come across this thread and wondered if the proposed change made it into a version of Web2py?

cheers

Dave

Massimo Di Pierro

unread,
Mar 12, 2014, 9:32:28 AM3/12/14
to web...@googlegroups.com
I believe it was.

Paolo Valleri

unread,
Mar 12, 2014, 11:43:58 AM3/12/14
to web...@googlegroups.com
This is the commit https://github.com/web2py/web2py/commit/5f1caf668b2a7d856c242f5866e10dfa77e133a8
I don't know if there are other changes


 Paolo


--
You received this message because you are subscribed to a topic in the Google Groups "web2py-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/web2py/6c0iPBa4xH4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to web2py+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

David Simmons

unread,
Mar 12, 2014, 2:55:30 PM3/12/14
to web...@googlegroups.com
Thanks both. It is great to see the improvements in Web2py.

cheers

Dave 
Reply all
Reply to author
Forward
0 new messages