Re: Struggling with slow startup / SQL in new app (using raw SQL)

瀏覽次數:131 次
跳到第一則未讀訊息

Nikolas Stevenson-Molnar

未讀,
2013年1月21日 晚上11:24:092013/1/21
收件者:django...@googlegroups.com
Hi Matt,

Firstly, I encourage you to have another crack a the ORM. It can certainly seem a bit aggravating at times if you're coming from a SQL mindset, but really pays off down the road in terms of maintainability and readability. Typically you should only need raw queries in Django in cases where you have super-specialized (that uses views or non-standard functions) queries or need some specific optimization. If there's really no way to perform many of your "day-to-day" queries with the ORM then that's an indication that a different database design may fit your data model better. I understand that you may have a unique situation, but I just wanted to throw that out there as I personally find the ORM to be a huge time saver.

Now, with that out of the way... a couple of considerations: 1) you say it's a slow "startup"; if you refresh the page do the queries run just as slow the second time around? and 2) are your Django app and phpMyAdmin running on the same machine? If not, could transit time be an issue?

Finally, can you give an idea about the size of the tables in question? How many rows in each?

_Nik

On 1/21/2013 3:25 PM, Matt Andrews wrote:
Hi all,

Fairly new to Django. I ended up pulling out all of the ORM-generated queries and writing my own SQL directly (I got fed up trying to work out how to achieve the kind of things I needed without Django adding in extra joins or unintended WHERE clauses etc). All my app's SQL uses cursor.execute() and the dictfetchall() method as referenced here

I've found that my app incurs a couple of seconds load time in production, with CPU time at 2532ms and overall time 4684ms (according to the debug toolbar). I'm seeing 8 SQL queries take 380ms, and each one seems to be several times slower when made by Django versus hitting the database through phpMyAdmin or something: eg, this query:

SELECT * FROM news 
JOIN news_categories ON news.news_category_id = news_categories.id 
LEFT JOIN writers ON news.writer_id = writers.id 
LEFT JOIN images ON news.image_id = images.id 
ORDER BY news.is_sticky DESC, news.date_posted DESC 
LIMIT 10

This takes 14.8ms when run in phpMyAdmin (against the production database) but Django reports it as 85.2ms. The same ratios are true for all my other queries.

All I can think of is the note on the dictfetchall() method in the Django docs which describes a "small performance hit". Is this it?!

I've profiled the app too, although I'm a bit hazy about what it all means. Here's a dump of the result: http://pastebin.com/raw.php?i=UHE9edVC (this is from running on my local server rather than production but performance is broadly similar).

Can anyone help me? I realise I've perhaps gone off-piste by writing raw SQL but I feel it was justified.

thanks,
Matt


--
You received this message because you are subscribed to the Google Groups "Django users" group.
To view this discussion on the web visit https://groups.google.com/d/msg/django-users/-/Qiley9RqfngJ.
To post to this group, send email to django...@googlegroups.com.
To unsubscribe from this group, send email to django-users...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-users?hl=en.

Matt Andrews

未讀,
2013年1月22日 清晨5:26:042013/1/22
收件者:django...@googlegroups.com
Hi Nik,

Thanks - I do feel like by circumventing the ORM I've just "given up" and perhaps I'll reconsider -- none of my queries are particularly "specialist" (as the sample above indicates) - I just found Django generating odd things.

To answer your questions:

1. Yes, reloading the page sees the same time for the queries (it just feels as though the entire process takes a second or two to start, which is perhaps not related to SQL itself).

2. I believe so, yes (it's shared hosting...).

If you're curious, you can see a sample of the app at http://beta.scenepointblank.com (obviously you won't see the SQL, but the "delay" between pages, even though these pages are all cached for 2hrs+, is partly my concern here).

Jani Tiainen

未讀,
2013年1月22日 清晨5:53:402013/1/22
收件者:django...@googlegroups.com
Hi,

From your raw SQL I saw you're doing few joins. So I suppose you do
quite a few foreign key fetches.

You didn't mention anything how you originally tried to solve case with
ORM. Could you please publish what you had when things were slow?
>> <https://docs.djangoproject.com/en/dev/topics/db/sql/#django.db.models.Manager.raw>.
>>
>>
>> I've found that my app incurs a couple of seconds load time in
>> production, with CPU time at 2532ms and overall time 4684ms
>> (according to the debug toolbar). I'm seeing 8 SQL queries take
>> 380ms, and each one seems to be several times slower when made by
>> Django versus hitting the database through phpMyAdmin or
>> something: eg, this query:
>>
>> SELECT * FROM news
>> JOIN news_categories ON news.news_category_id =
>> news_categories.id <http://news_categories.id>
>> LEFT JOIN writers ON news.writer_id = writers.id
>> <http://writers.id>
>> LEFT JOIN images ON news.image_id = images.id <http://images.id>
>> ORDER BY news.is_sticky DESC, news.date_posted DESC
>> LIMIT 10
>>
>>
>> This takes 14.8ms when run in phpMyAdmin (against the production
>> database) but Django reports it as 85.2ms. The same ratios are
>> true for all my other queries.
>>
>> All I can think of is the note on the dictfetchall() method in the
>> Django docs which describes a "small performance hit". Is this it?!
>>
>> I've profiled the app too, although I'm a bit hazy about what it
>> all means. Here's a dump of the result:
>> http://pastebin.com/raw.php?i=UHE9edVC
>> <http://pastebin.com/raw.php?i=UHE9edVC> (this is from running on
>> my local server rather than production but performance is broadly
>> similar).
>>
>> Can anyone help me? I realise I've perhaps gone off-piste by
>> writing raw SQL but I feel it was justified.
>>
>> thanks,
>> Matt
>>

--
Jani Tiainen

- Well planned is half done and a half done has been sufficient before...

Matt Andrews

未讀,
2013年1月22日 清晨6:57:212013/1/22
收件者:django...@googlegroups.com
Hi Jani,

I made a StackOverflow post last year with an example of the ORM stuff I tried and the poor queries it produced: http://stackoverflow.com/questions/5843457/django-objects-all-making-360-queries-how-can-i-optimise-this-manytomany

There's also this discussion about how using the same queryset in two places in the template caused Django to request its data twice: http://stackoverflow.com/questions/9447053/best-way-to-slice-a-django-queryset-without-hitting-the-database-more-than-once (this was easily fixed, but again, not very intuitive and frustrated me)

I don't have the data to hand currently but I also remember seeing weird things happen where queries would end with stuff like "... LIMIT 234423445" (or some crazy number which I'd never entered and was orders of magnitude bigger than the number of rows in the table).

I'm aware these are probably edge cases that are down to my own novice status, but even using things like select_related(), it wasn't doing what I wanted. I just felt it easier to use my existing SQL (I'm converting a PHP app over to Django) and I'm not concerned about database portability (switching to postgres or whatever). 

Nik: just realised I missed your final question. For the SQL posted above, the numbers are approximately: 12,000 rows in the `news` table, maybe 10 `news_category` rows, about 100 `writers` and around 3000 `images`. All properly indexed and with sensible column types. 

Jani Tiainen

未讀,
2013年1月22日 清晨7:09:502013/1/22
收件者:django...@googlegroups.com
Hi,

I see that you had quite a bunch of m2m keys.

Have you tried recent version of Django (1.4+) with prefetch_related() [1] ?


[1]
<https://docs.djangoproject.com/en/1.4/ref/models/querysets/#prefetch-related>
> > http://beta.scenepointblank.com <http://beta.scenepointblank.com>
> <http://images.id> <http://images.id>
> --
> You received this message because you are subscribed to the Google
> Groups "Django users" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/django-users/-/H1vytr_AuFIJ.
> To post to this group, send email to django...@googlegroups.com.
> To unsubscribe from this group, send email to
> django-users...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/django-users?hl=en.


Matt Andrews

未讀,
2013年1月22日 清晨7:47:182013/1/22
收件者:django...@googlegroups.com
That looks really useful - for future reference, great! Sadly my webhost is still bundling Django 1.2.3 :(

In general, though, these issues still come down to the fact that a SQL query as executed by Django takes several times longer than going directly to the DB. Is this down to my use of dictfetchall() or something more low level?

Nikolas Stevenson-Molnar

未讀,
2013年1月22日 上午11:42:192013/1/22
收件者:django...@googlegroups.com
Hi Matt,

It's unlikely the problem lies with dictfetchall. The "small performance hit" mentioned by the docs probably refers to the fact that the function has a time complexity of (number of colulmns) * (number of rows). Since you're only getting 10 rows back, I can't see that having an appreciable impact.

I haven't seen this kind of variation between performing a direct query, and executing the query as part of a Django request… especially with such a basic query. Do you see the same difference in timing locally or is it only when deployed?

_Nik

To view this discussion on the web visit https://groups.google.com/d/msg/django-users/-/CxYZKPFVBQgJ.

Matt Andrews

未讀,
2013年1月22日 中午12:26:412013/1/22
收件者:django...@googlegroups.com
Hi Nik,

I see the discrepancy both locally and when deployed, too -- that's what's so puzzling. If it helps, I'm using MySQL and running it on Windows locally and under Passenger on my remote Linux server (also on MySQL).

Only other thing I can think of is that this "overhead" might literally be the extra work Django Debug Toolbar does to show queries etc. Obviously when I turn the toolbar off then I can't see (directly) the SQL times Django is getting, but could be a factor? Either way, with the toolbar off on production, there's definitely a slight "lag" before page requests start to kick in -- maybe this is unrelated to the SQL, though.

Nikolas Stevenson-Molnar

未讀,
2013年1月22日 下午2:54:532013/1/22
收件者:django...@googlegroups.com
When running locally, are you also using a local database instance? IIRC the time that phpMyAdmin reports is the time taken for the database to execute the query, whereas the time reported by the debug toolbar (I think) is complete round-trip (send query, execute, return results). So if there is latency between your application and your database, that could account for the discrepancy.

The debug toolbar does add overhead to your request (about doubles the page load time), but the numbers it reports should be more or less accurate (iow a page might take 1s to load with the toolbar, but the toolbar will report 500ms, which is accurate for load-time without the toolbar).

_Nik
To view this discussion on the web visit https://groups.google.com/d/msg/django-users/-/NvJBbauReMUJ.

Matt Andrews

未讀,
2013年1月22日 下午4:16:512013/1/22
收件者:django...@googlegroups.com
Yes, local app, local DB (which is an exact copy of the production DB). Interestingly, I just tried pointing the local app at the remote database and total query time (for 10 queries) went from 115ms to 3477ms! Really odd.

Nikolas Stevenson-Molnar

未讀,
2013年1月22日 下午4:26:042013/1/22
收件者:django...@googlegroups.com
That makes sense as connecting to the remote DB means increased transit time. I'm at a bit of a loss to explain why a query to the local DB via Django would be any slower than making that query directly. Maybe try a small test to time the query with the MySQLdb library (which Django is using) directly? http://mysql-python.sourceforge.net/MySQLdb.html#some-examples

Something like:
start = time.time()
#execute query
print "Execution took %.3f seconds" % time.time()-start
start = time.time()
#iterate results
print "Retrieving results took %.3f seconds" % time.time()-start

That would also tell you if actually getting results from the query is taking any appreciable time (again, phpMyAdmin is probably just reporting the execution time)

_Nik


On 1/22/2013 1:16 PM, Matt Andrews wrote:
Yes, local app, local DB (which is an exact copy of the production DB). Interestingly, I just tried pointing the local app at the remote database and total query time (for 10 queries) went from 115ms to 3477ms! Really odd.

On Tuesday, January 22, 2013 7:54:53 PM UTC, Nikolas Stevenson-Molnar wrote:
When running locally, are you also using a local database instance? IIRC the time that phpMyAdmin reports is the time taken for the database to execute the query, whereas the time reported by the debug toolbar (I think) is complete round-trip (send query, execute, return results). So if there is latency between your application and your database, that could account for the discrepancy.

The debug toolbar does add overhead to your request (about doubles the page load time), but the numbers it reports should be more or less accurate (iow a page might take 1s to load with the toolbar, but the toolbar will report 500ms, which is accurate for load-time without the toolbar).

_Nik

On 1/22/2013 9:26 AM, Matt Andrews wrote:
Hi Nik,

I see the discrepancy both locally and when deployed, too -- that's what's so puzzling. If it helps, I'm using MySQL and running it on Windows locally and under Passenger on my remote Linux server (also on MySQL).

Only other thing I can think of is that this "overhead" might literally be the extra work Django Debug Toolbar does to show queries etc. Obviously when I turn the toolbar off then I can't see (directly) the SQL times Django is getting, but could be a factor? Either way, with the toolbar off on production, there's definitely a slight "lag" before page requests start to kick in -- maybe this is unrelated to the SQL, though.


On Tuesday, 22 January 2013 16:42:19 UTC, Nikolas Stevenson-Molnar wrote:
Hi Matt,

It's unlikely the problem lies with dictfetchall. The "small performance hit" mentioned by the docs probably refers to the fact that the function has a time complexity of (number of colulmns) * (number of rows). Since you're only getting 10 rows back, I can't see that having an appreciable impact.

I haven't seen this kind of variation between performing a direct query, and executing the query as part of a Django request� especially with such a basic query. Do you see the same difference in timing locally or is it only when deployed?

_Nik

> � � Hi,
>
> � � � From your raw SQL I saw you're doing few joins. So I suppose you do
> � � quite a few foreign key fetches.
>
> � � You didn't mention anything how you originally tried to solve case with
> � � ORM. Could you please publish what you had when things were slow?
>
> � � 22.1.2013 12:26, Matt Andrews kirjoitti:
> � � �> Hi Nik,
> � � �>
> � � �> Thanks - I do feel like by circumventing the ORM I've just "given
> � � up"
> � � �> and perhaps I'll reconsider -- none of my queries are particularly
> � � �> "specialist" (as the sample above indicates) - I just found Django
> � � �> generating odd things.
> � � �>
> � � �> To answer your questions:
> � � �>
> � � �> 1. Yes, reloading the page sees the same time for the queries (it
> � � just
> � � �> feels as though the entire process takes a second or two to
> � � start, which
> � � �> is perhaps not related to SQL itself).
> � � �>
> � � �> 2. I believe so, yes (it's shared hosting...).
> � � �>
> � � �> If you're curious, you can see a sample of the app at
> � � �> http://beta.scenepointblank.com <http://beta.scenepointblank.com>
> � � (obviously you won't see the SQL, but
> � � �> the "delay" between pages, even though these pages are all cached
> � � for
> � � �> 2hrs+, is partly my concern here).
> � � �>
> � � �> On Tuesday, January 22, 2013 4:24:09 AM UTC, Nikolas
> � � Stevenson-Molnar wrote:
> � � �>
> � � �> � � Hi Matt,
> � � �>
> � � �> � � Firstly, I encourage you to have another crack a the ORM. It can
> � � �> � � certainly seem a bit aggravating at times if you're coming
> � � from a
> � � �> � � SQL mindset, but really pays off down the road in terms of
> � � �> � � maintainability and readability. Typically you should only
> � � need raw
> � � �> � � queries in Django in cases where you have super-specialized
> � � (that
> � � �> � � uses views or non-standard functions) queries or need some
> � � specific
> � � �> � � optimization. If there's really no way to perform many of your
> � � �> � � "day-to-day" queries with the ORM then that's an indication
> � � that a
> � � �> � � different database design may fit your data model better. I
> � � �> � � understand that you may have a unique situation, but I just
> � � wanted
> � � �> � � to throw that out there as I personally find the ORM to be a
> � � huge
> � � �> � � time saver.
> � � �>
> � � �> � � Now, with that out of the way... a couple of considerations:
> � � 1) you
> � � �> � � say it's a slow "startup"; if you refresh the page do the
> � � queries
> � � �> � � run just as slow the second time around? and 2) are your
> � � Django app
> � � �> � � and phpMyAdmin running on the same machine? If not, could
> � � transit
> � � �> � � time be an issue?
> � � �>
> � � �> � � Finally, can you give an idea about the size of the tables in
> � � �> � � question? How many rows in each?
> � � �>
> � � �> � � _Nik
> � � �>
> � � �> � � On 1/21/2013 3:25 PM, Matt Andrews wrote:
> � � �>> � � Hi all,
> � � �>>
> � � �>> � � Fairly new to Django. I ended up pulling out all of the
> � � �>> � � ORM-generated queries and writing my own SQL directly (I got
> � � fed
> � � �>> � � up trying to work out how to achieve the kind of things I
> � � needed
> � � �>> � � without Django adding in extra joins or unintended WHERE
> � � clauses
> � � �>> � � etc). All my app's SQL uses cursor.execute() and the
> � � �>> � � dictfetchall() method as referenced here
> � � �>>
> � � <https://docs.djangoproject.com/en/dev/topics/db/sql/#django.db.models.Manager.raw
> � � <https://docs.djangoproject.com/en/dev/topics/db/sql/#django.db.models.Manager.raw>>.
>
> � � �>>
> � � �>>
> � � �>> � � I've found that my app incurs a couple of seconds load time in
> � � �>> � � production, with CPU time at 2532ms and overall time 4684ms
> � � �>> � � (according to the debug toolbar). I'm seeing 8 SQL queries take
> � � �>> � � 380ms, and each one seems to be several times slower when
> � � made by
> � � �>> � � Django versus hitting the database through phpMyAdmin or
> � � �>> � � something: eg, this query:
> � � �>>
> � � �>> � � � � SELECT * FROM news
> � � �>> � � � � JOIN news_categories ON news.news_category_id =
> � � �>> news_categories.id <http://news_categories.id>
> � � <http://news_categories.id>
> � � �>> � � � � LEFT JOIN writers ON news.writer_id = writers.id
> � � <http://writers.id>
> � � �>> � � � � <http://writers.id>
> � � �>> � � � � LEFT JOIN images ON news.image_id = images.id
> � � <http://images.id> <http://images.id>
> � � �>> � � � � ORDER BY news.is_sticky DESC, news.date_posted DESC
> � � �>> � � � � LIMIT 10
> � � �>>
> � � �>>
> � � �>> � � This takes 14.8ms when run in phpMyAdmin (against the
> � � production
> � � �>> � � database) but Django reports it as 85.2ms. The same ratios are
> � � �>> � � true for all my other queries.
> � � �>>
> � � �>> � � All I can think of is the note on the dictfetchall() method
> � � in the
> � � �>> � � Django docs which describes a "small performance hit". Is
> � � this it?!
> � � �>>
> � � �>> � � I've profiled the app too, although I'm a bit hazy about
> � � what it
> � � �>> � � all means. Here's a dump of the result:
> � � �>> http://pastebin.com/raw.php?i=UHE9edVC
> � � <http://pastebin.com/raw.php?i=UHE9edVC>
> � � �>> � � <http://pastebin.com/raw.php?i=UHE9edVC
> � � <http://pastebin.com/raw.php?i=UHE9edVC>> (this is from running on
> � � �>> � � my local server rather than production but performance is
> � � broadly
> � � �>> � � similar).
> � � �>>
> � � �>> � � Can anyone help me? I realise I've perhaps gone off-piste by
> � � �>> � � writing raw SQL but I feel it was justified.
> � � �>>
> � � �>> � � thanks,
> � � �>> � � Matt
> � � �>>
>
> � � --
> � � Jani Tiainen
>
> � � - Well planned is half done and a half done has been sufficient
> � � before...
To view this discussion on the web visit https://groups.google.com/d/msg/django-users/-/UXh6v5jzI0gJ.
回覆所有人
回覆作者
轉寄
0 則新訊息