Postgresql Index & expensive queries [n00bie alert]

194 views
Skip to first unread message

Barry Morrison

unread,
Oct 19, 2012, 11:16:25 PM10/19/12
to django...@googlegroups.com

I've got a pretty expensive query...Wondering if I can't speed things up in regards to Postgresql.

Here is the model: [1] http://dpaste.org/JmEeQ/

Here is the sql statement: [2] http://dpaste.org/GbfAJ/

Here is the template: [3] http://dpaste.org/vxRs4/

Here is the 'guts' of the view: [4] http://dpaste.org/w0b2z/

Total Postgresql/SQL n00b, so this may be a stupid statement. I'm wondering if an index on '"press_page"."article_id"' wouldn't speed things up a bit?

In dev, I don't have the amount of data that exists in production, so I can't easily/realistically recreate this scenario.

FWIW, I'm running:

  • Django 12.04
  • Postgresql 9.1.5
  • Django 1.4.1

I'm using memcache on the view to cache it, it helped take page load down significantly, but I was wondering if there wasn't something I could do from the DB side of things since I know it's this query that is taking the page so long to load.

Thanks!

Sam Lai

unread,
Oct 20, 2012, 1:25:31 AM10/20/12
to django...@googlegroups.com
On 20 October 2012 14:16, Barry Morrison <bdmor...@gmail.com> wrote:
> I've got a pretty expensive query...Wondering if I can't speed things up in
> regards to Postgresql.
>
> Here is the model: [1] http://dpaste.org/JmEeQ/
>
> Here is the sql statement: [2] http://dpaste.org/GbfAJ/
>
> Here is the template: [3] http://dpaste.org/vxRs4/
>
> Here is the 'guts' of the view: [4] http://dpaste.org/w0b2z/
>
> Total Postgresql/SQL n00b, so this may be a stupid statement. I'm wondering
> if an index on '"press_page"."article_id"' wouldn't speed things up a bit?

Are you using the django-debug-toolbar? If not, consider using it -
once installed, visit that page in your browser, open up the toolbar,
select SQL queries, find the offending query (processing times are
shown), and click the Explain link. That will tell you what PostgreSQL
is doing, and how performance can be improved.

Alternately, fire up a terminal and launch the psql database shell.
Type in the query ([2]) but prefix it with the word EXPLAIN. This will
tell you what PostgreSQL is doing to process that query. If you see
Seq Scan in there, consider adding the index. Remember to run the SQL
statement - ANALYZE press_page - after you create the index so it can
update the internal statistics that it uses to decide how to process a
query.

You can also do this within the PGadmin application if that makes things easier.

> In dev, I don't have the amount of data that exists in production, so I
> can't easily/realistically recreate this scenario.

You really need to have a good subset (considering both the number of
records, and the distribution of values within each record) if it is
not possible to have a copy of the production database. PostgreSQL
performs analysis on the data itself and uses those statistics to
decide how to process the query (whether to use an index, scan
sequentially etc.). If you dev database is not a good representation,
then the results you'll see with your dev database, and hence your
improvements, may have no effect or a negative effect in production.

I do believe having an index on press_page.article_id is a good idea
though. Also, are you really showing every article in that view?
Consider using a limit so you're not retrieving all the records, only
to use the first 10. The pagination features in Django may be useful
here. Finally, an index on press_page.pgnumber may also be worth
considering so PostgreSQL may be able to avoid the sorting step (an
index of the default type is sorted by the field in ascending order).

> FWIW, I'm running:
>
> Django 12.04
> Postgresql 9.1.5
> Django 1.4.1
>
> I'm using memcache on the view to cache it, it helped take page load down
> significantly, but I was wondering if there wasn't something I could do from
> the DB side of things since I know it's this query that is taking the page
> so long to load.
>
> Thanks!
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/django-users/-/U92OjnMgEggJ.
> To post to this group, send email to django...@googlegroups.com.
> To unsubscribe from this group, send email to
> django-users...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/django-users?hl=en.

Xavier Ordoquy

unread,
Oct 20, 2012, 2:54:36 AM10/20/12
to django...@googlegroups.com
Hi,

As Samuel said, the first step is to install Django Debug Toolbar to see what's going on with your queries.

Regards,
Xavier Ordoquy,
Linovia.

Barry Morrison

unread,
Oct 20, 2012, 8:26:45 AM10/20/12
to django...@googlegroups.com
I am using Django Debug Toolbar locally for development.  That is where I copied & pasted the SQL from.

Even if I had a complete set of data for postgresql, my local machine is an 8-core, 16GB of RAM, with solid state hard drives.  A LOT different than my 2CPU/1GB RAM/spindle disk VPS. 

I guess what I need to do is clone my VPS off, firewall the crap out of it and run Django Debug Toolbar on that...giving me a more accurate idea of what's going on. 

Thanks for the response.

Barry Morrison

unread,
Oct 20, 2012, 8:27:52 AM10/20/12
to django...@googlegroups.com
prefetch-related is awesome!  Thank you for that. 

Thanks!

Barry

Barry Morrison

unread,
Oct 20, 2012, 6:05:43 PM10/20/12
to django...@googlegroups.com
Here is what exists in Dev re: Query Plan from debug toolbar:

QUERY PLAN
Sort (cost=8.28..8.28 rows=1 width=740)
  Sort Key: pgnumber
  -> Index Scan using press_page_article_id_like on press_page (cost=0.00..8.27 rows=1 width=740)
      Index Cond: ((article_id)::text = 'Test'::text)

Barry Morrison

unread,
Oct 20, 2012, 6:34:01 PM10/20/12
to django...@googlegroups.com
Looking further at Postgresql, what's interesting.  Is it creates an 'id' column and appends '_id' to what is assumed to be the id field (if none is specified). 

Also, it appears the magic may have already created the indexes:

django_db=# \d press_page
                                   Table "public.press_page"
   Column   |          Type          |                        Modifiers                       
------------+------------------------+---------------------------------------------------------
 id         | integer                | not null default nextval('press_page_id_seq'::regclass)
 article_id | character varying(255) | not null
 page       | character varying(100) |
 pgnumber   | smallint               | not null
Indexes:
    "press_page_pkey" PRIMARY KEY, btree (id)
    "press_page_article_id" btree (article_id)
    "press_page_article_id_like" btree (article_id varchar_pattern_ops)
Check constraints:
    "press_page_pgnumber_check" CHECK (pgnumber >= 0)
Foreign-key constraints:
    "press_page_article_id_fkey" FOREIGN KEY (article_id) REFERENCES press_article(publication) DEFERRABLE INITIALLY DEFERRED

Christophe Pettus

unread,
Oct 20, 2012, 9:27:27 PM10/20/12
to django...@googlegroups.com

On Oct 20, 2012, at 11:34 PM, Barry Morrison wrote:

> Looking further at Postgresql, what's interesting. Is it creates an 'id' column and appends '_id' to what is assumed to be the id field (if none is specified).
>
> Also, it appears the magic may have already created the indexes:

It's not actually PostgreSQL that's creating those. That's Django's standard behavior.

1. It automatically creates an id column for any model that lacks an explicit primary key.
2. It automatically adds an index to foreign key models (for text fields, you get both the standard one and the varchar_pattern_ops one, which is technically no longer required).

--
-- Christophe Pettus
x...@thebuild.com

Reply all
Reply to author
Forward
0 new messages