Re: Proposal: implement Postgres CLUSTER USING by default for MtM tables

57 views
Skip to first unread message

Russell Keith-Magee

unread,
May 22, 2013, 7:43:14 PM5/22/13
to django-d...@googlegroups.com
Hi Bryce,

An interesting idea. How are you proposing that this feature be exposed as an API? 

Is there any reason that this couldn't be handled as a DB-admin operation? i.e., Django just does what it normally does, and if a DB admin notices that table clustering is a problem, they apply the cluster command at the database level. Given that this is a very low level optimisation, is there any reason that this needs to be exposed as a Django-level API optimisation option?

Yours,
Russ Magee %-)

On Wed, May 22, 2013 at 9:28 AM, <bry...@obviously.com> wrote:
The tables created by django for MtM fields are unordered.  Loading all the relationships may result in a number of I/O operations approaching the number of relationships:

# select * from basetable;
 id |basetable_id| mtm_id 
----+------------+----------
  6 |          3 |        1
  7 |          2 |        2
 15 |          1 |        1
 16 |          1 |        2
 18 |          3 |        5

Postgres has a feature to 'defragment' such tables packing data that gets accessed at the same time into the same disk blocks.  The number of I/O operations can be as small as one:

# \pset pager off
# cluster basetable using table_mem_id;
# select * from basetable;
 id |basetable_id| mtm_id 
----+------------+----------
 15 |          1 |        1
 16 |          1 |        2
  7 |          2 |        2
  6 |          3 |        1
 18 |          3 |        5

On a large table this can put hundreds of MtM relationships sequentially on disk.  This makes for less I/O operations and less cache space used.

----
This feature is postgres specific.
And it would involve django making a guess about the use pattern of the relationship.  Either the forward or reverse is optimized.

--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Donald Stufft

unread,
May 22, 2013, 7:46:35 PM5/22/13
to django-d...@googlegroups.com
What do you mean by "clustering by default". CLUSTER is a one time operation. You use to to arrange the orders of the row in a table but after they have been arranged it has no long lasting affects past that.

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

signature.asc
Reply all
Reply to author
Forward
0 new messages