Specifying the field index types

242 views
Skip to first unread message

Zev Benjamin

unread,
Sep 23, 2013, 10:36:56 AM9/23/13
to django-d...@googlegroups.com
Hi,

I'd like to be able to specify what kind of index the database should use for a particular field.  I have a proof of concept branch that works with PostgreSQL at https://github.com/zbenjamin/django/compare/index-types, but I'd like to solicit opinions on whether there's a better way of doing it.  The way it works in my branch is that you specify the index type by making the Field db_index argument a string containing the name of the index type you'd like to use.  Specifying db_index=True uses the default index type.

Example usage:

class IndexTest(models.Model):
    unindexed = models.IntegerField(db_index=False)
    default_indexed = models.IntegerField(db_index=True)
    btree_indexed = models.IntegerField(db_index="btree")
    hash_indexed = models.IntegerField(db_index="hash")



Zev

Michael Manfre

unread,
Sep 23, 2013, 1:06:12 PM9/23/13
to django-d...@googlegroups.com
My personal opinion is that If you need to get in to this much detail about your indices, you're probably better off tweaking things directly against the database and not specifying them in the model. With that said, here are a few suggestions for your proof of concept.

Don't piggyback db_index. It would be cleaner to add another argument "db_index_type" that can hold whatever values the database backend is willing to support.

There should not be any backend specific logic outside of the database backends and test suite. I'd recommend altering your validation.py changes to check a new DatabaseFeature supports_db_index_types and adding a method to BaseDatabaseCreation that can validate the db_index_type. db_index_type doesn't necessarily need to be a string. Better to let the database backends decide whether it is best to have db_index_type represented as a string, int, or object.

Regards,
Michael Manfre



--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
For more options, visit https://groups.google.com/groups/opt_out.

Zev Benjamin

unread,
Oct 1, 2013, 4:14:37 PM10/1/13
to django-d...@googlegroups.com
Thanks for the feedback, Michael.


On Monday, September 23, 2013 1:06:12 PM UTC-4, Michael Manfre wrote:
My personal opinion is that If you need to get in to this much detail about your indices, you're probably better off tweaking things directly against the database and not specifying them in the model.

The problem is that extensions like django-hstore want to be able to provide indexing like normal fields and the index type in that situation can make a huge performance difference.
 
With that said, here are a few suggestions for your proof of concept.

Don't piggyback db_index. It would be cleaner to add another argument "db_index_type" that can hold whatever values the database backend is willing to support.

My original reason for not doing it this way was that there might be conflicts between db_index and db_index_type.  For example, what should happen if db_index_type is specified, but db_index is False?  But I suppose we can make db_index_type override db_index like Django already does for foreign keys.
 

There should not be any backend specific logic outside of the database backends and test suite. I'd recommend altering your validation.py changes to check a new DatabaseFeature supports_db_index_types and adding a method to BaseDatabaseCreation that can validate the db_index_type. db_index_type doesn't necessarily need to be a string. Better to let the database backends decide whether it is best to have db_index_type represented as a string, int, or object.

I've created a new version that takes this approach at https://github.com/zbenjamin/django/tree/index-types2.  What are the next steps if I'd like to submit this upstream?  I know the branch needs documentation and tests, but my understanding is that there also must be an accepted Django ticket and I'm a little fuzzy on that process.


Thanks,
Zev
 

Alex Burgel

unread,
Oct 2, 2013, 1:19:31 PM10/2/13
to django-d...@googlegroups.com
This is something that I'd also be interested in. For the Google App Engine backend, you have the ability to create indexes that require more configuration than just on/off. But I don't think a single additional field would do the trick for my case.

Another option would be to open up the Meta class for custom attributes. There's a thread which discusses this here:


The config might look something like this:

class Article(db.models):
  class Meta:
    custom_indexes = {'field1': 'hash', 'field2': 'btree'}
 field1 = db.IntegerField()
 field2 = db.CharField() 

The tradeoff would be that the index would not be specified in the field description, but it would be close enough. (Currently for the app engine backend, you specify these in a separate file that the backend looks for.)

Zev Benjamin

unread,
Oct 2, 2013, 1:29:10 PM10/2/13
to django-d...@googlegroups.com
What kind of additional configuration?  One advantage of not specifying the type of the db_index_type field is that you could instantiate an object that encapsulates the configuration.


Zev

Alex Burgel

unread,
Oct 2, 2013, 2:44:20 PM10/2/13
to django-d...@googlegroups.com
On Wednesday, October 2, 2013 1:29:10 PM UTC-4, Zev Benjamin wrote:
What kind of additional configuration?  One advantage of not specifying the type of the db_index_type field is that you could instantiate an object that encapsulates the configuration.

That's true. I am partial to opening up Meta for other reasons, but in this case, if you were to have an arbitrary config object that is passed to the db backend, then you could specify the App Engine indexes.

App Engine itself creates indexes on each field, by default all fields are indexed, so this is something that you have to explicitly turn off. There are also composite indexes, which index across many fields, these are controlled via a separate App Engine-specific file (so its unrelated to this discussion). But on top of all that there's an extension to the app engine backend called django-dbindexer that allows you to add more indexes, like for case-insensitive queries or contains queries. App Engine can't do these things out of the box. Currently, you use another special file for these indexes, but it would be nice to have this all configured on the field.

--Alex

Anssi Kääriäinen

unread,
Oct 15, 2013, 5:32:07 AM10/15/13
to django-d...@googlegroups.com

I have been trying to make contrib.gis to use less private APIs. While doing that it became evident that a better way for creating custom indexes is needed. Django's contrib.gis needs to override backend's private methods so that indexes can be created. This approach doesn't work if you need multiple different custom index types (and overriding private methods itself is a bad habit). For example, a 3rd party PostgreSQL field (say trigram field) would need custom indexes to work efficiently, but there is no easy way to do that currently (unless migrations have added something to allow this?).

I don't want to add a couple more index types to core and leave it to that. The solution must allow for 3rd party apps to create any index they need.

One possible solution idea is to add a get_index_definitions(self, connection) method to Field. By default the Field's get_index_definitions(self, connection) method will return  [StandardIndex(self)] if any index is needed by the field, else []. For contrib.gis it would be something like super(GeometryField, self).get_index_definitions(connection).append(GeometryIndex(self)) (Of course, only if self.spatial_index == True).

Index class instances have an as_sql(self, qn, connection) method. A core index definition's as_sql() would need to call back to connection (so that 3rd party backends are supported). For example, StandardIndex.as_sql() would be this:
    return connection.creation.standard_index_sql(self)

A 3rd party trigram field's index could use something like this:
   if connection.vendor == 'postgresql':
        return 'CREATE INDEX trgm_idx ON %s USING gist (%s gist_trgm_ops)' % (qn(self.field.model._meta.db_table), qn(self.field.column))
   elif hasattr(connection.creation, 'create_trgm_index'):
        return connection.creation.create_trgm_index(self)  # any custom backend could implement trigram index.
   raise NotSupportedException(("The connection %s doesn't support trgm indexes")

With virtual fields you could have fields that do not do anything else than create an index. So:

class MyModel(models.Model):
    a = models.IntegerField()
    b = models.IntegerField()
    my_composite_hash_index = HashIndex(a, b) # HashIndex is a virtual field that implements get_index_definitions() method, but doesn't do anything else for the model.

This same concept might be extended to other custom SQL (comments, check constraints, storage parameters, ...).

I don't know how this idea combines with migrations framework. Anybody have any idea of how to make this idea migrations friendly?

 - Anssi

Marc Tamlyn

unread,
Oct 15, 2013, 5:40:32 AM10/15/13
to django-d...@googlegroups.com
I believe the basics of how to make it migration friendly would simply be to make sure it is given in the field's `deconstruct` method. The Index class may also need to provide a `as_remove_sql(self, qn, connection)` method which would return the relevant `DROP INDEX` command for the db.

I think it might also be nice to allow the indexes to be overriden at field create time, so that I can do something like `CharField(max_length=255, db_indexes=[StandardIndex, BTreeIndex])` so I don't have to create a custom field subclass in order to add a db-specific index.

M


--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
Reply all
Reply to author
Forward
0 new messages