Adding an option on db_index to not create text/varchar_pattern_ops

500 views
Skip to first unread message

rodolphe.q...@novapost.fr

unread,
Mar 19, 2015, 7:44:58 PM3/19/15
to django-d...@googlegroups.com
As proposed by timgraham in https://code.djangoproject.com/ticket/24507 I open the discussion here.

When adding db_index=True on a CharField Django automatically create 2 indexes on some backends as PostgreSQL. But in usage the second index is not always necessary if you never use contains() or similar queries. As you can see here I extracted indexes usages statistics from one of our production server.

The index foo_bar_email_from_create_like is never use even if foo_bar_email_from_create is, and if we look on our queries this is totally logic and regular. And it's the same for foo_bar_tech_id and foo_bar_user_type, and it's the same on the other table.

        indexrelname                  |  idx_scan  | idx_tup_read | idx_tup_fetch
--------------------------------------+------------+--------------+--------------
foo_bar_address_like                  
|          0 |            0 |             0
foo_bar_current_profile_id            
|       1846 |          617 |           236
foo_bar_date_delete                  
|          0 |            0 |             0
foo_bar_email_from_create            
|      31209 |        90886 |         21903
foo_bar_email_from_create_like        
|          0 |            0 |             0
foo_bar_entity_id                    
|       8026 |        28957 |            14
foo_bar_pkey                          
| 1258565593 |   1418841848 |    1194873240
foo_bar_site_id                      
|    4495829 |  51000840065 |          3564
foo_bar_tech_id                      
|   25045160 |     28233693 |      25087324
foo_bar_tech_id_like                  
|          0 |            0 |             0
foo_bar_user_type                    
|      21428 |    263769329 |     216686751
foo_bar_user_type_like                
|          0 |            0 |             0
foo_bar_uuid_like                    
|          0 |            0 |             0
foo_bar_uuid_uniq                    
|   13134415 |     13157636 |      12928178

A last point is each index on this table is consumming more the 2Gb on disk.

Even if we can suppress the indexes, and this is what we do with (https://github.com/novafloss/django-json-dbindex) on bigger one realy problematic, we'd prefer to not create them.

Thanks for your time and this wonderful project Django is.

Alex Hill

unread,
Mar 20, 2015, 2:27:10 AM3/20/15
to django-d...@googlegroups.com
I agree that this is a problem, and I'd like to see more control over indexes generally in Django.

However, at first glance I suspect that fixing only your problem would mean adding special-case gunk to the Field class in addition to the Postgres schema editor where it currently lives[1]. It feels wrong to expand this special-casing when other types could benefit from having customisable indexes. For instance, the db_index parameter is useless for column types that don't work with a btree index, like jsonb and tsvector.

To fix your immediate problem, can't you avoid creating the extra indexes by just setting db_index=False, and then use django-json-dbindex to define the indexes you DO want?

I would like to see an index API that allowed Field subclasses to specify, per-backend, the indexes that should be created if db_index is True. Those defaults should be overridable by the user, perhaps by passing a sequence of some Index type as the db_index parameter instead of simply True or False.

Not all indexes relate to a single field, so it would make sense to also be able to register indexes in a model's Meta as well. The Index type should be able to use Expressions to create functional and multi-column indexes.

So, that's my wishlist. All that said if you can come up with a patch that fixes your problem in a sane way, great! :)

Cheers,
Alex

Marc Tamlyn

unread,
Mar 20, 2015, 6:51:13 AM3/20/15
to django-d...@googlegroups.com
I have some plans (see https://github.com/django/deps/pull/6) to rework more widely how indexes are managed and allow finer control over them. It pretty much includes all the things mentioned by Alex. I'm intending on doing some work on it over the next few weeks.

Marc

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/3333ad3c-dc43-498b-a9f3-382330bbedc2%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Alexander Hill

unread,
Mar 20, 2015, 10:45:22 AM3/20/15
to django-d...@googlegroups.com
Hi Marc,

Great, that's very interesting reading! A lot of similarities to what I had in mind and a couple of differences.

As I imagine the API, Indexes would be backend-specific rather than generic. Field would have a get_db_indexes() method which took a connection as its first argument like get_db_prep_value(), db_type(), etc. Every backend would implement a common interface to meet the most common needs, and from there its Indexes would be free to diverge.

Are the various DBMS's indexes really similar enough that the differences can be abstracted over in an elegant and useful way? What does such abstraction offer?

Take the operator class example that prompted this thread: how would TextField specify that it needed a text_pattern_ops index in PostgreSQL?

Cheers,
Alex

--
You received this message because you are subscribed to a topic in the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-developers/H2QFcQYsbo8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to django-develop...@googlegroups.com.

To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.

rodolphe.q...@novapost.fr

unread,
Mar 20, 2015, 11:41:14 AM3/20/15
to django-d...@googlegroups.com
Hi Alex and Marc,

Alex thanks for your answer, to be clear I did not open the discussion to find a solution to resolve my problems, they are done. My goal is to open discussion on indexes support in Dango. And as you right said, the solution is effectively to not create indexes in the models and use and external tool.

Marc, I add a look at your PR and it contains a lot of things I thought about these last weeks, thanks for that, I'd love to see Gist and Gin index support in Django, and indeed all other specific indexes in other RDBMS. I did not see if your PR includes partial indexes, have you plan to support them too or not ?

Regards
Reply all
Reply to author
Forward
0 new messages