[http://www.postgresql.org/docs/9.3/static/indexes-unique.html PostgreSQL
docs] say:
{{{
there's no need to manually create indexes on unique columns; doing so
would just duplicate the automatically-created index.
}}}
Further the docs say:
`The index covers the columns that make up the [...] unique constraint
[...] and is the mechanism that enforces the constraint.`
However this model in Django with `unique=True` on a `TextField` creates
an index on top of the unique constraint.
{{{
class Book(models.Model):
name = models.TextField(unique=True)
}}}
creates following table & constraints in PostgreSQL:
{{{
CREATE TABLE book (
id integer NOT NULL,
name text NOT NULL,
);
ALTER TABLE ONLY book ADD CONSTRAINT book_name_key UNIQUE (name);
CREATE INDEX book_name_like ON book USING btree (name text_pattern_ops);
}}}
Please correct me if I'm wrong. My conclusion is that database enforce
unique constraint by way of an index. Adding another index is a waste.
There's some mention of this fact in an old bug report
([https://code.djangoproject.com/ticket/3030#comment:3 comment 3] &
[https://code.djangoproject.com/ticket/3030#comment:6 comment 6] ) but it
looks like the issue got dropped.
However, if the justification to add a second index is
[https://code.djangoproject.com/ticket/12234 Bug Report 12234] then it
might be more efficient to interpret a unique constraint as
{{{
CREATE UNIQUE INDEX "book_name_like" ON "book" ("name" text_pattern_ops);
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/24082>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* needs_better_patch: => 0
* needs_tests: => 0
* needs_docs: => 0
Old description:
New description:
Further the docs say:
{{{
CREATE UNIQUE INDEX "book_name_like_idx" ON "book" ("name"
text_pattern_ops);
}}}
instead of a `UNIQUE` constraint, an `INDEX` and an implicit index by the
database.
--
--
Ticket URL: <https://code.djangoproject.com/ticket/24082#comment:1>
Old description:
> {{{
> CREATE UNIQUE INDEX "book_name_like_idx" ON "book" ("name"
> text_pattern_ops);
>
> }}}
>
> instead of a `UNIQUE` constraint, an `INDEX` and an implicit index by the
> database.
New description:
Further the docs say:
might be more efficient to interpret a `unique=True` as
{{{
CREATE UNIQUE INDEX "book_name_like_idx" ON "book" ("name"
text_pattern_ops);
}}}
instead of a `UNIQUE` constraint, an `INDEX` and an implicit index by the
database.
--
--
Ticket URL: <https://code.djangoproject.com/ticket/24082#comment:2>
Old description:
> I've experienced this with PostgreSQL but I suspect it could be true for
> might be more efficient to interpret a `unique=True` as
>
> {{{
> CREATE UNIQUE INDEX "book_name_like_idx" ON "book" ("name"
> text_pattern_ops);
>
> }}}
>
> instead of a `UNIQUE` constraint, an `INDEX` and an implicit index by the
> database.
New description:
I've experienced this with PostgreSQL but I suspect it could be true for
other databases too.
[http://www.postgresql.org/docs/9.3/static/indexes-unique.html PostgreSQL
docs] say:
{{{
there's no need to manually create indexes on unique columns; doing so
would just duplicate the automatically-created index.
}}}
Further the docs say:
`The index covers the columns that make up the [...] unique constraint
[...] and is the mechanism that enforces the constraint.`
However this model in Django with `unique=True` on a `TextField` creates
an index on top of the unique constraint.
{{{
class Book(models.Model):
name = models.TextField(unique=True)
}}}
creates following table & constraints in PostgreSQL:
{{{
CREATE TABLE book (
id integer NOT NULL,
name text NOT NULL,
);
ALTER TABLE ONLY book ADD CONSTRAINT book_name_key UNIQUE (name);
CREATE INDEX book_name_like ON book USING btree (name text_pattern_ops);
}}}
Please correct me if I'm wrong. My conclusion is that databases enforce
unique constraint by way of an index. Adding another index is a waste.
There's some mention of this fact in an old bug report
([https://code.djangoproject.com/ticket/3030#comment:3 comment 3] &
[https://code.djangoproject.com/ticket/3030#comment:6 comment 6] ) but it
looks like the issue got dropped.
I've also verified this with the following create table statement in
PostgreSQL (no explicit index). A `SELECT` on `name` uses an index scan
instead of a sequential scan (which means there's an implicit index). So
in this case, Django doesn't need to add a `CREATE INDEX` statement.
{{{
CREATE TABLE book (
id serial primary key,
name text UNIQUE
);
}}}
However, if the justification to add a second index is to use
`text_pattern_ops` ( [https://code.djangoproject.com/ticket/12234 Bug
Report 12234] ) then it might be more efficient to interpret a
`unique=True` in the above table as
{{{
CREATE TABLE book (
id serial primary key,
name text
);
CREATE UNIQUE INDEX book_name_like ON book USING btree (name
text_pattern_ops);
}}}
i.e. no `UNIQUE` constraint in the table, only a `UNIQUE INDEX`.
--
--
Ticket URL: <https://code.djangoproject.com/ticket/24082#comment:3>
Comment (by shaib):
Imposing uniqueness on text columns is almost always nonsense (and doesn't
work at all on Oracle). If it weren't for backwards compatibility, I'd
suggest we flag it as an error.
That said, the issue here really is whether `unique=True` implies
`db_index=True`. This holds for most field types, so I think most people
would expect it to hold for text fields as well.
You should get the behavior you expect, probably, with `unique=True,
db_index=False`. If this indeed gives you what you want, I'd resolve this
ticket by adding an admonition about this in the `TextField`
documentation.
--
Ticket URL: <https://code.djangoproject.com/ticket/24082#comment:4>
Comment (by djbug):
> the issue here really is whether `unique=True` implies `db_index=True`.
This holds for most field types, so I think most people would expect it to
hold for text fields as well.
Could you explain why should `unique=True` imply creation of an index
explicitly ? AFAIK, this would duplicate the index in PostgreSQL (& a few
other databases according to the old bug reports mentioned above). If we
remove this explicit index, there's no drop in performance because
internally the db maintains an index for a unique constraint.
> You should get the behavior you expect, probably, with `unique=True,
db_index=False`. If this indeed gives you what you want, I'd resolve this
ticket by adding an admonition about this in the `TextField`
documentation.
I've tried your suggestion. For a `TextFIeld`, the following two generate
the same constraint & index:
{{{
TextField( unique=True, db_index=False)
}}}
and
{{{
TextField( unique=True)
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/24082#comment:5>
Comment (by marfire):
In my opinion it would be better to not have `unique=True` imply
`db_index=True`. How the database enforces the constraint seems more like
an implementation detail than something Django should guarantee.
However, this is the
[https://docs.djangoproject.com/en/1.7/ref/models/fields/#unique currently
documented behavior] ("Note that when `unique` is `True` you don’t need to
specify `db_index`, because `unique` implies the creation of an index"),
so there would be a backwards-compatibility problem in changing it.
The fact that you can't prevent the creation of the extra index with an
explicit `db_index=False` does feel like a bug, though, and something that
could be fixed without a deprecation cycle.
As for the suggestion to use a single `text_pattern_ops` unique index to
serve double duty, the [http://www.postgresql.org/docs/9.3/static/indexes-
opclass.html PostgreSQL documentation] makes it clear that that will not
suffice to support all of Django's lookup types: "Note that you should
also create an index with the default operator class if you want queries
involving ordinary <, <=, >, or >= comparisons to use an index. Such
queries cannot use the xxx_pattern_ops operator classes."
--
Ticket URL: <https://code.djangoproject.com/ticket/24082#comment:6>
Comment (by djbug):
Replying to [comment:6 marfire]:
> However, this is the
[https://docs.djangoproject.com/en/1.7/ref/models/fields/#unique currently
documented behavior] ("Note that when `unique` is `True` you don’t need to
specify `db_index`, because `unique` implies the creation of an index"),
so there would be a backwards-compatibility problem in changing it.
>
> The fact that you can't prevent the creation of the extra index with an
explicit `db_index=False` does feel like a bug, though, and something that
could be fixed without a deprecation cycle.
Agreed about backward compatibility & having `db_index=False` as a fix.
> As for the suggestion to use a single `text_pattern_ops` unique index to
serve double duty, the [http://www.postgresql.org/docs/9.3/static/indexes-
opclass.html PostgreSQL documentation] makes it clear that that will not
suffice to support all of Django's lookup types: "Note that you should
also create an index with the default operator class if you want queries
involving ordinary <, <=, >, or >= comparisons to use an index. Such
queries cannot use the xxx_pattern_ops operator classes."
A single `text_pattern_ops` index works for '''both''' `LIKE` & `=`
queries. But it won't work for other operators like `< <= etc.`. But, I
feel that, it would be an overkill to create 2 indexes by default. It
should be left to the user to decide if they want double index or a single
one with whichever operator class they want.
--
Ticket URL: <https://code.djangoproject.com/ticket/24082#comment:7>
* cc: shaib (added)
* stage: Unreviewed => Accepted
Comment:
First of all, +1 everything [comment:6 marfire] said.
Replying to [comment:5 djbug]:
> > Imposing uniqueness on text columns is almost always nonsense.
>
> For PostgreSQL, `CharField` maps to `varchar(n)` and `TextField` maps to
`text`. According to the official docs, there's no performance difference
between the two. Most DBA recommend `text` over `varchar(n)`
([http://stackoverflow.com/a/20334221/781695 Ref 1])
([http://blog.endpoint.com/2012/11/postgres-alter-column-problems-and.html
Ref 2]).
Well, I rephrase: s/text column/`TextField`/. `TextField`s are for big
blobs of text, not identifiers.
> That said, the above discussion about unique & index is true for
`CharField` too as far as Django is concerned.
>
> > the issue here really is whether `unique=True` implies
`db_index=True`. This holds for most field types, so I think most people
would expect it to hold for text fields as well.
>
> Could you explain why should `unique=True` imply creation of an index
explicitly ?
It generally does not. An index is created explicitly only to support
`LIKE` operations (as you noted, #12234).
>
> > You should get the behavior you expect, probably, with `unique=True,
db_index=False`. If this indeed gives you what you want, I'd resolve this
ticket by adding an admonition about this in the `TextField`
documentation.
>
> I've tried your suggestion. For a `TextFIeld`, the following two
generate the same constraint & index:
> {{{
> TextField( unique=True, db_index=False)
> }}}
> and
> {{{
> TextField( unique=True)
> }}}
>
I'm accepting based on this (and documentation should be added as well).
--
Ticket URL: <https://code.djangoproject.com/ticket/24082#comment:8>
Comment (by shaib):
Replying to [comment:7 djbug]:
>
> A single `text_pattern_ops` index works for '''both''' `LIKE` & `=`
queries. But it won't work for other operators like `< <= etc.`. But, I
feel that, it would be an overkill to create 2 indexes by default. It
should be left to the user to decide if they want double index or a single
one with whichever operator class they want.
I disagree; it is much easier to notice a superfluous index, or an
explicitly-removed one (once `db_index=False` works), than one that is
partly-missing (will work for anything except LIKE). If you're at the
point where a superfluous index is hurting your performance, you're likely
to be paying much closer attention to your database anyway; a missing
index will hurt you much earlier.
--
Ticket URL: <https://code.djangoproject.com/ticket/24082#comment:9>
Comment (by djbug):
I'll add one more thing to the above discussion:
Based on my tests, the `text_pattern_ops` does not use index scan with `<
<= > >=` operators. So the point about requiring 2 index is questionable.
It's possible that I'm doing something wrong and would love to have
someone verify this before coming to a conclusion.
According to PG docs, the following operators are supported by
`text_pattern_ops`:
{{{
"btree";"text_pattern_ops";"=(text,text)"
"btree";"text_pattern_ops";"~<~(text,text)"
"btree";"text_pattern_ops";"~<=~(text,text)"
"btree";"text_pattern_ops";"~>=~(text,text)"
"btree";"text_pattern_ops";"~>~(text,text)"
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/24082#comment:10>
* cc: charettes (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/24082#comment:11>
* keywords: => db-indexes
--
Ticket URL: <https://code.djangoproject.com/ticket/24082#comment:12>
* cc: emorley@… (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/24082#comment:13>
* owner: nobody => felixxm
* cc: felixxm (added)
* status: new => assigned
--
Ticket URL: <https://code.djangoproject.com/ticket/24082#comment:14>
* owner: felixxm => (none)
* status: assigned => new
--
Ticket URL: <https://code.djangoproject.com/ticket/24082#comment:15>
* cc: Phil Krylov (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/24082#comment:16>
* cc: Semyon Pupkov (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/24082#comment:17>
* cc: Can Sarıgöl (added)
Comment:
Hi, I'm working on a [https://github.com/django/django/pull/11264 PR]
which is related to this issue. I pushed a
[https://github.com/django/django/pull/11264/commits/fdb80f295b7e2b6d011c59bb03954693df0592c2
commit] that contains a solution effort.
--
Ticket URL: <https://code.djangoproject.com/ticket/24082#comment:18>
* cc: Peter Thomassen (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/24082#comment:19>