I fail to see how "it has consequences for existing code", as Russell
put it.
I did discuss this with Bergström, and we came to the conclusion that
it won't actually break much code, if any.
The only case which it could break, AFAICT, is if custom SQL manages
to depend on the absence of indexes. I guess that could break code
that violates indexing constraints, which are applied later, maybe? I
don't know.
- Ludvig
It has consequences because you are proposing to change the order in
which indexes and custom SQL are applied. Any code that depends on the
existing order will be affected.
> I did discuss this with Bergström, and we came to the conclusion that
> it won't actually break much code, if any.
If it has the potential to break _any_ code, it is an unacceptable
change. The Django core developers have stated that we will maintain
backwards compatibility of the v1.0 interface, so any change with even
the _potential_ to affect backwards compatibility will need to be
checked very carefully.
However, the larger point is that you don't get to make that decision
about whether a change is acceptable or not. You can discuss a change
and make a recommendation, but ultimately the decision needs to be
made by a core developer. The Design Decision Required triage state
exists for exactly this reason.
> The only case which it could break, AFAICT, is if custom SQL manages
> to depend on the absence of indexes. I guess that could break code
> that violates indexing constraints, which are applied later, maybe? I
> don't know.
This is what needs to be confirmed, and given that it is a non-trivial
change, your decision needs to be confirmed by a core developer. I am
willing to be convinced, but you will need to prove to me that there
is no backwards compatibility problem here. The discussion in this
thread so far hasn't done that.
To answer the original question (why is it done in this order) -
mostly historical reasons. Originally, Django didn't have fixtures, so
initial data were all loaded from initial SQL. It's generally faster
to insert data and then add indicies, hence the order. When Django
added fixtures, the order wasn't changed.
Yours,
Russ Magee %-)
Just want to note that *I'm* not proposing any changes.
>> I did discuss this with Bergström, and we came to the conclusion that
>> it won't actually break much code, if any.
>
> If it has the potential to break _any_ code, it is an unacceptable
> change. The Django core developers have stated that we will maintain
> backwards compatibility of the v1.0 interface, so any change with even
> the _potential_ to affect backwards compatibility will need to be
> checked very carefully.
>
> However, the larger point is that you don't get to make that decision
> about whether a change is acceptable or not. You can discuss a change
> and make a recommendation, but ultimately the decision needs to be
> made by a core developer. The Design Decision Required triage state
> exists for exactly this reason.
I don't think it's "the larger point." I misunderstood the triaging
system, and incorrectly changed it. Bennett reverted. I apologized.
End of that.
What I meant by "ready for checking" was that the patch applies
cleanly, runs, and does what it is intended to do. And obviously that
was mistake, and again: I realize this now.
>> The only case which it could break, AFAICT, is if custom SQL manages
>> to depend on the absence of indexes. I guess that could break code
>> that violates indexing constraints, which are applied later, maybe? I
>> don't know.
>
> This is what needs to be confirmed, and given that it is a non-trivial
> change, your decision needs to be confirmed by a core developer. I am
> willing to be convinced, but you will need to prove to me that there
> is no backwards compatibility problem here. The discussion in this
> thread so far hasn't done that.
I realize that my opinions and decisions play a very small role in the
development of Django, but I try to express them so that core
developers may read them, and either think "what a retard", or "fair
point."
I guess the question really is: is this custom SQL execution order
change really the right fix for the issue? I can't say I'm convinced
that it is, given your next paragraph. I would suggest the need for
signaling that a custom SQL file should execute post-indices.
One solution, which is entirely backwards-compatible, would be to say
that "if you want your custom SQL to run after indices creation, name
files '*.post.sql,'" or something like that. That has its own issues,
but you get the general idea.
> To answer the original question (why is it done in this order) -
> mostly historical reasons. Originally, Django didn't have fixtures, so
> initial data were all loaded from initial SQL. It's generally faster
> to insert data and then add indicies, hence the order. When Django
> added fixtures, the order wasn't changed.
I see, so the custom SQL machinery grew out of being a crude version
of fixtures then.
I think the cause of confusion here is that Bergström and I both were
expecting it to be for more than loading data -- I want to create a
view in it (hence my interest), and he wants to drop some excessive
indices.
After pondering some more, I realized the docs actually say "initial
SQL data", and so I'm not sure if this change is actually a good idea
or not.
- Ludvig
So noted :-)
> I don't think it's "the larger point." I misunderstood the triaging
> system, and incorrectly changed it. Bennett reverted. I apologized.
> End of that.
No problems.
> I realize that my opinions and decisions play a very small role in the
> development of Django, but I try to express them so that core
> developers may read them, and either think "what a retard", or "fair
> point."
This is exactly what we want (and need) to happen. We (the core devs)
rely upon community members such as yourself to have these
discussions, taking some of the load off of us so we can concentrate
on having smaller discussions where many of the details have already
been sorted out, making decisions and committing code. In this case,
there was a misunderstanding about the best way to communicate the
result of your discussions, but we've resolved that problem.
> One solution, which is entirely backwards-compatible, would be to say
> that "if you want your custom SQL to run after indices creation, name
> files '*.post.sql,'" or something like that. That has its own issues,
> but you get the general idea.
This was my immediate thought, and yes, it would be backwards
compatible. My immediate counter-reaction is that you would require
several configuration points (post table creation, post index
creation, post data insertion), which means a lot of
difficult-to-explain configuration points.
>> To answer the original question (why is it done in this order) -
>> mostly historical reasons. Originally, Django didn't have fixtures, so
>> initial data were all loaded from initial SQL. It's generally faster
>> to insert data and then add indicies, hence the order. When Django
>> added fixtures, the order wasn't changed.
>
> I see, so the custom SQL machinery grew out of being a crude version
> of fixtures then.
Correct.
> I think the cause of confusion here is that Bergström and I both were
> expecting it to be for more than loading data -- I want to create a
> view in it (hence my interest), and he wants to drop some excessive
> indices.
To be clear - I fully acknowledge your use case. There are many
occasions where you need to invoke custom SQL, and the decision to
execute that SQL pre or post index installation is significant. I
would very much like to be able to support this use case - we just
need to find the right way to do it.
> After pondering some more, I realized the docs actually say "initial
> SQL data", and so I'm not sure if this change is actually a good idea
> or not.
There shouldn't be anywhere talking about 'initial SQL data' - I
thought I purged all of those references. If you find any such
references, they should be referring to "custom SQL", or similar
phrasing.
Yours,
Russ Magee %-)
As I noted in my message to Ludvig, any reference to initial SQL
should be removed (and I thought they already had been). SQL hooks
_can_ be used for inserting data, but that's not the preferred option.
> As I stated in the ticket (and Russell above), databases generally
> want to throw indexes at the bottom of the equation, which now both
> custom SQL and fixtures is at. Changing this most likely will break
> some poor guy's code (which _I'm_ perfectly OK with, since they in
> that case most likely should update it anyway) but understand the Way
> of Django and fully respect that. This patch is the result.
> Just for reference, this is what I think should be done for an optimal
> workflow:
> * create table
> * create m2m
> * run custom data
> * run fixtures
> * run indexes
> * emit "database done" signal
I agree that if we were starting with a clean slate, this would be
close to the right way to do things (and, if isn't that far from the
way things are done right now). My only issue would be with moving the
signal to the end of the process. A "no, really, I'm finished" signal
was requested some time ago:
http://code.djangoproject.com/ticket/7561
Interestingly, removing superfluous indicies was one of the use cases
provided in support of this change. At the time it was rejected
because the specific proposal was backwards compatible (and was made
during the v1.0 push). The post_syncdb hook is commonly used for data
insertion, and the current location is the right place for that, but
having a second signal is a definite possibility.
Moving the fixture loading to pre-indexing is also a possibility - I
can't think of a reason this would cause backwards-incompatibility
problems (and for large fixtures, it should be marginally faster).
Yours,
Russ Magee %-)
Hm, yes... It's true it would get pretty convoluted quickly. The best
approach I can think of is making it flexible, something like (just
throwing this out there):
class M(models.Model):
...
class Meta:
custom_sql = {"drop_indices": ("post-indices", "pre-
inserts")}
Or something like that, and you could build a dependency tree out of
that, running the SQL at certain points in the code. It has other
issues (like being overkill), and it's not really a proposal I like.
Instead, I think what you say in your next e-mail is a better idea:
provide the developers with the tools (like signals) to make these
complicated usecases doable, with a dash of Python code. Then Django
could provide a simple function for saying "execute SQL in file X."
def drop_indices(sender=None, **kwargs):
load_sql(sender, "drop_indices.sql")
post_something.connect(drop_indices)
(Though this usecase would probably do better with inlining the SQL as
I doubt it would span that many lines.)
However, as noted, dropping indices isn't *that* uncommon. I think
it'd be a good idea to support dropping indices with the ease of
adding a file - or even specifying that in the model. (Like
`ForeignKey(no_index=True)`.)
>> I think the cause of confusion here is that Bergström and I both were
>> expecting it to be for more than loading data -- I want to create a
>> view in it (hence my interest), and he wants to drop some excessive
>> indices.
>
> To be clear - I fully acknowledge your use case. There are many
> occasions where you need to invoke custom SQL, and the decision to
> execute that SQL pre or post index installation is significant. I
> would very much like to be able to support this use case - we just
> need to find the right way to do it.
>
>> After pondering some more, I realized the docs actually say "initial
>> SQL data", and so I'm not sure if this change is actually a good idea
>> or not.
>
> There shouldn't be anywhere talking about 'initial SQL data' - I
> thought I purged all of those references. If you find any such
> references, they should be referring to "custom SQL", or similar
> phrasing.
As linked earlier:
http://docs.djangoproject.com/en/dev/howto/initial-data/#initial-sql
This whole subsection (and the section intro) speaks of it in terms of
initial SQL, and is within that context (initial data).
A quick googling reveals two files which still have this reference:
* http://docs.djangoproject.com/en/dev/howto/initial-data/
* http://docs.djangoproject.com/en/dev/ref/settings/
- Ludvig