initial SQL thinks it's a format string

18 views
Skip to first unread message

d...@simon.net.nz

unread,
Aug 31, 2006, 3:54:24 AM8/31/06
to Django developers
Hello Djangonauts,

I've found an issue with initial data and SQLite - if any of the fields
in the initial data has a "%" in it, the import fails. Pysqlite is
apparently processing the query for string formatting args.

Now, I'm *not* getting this error on MySQLdb, but haven't tested
postgres/ado_mssql etc. From [1] I think postgres is also affected, and
would be interested if someone could check this out.

It seems to be quite easy to fix - just do a search and replace for %
to %% around line 352 in django/core/management.py, in
get_sql_initial_data_for_model(). However, I'm not sure what the best
practice for doing RDBMS specific "tweaking/fixing" is - should this be
implemented in something like db/backends/<db>/base.py?

Nor am I sure how far you want to go in fixing what appears to be a
pysqlite implementation, uh, wart.

Here's a test which should show this up:


/format_string/test/models.py
---------------------------------------

from django.db import models

class Simple(models.Model):
name = models.CharField(maxlength = 50)

__test__ = {'API_TESTS':"""
# Regression test for #
# Check that the format string fix stores the correct number of %'s


>>> Simple.objects.get(pk=1).name
'a%a'
>>> Simple.objects.get(pk=2).name
'a%%a'
>>> Simple.objects.get(pk=3).name
'a%%%a'
>>> Simple.objects.get(pk=4).name
'a%%a%%a'
>>> Simple.objects.get(pk=5).name
'a%%%a%%%a'
"""
}


/format_string/test/sql/simple.sql
--------------------------------------------
INSERT INTO initial_sql_formatstring_regress_simple (name) VALUES
('a%a');
INSERT INTO initial_sql_formatstring_regress_simple (name) VALUES
('a%%a');
INSERT INTO initial_sql_formatstring_regress_simple (name) VALUES
('a%%%a');
INSERT INTO initial_sql_formatstring_regress_simple (name) VALUES
('a%%a%%a');
INSERT INTO initial_sql_formatstring_regress_simple (name) VALUES
('a%%%a%%%a');

---------------------------------------------

So - as it is, this will fail on SQLite at the .get()'s as the initial
sql import will choke when it can't work out what %a is.

Cheers,
Simon G.

[1]
http://groups.google.com/group/django-developers/browse_thread/thread/14d1bdf73085f02b/ae5dfd2b30194d4a%23ae5dfd2b30194d4a

Malcolm Tredinnick

unread,
Sep 15, 2006, 9:35:04 PM9/15/06
to django-d...@googlegroups.com
Hi Simon,

On Thu, 2006-08-31 at 00:54 -0700, si...@simon.net.nz wrote:
> Hello Djangonauts,
>
> I've found an issue with initial data and SQLite - if any of the fields
> in the initial data has a "%" in it, the import fails. Pysqlite is
> apparently processing the query for string formatting args.
>
> Now, I'm *not* getting this error on MySQLdb, but haven't tested
> postgres/ado_mssql etc. From [1] I think postgres is also affected, and
> would be interested if someone could check this out.
>
> It seems to be quite easy to fix - just do a search and replace for %
> to %% around line 352 in django/core/management.py, in
> get_sql_initial_data_for_model(). However, I'm not sure what the best
> practice for doing RDBMS specific "tweaking/fixing" is - should this be
> implemented in something like db/backends/<db>/base.py?

Probably, yes. We aren't doing that for initial SQL data at the moment,
but it sounds like we are going to need hooks for backend-specific
stuff.

> Nor am I sure how far you want to go in fixing what appears to be a
> pysqlite implementation, uh, wart.

It should be fixed. SQLite is a supported database and very useful in
many cases.

A small meta-argument about the real problem here...

In general, initial SQL data is a real pain to manage and if anybody has
a good idea on a better approach. I would like to avoid building a full
SQL parser in Python for obvious reasons. And yet, the range of things
people can put in initial SQL is pretty broad. There are currently open
tickets for inserting stored procedures and, I think, inserting a small
C program into a field (can't remember if that was sent to me off-list
or is in Trac right now).

I have a memory that there was a problem with just shelling out to the
command-line backend (probably because it was going to be fiddly on
Windows or something), but that may have been an hallucination. Probably
worth looking at again.

Certainly, the initial SQL is a never-ending chase to fix small bugs
with string quoting, SQL portability, Python SQL wrapper portability,
newline handling and problems that we haven't even thought of yet.
Basically, our user-base is very evil... er... creative and each fix has
a half-life of about 15 minutes before somebody comes up with another
broken example. Still, we should fix things as we find them.


>
> Here's a test which should show this up:

[... snip...]

Did this make it into a ticket in the end? I'm unlikely to remember it's
on the mailing list, but if it's in Trac it won't get lost so easily.

Regards,
Malcolm

James Bennett

unread,
Sep 15, 2006, 10:22:12 PM9/15/06
to django-d...@googlegroups.com
On 9/15/06, Malcolm Tredinnick <mal...@pointy-stick.com> wrote:
> In general, initial SQL data is a real pain to manage and if anybody has
> a good idea on a better approach. I would like to avoid building a full
> SQL parser in Python for obvious reasons.

Other people already *have* built SQL parsers in Python, with all the
quirks of specific databases, so... why not either

A) Do away with initial SQL entirely, and move to officially
recommending people set up a 'management.py' file in their apps which
hooks the post_syncdb signal to do whatever's needed -- up to and
including grabbing the database cursor and executing statements
directly, or
B) Keep initial SQL around, but encourage more use of the
management/post_syncdb routine, and possibly limit the range of
statement types which can be executed in an initial SQL file (say, by
limiting to just INSERT statements; since the management function for
viewing these statements is called "sqlinitialdata", it makes sense
that its purpose would be to populate data into a table immediately
after creation).

--
"May the forces of evil become confused on the way to your house."
-- George Carlin

Malcolm Tredinnick

unread,
Sep 15, 2006, 10:38:47 PM9/15/06
to django-d...@googlegroups.com
On Fri, 2006-09-15 at 21:22 -0500, James Bennett wrote:
> On 9/15/06, Malcolm Tredinnick <mal...@pointy-stick.com> wrote:
> > In general, initial SQL data is a real pain to manage and if anybody has
> > a good idea on a better approach. I would like to avoid building a full
> > SQL parser in Python for obvious reasons.
>
> Other people already *have* built SQL parsers in Python, with all the
> quirks of specific databases, so... why not either
>
> A) Do away with initial SQL entirely, and move to officially
> recommending people set up a 'management.py' file in their apps which
> hooks the post_syncdb signal to do whatever's needed -- up to and
> including grabbing the database cursor and executing statements
> directly, or

My gut feeling is that this is restrictive. I know you're a big fan of
the "Django not for everything" school of thought (me, too, in some
sense and I respect your position, but my exclusion range is much
smaller). I want to see how far we can go to being as accommodating as
possible. I like the idea of being able to work smoothly with legacy
databases and other tools and often the easiest way to do this is to be
able to initialise from a partial SQL dump. Not having to do an SQL dump
and then rewrite into Python.

I'll acknowledge that's a bit of a spit-and-polish mentality, though,
and it's mostly why I haven't worried too much about this up until know
beacuse (a) there are always work-arounds and (b) there are much higher
priority things.

> B) Keep initial SQL around, but encourage more use of the
> management/post_syncdb routine, and possibly limit the range of
> statement types which can be executed in an initial SQL file (say, by
> limiting to just INSERT statements; since the management function for
> viewing these statements is called "sqlinitialdata", it makes sense
> that its purpose would be to populate data into a table immediately
> after creation).

That's what we do currently -- only support certain types of inserts,
etc (it's just that the particular set we support today is a little
amorphous, depending upon which tickets have been fixed). This somewhat
strikes me as a "we'll fix this problem in the documentation" solution,
which feels a little awkward (and consequently we end up doing it even
worse, but neither documenting nor fixing the root cause ... nobody said
the universe was fair).

For now, I was floating this as a "if somebody wants something to think
about and work on" problem; it's not going to interrupt my flow very
much, personally, at the moment.

Regards,
Malcolm

d...@simon.net.nz

unread,
Sep 15, 2006, 11:32:27 PM9/15/06
to Django developers
Hi Malcolm & James,

I do think the SQL initial data is going to be a problem in a number of
ways:

1) database API's (as per this thread)

2) line breaks ( c.f. http://code.djangoproject.com/ticket/2729 )

3) database specific issues. For example - the sql file attached to the
ticket above (2729) was exported with one of the mysql tools (prob.
mysqldump), which magically adds backticks around the table names.
SQLite does not like this, and from memory postgres won't either. I'm
not sure what other glitches will be like this, but hopefully this is
the only one.

So - does the application developer try to manage one export for each
database Django supports? Does django have to try to catch and handle
that? Neither of these are terribly appealing.

Suggestion: why not allow .py files inside the sql/ dir? These could be
standard db-api commands, and run preferentially / prior to the .sql
files. IMHO this would be nicer than hacking around with manage.py, and
avoids all these problems. However, it does have (minor) security
concerns - the .py file could contain arbitrary code, but this could be
filtered. If this was added, then it'd be nice to allow exporting from
tables in Django's db format too.

This brings up another option, having .py scripts activate on
installation would be very nice for things like oh, installing cron
jobs, creating directories, etc. Maybe an /install/ directory which
contains post-install scripts that get activated...

--Simon

Malcolm Tredinnick

unread,
Sep 16, 2006, 12:12:00 AM9/16/06
to django-d...@googlegroups.com
On Fri, 2006-09-15 at 20:32 -0700, si...@simon.net.nz wrote:
> Hi Malcolm & James,
>
> I do think the SQL initial data is going to be a problem in a number of
> ways:
>
> 1) database API's (as per this thread)
>
> 2) line breaks ( c.f. http://code.djangoproject.com/ticket/2729 )
>
> 3) database specific issues. For example - the sql file attached to the
> ticket above (2729) was exported with one of the mysql tools (prob.
> mysqldump), which magically adds backticks around the table names.

Which makes it no longer SQL.

> SQLite does not like this, and from memory postgres won't either. I'm
> not sure what other glitches will be like this, but hopefully this is
> the only one.

Let's not conflate multiple problems here. Importing not-SQL and trying
to treat it as SQL is not a problem we are going to solve with initial
SQL import. If you want to have portable input data, you have to write
proper SQL. If you don't mind being tied to one particular vendor then
you can use extensions like backticks, but it would be crazy for Django
to try and "normalise" data at the initial SQL import phase to support
cross-vendor portability like this.

Getting the import of SQL correct is hard enough.

> So - does the application developer try to manage one export for each
> database Django supports? Does django have to try to catch and handle
> that? Neither of these are terribly appealing.

If the application developer wants their data to work on multiple
databases, they should ensure it is dumped in portable SQL format. That
is usually possible. If not and more intelligence is required, they can
drop back to the Python language and examine settings.DATABASE_* to work
out what to do.

> Suggestion: why not allow .py files inside the sql/ dir? These could be
> standard db-api commands, and run preferentially / prior to the .sql
> files. IMHO this would be nicer than hacking around with manage.py, and
> avoids all these problems. However, it does have (minor) security
> concerns - the .py file could contain arbitrary code, but this could be
> filtered. If this was added, then it'd be nice to allow exporting from
> tables in Django's db format too.

We already allow arbitrary Python hooks:
http://www.bright-green.com/blog/2006_07_12/initialising_application_data_.html . So we have raw SQL and Python hooks. Adding more and more options to do those things in slightly different ways is not something I'd be in favour of.

Cheers,
Malcolm

Russell Keith-Magee

unread,
Sep 16, 2006, 12:20:57 AM9/16/06
to django-d...@googlegroups.com
On 9/16/06, James Bennett <ubern...@gmail.com> wrote:
>
> On 9/15/06, Malcolm Tredinnick <mal...@pointy-stick.com> wrote:
> > In general, initial SQL data is a real pain to manage and if anybody has
> > a good idea on a better approach. I would like to avoid building a full
> > SQL parser in Python for obvious reasons.
>
> Other people already *have* built SQL parsers in Python, with all the
> quirks of specific databases, so... why not either
> ...

Here's a third option: Introduce the use of fixtures, which allow
other ways to put initial data into the database.

The test system requires the ability to install (and remove) data into
the database to act as test fixtures. Obviously, SQL isn't the ideal
format for this, beacuse of differences between databases. However, we
now have a deserialization framework which provides for JSON, XML and
Python deserializers. My intention is to provide for an 'install
fixture' api, with an entry point in django-admin, where the initial
data can come from any of these deserializable formats.

The 'initial SQL' feature is needed to support DB specific
initialization that Django doesn't handle (minor table alterations,
for example), so I would expect that this will continue to be
available, just cease to be an attractive mechanism for initial data.

I'm still working on some of the details, but I should have some
specific proposals soon (or at least, as soon as I get a few moments
to scratch myself).

Yours,
Russ Magee %-)

Reply all
Reply to author
Forward
0 new messages