Using an INSERT...SELECT... type of query

877 views
Skip to first unread message

Brian P Armstrong

unread,
Mar 26, 2008, 5:36:21 PM3/26/08
to Django users
Hi,

I was working on moving an old site from php to django and found that
I was using INSERT...SELECT... queries in a couple places. The point
of these queries is to take a group of rows using the SELECT subquery
and insert them all at once without having to iterate over them in
your code. Ideally, you save some time since the database never has
to send you anything, and all you have to send it is a single
INSERT...SELECT... statement.

I couldn't find anything like this in django's database API, but I was
told to check qs-rf. I still couldn't find anything like this, so I
decided to hack my own out into qs-rf. Since this is a major change,
I didn't want to submit it as a patch. I'm also not completely sure
it doesn't already exist somehow.

The syntax on it works like this:

http://dpaste.com/41473/

You create 1 or more QuerySets and then pass them and some keywords
into batch_create on the class you want to add to. You use tuples to
specify class and attribute names if you want the new rows to be taken
from the QuerySets you passed in. Otherwise, you can specify literals
if you would like one of the fields of all of the new rows to be the
same.

Here's the diff. I'd be glad to continue working on it (error
checking, documenting, and more input types) if anyone thinks it might
be useful.

http://dpaste.com/41475/

Thanks,
Brian Armstrong

Malcolm Tredinnick

unread,
Mar 26, 2008, 5:40:20 PM3/26/08
to django...@googlegroups.com

On Wed, 2008-03-26 at 14:36 -0700, Brian P Armstrong wrote:
> Hi,
>
> I was working on moving an old site from php to django and found that
> I was using INSERT...SELECT... queries in a couple places. The point
> of these queries is to take a group of rows using the SELECT subquery
> and insert them all at once without having to iterate over them in
> your code. Ideally, you save some time since the database never has
> to send you anything, and all you have to send it is a single
> INSERT...SELECT... statement.

What is the use-case for something like this in Django? The only times
we do SQL inserts are when we're saving a model instance or updating a
many-to-many relation. I can't see either of those really needing this
type of functionality, but I'd be interested to hear the use-case you
have in mind.

Malcolm

--
Everything is _not_ based on faith... take my word for it.
http://www.pointy-stick.com/blog/

Brian Armstrong

unread,
Mar 26, 2008, 6:29:36 PM3/26/08
to Django users
On Mar 26, 4:40 pm, Malcolm Tredinnick <malc...@pointy-stick.com>
wrote:
> What is the use-case for something like this in Django? The only times
> we do SQL inserts are when we're saving a model instance or updating a
> many-to-many relation. I can't see either of those really needing this
> type of functionality, but I'd be interested to hear the use-case you
> have in mind.
>
> Malcolm
>
> --
> Everything is _not_ based on faith... take my word for it.http://www.pointy-stick.com/blog/

Sure. Let's say you've got a forum with a search function. One way
to implement this is to catalog all the words in each post when the
post is made. You've got two tables for this function, one that has a
unique listing of words, 1 per row, and another table that maps each
of those words to each post. For example, you might have a post with
the text "Hi my name is Bob"

Then you would have these tables if that post has a post_id of 5

table_words
word_id word
1 'hi'
2 'python'
3 'name'
4 'my'
5 'monty'
6 'is'
7 'Bob'

table_post_words
post_id word_id
5 1
5 7
5 6
5 3
5 4

So if a user searches for the word "name", then they will get back all
the posts that have an entry in table_post_words with a word_id of 3.

After updating table_words to have any new unique entries, you then
need to create an entry in table_post_words for each searchable word
that appears. One way to do this is to use a SELECT to get all these
entries and then iterate over that list to create them one at a time.
Another way is to take the same SELECT statement and use it as a
subquery to an INSERT, so that your db does all this for you. It
might be preferred to do it this way if you don't want the overhead of
passing all the entries back and forth between db and server.

If I'm looking at this the wrong way, please let me know. :)

Brian

Brian Armstrong

unread,
Mar 26, 2008, 6:33:45 PM3/26/08
to Django users
Oh, wait, duh, that's what the Many-to-Many relationship is.

Sorry about that. I feel rather dumb now.

Brian Armstrong

unread,
Mar 26, 2008, 9:42:22 PM3/26/08
to Django users
Sorry for the spam here, but I came across something going through the
code switching process that I think may actually call for more than
the standard MtM Field.

What happens if you need to add additional fields to your join table?

For example, let's say you had an employees table and a projects
table, and Employees to Projects is MtM. A new project gets added and
the decision is that everyone from department 'x' is going to have
this project added to their join table. We also want to keep track of
some extra data on the join table like "Hours worked" that tell us how
many hours each employee has worked on a particular project. Is there
a way to do this without creating a custom join table and iterating
over the employees one by one?

I realize you could just as easily select all these employees and then
iterate over them, but it feels like an ugly solution to me, I guess.
Maybe it's just a stylistic concern.

Thanks again and sorry for the noise,
Brian
Reply all
Reply to author
Forward
0 new messages