Malcolm Tredinnick's SQL puzzle

0 views
Skip to first unread message

Maciej Bliziński

unread,
Jul 21, 2006, 8:21:46 PM7/21/06
to django...@googlegroups.com
Malcolm asked, how to find the classes that were attended by all of the
students from a given list.
http://www.pointy-stick.com/blog/2006/06/12/sql-puzzle/

Then, he proposed a solution with HAVING clause:
http://www.pointy-stick.com/blog/2006/06/13/sql-puzzle-solution/

I'd like to submit another SQL solution.

The database structure was:

CREATE TABLE Class (
id integer NOT NULL PRIMARY KEY
);

CREATE TABLE Student (
id integer NOT NULL PRIMARY KEY
);

CREATE TABLE Reln_Class_Student (
class_id integer NOT NULL REFERENCES Class(id),
student_id integer NOT NULL REFERENCES Student(id),
PRIMARY KEY (class_id, student_id)
);

Like in Malcolm's example, following example finds the classes that are
attended by both 253 and 289 students.

SELECT id
FROM
Class AS C
INNER JOIN Reln_Class_Student AS s253
ON (C.id = s253.class_id)
INNER JOIN Reln_Class_Student AS s289
ON (C.id = s289.class_id)
WHERE
s253.student_id = 253
AND
s289.student_id = 289
;

In my example, the number of students on the list (here: two) is equal
to number of joins that are to be performed.

Each join must have different alias assigned, hence "s253" and "s289"
aliases for joined tables.

I don't know about how well does it scale with longer students list.
With just two students on the search list, with total of 2000 students,
200 classes and about 70 thousands records in the Reln_Class_Student
table, the query performed as follows:

Nested Loop (cost=0.00..1318.12 rows=16 width=4) (actual time=0.994..11.487 rows=15 loops=1)
-> Nested Loop (cost=0.00..1103.88 rows=39 width=8) (actual time=0.410..10.610 rows=36 loops=1)
-> Index Scan using class_pkey on "class" c (cost=0.00..5.20 rows=200 width=4) (actual time=0.099..3.914 rows=200 loops=1)
-> Index Scan using reln_class_student_pkey on reln_class_student s289 (cost=0.00..5.48 rows=1 width=4) (actual time=0.025..0.026 rows=0 loops=200)
Index Cond: (("outer".id = s289.class_id) AND (s289.student_id = 289))
-> Index Scan using reln_class_student_pkey on reln_class_student s253 (cost=0.00..5.48 rows=1 width=4) (actual time=0.012..0.013 rows=0 loops=36) Index Cond: (("outer".class_id = s253.class_id) AND (s253.student_id = 253))
Total runtime: 11.634 ms

The same query with list of 10 students and 10 joins executed in 5.8ms,
so I guess there's no problem with scalability of my solution. At least
with PostgreSQL.

I can imagine this solution applied to Django with something like:

Group.objects.filter(students__id = 253).filter(students__id = 289)

but currently, it would return zero groups as it would try to find
students with id equal to both 253 and 289 at once. In other words,
Django wouldn't perform two joins. It would add two WHERE statements to
one JOIN.

My knowledge about Django ends here. Maybe Django gurus will think of
a way to make Django perform two (and more) aliased joins with the same
table.

--
Maciej Bliziński <m.bli...@wit.edu.pl>
http://automatthias.wordpress.com

Reply all
Reply to author
Forward
0 new messages