Am I doing something wrong? Joins of virtual tables

26 views
Skip to first unread message

b...@wbpsystems.com

unread,
Dec 15, 2011, 10:16:23 AM12/15/11
to python...@googlegroups.com
I'm using python and sqlite to do some data analysis.  Everything is working fine except this sqlite command:

SELECT stud.wsu_id AS id FROM (SELECT wsu_id AS wsu_id, COUNT(*) AS c FROM registration GROUP BY wsu_id) AS reg CROSS JOIN (SELECT wsu_id AS wsu_id, COUNT(*) AS c FROM student GROUP BY wsu_id) AS stud WHERE reg.wsu_id=stud.wsu_id AND reg.c!=stud.c GROUP BY stud.wsu_id

I've executed the sub-queries individually and they work fine; however when I run the above query, it produces a single result (when their should be a great deal more).  Any ideas and what may be happening?

Roger Binns

unread,
Dec 15, 2011, 12:38:24 PM12/15/11
to python...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 15/12/11 07:16, b...@wbpsystems.com wrote:
> I've executed the sub-queries individually and they work fine; however
> when I run the above query, it produces a single result (when their
> should be a great deal more). Any ideas and what may be happening?

Your subject mentions virtual tables. Have you tried doing this with
regular tables to eliminate the virtual tables as a factor?

Roger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iEUEARECAAYFAk7qMJAACgkQmOOfHg372QTdHACY/Xo8TRIFRtLYMXPVDTri58xZ
EgCeIG3JUM/7mL0rFfUCHVEk7LL2MRc=
=qwyY
-----END PGP SIGNATURE-----

Roger Binns

unread,
Dec 15, 2011, 1:54:32 PM12/15/11
to python...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 15/12/11 07:16, b...@wbpsystems.com wrote:
> Any ideas and what may be happening?

Another common cause is values not being what you expected. Use typeof()
to verify types and hex() to verify that strings don't have trailing
spaces or other hidden characters:

SELECT typeof(foo), hex(foo) FROM bar WHERE ....

Roger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAk7qQmgACgkQmOOfHg372QRjUwCdHwuk6XuDHT59Cea8GmdIMPkA
rzUAoNq/DOLSh/AfXkpJJquUyWbRSjeq
=A3SW
-----END PGP SIGNATURE-----

Ben Smith

unread,
Dec 15, 2011, 5:39:26 PM12/15/11
to python...@googlegroups.com
Yeah; so you can't tell this from what I posted, but all the fields/data are ints.  By virtual tables I just mean that I'm using subqueries to create two selections then treating them as tables.  This is for data analysis, not a permanent anything, in fact the database itself only exists in memory while python is being executed.

Adrian Klaver

unread,
Dec 15, 2011, 2:47:42 PM12/15/11
to python...@googlegroups.com, Roger Binns
On 12/15/2011 10:54 AM, Roger Binns wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 15/12/11 07:16, b...@wbpsystems.com wrote:
>> Any ideas and what may be happening?
>
> Another common cause is values not being what you expected. Use typeof()
> to verify types and hex() to verify that strings don't have trailing
> spaces or other hidden characters:
>
> SELECT typeof(foo), hex(foo) FROM bar WHERE ....
>
> Roger

What happens if you change:
..WHERE
reg.wsu_id=stud.wsu_id AND reg.c!=stud.c ..

to:

ON
reg.wsu_id=stud.wsu_id WHERE reg.c!=stud.c ..

?
--
Adrian Klaver
adrian...@gmail.com

Adrian Klaver

unread,
Dec 15, 2011, 3:43:39 PM12/15/11
to python...@googlegroups.com, Roger Binns
On 12/15/2011 10:54 AM, Roger Binns wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 15/12/11 07:16, b...@wbpsystems.com wrote:
>> Any ideas and what may be happening?
>
> Another common cause is values not being what you expected. Use typeof()
> to verify types and hex() to verify that strings don't have trailing
> spaces or other hidden characters:
>
> SELECT typeof(foo), hex(foo) FROM bar WHERE ....
>
> Roger
>

Reformatted the original query to follow the flow a little easier. I
think the GROUP BY at the bottom is meant to be an ORDER BY. Not sure it
makes a difference in this case, but I point it just in case.

SELECT
stud.wsu_id AS id
FROM
(SELECT
wsu_id AS wsu_id, COUNT(*) AS c
FROM
registration
GROUP BY
wsu_id) AS reg
CROSS JOIN
(SELECT
wsu_id AS wsu_id, COUNT(*) AS c
FROM
student
GROUP BY
wsu_id) AS stud
WHERE
reg.wsu_id=stud.wsu_id
AND
reg.c!=stud.c
GROUP BY
stud.wsu_id


--
Adrian Klaver
adrian...@gmail.com

Roger Binns

unread,
Dec 15, 2011, 8:07:52 PM12/15/11
to python...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 15/12/11 14:39, Ben Smith wrote:
> Yeah; so you can't tell this from what I posted, but all the
> fields/data are ints.

And every time this happens on the sqlite-users mailing list we hear the
developer make that claim, finally agree to use typeof/hex and find out it
isn't true ...

> By virtual tables

SQLite already has a feature named virtual tables and nothing to do with
your usage:

http://www.sqlite.org/vtab.html

Since you aren't using those it would seem your issue has nothing to do
with Python. You'll find the sqlite-users mailing list a better place to
get answers. Be warned that the first thing they'll want verification on
is types and then a way to reproduce what you are seeing.

Roger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAk7qmecACgkQmOOfHg372QR5OACffoKW2LTXN6NG2p0XWUbK3Gug
ySMAoLr/x+b8tjh/YV1J2hDYnza6WUE7
=H2/N
-----END PGP SIGNATURE-----

Ben Smith

unread,
Dec 16, 2011, 11:48:12 AM12/16/11
to python...@googlegroups.com
I've modified the code to:

SELECT stud.wsu_id AS id, typeof(stud.wsu_id), typeof(reg.wsu_id) FROM (SELECT wsu_id AS wsu_id, COUNT(*) AS c FROM registration GROUP BY wsu_id) AS reg CROSS JOIN (SELECT wsu_id AS wsu_id, COUNT(*) AS c FROM student GROUP BY wsu_id) AS stud ON reg.wsu_id=stud.wsu_id AND reg.c!=stud.c GROUP BY stud.wsu_id

The results as you can see show that they are integers (see attached image).  I've erased part of the output because it is a student number.

Ben
Capture.png

Ben Smith

unread,
Dec 16, 2011, 11:49:59 AM12/16/11
to python...@googlegroups.com, Roger Binns
That was actually my original version.  I modified it when that didn't work.  Anyhow, that produces identical results

Ben Smith

unread,
Dec 16, 2011, 11:51:44 AM12/16/11
to python...@googlegroups.com, Roger Binns
No group by is what I intended.  It may be unnecessary, but I just wanted to make sure there wasn't duplicates.  What this program does is produce a list of anomalies that someone has to then manually check.  So, with that scenario, I don't see a reason to produce duplicates.

Ben Smith

unread,
Dec 16, 2011, 3:16:02 PM12/16/11
to Adrian Klaver, python...@googlegroups.com
Adrian -

I've previously attempted all of those variations.

On Friday, December 16, 2011 at 12:13 PM, Adrian Klaver wrote:

> On Friday, December 16, 2011 8:48:12 am Ben Smith wrote:
> > I've modified the code to:
> >
> > SELECT stud.wsu_id AS id, typeof(stud.wsu_id), typeof(reg.wsu_id) FROM
> > (SELECT wsu_id AS wsu_id, COUNT(*) AS c FROM registration GROUP BY wsu_id)
> > AS reg CROSS JOIN (SELECT wsu_id AS wsu_id, COUNT(*) AS c FROM student
> > GROUP BY wsu_id) AS stud ON reg.wsu_id=stud.wsu_id AND reg.c!=stud.c GROUP
> > BY stud.wsu_id
>
>
>

> Actually the above is not quite what I suggested. See below for complete query,
> note the substitution of WHERE for AND:
>
> SELECT
> stud.wsu_id AS id


> FROM
> (SELECT
> wsu_id AS wsu_id, COUNT(*) AS c
> FROM
> registration
> GROUP BY
> wsu_id) AS reg
> CROSS JOIN
> (SELECT
> wsu_id AS wsu_id, COUNT(*) AS c
> FROM
> student
> GROUP BY
> wsu_id) AS stud
> ON
> reg.wsu_id=stud.wsu_id

> WHERE


> reg.c!=stud.c
> GROUP BY
> stud.wsu_id
>
> >
> > The results as you can see show that they are integers (see attached
> > image). I've erased part of the output because it is a student number.
>
>
>
>
>
> >
> > Ben
>

> --
> Adrian Klaver
> adrian...@gmail.com (mailto:adrian...@gmail.com)

Adrian Klaver

unread,
Dec 16, 2011, 3:13:21 PM12/16/11
to python...@googlegroups.com, Ben Smith
On Friday, December 16, 2011 8:48:12 am Ben Smith wrote:
> I've modified the code to:
>
> SELECT stud.wsu_id AS id, typeof(stud.wsu_id), typeof(reg.wsu_id) FROM
> (SELECT wsu_id AS wsu_id, COUNT(*) AS c FROM registration GROUP BY wsu_id)
> AS reg CROSS JOIN (SELECT wsu_id AS wsu_id, COUNT(*) AS c FROM student
> GROUP BY wsu_id) AS stud ON reg.wsu_id=stud.wsu_id AND reg.c!=stud.c GROUP
> BY stud.wsu_id

Actually the above is not quite what I suggested. See below for complete query,

note the substitution of WHERE for AND:

SELECT
stud.wsu_id AS id


FROM
(SELECT
wsu_id AS wsu_id, COUNT(*) AS c
FROM
registration
GROUP BY
wsu_id) AS reg
CROSS JOIN
(SELECT
wsu_id AS wsu_id, COUNT(*) AS c
FROM
student
GROUP BY
wsu_id) AS stud
ON
reg.wsu_id=stud.wsu_id

WHERE
reg.c!=stud.c
GROUP BY
stud.wsu_id

>
> The results as you can see show that they are integers (see attached
> image). I've erased part of the output because it is a student number.

>
> Ben

--
Adrian Klaver
adrian...@gmail.com

Adrian Klaver

unread,
Dec 16, 2011, 3:25:05 PM12/16/11
to Ben Smith, python...@googlegroups.com
On Friday, December 16, 2011 12:16:02 pm Ben Smith wrote:
> Adrian -
>
> I've previously attempted all of those variations.
>

So are you sure you are not getting the right answer. Maybe run the query as:

SELECT
stud.wsu_id AS id, reg.c AS reg_ct, stud.c AS stud_ct


FROM
(SELECT
wsu_id AS wsu_id, COUNT(*) AS c
FROM
registration
GROUP BY
wsu_id) AS reg
CROSS JOIN
(SELECT
wsu_id AS wsu_id, COUNT(*) AS c
FROM
student
GROUP BY
wsu_id) AS stud

WHERE
reg.wsu_id=stud.wsu_id
ORDER BY
stud.wsu_id

Note the ORDER BY. I know you want to eliminate duplicates, but I am trying to
eliminate the possibility that the outer aggregation is causing the issue.

--
Adrian Klaver
adrian...@gmail.com

Ben Smith

unread,
Dec 16, 2011, 5:08:08 PM12/16/11
to Adrian Klaver, python...@googlegroups.com
Ahh,

Yes I did do that and no, I only got 4 results (but when I ran the subqueries individually and matched the results using vlookup in excel I got the proper number).

Ben

On Friday, December 16, 2011 at 2:05 PM, Adrian Klaver wrote:

> On Friday, December 16, 2011 1:57:48 pm Ben Smith wrote:
> > Hi Adrian -
> >
> > Yes, I've tried it without the group by to see if that was the
> > problem. Yes I'm quite sure I'm not getting the right answer. I
> > should have about 50 results, I'm getting 1. Your sql isn't actually
> > what I'm trying to do (the condition that the "c" column in each table
> > be not equal to each other is important).
>
>
>
> I should have been clearer. The point was to run without the condition and
> eyeball the results to see if you are getting the raw data you think you are
> getting.
>
> >
> > Ben


>
> > > causing the issue.
> > >
> > > --
> > > Adrian Klaver

> > > adrian...@gmail.com (mailto:adrian...@gmail.com)

Ben Smith

unread,
Dec 16, 2011, 4:57:48 PM12/16/11
to Adrian Klaver, python...@googlegroups.com
Hi Adrian -

Yes, I've tried it without the group by to see if that was the
problem. Yes I'm quite sure I'm not getting the right answer. I
should have about 50 results, I'm getting 1. Your sql isn't actually
what I'm trying to do (the condition that the "c" column in each table
be not equal to each other is important).

Ben

Adrian Klaver

unread,
Dec 16, 2011, 5:05:33 PM12/16/11
to Ben Smith, python...@googlegroups.com
On Friday, December 16, 2011 1:57:48 pm Ben Smith wrote:
> Hi Adrian -
>
> Yes, I've tried it without the group by to see if that was the
> problem. Yes I'm quite sure I'm not getting the right answer. I
> should have about 50 results, I'm getting 1. Your sql isn't actually
> what I'm trying to do (the condition that the "c" column in each table
> be not equal to each other is important).

I should have been clearer. The point was to run without the condition and

eyeball the results to see if you are getting the raw data you think you are
getting.

>
> Ben
>

> > causing the issue.
> >
> > --
> > Adrian Klaver
> > adrian...@gmail.com

--
Adrian Klaver
adrian...@gmail.com

Reply all
Reply to author
Forward
0 new messages