Intersection of rows

134 views
Skip to first unread message

Dave S

unread,
Feb 6, 2018, 4:57:29 PM2/6/18
to web2py-users
I have an application (used for comparing routes) with 2 tables relevant to this question.  (Paraphrasing; code is at home,
where the router jammed.)

db.define_table('run',
                       
Field('when', 'date'),
                       
Field('description', 'string))

db.define_table('
segment',
                       Field('
waypoint', 'string'),
                        Field('
timeofday', 'time'),
                       Field('
odometer', 'decimal'),
                       Field('
partof', 'reference run'))

I wanted to find the runs that had X as a waypoint and also had Y as a waypoint, so I looked at
<URL:http://web2py.com/books/default/chapter/29/06/the-database-abstraction-layer#Combining-rows>
and wrote:

def fromto():
   
# not shown -- X and Y from request.vars
    q1
= db.segment.waypoint = X
    q2
= db.segment.waypoint = Y
    runs1
= db(run.id == segment.partof & q1).select(...,distinct=True)
    runs2
= db(run.id == segment.partof & q2).select(...,distinct=True)
    theruns
= runs1 & runs2
   
print len(runs1) + len(runs2) + len(theruns)
   
return ("runs" = theruns)

For the first waypoints I tried, the print produced counts of 111 10  121
which is clearly NOT the intersection.  (I have verified that theruns includes  runs where Y is not included,
and the page shows the Y runs at the end of the X runs (that is, appended).)

If I go for union ('|' in line 7), the counts are 111 10 112.
(1 Y run did not go through X).

I add the "distinct=True" after the first attempt, in case multiple appearances of a row was confusing things.   It made no apparent difference.

I think this is on 2.14.6; I may be able to test newer web2py's tonight.  Am I doing something wrong, or was there a hiccup in intersection?

Dave S
/dps

Dave S

unread,
Feb 7, 2018, 2:54:23 AM2/7/18
to web...@googlegroups.com


On Tuesday, February 6, 2018 at 1:57:29 PM UTC-8, Dave S wrote:
[...] 
I think this is on 2.14.6; I may be able to test newer web2py's tonight.  Am I doing something wrong, or was there a hiccup in intersection?


Downloading 2.16.1 and blindly copying my 2.14.6 copy to the new environment, I get a ticket/traceback for 

ValueErrorunsupported format character ' ' (0x20at index 2

Lucky for me, I had seen this with 2.15.0b1, and changed 
Field("distance", 'decimal(5, 1)', default=None, writable=False),

to
Field("distance", 'decimal(5,1)', default=None, writable=False),

And the results with 2.16.1 are same as with 2.14.6:  union works, intersection fails by appending.

(Oh, I'm using sqlite3, this being a development machine.)

For the Field tweak, the traceback wasn't particularly helpful ... the model isn't mentioned, but that's where the error was.

Traceback (most recent call last):
File "C:\Users\Dave\web2py_win\web2py_win -2.16\web2py\gluon\restricted.py", line 219, in restricted
exec(ccode, environment)
File "C:/Users/Dave/web2py_win/web2py_win -2.16/web2py/applications/shortcuts/controllers/default.py", line 275, in <module>
File "C:\Users\Dave\web2py_win\web2py_win -2.16\web2py\gluon\globals.py", line 419, in <lambda>
self._caller = lambda f: f()
File "C:/Users/Dave/web2py_win/web2py_win -2.16/web2py/applications/shortcuts/controllers/default.py", line 198, in fromto
runs1 = db((db.run.id == db.segment.partof) & segq1).select("run.id", "run.description", "run.distance","run.duration", orderby = db.run.id|db.run.duration, distinct=True)
File "C:\Users\Dave\web2py_win\web2py_win -2.16\web2py\gluon\packages\dal\pydal\objects.py", line 2250, in select
return adapter.select(self.query, fields, attributes)
File "C:\Users\Dave\web2py_win\web2py_win -2.16\web2py\gluon\packages\dal\pydal\adapters\sqlite.py", line 82, in select
return super(SQLite, self).select(query, fields, attributes)
File "C:\Users\Dave\web2py_win\web2py_win -2.16\web2py\gluon\packages\dal\pydal\adapters\base.py", line 762, in select
return self._select_aux(sql, fields, attributes, colnames)
File "C:\Users\Dave\web2py_win\web2py_win -2.16\web2py\gluon\packages\dal\pydal\adapters\base.py", line 741, in _select_aux
return processor(rows, fields, colnames, cacheable=cacheable)
File "C:\Users\Dave\web2py_win\web2py_win -2.16\web2py\gluon\packages\dal\pydal\adapters\base.py", line 305, in parse
for row in rows
File "C:\Users\Dave\web2py_win\web2py_win -2.16\web2py\gluon\packages\dal\pydal\adapters\base.py", line 231, in _parse
value = self.parse_value(value, fit, ft, blob_decode)
File "C:\Users\Dave\web2py_win\web2py_win -2.16\web2py\gluon\packages\dal\pydal\adapters\base.py", line 198, in parse_value
return self.parser.parse(value, field_itype, field_type)
File "C:\Users\Dave\web2py_win\web2py_win -2.16\web2py\gluon\packages\dal\pydal\parsers\__init__.py", line 101, in parse
return self.registered[field_itype](value, field_type)
File "C:\Users\Dave\web2py_win\web2py_win -2.16\web2py\gluon\packages\dal\pydal\parsers\__init__.py", line 76, in __call__
return self.call(value, field_type)
File "C:\Users\Dave\web2py_win\web2py_win -2.16\web2py\gluon\packages\dal\pydal\parsers\__init__.py", line 70, in _call_with_extras
return self.f(self.parser, value, **extras)
File "C:\Users\Dave\web2py_win\web2py_win -2.16\web2py\gluon\packages\dal\pydal\parsers\sqlite.py", line 19, in _decimal
value = ('%.' + decimals + 'f') % value

/dps
 

Anthony

unread,
Feb 8, 2018, 9:23:59 PM2/8/18
to web2py-users
Looks like the book is wrong -- according to the code, rows1 & rows2 simply combines the records from each Rows object -- it does not find the intersection.

Anyway, given your code, it looks like we would not expect any results in the intersection of the two Rows objects, as the conditions of each query are mutually exclusive (i.e., "waypoint" either equals X or Y). Of course, if X == Y, then the two sets of Rows are identical, but I assume that isn't generally the case.

If you do really have non-mutually-exclusive results, why not just generate the intersection with the SQL query (i.e., query1 & query2)?

Also, note, you queries should use "==", not a single "=".

Anthony

Anthony

unread,
Feb 8, 2018, 9:27:44 PM2/8/18
to web2py-users
On Thursday, February 8, 2018 at 9:23:59 PM UTC-5, Anthony wrote:
Looks like the book is wrong -- according to the code, rows1 & rows2 simply combines the records from each Rows object -- it does not find the intersection.

Actually, the book was originally correct, and at some point it was changed to incorrectly state that "&" was for intersection (it was always for a union without removing duplicates). See https://github.com/web2py/web2py-book/commit/1984ff40c73c6da17ff268090c8526e3375314b5#diff-5c07237eda807f68bdcaebb6e01fc58bL1534.

Anthony

Anthony

unread,
Feb 8, 2018, 9:36:53 PM2/8/18
to web2py-users

Dave S

unread,
Feb 9, 2018, 2:48:52 PM2/9/18
to web2py-users


On Thursday, February 8, 2018 at 6:23:59 PM UTC-8, Anthony wrote:
Looks like the book is wrong -- according to the code, rows1 & rows2 simply combines the records from each Rows object -- it does not find the intersection.

Okay, then my results match that!
 
Anyway, given your code, it looks like we would not expect any results in the intersection of the two Rows objects, as the conditions of each query are mutually exclusive (i.e., "waypoint" either equals X or Y). Of course, if X == Y, then the two sets of Rows are identical, but I assume that isn't generally the case.


A segment row will only have one waypoint, but multiple segment rows may have the same parent run.  I chose to make the run table very simple (and in particular, not to have a list of children), but the segment table points upward to the run.

If you do really have non-mutually-exclusive results, why not just generate the intersection with the SQL query (i.e., query1 & query2)?


I'm not very sophisticated at joining in.
 
Also, note, you queries should use "==", not a single "=".


erk
 
Anthony


/dps
 

Anthony

unread,
Feb 9, 2018, 8:26:35 PM2/9/18
to web2py-users
Anyway, given your code, it looks like we would not expect any results in the intersection of the two Rows objects, as the conditions of each query are mutually exclusive (i.e., "waypoint" either equals X or Y). Of course, if X == Y, then the two sets of Rows are identical, but I assume that isn't generally the case.


A segment row will only have one waypoint, but multiple segment rows may have the same parent run.  I chose to make the run table very simple (and in particular, not to have a list of children), but the segment table points upward to the run.

Your code doesn't show which columns are selected -- I suppose you are saying the "waypoint" column is not selected.

Anthony

Dave S

unread,
Feb 10, 2018, 3:56:51 AM2/10/18
to web2py-users
Well, here's the code actually being used, now that my home dev machine is  able to reach the interwebs again.

    segq1 = db.segment.waypoint.contains(request.args[0])
    segq2
= db.segment.waypoint.contains(request.args[1])

    runs1
= db((db.run.id == db.segment.partof) & segq1).select("run.id", "run.description", "run.distance","run.duration", orderby = db.run.id|db.run.duration, distinct=True)

    runs2
= db((db.run.id == db.segment.partof) & segq2).select("run.id", "run.description", "run.distance","run.duration", orderby = db.run.id|db.run.duration, distinct=True)


At this point in the code, I'm only interested in the waypoints as a link to the parent runs.  The goal is to make a list of runs that have both waypoints as children. (That is, runs that have at least 2 segments, and one segment contains waypoint X and one segment contains waypoint Y.)

That list will be displayed showing the run.id/s (as a link), the distance, and the duration.  Clicking on an id will show the details of that run by selecting ALL the segments.

Sample list of runs:

run.id        run.description        run.distance  run.duration
17        Work to 73 mrg via PP        7.9             9.0
19        Work to Home by AVP         19.5            21.0
20        Work to 73 mrg by La Paz     7.7            12.0
21        Work to Home by La Paz      19.2            24.0

Sample for details of one run  (sorry about the wrap):
segment        run                                waypoint        via            time        odometer        begins        ends        comments
42        2017-05-29, Work to 73 mrg via PP        Office         AVP          12:20:00        35.0        True        False        
43        2017-05-29, Work to 73 mrg via PP        Pacific Park   Pacific Park 12:21:00        36.5        False        False        
44        2017-05-29, Work to 73 mrg via PP        Moulton        Oso Parkway  12:24:00        38.4        False        False        
45        2017-05-29, Work to 73 mrg via PP        Cabot          I-5          12:26:00        40.4        False        False        
46        2017-05-29, Work to 73 mrg via PP        73S merge      I-5          12:29:00        42.9        False        True        

(I suppose I don't really need to display the IDs as well as the run "names", but it's useful during development)

/dps

Anthony

unread,
Feb 10, 2018, 9:05:31 AM2/10/18
to web2py-users
    segq1 = db.segment.waypoint.contains(request.args[0])
    segq2
= db.segment.waypoint.contains(request.args[1])
    runs1
= db((db.run.id == db.segment.partof) & segq1).select("run.id", "run.description", "run.distance","run.duration", orderby = db.run.id|db.run.duration, distinct=True)
    runs2
= db((db.run.id == db.segment.partof) & segq2).select("run.id", "run.description", "run.distance","run.duration", orderby = db.run.id|db.run.duration, distinct=True)


How about:

query = (db.run.id == db.segment.partof) & segq1 & segq2
theruns
= db(query).select("run.id", "run.description", "run.distance", "run.duration",
                           
orderby=db.run.id|db.run.duration, distinct=True)

Anthony



Dave S

unread,
Feb 13, 2018, 4:09:02 AM2/13/18
to web2py-users
That's yielding 0 rows from the run table (for my chosen args)..  It looks to me like it is requiring the segment row to match both arguments.  If I use
a/c/f/Cabot/Cabot in the url, I get the 10 rows for runs that have Cabot.  But a/c/f/office/Cabot give 0 rows, even though at least run.id == 20 (see the example) should show up.

/dps

Anthony

unread,
Feb 13, 2018, 10:37:22 AM2/13/18
to web...@googlegroups.com
On Tuesday, February 13, 2018 at 4:09:02 AM UTC-5, Dave S wrote:
That's yielding 0 rows from the run table (for my chosen args)..  It looks to me like it is requiring the segment row to match both arguments.  If I use
a/c/f/Cabot/Cabot in the url, I get the 10 rows for runs that have Cabot.  But a/c/f/office/Cabot give 0 rows, even though at least run.id == 20 (see the example) should show up.

OK, now I see how the data are structured. I think this should work:

seg1_run_ids = db(db.segment.waypoint.contains(request.args[0]))._select(db.segment.partof)
seg2_run_ids
= db(db.segment.waypoint.contains(request.args[1]))._select(db.segment.partof)
runs
= db(db.run.id.belongs(seg1_run_ids) & db.run.id.belongs(seg2_run_ids)).select()

seg1_run_ids and seg2_run_ids are sub-selects that return the run ids of segments that match each of the waypoints. The final query includes runs whose ids are in both sets of ids from the sub-selects. It will produce SQL like:

SELECT run.id, run.description, run.distance, run.duration
FROM run
WHERE run
.id IN (SELECT segment.partof FROM segment WHERE segment.waypoint LIKE '%[value 1]%')
AND run
.id IN (SELECT segment.partof FROM segment WHERE segment.waypoint LIKE '%[value 2]%')

This can also be done with a double join with aliases. I think it would be something like this:

seg1q = db.segment.with_alias('seg1').waypoint.contains(request.args(0))
seg2q
= db.segment.with_alias('seg2').waypoint.contains(request.args(1))
runs
= db(seg1q & seg2q).select(db.run.ALL,
                                join
=[db.seg1.on(db.seg1.partof == db.run.id),
                                      db
.seg2.on(db.seg2.partof == db.run.id)])

The above will produce SQL like:

SELECT run.id, run.description, run.distance, run.duration
FROM run
JOIN segment AS seg1 ON seg1
.partof = run.id
JOIN segment AS seg2 ON seg2
.partof = run.id
WHERE
seg1.waypoint like "%[value 1]%"
AND seg2
.waypoint like "%[value 2]%"

Anthony

Dave S

unread,
Feb 14, 2018, 4:02:52 PM2/14/18
to web...@googlegroups.com
On Tuesday, February 13, 2018 at 7:37:22 AM UTC-8, Anthony wrote:
On Tuesday, February 13, 2018 at 4:09:02 AM UTC-5, Dave S wrote:
That's yielding 0 rows from the run table (for my chosen args)..  It looks to me like it is requiring the segment row to match both arguments.  If I use
a/c/f/Cabot/Cabot in the url, I get the 10 rows for runs that have Cabot.  But a/c/f/office/Cabot give 0 rows, even though at least run.id == 20 (see the example) should show up.

OK, now I see how the data are structured. I think this should work:


Thank you!  I meant to mention yesterday that this is looking good.  I'll add some comments shortly.

/dps
 

Dave S

unread,
Feb 15, 2018, 4:43:58 AM2/15/18
to web...@googlegroups.com
Both methods work with my main test data (the source of 111 X and 10 Y), yielding 9 runs that match both.  The alias method, however, let a duplicate row sneak in.  I need to test more pairs, but I think you (Anthony) solved my problem.

BTW, can one do a groupby inappadmin/select?

/dps

Anthony

unread,
Feb 15, 2018, 1:44:46 PM2/15/18
to web2py-users
Both methods work with my main test data (the source of 111 X and 10 Y), yielding 9 runs that match both.  The alias method, however, let a duplicate row sneak in.  I need to test more pairs, but I think you (Anthony) solved my problem.

If the alias method is producing duplicates, I'm guessing the run in question has the same waypoint repeated more than once. Just set distinct=True to remove the duplicates.
 

BTW, can one do a groupby inappadmin/select?

I don't think so.

Anthony

Dave S

unread,
Feb 15, 2018, 5:53:54 PM2/15/18
to web2py-users


On Thursday, February 15, 2018 at 10:44:46 AM UTC-8, Anthony wrote:
Both methods work with my main test data (the source of 111 X and 10 Y), yielding 9 runs that match both.  The alias method, however, let a duplicate row sneak in.  I need to test more pairs, but I think you (Anthony) solved my problem.

If the alias method is producing duplicates, I'm guessing the run in question has the same waypoint repeated more than once. Just set distinct=True to remove the duplicates.

Ah, I'm using "contains", so (from memory) I have a waypoint of "Cabot", and then a waypoint of "I-5@Cabot", both of which match.  Using "distinct" is easier than tightening the condition, and I don't think runs with just "I-5@Cabot" will be a problem, but I might have some other waypoints where I could trick myself.

 

BTW, can one do a groupby inappadmin/select?

I don't think so.


I suspected that.
 
Anthony

/dps "thanks again"

 
Reply all
Reply to author
Forward
0 new messages