Intersection of rows

Dave S

unread,

Feb 6, 2018, 4:57:29 PM2/6/18

to web2py-users

I have an application (used for comparing routes) with 2 tables relevant to this question. (Paraphrasing; code is at home,
where the router jammed.)

db.define_table('run',
                       Field('when', 'date'),
                       Field('description', 'string))

db.define_table('segment',
                       Field('waypoint', 'string'),
                        Field('timeofday', 'time'),
                       Field('odometer', 'decimal'),
                       Field('partof', 'reference run'))

I wanted to find the runs that had X as a waypoint and also had Y as a waypoint, so I looked at
<URL:http://web2py.com/books/default/chapter/29/06/the-database-abstraction-layer#Combining-rows>
and wrote:

def fromto():
    # not shown -- X and Y from request.vars
    q1 = db.segment.waypoint = X
    q2 = db.segment.waypoint = Y
    runs1 = db(run.id == segment.partof & q1).select(...,distinct=True)
    runs2 = db(run.id == segment.partof & q2).select(...,distinct=True)
    theruns = runs1 & runs2
    print len(runs1) + len(runs2) + len(theruns)
    return ("runs" = theruns)

For the first waypoints I tried, the print produced counts of 111 10 121
which is clearly NOT the intersection. (I have verified that theruns includes runs where Y is not included,
and the page shows the Y runs at the end of the X runs (that is, appended).)

If I go for union ('|' in line 7), the counts are 111 10 112.
(1 Y run did not go through X).

I add the "distinct=True" after the first attempt, in case multiple appearances of a row was confusing things. It made no apparent difference.

I think this is on 2.14.6; I may be able to test newer web2py's tonight. Am I doing something wrong, or was there a hiccup in intersection?

Dave S
/dps

Dave S

unread,

Feb 7, 2018, 2:54:23 AM2/7/18

to web...@googlegroups.com

On Tuesday, February 6, 2018 at 1:57:29 PM UTC-8, Dave S wrote:

[...]

I think this is on 2.14.6; I may be able to test newer web2py's tonight. Am I doing something wrong, or was there a hiccup in intersection?

Downloading 2.16.1 and blindly copying my 2.14.6 copy to the new environment, I get a ticket/traceback for

ValueError: unsupported format character ' ' (0x20) at index 2

Lucky for me, I had seen this with 2.15.0b1, and changed

Field("distance", 'decimal(5, 1)', default=None, writable=False),

to

Field("distance", 'decimal(5,1)', default=None, writable=False),

And the results with 2.16.1 are same as with 2.14.6: union works, intersection fails by appending.

(Oh, I'm using sqlite3, this being a development machine.)

For the Field tweak, the traceback wasn't particularly helpful ... the model isn't mentioned, but that's where the error was.

Traceback (most recent call last):
  File "C:\Users\Dave\web2py_win\web2py_win -2.16\web2py\gluon\restricted.py", line 219, in restricted
    exec(ccode, environment)
  File "C:/Users/Dave/web2py_win/web2py_win -2.16/web2py/applications/shortcuts/controllers/default.py", line 275, in <module>
  File "C:\Users\Dave\web2py_win\web2py_win -2.16\web2py\gluon\globals.py", line 419, in <lambda>
    self._caller = lambda f: f()
  File "C:/Users/Dave/web2py_win/web2py_win -2.16/web2py/applications/shortcuts/controllers/default.py", line 198, in fromto
    runs1 = db((db.run.id == db.segment.partof) & segq1).select("run.id", "run.description", "run.distance","run.duration", orderby = db.run.id|db.run.duration, distinct=True)
  File "C:\Users\Dave\web2py_win\web2py_win -2.16\web2py\gluon\packages\dal\pydal\objects.py", line 2250, in select
    return adapter.select(self.query, fields, attributes)
  File "C:\Users\Dave\web2py_win\web2py_win -2.16\web2py\gluon\packages\dal\pydal\adapters\sqlite.py", line 82, in select
    return super(SQLite, self).select(query, fields, attributes)
  File "C:\Users\Dave\web2py_win\web2py_win -2.16\web2py\gluon\packages\dal\pydal\adapters\base.py", line 762, in select
    return self._select_aux(sql, fields, attributes, colnames)
  File "C:\Users\Dave\web2py_win\web2py_win -2.16\web2py\gluon\packages\dal\pydal\adapters\base.py", line 741, in _select_aux
    return processor(rows, fields, colnames, cacheable=cacheable)
  File "C:\Users\Dave\web2py_win\web2py_win -2.16\web2py\gluon\packages\dal\pydal\adapters\base.py", line 305, in parse
    for row in rows
  File "C:\Users\Dave\web2py_win\web2py_win -2.16\web2py\gluon\packages\dal\pydal\adapters\base.py", line 231, in _parse
    value = self.parse_value(value, fit, ft, blob_decode)
  File "C:\Users\Dave\web2py_win\web2py_win -2.16\web2py\gluon\packages\dal\pydal\adapters\base.py", line 198, in parse_value
    return self.parser.parse(value, field_itype, field_type)
  File "C:\Users\Dave\web2py_win\web2py_win -2.16\web2py\gluon\packages\dal\pydal\parsers\__init__.py", line 101, in parse
    return self.registered[field_itype](value, field_type)
  File "C:\Users\Dave\web2py_win\web2py_win -2.16\web2py\gluon\packages\dal\pydal\parsers\__init__.py", line 76, in __call__
    return self.call(value, field_type)
  File "C:\Users\Dave\web2py_win\web2py_win -2.16\web2py\gluon\packages\dal\pydal\parsers\__init__.py", line 70, in _call_with_extras
    return self.f(self.parser, value, **extras)
  File "C:\Users\Dave\web2py_win\web2py_win -2.16\web2py\gluon\packages\dal\pydal\parsers\sqlite.py", line 19, in _decimal
    value = ('%.' + decimals + 'f') % value

/dps

Anthony

unread,

Feb 8, 2018, 9:23:59 PM2/8/18

to web2py-users

Looks like the book is wrong -- according to the code, rows1 & rows2 simply combines the records from each Rows object -- it does not find the intersection.

Anyway, given your code, it looks like we would not expect any results in the intersection of the two Rows objects, as the conditions of each query are mutually exclusive (i.e., "waypoint" either equals X or Y). Of course, if X == Y, then the two sets of Rows are identical, but I assume that isn't generally the case.

If you do really have non-mutually-exclusive results, why not just generate the intersection with the SQL query (i.e., query1 & query2)?

Also, note, you queries should use "==", not a single "=".

Anthony

unread,

Feb 8, 2018, 9:27:44 PM2/8/18

to web2py-users

On Thursday, February 8, 2018 at 9:23:59 PM UTC-5, Anthony wrote:

Looks like the book is wrong -- according to the code, rows1 & rows2 simply combines the records from each Rows object -- it does not find the intersection.

Actually, the book was originally correct, and at some point it was changed to incorrectly state that "&" was for intersection (it was always for a union without removing duplicates). See https://github.com/web2py/web2py-book/commit/1984ff40c73c6da17ff268090c8526e3375314b5#diff-5c07237eda807f68bdcaebb6e01fc58bL1534.

Anthony

unread,

Feb 8, 2018, 9:36:53 PM2/8/18

to web2py-users

Fixed: https://github.com/web2py/web2py-book/pull/373

Dave S

unread,

Feb 9, 2018, 2:48:52 PM2/9/18

to web2py-users

On Thursday, February 8, 2018 at 6:23:59 PM UTC-8, Anthony wrote:

Looks like the book is wrong -- according to the code, rows1 & rows2 simply combines the records from each Rows object -- it does not find the intersection.

Okay, then my results match that!

Anyway, given your code, it looks like we would not expect any results in the intersection of the two Rows objects, as the conditions of each query are mutually exclusive (i.e., "waypoint" either equals X or Y). Of course, if X == Y, then the two sets of Rows are identical, but I assume that isn't generally the case.

A segment row will only have one waypoint, but multiple segment rows may have the same parent run. I chose to make the run table very simple (and in particular, not to have a list of children), but the segment table points upward to the run.

If you do really have non-mutually-exclusive results, why not just generate the intersection with the SQL query (i.e., query1 & query2)?

I'm not very sophisticated at joining in.

Also, note, you queries should use "==", not a single "=".

erk

Anthony

/dps

Anthony

unread,

Feb 9, 2018, 8:26:35 PM2/9/18

to web2py-users

Anyway, given your code, it looks like we would not expect any results in the intersection of the two Rows objects, as the conditions of each query are mutually exclusive (i.e., "waypoint" either equals X or Y). Of course, if X == Y, then the two sets of Rows are identical, but I assume that isn't generally the case.

A segment row will only have one waypoint, but multiple segment rows may have the same parent run. I chose to make the run table very simple (and in particular, not to have a list of children), but the segment table points upward to the run.

Your code doesn't show which columns are selected -- I suppose you are saying the "waypoint" column is not selected.

Anthony

Dave S

unread,

Feb 10, 2018, 3:56:51 AM2/10/18

to web2py-users

Well, here's the code actually being used, now that my home dev machine is able to reach the interwebs again.

    segq1 = db.segment.waypoint.contains(request.args[0])
    segq2 = db.segment.waypoint.contains(request.args[1])


    runs1 = db((db.run.id == db.segment.partof) & segq1).select("run.id", "run.description", "run.distance","run.duration", orderby = db.run.id|db.run.duration, distinct=True)


    runs2 = db((db.run.id == db.segment.partof) & segq2).select("run.id", "run.description", "run.distance","run.duration", orderby = db.run.id|db.run.duration, distinct=True)

At this point in the code, I'm only interested in the waypoints as a link to the parent runs. The goal is to make a list of runs that have both waypoints as children. (That is, runs that have at least 2 segments, and one segment contains waypoint X and one segment contains waypoint Y.)

That list will be displayed showing the run.id/s (as a link), the distance, and the duration. Clicking on an id will show the details of that run by selecting ALL the segments.

Sample list of runs:

run.id run.description run.distance run.duration
17 Work to 73 mrg via PP 7.9 9.0
19 Work to Home by AVP 19.5 21.0
20 Work to 73 mrg by La Paz 7.7 12.0
21 Work to Home by La Paz 19.2 24.0

Sample for details of one run (sorry about the wrap):

segment run waypoint via time odometer begins ends comments
42 2017-05-29, Work to 73 mrg via PP Office AVP 12:20:00 35.0 True False
43 2017-05-29, Work to 73 mrg via PP Pacific Park Pacific Park 12:21:00 36.5 False False
44 2017-05-29, Work to 73 mrg via PP Moulton Oso Parkway 12:24:00 38.4 False False
45 2017-05-29, Work to 73 mrg via PP Cabot I-5 12:26:00 40.4 False False
46 2017-05-29, Work to 73 mrg via PP 73S merge I-5 12:29:00 42.9 False True

(I suppose I don't really need to display the IDs as well as the run "names", but it's useful during development)

/dps

Anthony

unread,

Feb 10, 2018, 9:05:31 AM2/10/18

to web2py-users

segq1 = db.segment.waypoint.contains(request.args[0]) segq2 = db.segment.waypoint.contains(request.args[1]) runs1 = db((db.run.id == db.segment.partof) & segq1).select("run.id", "run.description", "run.distance","run.duration", orderby = db.run.id|db.run.duration, distinct=True) runs2 = db((db.run.id == db.segment.partof) & segq2).select("run.id", "run.description", "run.distance","run.duration", orderby = db.run.id|db.run.duration, distinct=True)

How about:

query = (db.run.id == db.segment.partof) & segq1 & segq2
theruns = db(query).select("run.id", "run.description", "run.distance", "run.duration",
                           orderby=db.run.id|db.run.duration, distinct=True)

Anthony

Dave S

unread,

Feb 13, 2018, 4:09:02 AM2/13/18

to web2py-users

That's yielding 0 rows from the run table (for my chosen args).. It looks to me like it is requiring the segment row to match both arguments. If I use

a/c/f/Cabot/Cabot in the url, I get the 10 rows for runs that have Cabot. But a/c/f/office/Cabot give 0 rows, even though at least run.id == 20 (see the example) should show up.

/dps

Anthony

unread,

Feb 13, 2018, 10:37:22 AM2/13/18

to web...@googlegroups.com

On Tuesday, February 13, 2018 at 4:09:02 AM UTC-5, Dave S wrote:

That's yielding 0 rows from the run table (for my chosen args).. It looks to me like it is requiring the segment row to match both arguments. If I use
a/c/f/Cabot/Cabot in the url, I get the 10 rows for runs that have Cabot. But a/c/f/office/Cabot give 0 rows, even though at least run.id == 20 (see the example) should show up.

OK, now I see how the data are structured. I think this should work:

seg1_run_ids = db(db.segment.waypoint.contains(request.args[0]))._select(db.segment.partof)
seg2_run_ids = db(db.segment.waypoint.contains(request.args[1]))._select(db.segment.partof)
runs = db(db.run.id.belongs(seg1_run_ids) & db.run.id.belongs(seg2_run_ids)).select()

seg1_run_ids and seg2_run_ids are sub-selects that return the run ids of segments that match each of the waypoints. The final query includes runs whose ids are in both sets of ids from the sub-selects. It will produce SQL like:

SELECT run.id, run.description, run.distance, run.duration
FROM run
WHERE run.id IN (SELECT segment.partof FROM segment WHERE segment.waypoint LIKE '%[value 1]%')
AND run.id IN (SELECT segment.partof FROM segment WHERE segment.waypoint LIKE '%[value 2]%')

This can also be done with a double join with aliases. I think it would be something like this:

seg1q = db.segment.with_alias('seg1').waypoint.contains(request.args(0))
seg2q = db.segment.with_alias('seg2').waypoint.contains(request.args(1))
runs = db(seg1q & seg2q).select(db.run.ALL,
                                join=[db.seg1.on(db.seg1.partof == db.run.id),
                                      db.seg2.on(db.seg2.partof == db.run.id)])

The above will produce SQL like:

SELECT run.id, run.description, run.distance, run.duration
FROM run
JOIN segment AS seg1 ON seg1.partof = run.id
JOIN segment AS seg2 ON seg2.partof = run.id
WHERE seg1.waypoint like "%[value 1]%"
AND seg2.waypoint like "%[value 2]%"

Anthony

Dave S

unread,

Feb 14, 2018, 4:02:52 PM2/14/18

to web...@googlegroups.com

On Tuesday, February 13, 2018 at 7:37:22 AM UTC-8, Anthony wrote:

On Tuesday, February 13, 2018 at 4:09:02 AM UTC-5, Dave S wrote:
That's yielding 0 rows from the run table (for my chosen args).. It looks to me like it is requiring the segment row to match both arguments. If I use
a/c/f/Cabot/Cabot in the url, I get the 10 rows for runs that have Cabot. But a/c/f/office/Cabot give 0 rows, even though at least run.id == 20 (see the example) should show up.

OK, now I see how the data are structured. I think this should work:

Thank you! I meant to mention yesterday that this is looking good. I'll add some comments shortly.

/dps

Dave S

unread,

Feb 15, 2018, 4:43:58 AM2/15/18

to web...@googlegroups.com

Both methods work with my main test data (the source of 111 X and 10 Y), yielding 9 runs that match both. The alias method, however, let a duplicate row sneak in. I need to test more pairs, but I think you (Anthony) solved my problem.

BTW, can one do a groupby inappadmin/select?

/dps

Anthony

unread,

Feb 15, 2018, 1:44:46 PM2/15/18

to web2py-users

Both methods work with my main test data (the source of 111 X and 10 Y), yielding 9 runs that match both. The alias method, however, let a duplicate row sneak in. I need to test more pairs, but I think you (Anthony) solved my problem.

If the alias method is producing duplicates, I'm guessing the run in question has the same waypoint repeated more than once. Just set distinct=True to remove the duplicates.

BTW, can one do a groupby inappadmin/select?

I don't think so.

Anthony

Dave S

unread,

Feb 15, 2018, 5:53:54 PM2/15/18

to web2py-users

On Thursday, February 15, 2018 at 10:44:46 AM UTC-8, Anthony wrote:

Both methods work with my main test data (the source of 111 X and 10 Y), yielding 9 runs that match both. The alias method, however, let a duplicate row sneak in. I need to test more pairs, but I think you (Anthony) solved my problem.

If the alias method is producing duplicates, I'm guessing the run in question has the same waypoint repeated more than once. Just set distinct=True to remove the duplicates.

Ah, I'm using "contains", so (from memory) I have a waypoint of "Cabot", and then a waypoint of "I-5@Cabot", both of which match. Using "distinct" is easier than tightening the condition, and I don't think runs with just "I-5@Cabot" will be a problem, but I might have some other waypoints where I could trick myself.

BTW, can one do a groupby inappadmin/select?

I don't think so.

I suspected that.

Anthony

/dps "thanks again"

Reply all

Reply to author

Forward