Regression by 10892 for branches/0.12-stable

14 views
Skip to first unread message

Steffen Hoffmann

unread,
Jan 15, 2012, 3:35:02 PM1/15/12
to trac...@googlegroups.com, Odd Simon Simonsen
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

it has been brought to my attention by a request in IRC #trac today,
that a user is experiencing following issue with report 6 in default
configuration on latest stable 0.12:

'My Tickets' is shown _after_ 'Active Tickets'

Before the changeset it was ok. Work-around is to rename groups or use

ORDER BY (COALESCE(owner, '') = '$USER') DESC, ...

instead of

ORDER BY (owner = '$USER') DESC, ...

Hint: The underlying query seem to not get the DESC rule, when the
column is added the new way, as SQL logging reveals ([2] last line).
Well, looks like there is simply no way to attach a 'DESC' to the
__group__ directive, right?

Since I'm not a SQL guru, I start with discussion here rather than
filing a ticket right-away.

Sincerely,

Steffen Hoffmann


[1] http://trac.edgewall.org/changeset/10892
[2] http://pastebin.com/G3criDRQ
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk8TOHMACgkQ31DJeiZFuHerswCgt0vEC5hw/eloK/i3UbNcxnG1
jFAAoOJpFbfqY9Qn/YhpkX7hrDH+KTEp
=W5uA
-----END PGP SIGNATURE-----

osimons

unread,
Jan 15, 2012, 6:17:45 PM1/15/12
to Trac Development
On Jan 15, 9:35 pm, Steffen Hoffmann <hoff...@web.de> wrote:
> Hello,
>
> it has been brought to my attention by a request in IRC #trac today,
> that a user is experiencing following issue with report 6 in default
> configuration on latest stable 0.12:
>
> 'My Tickets' is shown _after_ 'Active Tickets'
>
> Before the changeset it was ok. Work-around is to rename groups or use
>
> ORDER BY (COALESCE(owner, '') = '$USER') DESC, ...
>
> instead of
>
> ORDER BY (owner = '$USER') DESC, ...
>
> Hint: The underlying query seem to not get the DESC rule, when the
> column is added the new way, as SQL logging reveals ([2] last line).
> Well, looks like there is simply no way to attach a 'DESC' to the
> __group__ directive, right?
>
> Since I'm not a SQL guru, I start with discussion here rather than
> filing a ticket right-away.
>
> Sincerely,
>
> Steffen Hoffmann
>
> [1]http://trac.edgewall.org/changeset/10892
> [2]http://pastebin.com/G3criDRQ

Thanks Steffen,

It looks more like the change revealed another limitation of reports,
but in a sense it is also a regression. I'll try to summarize the
issue as I'm currently not quite sure how to solve it:

The issue adressed by r10892 is a problem introduced by report
pagination. Pagination essentially uses the "SELECT ..." report as a
subquery, and then adds a new level on top that does "SELECT * FROM
(<report sql>) LIMIT 100 OFFSET 0" (or whatever the current page and
size are). The problem is that even though the subquery is ordered
correctly, there is no guarantee that the order will be preserved when
the new SELECT grabs rows from it - if the toplevel SELECT wants order
it must explicitly state its own order.

Very often the order is as expected simply due to the efficient
caching nature of databases as we perform several quite similar
queries before the main report query - for instance a query to detect
the column names in use. But, the order is not always correct. For a
regular ordered query it often makes little difference if some rows
may have switched around a little - it can be quite hard to detect for
a normal human. However, with grouping (__group__) this becomes more
obvious as we switch group header each time a new value appears. So
what used to happen was that the same group could appear twice or more
in the report output. Very confusing.

So, the essence of the current situation is this: In
trac.ticket.report.ReportModule.execute_paginated_report() we detect
columns in use, and we detect if one of the columns is named
'__group__'. If so, we make sure it appears first in sort order. What
we cannot currently detect is if __group__ is sorted ASC or DESC by
the report SQL. Default report {8} "Active Tickets, Mine first"
includes "...ORDER BY (COALESCE(owner, '') = %s) DESC..". The SQL
could use and write anything for how it sorts, and all we know from
reading cursor description is that there is a '__group__'. But, at
cursor level we do not know that the grouping in {8} is based on
'owner'. The report could have said "...ORDER BY __group__ DESC..."
and it would be the same. Or it could have dropped ordering by
'__group__' as that would be inserted by the report pagination code
anyway following r10892 - but then without explicit order no DESC flag
could be added. In any report we have no control over how the ordering
is written or what fields it relates to, grouped or not.

No method that I'm aware of can make me detect the ASC/DESC order
specified for grouping - in a way that allows us to easily carry it
onto the toplevel select to maintain the expected order.

So:

1) Should we just ignore it? As I jokingly said on IRC, just rename
report {8} to "Active Tickets, Mine last"? :-)

2) Should we try to parse the report SQL and somehow extract any
underlying order specification?

3) Rewrite report pagination to not use a subquery?

4) Any other good ideas?



:::simon

https://www.coderesort.com
http://trac-hacks.org/wiki/osimons

Christian Boos

unread,
Jan 16, 2012, 4:46:44 PM1/16/12
to trac...@googlegroups.com
On 1/16/2012 12:17 AM, osimons wrote:
> ...

> The issue adressed by r10892 is a problem introduced by report
> pagination. Pagination essentially uses the "SELECT ..." report as a
> subquery, and then adds a new level on top that does "SELECT * FROM
> (<report sql>) LIMIT 100 OFFSET 0" (or whatever the current page and
> size are). The problem is that even though the subquery is ordered
> correctly, there is no guarantee that the order will be preserved when
> the new SELECT grabs rows from it - if the toplevel SELECT wants order
> it must explicitly state its own order.


Well, then it looks like the current way we do the pagination is
flawed. It's strange that we never noticed it so far. I wonder if
we couldn't adopt the approach taken in the custom queries and
simply append LIMIT ... OFFSET ... to the query?

>
> 1) Should we just ignore it? As I jokingly said on IRC, just rename
> report {8} to "Active Tickets, Mine last"? :-)
>

Would be nice if we could fix this for 0.12.3 as apparently
r10892 made the problem more or differently visible.

> 2) Should we try to parse the report SQL and somehow extract any
> underlying order specification?

It seems it could be "enough" to check if the SQL query is ending
with an `ORDER BY`, and in that case simply insert our extra
order columns, if any. But as that clause may contain arbitrary
expressions, we should at least check for an even count of
parentheses between the ORDER BY and the end of the query.

> 3) Rewrite report pagination to not use a subquery?

Yep

> 4) Any other good ideas?

Let's see if we can implement the above...

-- Christian

osimons

unread,
Jan 16, 2012, 5:24:49 PM1/16/12
to Trac Development
On Jan 16, 10:46 pm, Christian Boos <christian.b...@free.fr> wrote:
> On 1/16/2012 12:17 AM, osimons wrote:
>  > ...
>  > The issue adressed by r10892 is a problem introduced by report
>  > pagination. Pagination essentially uses the "SELECT ..." report as a
>  > subquery, and then adds a new level on top that does "SELECT * FROM
>  > (<report sql>) LIMIT 100 OFFSET 0" (or whatever the current page and
>  > size are). The problem is that even though the subquery is ordered
>  > correctly, there is no guarantee that the order will be preserved when
>  > the new SELECT grabs rows from it - if the toplevel SELECT wants order
>  > it must explicitly state its own order.
>
> Well, then it looks like the current way we do the pagination is
> flawed. It's strange that we never noticed it so far. I wonder if
> we couldn't adopt the approach taken in the custom queries and
> simply append LIMIT ... OFFSET ... to the query?
>
>  >
>  > 1) Should we just ignore it? As I jokingly said on IRC, just rename
>  > report {8} to "Active Tickets, Mine last"? :-)
>  >
>
> Would be nice if we could fix this for 0.12.3 as apparently
> r10892 made the problem more or differently visible.

Agree. Or just revert it if we find no good fix now.

>  > 2) Should we try to parse the report SQL and somehow extract any
>  > underlying order specification?
>
> It seems it could be "enough" to check if the SQL query is ending
> with an `ORDER BY`, and in that case simply insert our extra
> order columns, if any. But as that clause may contain arbitrary
> expressions, we should at least check for an even count of
> parentheses between the ORDER BY and the end of the query.

If we can safely extract and reuse the ORDER BY clause of the
underlying query, we have no additional columns to insert - we would
just reuse that at top level.

>  > 3) Rewrite report pagination to not use a subquery?
>
> Yep

Doing it in Python would likely mean making full query each time, but
just return the number of rows (limit + offset) requested by the user.

On the other hand, if we can be sure that the underlying report have
no LIMIT or OFFSET clauses we could in theory just append the clauses
to the main query and use that directly. There is no way of knowing
what SQL reports look like in the wild though. Doing it in SQL would
certainly be preferable.

>  > 4) Any other good ideas?
>
> Let's see if we can implement the above...

Yes, lets give it a shot. IRC?

:::simon

osimons

unread,
Jan 16, 2012, 5:41:00 PM1/16/12
to Trac Development


On Jan 16, 11:24 pm, osimons <oddsim...@gmail.com> wrote:
> On Jan 16, 10:46 pm, Christian Boos <christian.b...@free.fr> wrote:
> > On 1/16/2012 12:17 AM, osimons wrote:
> >  > 2) Should we try to parse the report SQL and somehow extract any
> >  > underlying order specification?
>
> > It seems it could be "enough" to check if the SQL query is ending
> > with an `ORDER BY`, and in that case simply insert our extra
> > order columns, if any. But as that clause may contain arbitrary
> > expressions, we should at least check for an even count of
> > parentheses between the ORDER BY and the end of the query.
>
> If we can safely extract and reuse the ORDER BY clause of the
> underlying query, we have no additional columns to insert - we would
> just reuse that at top level.

I just covered normal report execution when I said we could reuse it
as-is. However, if the user clicks on a column to report the report
manually (once to select a new sort column ASC, doing it twice to
switch it to DESC) we would need to completely replace the ORDER BY
inside the report and replace it with "ORDER BY [__group__, ] <field>
ASC|DESC" (__group__ only used if the report is grouped of course).


:::simon

osimons

unread,
Jan 16, 2012, 5:43:29 PM1/16/12
to Trac Development
Arrgh. Annoying typo:

...if the user clicks on a column to RE-SORT the report manually...


:::simon

osimons

unread,
Jan 17, 2012, 10:26:02 AM1/17/12
to Trac Development


On Jan 16, 11:41 pm, osimons <oddsim...@gmail.com> wrote:
> On Jan 16, 11:24 pm, osimons <oddsim...@gmail.com> wrote:
>
> > On Jan 16, 10:46 pm, Christian Boos <christian.b...@free.fr> wrote:
> > > On 1/16/2012 12:17 AM, osimons wrote:
> > >  > 2) Should we try to parse the report SQL and somehow extract any
> > >  > underlying order specification?
>
> > > It seems it could be "enough" to check if the SQL query is ending
> > > with an `ORDER BY`, and in that case simply insert our extra
> > > order columns, if any. But as that clause may contain arbitrary
> > > expressions, we should at least check for an even count of
> > > parentheses between the ORDER BY and the end of the query.
>
> > If we can safely extract and reuse the ORDER BY clause of the
> > underlying query, we have no additional columns to insert - we would
> > just reuse that at top level.
>
> I just covered normal report execution when I said we could reuse it
> as-is. However, if the user clicks on a column to re-sort the report
> manually (once to select a new sort column ASC, doing it twice to
> switch it to DESC) we would need to completely replace the ORDER BY
> inside the report and replace it with "ORDER BY [__group__, ] <field>
> ASC|DESC" (__group__ only used if the report is grouped of course).


I've spent some time thinking about this, and also had helpful
discussions with 'CareBear\' and 'shesek' today on #trac IRC.

The following is a summary of status and possible approaches:

1. REVERT TO INITIAL HANDLING
-----------------------------

Revert the patch, and leave it at that. Best described as the 2
fundamental issues apparent here:

a) It works, mostly. But in essence it returns rows in undefined
order, and we are depending on undefined order to paginate and render
results. Issues like groups appearing several times in reports cannot
be avoided. Rows may appear out of 'order'.

b) The order of groups as now reported as a 'regression' remains, but
only when you re-sort the report. Like our own /demo-0.12/report/8
("Active Tickets, Mine first") do this test;
- open report, observe that it says "My Tickets"
- click any column header to resort table (like "Component")
- observe that it now says "Active Tickets" as first group.
All r10892 did was make this bug apparent as without parsing the query
we can't know if the order we insert is supposed to be ASC or DESC.
After r10892 the bug will be clearly visible for all reports using
ORDER BY __group__ DESC.

2. DROP SUBQUERY AND PAGINATE IN PYTHON
---------------------------------------

Instead of LIMIT and OFFSET in SQL, return the full resultset and pick
the rows we want depending on paging status.

a) The sorting would be correct. But, it will come at the cost of
performance as the cursor will be populated with full data sets for
each page in the report. The performance must be measured and compared
for larger datasets as the penalty may be very noticeable.

b) Switching to Python paging does not help with detecting ASC or DESC
sorting of groups. Issue remains.

3. USE ROW_NUMBER IN SQL
------------------------

Supported for MySQL and Postgres, we could mofify the inner SQL to be
"SELECT ROW_NUMBER(), ...<remaing query>..." to have an ID to follow
each row that we then could reuse at outer level.

a) The sorting would be correct, but no solution in SQLite that does
not support it. Would need conditional code that uses initial
behaviour for SQLite, and other behaviour for those that are known to
support it. There should be no noticeable performance differences
compared to current status.

b) Again, no new knowledge about __group__ ASC or DESC.

4. PARSE QUERY AND EXTRACT ORDER
--------------------------------

Somehow parse the query and extract the details we need, really just
the ORDER BY clause content. If __group__ are part of current columns,
we would need to split the ORDER BY content into parts (can order by
more than one item of course) and detect if DESC is mentioned for
first order item. As the SQL in theory can contain further nested
subqueries, and "ORDER BY" text may otherwise appear in functions or
elsewhere, care must be taken to extract this correctly.

a) By appending the subquery ORDER BY to the outer query, the results
will be ordered correctly.

b) With the contents of the ORDER BY clause in hand, parsed and
correctly identified ASC/DESC of first item, we can also re-sort
grouped reports by doing "ORDER BY __group__ ASC|DESC, selected_column
ASC|DESC". The outer and inner order should also be correct.

5. JUST APPEND LIMIT AND OFFSET TO THE QUERY
--------------------------------------------

As Christian suggested, just drop the use of subquery and append
"LIMIT ... OFFSET .." to the original query.

a) The order would be correct. New issue may be that the report would
fail if the user-entered SQL already contains the same keywords.
Typically reports that contain the "top number of.." or the "most
recent.." that itself restricts the dataset. It would be a bug, and we
would need a fallback strategy - like for instance just use current
strategy if exception occurs (1.).

b) Re-sorting columns would not be possible seeing we would not be
able to change the ORDER BY clause. This would of be much worse than
the current/initial situation.

6. "YOUR" SUGGESTION?
---------------------

Feel free to suggest other approaches - or combination of approaches.

Christian Boos

unread,
Jan 17, 2012, 8:11:24 PM1/17/12
to trac...@googlegroups.com
On 1/17/2012 4:26 PM, osimons wrote:
...

>
> The following is a summary of status and possible approaches:
>
> 1. REVERT TO INITIAL HANDLING
> -----------------------------
>
> Revert the patch, and leave it at that. Best described as the 2
> fundamental issues apparent here:
>
> a) It works, mostly. But in essence it returns rows in undefined
> order, and we are depending on undefined order to paginate and render
> results. Issues like groups appearing several times in reports cannot
> be avoided. Rows may appear out of 'order'.

For me, the most serious problem here is this undefined order.

#216 was implemented on the wrong assumption that the outer query
will just keep the order given by the inner query, but this is
apparently wrong. Now that we know, we can't leave it like that.

>
> b) The order of groups as now reported as a 'regression' remains, but
> only when you re-sort the report. Like our own /demo-0.12/report/8
> ("Active Tickets, Mine first") do this test;
> - open report, observe that it says "My Tickets"
> - click any column header to resort table (like "Component")
> - observe that it now says "Active Tickets" as first group.
> All r10892 did was make this bug apparent as without parsing the query
> we can't know if the order we insert is supposed to be ASC or DESC.
> After r10892 the bug will be clearly visible for all reports using
> ORDER BY __group__ DESC.

We could introduce a __groupdesc__ keyword...

>
> 2. DROP SUBQUERY AND PAGINATE IN PYTHON
> ---------------------------------------
>
> Instead of LIMIT and OFFSET in SQL, return the full resultset and pick
> the rows we want depending on paging status.
>
> a) The sorting would be correct. But, it will come at the cost of
> performance as the cursor will be populated with full data sets for
> each page in the report. The performance must be measured and compared
> for larger datasets as the penalty may be very noticeable.

Pagination was precisely introduced to avoid doing this,
so I think this solution can't be used.

>
> b) Switching to Python paging does not help with detecting ASC or DESC
> sorting of groups. Issue remains.
>
> 3. USE ROW_NUMBER IN SQL
> ------------------------
>
> Supported for MySQL and Postgres, we could mofify the inner SQL to be
> "SELECT ROW_NUMBER(), ...<remaing query>..." to have an ID to follow
> each row that we then could reuse at outer level.
>
> a) The sorting would be correct, but no solution in SQLite that does
> not support it. Would need conditional code that uses initial
> behaviour for SQLite, and other behaviour for those that are known to
> support it. There should be no noticeable performance differences
> compared to current status.

Eek. SQL dialect-hell. I also first looked into this direction but
found nothing convincing.

>
> b) Again, no new knowledge about __group__ ASC or DESC.
>
> 4. PARSE QUERY AND EXTRACT ORDER
> --------------------------------
>
> Somehow parse the query and extract the details we need, really just
> the ORDER BY clause content. If __group__ are part of current columns,
> we would need to split the ORDER BY content into parts (can order by
> more than one item of course) and detect if DESC is mentioned for
> first order item. As the SQL in theory can contain further nested
> subqueries, and "ORDER BY" text may otherwise appear in functions or
> elsewhere, care must be taken to extract this correctly.

Maybe we could analyze further what are the columns inside the
ORDER BY, though that's probably difficult in the general case.
As a first step however, only locating the ORDER BY should be
enough (see below).

>
> a) By appending the subquery ORDER BY to the outer query, the results
> will be ordered correctly.
>
> b) With the contents of the ORDER BY clause in hand, parsed and
> correctly identified ASC/DESC of first item, we can also re-sort
> grouped reports by doing "ORDER BY __group__ ASC|DESC, selected_column
> ASC|DESC". The outer and inner order should also be correct.
>
> 5. JUST APPEND LIMIT AND OFFSET TO THE QUERY
> --------------------------------------------
>
> As Christian suggested, just drop the use of subquery and append
> "LIMIT ... OFFSET .." to the original query.

Not only append LIMIT / OFFSET, but also add an ORDER BY clause
or augment an existing top-level one.

>
> a) The order would be correct. New issue may be that the report would
> fail if the user-entered SQL already contains the same keywords.
> Typically reports that contain the "top number of.." or the "most
> recent.." that itself restricts the dataset. It would be a bug, and we
> would need a fallback strategy - like for instance just use current
> strategy if exception occurs (1.).

Ok, if there's already a LIMIT, we should not add one (and probably
disable pagination controls).

>
> b) Re-sorting columns would not be possible seeing we would not be
> able to change the ORDER BY clause. This would of be much worse than
> the current/initial situation.

Actually not... My solution even has the nice side-effect to
always make it possible to sort with an user selected column.


>
> 6. "YOUR" SUGGESTION?
> ---------------------
>
> Feel free to suggest other approaches - or combination of approaches.
>

Right, so my proposed solution is a kind of combined 4-5 approach.
You can see a "proof of concept" attached to the ticket I created
for tracking this issue:


http://trac.edgewall.org/attachment/ticket/10530/10530-rework-report-pagination-r10914.patch

The pagination is now done using a single query, and it seems
to work well with the current reports and I hope with most
existing ones. We can probably make it more even more robust.

The problem with the "Mine ticket" report is still there, but
starting from that patch it should be possible to try 4.b) or
my __groupdesc__ suggestion.

But other ideas welcome, of course.

-- Christian

osimons

unread,
Jan 18, 2012, 5:48:49 PM1/18/12
to Trac Development


On Jan 18, 2:11 am, Christian Boos <christian.b...@free.fr> wrote:
> On 1/17/2012 4:26 PM, osimons wrote:
> ...
>
> Right, so my proposed solution is a kind of combined 4-5 approach.
> You can see a "proof of concept" attached to the ticket I created
> for tracking this issue:
>
> http://trac.edgewall.org/attachment/ticket/10530/10530-rework-report-...
>
> The pagination is now done using a single query, and it seems
> to work well with the current reports and I hope with most
> existing ones. We can probably make it more even more robust.

Moving the issue and discussion there then.

> The problem with the "Mine ticket" report is still there, but
> starting from that patch it should be possible to try 4.b) or
> my __groupdesc__ suggestion.

Now, on the subject of re-sorting... Compare:

http://trac.edgewall.org/demo-0.12/report/7

with

http://trac.edgewall.org/demo-0.13/report/7

Our 0.12 demo allows re-sorting columns. The 0.13 demo has no such
feature. What's changed?


:::simon
Reply all
Reply to author
Forward
0 new messages