How to deal with unsorted output of dictionaries in doctests

184 views
Skip to first unread message

Nils Bruin

unread,
Jan 27, 2020, 5:01:04 PM1/27/20
to sage-devel
In current IPython, dictionaries are no longer sorted on output, because in Py3 there is a slightly more meaningful order to the dict: it's close to insertion order. So it seems reasonable to follow this order for printing too.
Yet, the order in not actually intrinsically meaningful for the dict.
We have previously relied on the sorting of dictionaries to make doctests of dictionaries work smoothly. We need to change that if we upgrade IPython. There seem to be two main options:
 1) we adjust the doctests to the new output order of IPython
 2) we adjust the doctests so that the output order is somehow sorted.

The problem with 1) is that it's a nasty change (it hits quite a few doctests), and it's rather fundamentally fragile: if some code changes somewhere (or, unlikely, Py3 changes dict ordering), doctests break again.

For 2) there are different possibilities.
 a) Something like sorted(D.iteritems()) works, but it means the doctest now displays a list of tuples rather than something that looks like a dict: not a problem for "TESTS:" but less than desirable for "EXAMPLES:".
 b) It looks like with `from pprint import pprint` we could just do `pprint(D)` to get a pretty-printed dict.
 c) we have a pretty_print(...) which could grow a `sort=True` attribute for dicts. However, its default currently is to output "latex" which is grossly inappropriate for doctests. So if we go with that, we should probably ensure that the "display manager" in force for doctests has text preference set to "plain" (and perhaps also ensure that this is the case when using the command line shell, where "latex" output is also far from pretty.

Generally I'm not so in favour of making special arrangements for the benefit of doctests (just write them so that they don't rely on arbitrary peculiarities, and use "#random" if you have to display such things in the `EXAMPLES` area), but in this case we're inheriting a problem from a change in IPython, so a more systematic approach seems appropriate. Since this affects a lot of developers, a sage-devel discussion seem appropriate. Go ahead!

Michael Orlitzky

unread,
Jan 27, 2020, 6:41:55 PM1/27/20
to sage-...@googlegroups.com
On 1/27/20 5:01 PM, Nils Bruin wrote:
> This came up in https://trac.sagemath.org/ticket/29042 :
> In current IPython, dictionaries are no longer sorted on output, because
> in Py3 there is a slightly more meaningful order to the dict: it's close
> to insertion order. So it seems reasonable to follow this order for
> printing too.
> Yet, the order in not actually intrinsically meaningful for the dict.
> We have previously relied on the sorting of dictionaries to make
> doctests of dictionaries work smoothly. We need to change that if we
> upgrade IPython. There seem to be two main options:
>  1) we adjust the doctests to the new output order of IPython
>  2) we adjust the doctests so that the output order is somehow sorted.
>
> The problem with 1) is that it's a nasty change (it hits quite a few
> doctests), and it's rather fundamentally fragile: if some code changes
> somewhere (or, unlikely, Py3 changes dict ordering), doctests break again.
>

The insertion order is part of the language as of python-3.7, and can't
be changed:

* https://mail.python.org/pipermail/python-dev/2017-December/151283.html
* https://docs.python.org/3.7/tutorial/datastructures.html#dictionaries

So while it sucks, I guess we should update all of the doctests to print
in that order. It was always a bad idea to rely on them being sorted,
and I don't think I agree with making the order part of the language,
but that's what we've got.

David Roe

unread,
Jan 27, 2020, 7:51:28 PM1/27/20
to sage-devel
On Mon, Jan 27, 2020 at 5:01 PM Nils Bruin <nbr...@sfu.ca> wrote:
For 2) there are different possibilities.
 a) Something like sorted(D.iteritems()) works, but it means the doctest now displays a list of tuples rather than something that looks like a dict: not a problem for "TESTS:" but less than desirable for "EXAMPLES:".
 b) It looks like with `from pprint import pprint` we could just do `pprint(D)` to get a pretty-printed dict.
 c) we have a pretty_print(...) which could grow a `sort=True` attribute for dicts. However, its default currently is to output "latex" which is grossly inappropriate for doctests. So if we go with that, we should probably ensure that the "display manager" in force for doctests has text preference set to "plain" (and perhaps also ensure that this is the case when using the command line shell, where "latex" output is also far from pretty.

Generally I'm not so in favour of making special arrangements for the benefit of doctests (just write them so that they don't rely on arbitrary peculiarities, and use "#random" if you have to display such things in the `EXAMPLES` area), but in this case we're inheriting a problem from a change in IPython, so a more systematic approach seems appropriate. Since this affects a lot of developers, a sage-devel discussion seem appropriate. Go ahead!

What about having another special doctest comment like "# sort-keys" that makes the doctest framework sort the keys before printing?  This could be then implemented in one of the above ways in the doctest code.
David

Nils Bruin

unread,
Jan 27, 2020, 8:48:01 PM1/27/20
to sage-devel
On Monday, January 27, 2020 at 4:51:28 PM UTC-8, David Roe wrote:

What about having another special doctest comment like "# sort-keys" that makes the doctest framework sort the keys before printing?  This could be then implemented in one of the above ways in the doctest code.
David
That would require the doctest framework to parse the comment before running the doctest and setting a displayhook before, so that the sorted output can be generated. It still requires us to supply the display hook.

I don't know if in the current setup it's easy to modify the doctest environment before running the test, based on comments in the doctest (I don't think it's an option to process the doctest output on string-level after it's been produced . That just sounds like a fragile nightmare). If so, it's an option to selectively activate the hook.

Nils Bruin

unread,
Jan 27, 2020, 11:08:48 PM1/27/20
to sage-devel
On Monday, January 27, 2020 at 2:01:04 PM UTC-8, Nils Bruin wrote:
For 2) there are different possibilities.
 a) Something like sorted(D.iteritems()) works, but it means the doctest now displays a list of tuples rather than something that looks like a dict: not a problem for "TESTS:" but less than desirable for "EXAMPLES:".


In Py3.7 we can use the insertion order preservation to write:

dict(sorted(D.items()))

or

{k:D[k] for k in sorted(D)}

It's still a bit circuitous to write compared to just

D

but it does express the intent and prints like a dictionary (because it is one). I'm not sure I'm in favour of it, but we should take it into account when deciding what we do (the thread on python-devel dealing with specifying insertion order preservation in Py7 has some lively discussion on `pprint` and its sorting behaviour).

Markus Wageringel

unread,
Jan 30, 2020, 3:52:06 PM1/30/20
to sage-devel
Am Dienstag, 28. Januar 2020 00:41:55 UTC+1 schrieb Michael Orlitzky:
The insertion order is part of the language as of python-3.7, and can't
be changed:

* https://mail.python.org/pipermail/python-dev/2017-December/151283.html
* https://docs.python.org/3.7/tutorial/datastructures.html#dictionaries

So while it sucks, I guess we should update all of the doctests to print
in that order. It was always a bad idea to rely on them being sorted,
and I don't think I agree with making the order part of the language,
but that's what we've got.

While the order of dictionaries is insertion order now, functions in Sage or in libraries that return dictionaries usually do not make any promise about the order in which elements are inserted into the dictionary. Thus, when doctesting such a function, the order of the result is not specified and might easily change by a seemingly unrelated change in our code base. This is not a huge problem, as we can update any failing doctests, but we could also make the doctests more robust in the first place, by sorting the output.



From the different options mentioned here, my preferred choices are

2c) `pretty_print(D, sort=True)` for clarity, or

2d) `dict(sorted(D.items()))` for simplicity, as this makes use of well-known functions and does not require any other changes.

As Nils explained, for 2c (and even independently) it would make sense to change the default display mode from `None` to `plain` in the command line, but this would lead to a small dichotomy between the command line and the notebook, which is why I am slightly hesistant about this. It would make it a bit more difficult to doctest the notebook behavior. Therefore, I am leaning toward 2d, which should be an uncontroversial change and is similar to the `sorted(...)` idiom already used in the doctests.
 

Nils Bruin

unread,
Jan 30, 2020, 4:17:14 PM1/30/20
to sage-devel
On Thursday, January 30, 2020 at 12:52:06 PM UTC-8, Markus Wageringel wrote:
From the different options mentioned here, my preferred choices are

2c) `pretty_print(D, sort=True)` for clarity, or

2d) `dict(sorted(D.items()))` for simplicity, as this makes use of well-known functions and does not require any other changes.

As Nils explained, for 2c (and even independently) it would make sense to change the default display mode from `None` to `plain` in the command line, but this would lead to a small dichotomy between the command line and the notebook, which is why I am slightly hesistant about this. It would make it a bit more difficult to doctest the notebook behavior.

I do not think that's an issue at all. Presently, on the command line we *literally* get

sage: pretty_print({1:2})
\newcommand{\Bold}[1]{\mathbf{#1}}\left\{1 : 2\right\}

I don't think that satisfies anybody's idea of "pretty" printing. If you do the same thing in the notebook, you get a nicely mathjax typeset expression. We can't replicate that behaviour on the command line anyway, so going with "plain" wouldn't break anything: commandline and notebook already behave very differently.

It is the case that

  get_display_manager().preferences.text is None

holds in both at the moment, but we have that

  get_display_manager()

prints as 'The Sage display manager using the IPython command line backend' and 'The Sage display manager using the IPython notebook backend' respectively, so they're already distinguishable. Setting different defaults in these different cases doesn't seem like a problem to me.

Nils Bruin

unread,
Jan 30, 2020, 4:21:29 PM1/30/20
to sage-devel
On Thursday, January 30, 2020 at 1:17:14 PM UTC-8, Nils Bruin wrote:
prints as 'The Sage display manager using the IPython command line backend' and 'The Sage display manager using the IPython notebook backend' respectively, so they're already distinguishable. Setting different defaults in these different cases doesn't seem like a problem to me.

Oops, and as the documentation of display_manager says, in doctests, you get back: 'The Sage display manager using the doctest backend', so there's a third one! Perhaps we should try and keep that one aligned with the command line, since those both communicate through character streams (whereas the notebook one communicates through a message passing interface)

Michael Orlitzky

unread,
Jan 30, 2020, 5:10:12 PM1/30/20
to sage-...@googlegroups.com
On 1/30/20 3:52 PM, Markus Wageringel wrote:
>
> While the order of dictionaries is insertion order now, functions in
> Sage or in libraries that return dictionaries usually do not make any
> promise about the order in which elements are inserted into the
> dictionary. Thus, when doctesting such a function, the order of the
> result is not specified and might easily change by a seemingly unrelated
> change in our code base. This is not a huge problem, as we can update
> any failing doctests, but we could also make the doctests more robust in
> the first place, by sorting the output.

This is a good point, but it's a problem we already have. We don't make
any promises about the precise message contained in e.g. a ValueError,
but we still frequently test the contents. You just update it when you
change the message. The doctest is more of a sanity check here, telling
you that you've got two different answers and to go figure out which one
is right. If the code is right, you update the doctest.

The downside to sorting everything in the tests it that running the
tests already takes several hours (8+ here). How many CPU hours do we
collectively burn by sorting something 1293752893 times instead of
updating the test once?

Markus Wageringel

unread,
Jan 30, 2020, 5:53:30 PM1/30/20
to sage-...@googlegroups.com
Rather than distinguishing the different backends, I was more worried about how to keep what related doctests currently give us. For example, in src/doc/fr/tutorial/latex.rst, pretty_print is advertised as a way to obtain nicely formatted MathJax output:

    sage: pretty_print(x^12)
    <html><script type="math/tex">\newcommand{\Bold}[1]{\mathbf{#1}}x^{12}</script></html>

I suppose this is exactly what is used in the notebook to format the output. If the default display mode was `plain`, I would not know how to rephrase this doctest other than changing the display mode before and after the test, which is suboptimal in the tutorials.

Admittedly though, with the command line backend, the same command outputs

    sage: pretty_print(x^12)
    \newcommand{\Bold}[1]{\mathbf{#1}}x^{12}

so arguably a dichotomy is already there. With display mode `plain`, we get

    sage: pretty_print(x^12)
    x^12

which is the same as just printing the object.

It seems generally difficult, if not impossible, to write tests for different backends, as we only use the doctest backend during doctests. Though, I assume that the MathJax output is sufficiently tested elsewhere, so changing the default behavior of pretty_print for the commandline and doctests should not be a problem after all.

Markus Wageringel

unread,
Jan 30, 2020, 5:56:16 PM1/30/20
to sage-...@googlegroups.com

> Am 30.01.2020 um 23:10 schrieb Michael Orlitzky <mic...@orlitzky.com>:
>
> This is a good point, but it's a problem we already have. We don't make
> any promises about the precise message contained in e.g. a ValueError,
> but we still frequently test the contents. You just update it when you
> change the message. The doctest is more of a sanity check here, telling
> you that you've got two different answers and to go figure out which one
> is right. If the code is right, you update the doctest.

I agree a doctest is easily updated when things change, but errors could easily be introduced as well. Ideally a test is written in a way that it does not test more than necessary – and the iteration order of dictionaries usually does not need to be tested.

> The downside to sorting everything in the tests it that running the
> tests already takes several hours (8+ here). How many CPU hours do we
> collectively burn by sorting something 1293752893 times instead of
> updating the test once?

Although there are many tests outputting dictionaries, these are usually very small, so I do not expect this to have a big impact on running times. Besides, dictionaries are currently sorted via IPython during testing as well.

Markus Wageringel

unread,
Feb 1, 2020, 5:06:02 PM2/1/20
to sage-...@googlegroups.com
I have opened https://trac.sagemath.org/ticket/29136 proposing to change the default text display preference to "plain" for the text-based backends.
Reply all
Reply to author
Forward
0 new messages