Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

doctests compatibility for python 2 & python 3

86 views
Skip to first unread message

Robin Becker

unread,
Jan 17, 2014, 6:16:17 AM1/17/14
to pytho...@python.org
I have some problems making some doctests for python2 code compatible with
python3. The problem is that as part of our approach we are converting the code
to use unicode internally. So we allow eihter byte strings or unicode in inputs,
but we are trying to convert to unicode outputs.

That makes doctests quite hard as

def func(a):
"""
>>> func(u'aaa')
'aaa'
"""
return a

fails in python2 whilst

def func(a):
"""
>>> func(u'aaa')
u'aaa'
"""
return a

fails in python3. Aside from changing the tests so they look like
"""
>>> func(u'aaa')==u'aaa'
True
"""
which make the test utility harder. If the test fails I don't see the actual
outcome and expected I see expected True got False.

Is there an easy way to make these kinds of tests work in python 2 & 3?
--
Robin Becker

Chris Angelico

unread,
Jan 17, 2014, 6:24:18 AM1/17/14
to pytho...@python.org
On Fri, Jan 17, 2014 at 10:16 PM, Robin Becker <ro...@reportlab.com> wrote:
> Aside from changing the tests so they look like
> """
> >>> func(u'aaa')==u'aaa'
> True
> """

Do your test strings contain any non-ASCII characters? If not, you
might be able to do this:

def func(a):
"""
>>> str(func(u'aaa'))
'aaa'
"""
return a

which should work in both. In Py3, the str() call will do nothing, and
it'll compare correctly; in Py2, it'll convert it into a byte string,
which will repr() without the 'u'.

I don't think it's possible to monkey-patch unicode.__repr__ to not
put the u'' prefix on, but if there is a way, that'd possibly be more
convenient.

ChrisA

Chris Angelico

unread,
Jan 17, 2014, 6:30:46 AM1/17/14
to pytho...@python.org
On Fri, Jan 17, 2014 at 10:24 PM, Chris Angelico <ros...@gmail.com> wrote:
> Do your test strings contain any non-ASCII characters? If not, you
> might be able to do this:
>
> def func(a):
> """
> >>> str(func(u'aaa'))
> 'aaa'
> """
> return a

Actually, probably better than that:

def func(a):
"""
>>> text(func(u'aaa'))
'aaa'
"""
return a

try:
class text(unicode): # Will throw NameError in Py3
def __repr__(self):
return unicode.__repr__(self)[1:]
except NameError:
# Python 3 doesn't need this wrapper.
text = str

Little helper class that does what I don't think monkey-patching will
do. I've tested this and it appears to work in both 2.7.4 and 3.4.0b2,
but that doesn't necessarily mean that the repr of any given Unicode
string will be exactly the same on both versions. It's likely to be
better than depending on the strings being ASCII, though.

ChrisA

Steven D'Aprano

unread,
Jan 17, 2014, 6:41:24 AM1/17/14
to
On Fri, 17 Jan 2014 11:16:17 +0000, Robin Becker wrote:

> I have some problems making some doctests for python2 code compatible
> with python3. The problem is that as part of our approach we are
> converting the code to use unicode internally. So we allow eihter byte
> strings or unicode in inputs, but we are trying to convert to unicode
> outputs.

Alas, I think you've run into one of the weaknesses of doctest. Don't get
me wrong, I am a huge fan of doctest, but it is hard to write polyglot
string tests with it, as you have discovered.

However, you may be able to get 95% of the way by using print.

def func(a):
"""
>>> print(func(u'aaa'))
aaa
"""
return a

ought to behave identically in both Python 2 and Python 3.3, provided you
only print one object at a time. This ought to work with both ASCII and
non-ASCII (at least in the BMP).



--
Steven

Robin Becker

unread,
Jan 17, 2014, 7:12:35 AM1/17/14
to pytho...@python.org
On 17/01/2014 11:41, Steven D'Aprano wrote:
> def func(a):
> """
> >>> print(func(u'aaa'))
> aaa
> """
> return a
I think this approach seems to work if I turn the docstring into unicode

def func(a):
u"""
>>> print(func(u'aaa\u020b'))
aaa\u020b
"""
return a
def _doctest():
import doctest
doctest.testmod()

if __name__ == "__main__":
_doctest()

If I leave the u off the docstring it goes wrong in python 2.7. I also tried to
put an encoding onto the file and use the actual utf8 characters ie

# -*- coding: utf-8 -*-
def func(a):
"""
>>> print(func(u'aaa\u020b'))
aaaȋ
"""
return a
def _doctest():
import doctest
doctest.testmod()

and that works in python3, but fails in python 2 with this
> (py27) C:\code\hg-repos>python tdt1.py
> C:\python\Lib\doctest.py:1531: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - in
> terpreting them as being unequal
> if got == want:
> C:\python\Lib\doctest.py:1551: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - in
> terpreting them as being unequal
> if got == want:
> **********************************************************************
> File "tdt1.py", line 4, in __main__.func
> Failed example:
> print(func(u'aaa\u020b'))
> Expected:
> aaaȋ
> Got:
> aaaȋ
> **********************************************************************
> 1 items had failures:
> 1 of 1 in __main__.func
> ***Test Failed*** 1 failures.


--
Robin Becker

Robin Becker

unread,
Jan 17, 2014, 7:14:51 AM1/17/14
to pytho...@python.org
On 17/01/2014 11:30, Chris Angelico wrote:
> On Fri, Jan 17, 2014 at 10:24 PM, Chris Angelico <ros...@gmail.com> wrote:
>> Do your test strings contain any non-ASCII characters? If not, you
>> might be able to do this:
>>
>> def func(a):
>> """
>> >>> str(func(u'aaa'))
>> 'aaa'
>> """
>> return a
>
> Actually, probably better than that:
>
> def func(a):
> """
> >>> text(func(u'aaa'))
> 'aaa'
> """
> return a
>
> try:
> class text(unicode): # Will throw NameError in Py3
> def __repr__(self):
> return unicode.__repr__(self)[1:]
> except NameError:
> # Python 3 doesn't need this wrapper.
> text = str
>
> Little helper class that does what I don't think monkey-patching will
> do. I've tested this and it appears to work in both 2.7.4 and 3.4.0b2,
> but that doesn't necessarily mean that the repr of any given Unicode
> string will be exactly the same on both versions. It's likely to be
> better than depending on the strings being ASCII, though.
>
> ChrisA
>

I tried this approach with a few more complicated outcomes and they fail in
python2 or 3 depending on how I try to render the result in the doctest.
--
Robin Becker

Steven D'Aprano

unread,
Jan 17, 2014, 10:27:38 AM1/17/14
to
On Fri, 17 Jan 2014 12:12:35 +0000, Robin Becker wrote:

> On 17/01/2014 11:41, Steven D'Aprano wrote:
>> def func(a):
>> """
>> >>> print(func(u'aaa'))
>> aaa
>> """
>> return a
>
> I think this approach seems to work if I turn the docstring into unicode
>
> def func(a):
> u"""
> >>> print(func(u'aaa\u020b'))
> aaa\u020b
> """
> return a

Good catch! Without the u-prefix, the \u... is not interpreted as an
escape sequence, but as a literal backslash-u.


> If I leave the u off the docstring it goes wrong in python 2.7. I also
> tried to put an encoding onto the file and use the actual utf8
> characters ie
>
> # -*- coding: utf-8 -*-
> def func(a):
> """
> >>> print(func(u'aaa\u020b'))
> aaaȋ
> """
> return a

There seems to be some mojibake in your post, which confuses issues.

You refer to \u020b, which is LATIN SMALL LETTER I WITH INVERTED BREVE.
At least, that's what it ought to be. But in your post, it shows up as
the two character mojibake, ╚ followed by ï (BOX DRAWINGS DOUBLE UP AND
RIGHT followed by LATIN SMALL LETTER I WITH DIAERESIS). It appears that
your posting software somehow got confused and inserted the two
characters which you would have got using cp-437 while claiming that they
are UTF-8. (Your post is correctly labelled as UTF-8.)

I'm confident that the problem isn't with my newsreader, Pan, because it
is pretty damn good at getting encodings right, but also because your
post shows the same mojibake in the email archive:

https://mail.python.org/pipermail/python-list/2014-January/664771.html

To clarify: you tried to show \u020B as a literal. As a literal, it ought
to be the single character ȋ which is a lower case I with curved accent on
top. The UTF-8 of that character is b'\xc8\x8b', which in the cp-437 code
page is two characters ╚ ï.

py> '\u020b'.encode('utf8').decode('cp437')
'ȋ'

Hence, mojibake.


> def _doctest():
> import doctest
> doctest.testmod()
>
> and that works in python3, but fails in python 2 with this
>> (py27) C:\code\hg-repos>python tdt1.py C:\python\Lib\doctest.py:1531:
>> UnicodeWarning: Unicode equal comparison failed to convert both
>> arguments to Unicode - in terpreting them as being unequal
>> if got == want:
>> C:\python\Lib\doctest.py:1551: UnicodeWarning: Unicode equal comparison
>> failed to convert both arguments to Unicode - in terpreting them as
>> being unequal

I cannot replicate this specific exception. I think it may be a side-
effect of you being on Windows. (I'm on Linux, and everything is UTF-8.)

>> if got == want:
>> **********************************************************************
>> File "tdt1.py", line 4, in __main__.func Failed example:
>> print(func(u'aaa\u020b'))
>> Expected:
>> aaaȋ
>> Got:
>> aaaȋ

The difficulty here is that it is damn near impossible to sort out which,
if any, bits are mojibake inserted by your posting software, which by
your editor, your terminal, which by Python, and which are artifacts of
the doctest system.

The usual way to debug these sorts of errors is to stick a call to repr()
just before the print.

print(repr(func(u'aaa\u020b')))



--
Steven

Robin Becker

unread,
Jan 17, 2014, 11:17:27 AM1/17/14
to pytho...@python.org
On 17/01/2014 15:27, Steven D'Aprano wrote:
..........
>>
>> # -*- coding: utf-8 -*-
>> def func(a):
>> """
>> >>> print(func(u'aaa\u020b'))
>> aaaȋ
>> """
>> return a
>
> There seems to be some mojibake in your post, which confuses issues.
>
> You refer to \u020b, which is LATIN SMALL LETTER I WITH INVERTED BREVE.
> At least, that's what it ought to be. But in your post, it shows up as
> the two character mojibake, ╚ followed by ï (BOX DRAWINGS DOUBLE UP AND
> RIGHT followed by LATIN SMALL LETTER I WITH DIAERESIS). It appears that
> your posting software somehow got confused and inserted the two
> characters which you would have got using cp-437 while claiming that they
> are UTF-8. (Your post is correctly labelled as UTF-8.)
>
> I'm confident that the problem isn't with my newsreader, Pan, because it
> is pretty damn good at getting encodings right, but also because your
> post shows the same mojibake in the email archive:
>
> https://mail.python.org/pipermail/python-list/2014-January/664771.html
>
> To clarify: you tried to show \u020B as a literal. As a literal, it ought
> to be the single character ȋ which is a lower case I with curved accent on
> top. The UTF-8 of that character is b'\xc8\x8b', which in the cp-437 code
> page is two characters ╚ ï.

when I edit the file in vim with ut88 encoding I do see your ȋ as the literal.
However, as you note I'm on windows and no amount of cajoling will get it to
work reasonably so my printouts are broken. So on windows

(py27) C:\code\hg-repos>python -c"print(u'aaa\u020b')"
aaaȋ

on my linux

$ python2 -c"print(u'aaa\u020b')"
aaaȋ

$ python2 tdt1.py
/usr/lib/python2.7/doctest.py:1531: UnicodeWarning: Unicode equal comparison
failed to convert both arguments to Unicode - interpreting them as being unequal
if got == want:
/usr/lib/python2.7/doctest.py:1551: UnicodeWarning: Unicode equal comparison
failed to convert both arguments to Unicode - interpreting them as being unequal
if got == want:
**********************************************************************
File "tdt1.py", line 4, in __main__.func
Failed example:
print(func(u'aaa\u020b'))
Expected:
aaaȋ
Got:
aaaȋ
**********************************************************************
1 items had failures:
1 of 1 in __main__.func
***Test Failed*** 1 failures.
robin@everest ~/tmp:
$ cat tdt1.py
# -*- coding: utf-8 -*-
def func(a):
"""
>>> print(func(u'aaa\u020b'))
aaaȋ
"""
return a
def _doctest():
import doctest
doctest.testmod()

if __name__ == "__main__":
_doctest()
robin@everest ~/tmp:

so the error persists with our without copying errors.

Note that on my putty terminal I don't see the character properly (I see unknown
glyph square box), but it copies OK.
--
Robin Becker

Terry Reedy

unread,
Jan 17, 2014, 4:10:50 PM1/17/14
to pytho...@python.org
On 1/17/2014 7:14 AM, Robin Becker wrote:

> I tried this approach with a few more complicated outcomes and they fail
> in python2 or 3 depending on how I try to render the result in the doctest.

I never got how you are using doctests. There were certainly not meant
for heavy-duty unit testing, but for testing combined with explanation.
Section 26.2.3.7. (in 3.3) Warnings warns that they are fragile to even
single char changes and suggests == as a workaround, as 'True' and
'False' will not change. So I would not reject that option.

--
Terry Jan Reedy

Albert-Jan Roskam

unread,
Jan 18, 2014, 3:39:25 AM1/18/14
to pytho...@python.org, Terry Reedy
--------------------------------------------
On Fri, 1/17/14, Terry Reedy <tjr...@udel.edu> wrote:

Subject: Re: doctests compatibility for python 2 & python 3
To: pytho...@python.org
Date: Friday, January 17, 2014, 10:10 PM
=====> I used doctests in .txt files and I converted ALL of them when I wanted to make my code work for both Python 2 and 3. I tried to fix something like a dozen of them so they'd work in Python 2.7 and 3,3. but I found it just too cumbersome and time consuming. The idea of doctest is super elegant, but it is really only meant for testable documentation (maybe with sphinx). If you'd put all the (often boring, e.g. edge cases) test cases in docstrings, the .py file will look very cluttered. One thing that I missed in unittest was Ellipsis, but: https://pypi.python.org/pypi/gocept.testing/1.6.0 offers assertEllipsis and other useful stuff.

Albert-Jan


Robin Becker

unread,
Jan 20, 2014, 5:07:35 AM1/20/14
to pytho...@python.org
On 17/01/2014 21:10, Terry Reedy wrote:
> On 1/17/2014 7:14 AM, Robin Becker wrote:
>
..........
> I never got how you are using doctests. There were certainly not meant for
> heavy-duty unit testing, but for testing combined with explanation. Section
> 26.2.3.7. (in 3.3) Warnings warns that they are fragile to even single char
> changes and suggests == as a workaround, as 'True' and 'False' will not change.
> So I would not reject that option.
>
I have used some 'robust' True/False equality tests and also tests which return
None or a string indicating the expected and observed outcomes eg

>
> def equalStrings(a,b,enc='utf8'):
> return a==b if type(a)==type(b) else asUnicode(a,enc)==asUnicode(b,enc)
>
> def eqCheck(r,x):
> if r!=x:
> print('Strings unequal\nexp: %s\ngot: %s' % (ascii(x),ascii(r)))

of course I needed to import ascii from the future and the asUnicode function
has to be different for python 3 and 2.

Some of our code used doctests which are discovered by a file search.
--
Robin Becker

0 new messages