Difference in encoding between old formatting %s and new formatting

301 views
Skip to first unread message

Александр Илюшкин

unread,
Jun 20, 2016, 8:39:27 AM6/20/16
to robotframework-users
1. I have a library with keyword. Keyword write some message to a test documentation.
2. This python file in `utf-8`, and has needed heading 
`# -*- coding: utf-8 -*-`
3. `*.robot` files are in `utf-8`

Execution of this keyword in robot file with non-ascii symbols gives:

1. If keyword has `"%s" % msg`: no error, log file gives russian message, normally displayed.
2. If keyword has `"{}".format(msg)` or `"{!s}".format(msg)`: I get the error `UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-10: ordinal not in range(128)`

As you can see, I change only old python formatting to the new way. But how to fix this problem sith non-asc displaying error with new way, not using old style formatting?

On SO I got an answer but it's not working:

Try to use the Str.decode(encoding='UTF-8',errors='strict') method. See Python doc.

Example:

"{}".format(msg.decode(errors='replace'))

I checked this out and thats what I got: 

1. I cannot replace any characters because all russian text is removed then and the reason of it is 
2. The problem is that somehow the string "{}" have ascii encoding. Thats why msg which is unicode russian string cannot be converted into ascii
I've RTFM and I know that header of py file should work (i've set this header as # -*- coding: utf-8 -*-) but strings in this file are still in ascii
I can solve this problem if I explicitly define constant strings as unicode writing them as u"some string".format(msg) or unicode("some string").format(msg) and after that I won't get any error. 

But why the py file header isn't working for constant strings? When I run py file separately, it works as expected, If I run test in robot framework - it's failed...

Pekka Klärck

unread,
Jun 21, 2016, 6:24:26 AM6/21/16
to positi...@gmail.com, robotframework-users
Hello,

The reason for this behavior is inconsistency in Python 2 related to
how it handles constructing strings from byte strings and Unicode
strings. In most cases such operations produce Unicode strings:

>>> 'foo' + u'bar'
u'foobar'
>>> 'foo' + u'\xe4'
u'foo\xe4'
>>> '%s' % u'foo'
u'foo'
>>> '%s' % u'\xe4'
u'\xe4'

As seen above, it's OK that the Unicode string contains non-ASCII
characters. If the byte string contains non-ASCII characters, you get
an error:

>>> '\xe4' + u'foo'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in
position 0: ordinal not in range(128)
>>> '\xe4 %s' % u'foo'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in
position 0: ordinal not in range(128)

In other words, you can use the old string formatting with Unicode
strings being inserted into the template as long as the template
itself is pure ASCII. As a result you get an Unicode string.

Unfortunately the new string formatting system doesn't work like that.
The outcome of the `template.format()` depends on is the template a
byte string or a Unicode string:

>>> '{}'.format('foo')
'foo'
>>> '{}'.format(u'foo')
'foo'
>>> u'{}'.format('foo')
u'foo'

As seen above, inserted Unicode strings are converted to byte strings
if the template is a byte strings. This obviously fails if inserted
strings contain non-ASCII characters:

>>> '{}'.format(u'\xe4')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4'
in position 0: ordinal not in range(128)

A workaround is using Unicode strings in templates:

>>> u'{}'.format(u'\xe4')
u'\xe4'

An alternative solution is just using the old string formatting
syntax. It's not deprecated nor going to be deprecated in the future.

Cheers,
.peke
> --
> You received this message because you are subscribed to the Google Groups
> "robotframework-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to robotframework-u...@googlegroups.com.
> To post to this group, send email to robotframe...@googlegroups.com.
> Visit this group at https://groups.google.com/group/robotframework-users.
> For more options, visit https://groups.google.com/d/optout.



--
Agile Tester/Developer/Consultant :: http://eliga.fi
Lead Developer of Robot Framework :: http://robotframework.org
Reply all
Reply to author
Forward
0 new messages