On Tuesday, 6 December 2011 21:05:34 UTC+11, Áron wrote:
Short example attached. I read that openpyxl read cells as unicode, what am I doing wrong?
It is correct that openpyxl produces a unicode object when the cell contains text. Change your sample script to include the line
print row, type(text_in_cell), repr(text_in_cell)
just before the line containing the str() call. You will see that cell B4 contains
u'the next character is from extended ascii \u2265'
The error message from the next line (which you should have included in your first message) is
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2265' in position 42: ordinal not in range(128)
This is quite expected. You are trying to bash into ASCII a character that can't be represented in ASCII.
The short answer to your problem is "Don't do that". Just omit the attempted str conversion.
Longer answer:
In general: get your input text data into Python unicode objects. (Any Excel file reader does this for you, without being told to).
Work in unicode. This includes using unicode literals u"etcetc" for at least all literals that won't fit in ASCII. You should really use unicode literals for all text literals if you are planning to use Python 3.X some time soon (if so, read
http://python3porting.com/toc.html).
When it comes to output, ensure that the unicode objects are encoded in a manner appropriate to the receiver. Any Excel file writer does this for you, without being told to. Likewise any XML file writer. You will have trouble writing to sys.stdout if your script is running in a Command Prompt window on Windows if your text includes characters that can't be encoded in a legacy encoding e.g. cp850 -- this is a wholly separate issue, and doesn't obviate the general principles.
Hints:
1. Use repr() for debugging -- see above.
2.
| >>> import unicodedata
| >>>
unicodedata.name(text_in_cell[42])
| 'GREATER-THAN OR EQUAL TO'
| >>>
3. Read these:
|
http://www.joelonsoftware.com/articles/Unicode.html|
http://docs.python.org/howto/unicode.htmlHope this helps,
John