Writing label text to Excel when the text contains only numeric characters

806 views
Skip to first unread message

Paul Pharr

unread,
May 20, 2014, 10:13:11 AM5/20/14
to python...@googlegroups.com



I am trying to write a very simple Excel file containing several columns of ASCII text identifiers which sometimes contain only numeric characters. I am writing python strings to the cell and the intended value is properly stored in the Excel file, but by default Excel has the error checking preference "Flag numbers formatted as text" turned on. This causes every label cell with data that looks to Excel like a number to be flagged as an error.

If the cells are created as valid numbers stored as text labels in Excel such as infoSheet.row(6).write(1, "000555"), then if the user edits them, they will be turned into numbers. This can change the alignment formatting in unappealing ways and more importantly can strip leading zeros and interpret the number using scientific notation, making it incorrect as an identifier.

I tried submitting them with a leading apostrophe, which in the Excel GUI is the notation indicating that the text entered is to be preserved as text and not interpreted as a number. If I write text cells using xlwt with a leading apostrophe like infoSheet.row(6).write(1, "'000555"), then it shows up in Excel as having a leading apostrophe when the file is opened, but if the cell is edited by the user, the apostrophe disappears. So the apostrophe logic is not applied when the file is opened - only when cells are entered by the user. And even if a text cell is prefixed by an apostrophe, excel still flags the cell as having an error because it contains only numeric information (!!)

I have tried to format the cell as text using format like xlwt.easyxf(num_format_str="@"), but that leaves the error condition in place.

This leaves the old trick of setting a text field using a formula. So if I set a field to be ="000123" in excel or xlwt.Formula("\"000123\"") from xlwt, then it properly displays the label with no error condition, and if the user edits it, it is not accidentally interpreted numerically.

If I ignore the possibly inefficient nature of this for large spreadsheets of data, this has the significant downside that if this file created by xlwt is interpreted by a third party excel file reading library, it is usually displayed with empty cells where the formulas are used. I confirmed that gmail and apple iOS display of these files leaves all formulas blank, but if I open and save the file with Excel, then the "native" saved version shows all data in gmail and iOS. I assume this is because Excel writes cached results for formulas to the file, but my method of writing with xlwt does not.



textStyle = xlwt.easyxf(num_format_str="@")

infoSheet.row(5).write(1, "test", textStyle) #<== works as expected
infoSheet.row(6).write(1, "'000555", textStyle) #<== apostrophe displays in cell until cell is edited, then cell displays as an error cell because it contains a number as text
infoSheet.row(7).write(1, xlwt.Formula("\"000666\""), textStyle) #<== displays properly in Excel (though likely inefficient to store for large files), but cell looks empty in gmail or iOS until the file is recalculated and re-saved by Excel



All of my testing has been with:

Excel 2011 for Mac (14.3.9 (131030))
xlwt 0.7.4-1
Python 2.6.5 on Windows


All of these cases apply to any text data which looks like a number to Excel - including:

123
00123
123E6
123.45

and so on...


SO - to boil all of these observations down into a couple of questions:

First preference:

Has anyone developed a technique using xlwt by which I can write a text label to a cell which contains only numeric character data, and have it not be flagged as an error cell evin if the global "Flag numbers formatted as text" preference is turned on by the user? And further, does this technique prevent the cell from being parsed and turned into a value cell if the user edits it?


Second preference:

Has anyone developed a technique using xlwt by which I can somehow set the "cache" data for a formula cell so that I can use formulas for all my data and cause the file not to appear empty to a system like iOS or gmail that might examine it before it gets recalculated by Excel?



Thanks!


Paul

John Yeung

unread,
May 21, 2014, 2:23:00 PM5/21/14
to python-excel
On Tue, May 20, 2014 at 10:13 AM, Paul Pharr <paul...@gmail.com> wrote:
> And even if a text cell is prefixed by an apostrophe, excel
> still flags the cell as having an error because it contains only
> numeric information (!!)

For good or ill, that's just how Excel is.

> I have tried to format the cell as text using format like
> xlwt.easyxf(num_format_str="@"), but that leaves the error
> condition in place.

This is true, but it is resistant to simple user edits. (They would
have to choose a new cell format.)

> This leaves the old trick of setting a text field using a formula
> [which] displays the label with no error condition, and if
> the user edits it, it is not accidentally interpreted numerically.

It doesn't show the error, but it does show the formula if the cell is
selected. This may either be confusing to the user, or may tempt the
user to "correct" the cell by removing the weird characters.

> [Cells containing formulas may initially appear blank when read
> by software other than Excel.] I assume this is because Excel
> writes cached results for formulas to the file, but my method of
> writing with xlwt does not.

xlwt indeed does not write any calculation results. You will find
that almost all software which writes Excel files, other than Excel
itself, behaves similarly. This is because (1) it's highly nontrivial
to fully parse Excel formulas, (2) even if correctly parsed, it's
difficult to *exactly* duplicate the calculations that Excel itself
would perform; thus the results cannot be guaranteed to be "correct"
from an Excel point of view, and (3) because of the previous reasons,
Microsoft itself recommends that third-party software not attempt to
evaluate formulas.

That said, if you want to calculate formula results yourself and write
both the value and the formula, it is possible to do so. To my
knowledge, xlwt doesn't expose this functionality, but XlsxWriter
does. The thing about XlsxWriter is that it only generates the newer
Excel 2007+ (.xlsx) format. If that's a showstopper, then you have to
either patch xlwt yourself or live with one of the suboptimal choices
you've already found. Of the xlwt options, I personally prefer
setting the number format string to '@'.

One final caveat: I believe XlsxWriter currently has a bit of a bug
when it comes to writing formula values that look numeric but aren't
(your very situation). I will open a ticket for this if I can't find
that the issue is already open or fixed. The developer (John
McNamara) is usually quite responsive, but if time is critical, I can
point you to where you need to put the temporary duct tape in the
code.

John Y.
Reply all
Reply to author
Forward
0 new messages