[pyxl] Discussion of XlsxWriter write() behavior

791 views
Skip to first unread message

John Yeung

unread,
Jul 3, 2013, 5:09:11 PM7/3/13
to python-excel
One of the things that I believe makes xlwt Pythonic is its
fundamental stance that you've got *Python* data, i.e. strongly typed
data, and you're writing it out to Excel. Sure, maybe you originally
got your data into Python from a CSV, so everything is a string
whether it should be or not; but to xlwt, that's not its business.
That's between you and the CSV data, and xlwt isn't going to
implicitly convert characters to numbers.

This is the attitude that both OpenPyXL and XlsxWriter are missing,
perhaps because their languages of origin (PHP, Perl) are weakly
typed, and the prevailing philosophy is to try to silently coerce
where possible. Granted, this is also Excel's philosophy... but this
is precisely what most IT programmers REALLY, REALLY HATE about Excel.

(It also introduces a NaN bug, which I've posted to the tracker.)

I guess it's too late to recommend that XlsxWriter adopt the Pythonic,
strongly typed view as its default behavior, as it would probably
break a lot of code that is already out in the wild. (Even though it
would probably also FIX a lot of code that is out in the wild!) But I
think it would be well worth adding some xlwt-style write methods.
Perhaps call the basic one write_strict(), and the row-at-a-time one
write_row_strict(). I'm happy to bikeshed the names, which is part of
the reason I'm posting to this list instead of directly to the
tracker.

John Y.

John McNamara

unread,
Jul 19, 2013, 10:44:23 AM7/19/13
to python...@googlegroups.com


On Wednesday, 3 July 2013 22:09:11 UTC+1, John Yeung wrote:

This is the attitude that both OpenPyXL and XlsxWriter are missing,
perhaps because their languages of origin (PHP, Perl) are weakly
typed, and the prevailing philosophy is to try to silently coerce
where possible.  Granted, this is also Excel's philosophy... but this
is precisely what most IT programmers REALLY, REALLY HATE about Excel.




Hi,

The functionality of the XlsxWriter write() method doesn't have anything to do with the implementation language but rather the target file format. The docs are reasonably explicit about this:

    https://xlsxwriter.readthedocs.org/en/latest/worksheet.html#worksheet-write

 
"Excel makes a distinction between data types such as strings, numbers, blanks, formulas and hyperlinks. To simplify the process of writing data to an XlsxWriter file the write() method acts as a general alias for several more specific methods":

Also, it is covered in some detail in Tutorial 3 in the docs:

    https://xlsxwriter.readthedocs.org/en/latest/tutorial03.html
 
Basically, the write() method tries to write the type of data that Excel expects. Take for example the case you raise of CSV files. If you created a small csv to xlsx program to convert a file like the following to xlsx:

    Foo
    Bar
    1
    23
    456.789


For example with a program like the following that uses one the XlsxWriter type explicit functions since all of the read data will have a string type in Python:

    import csv
    from xlsxwriter.workbook import Workbook

    workbook = Workbook('file.xlsx')
    worksheet = workbook.add_worksheet()

    with open('file.csv', 'rb') as csvfile:
        csvreader = csv.reader(csvfile)
        for row_num, row_data in enumerate(csvreader):
            for col_num, data in enumerate(row_data):
                print "%d, %d, %s" % (row_num, col_num, data)
                worksheet.write_string(row_num, col_num, data)


The output would look like the following in Excel:


(Image above or below. It is a little hard to control).


This probably isn't what the end user wants. In fact I have several "bug reports" where people end up with Excel warnings like this for one reason or another.

The write() method is just some syntactic sugar to do what Excel wants. If the end-user wants something else then it is easily avoided or overridden.



>    (It also introduces a NaN bug, which I've posted to the tracker.)


 
(It also introduces a NaN bug, which I've posted to the tracker.)


Strictly speaking it was the method that I used in Python to determine if the data was a number that introduced the bug. The Perl version doesn't have this issue.
 

I guess it's too late to recommend that XlsxWriter adopt the Pythonic,
strongly typed view as its default behavior, as it would probably
break a lot of code that is already out in the wild.  (Even though it
would probably also FIX a lot of code that is out in the wild!)  But I
think it would be well worth adding some xlwt-style write methods.
Perhaps call the basic one write_strict(), and the row-at-a-time one
write_row_strict().

The best thing to do is to try this for a while and see if it is better. Neither method is very long and are easily implemented.

Regards,

John
--

John Yeung

unread,
Jul 19, 2013, 1:36:41 PM7/19/13
to python-excel
John,

Respectfully, I think you're missing my point entirely.


> The functionality of the XlsxWriter write() method doesn't have
> anything to do with the implementation language but rather the
> target file format.

I'm not saying write() is weakly typed due to implementation language.  Otherwise it would magically become strongly typed in Python.  I'm saying that the weak-typing philosophy of Perl perhaps colors the design of write() and of XlsxWriter in general.  I know that people who use and love PHP tend to *prefer* weak typing, and it's annoying to them that Python doesn't automatically coerce strings to numbers.

Your premise seems to be "let's work the way Excel works, because we're targeting Excel".  But xlwt's premise is "let's take advantage of the fact that we're not working *in* Excel, but rather only *targeting* Excel, to do things in a way that most Python programmers would prefer".

But even the "let's work the way Excel works" falls apart in one very fundamental way:

Excel honors its own types.

Yes, let me say that again:

Excel honors its own types.

And as you said yourself:


"Excel makes a distinction between data types such as strings, numbers, blanks, formulas and hyperlinks. To simplify the process of writing data to an XlsxWriter file the write() method acts as a general alias for several more specific methods"

So, while Excel does go out of its way to be "helpful" when encountering data that it doesn't know the type of, Excel knows *its own* types.  If you enter text consisting of digits in Excel (for example, by typing a leading apostrophe), then when it saves that cell, Excel marks it as text.  If you then open that file in Excel, lo and behold, it is still text.  Even to Excel.

And that right there is a big reason why many IT programmers like to save as Excel when they can, instead of CSV.  Because strings stay strings.

What xlwt is saying is:  "Hey, I don't know where you got your data.  It could have come from a CSV, but it could have come from Excel, or it could have come from a database, or it could be fabricated from scratch entirely in Python.  Doesn't matter to me.  How would I know how you got it, even if I did care?  The fact is, what you have *NOW* is Python data, and Python data is *strongly typed* and thus I will write *those types* as best I can to Excel."

XlsxWriter does me no favors by making me is_instance() my own data just to pick out the right type-specific write method.

Another way in which "let's work the way Excel works" falls apart in XlsxWriter is that write() doesn't even match Excel's coercion rules anyway.  For example, it doesn't automatically coerce '1/2/3' to the date 1/2/2003.  It doesn't automatically coerce 'tRuE' to the Boolean value TRUE.  Etc.

You seem to have tried to optimize write() for people who are using Python purely as glue from CSV to Excel.  It doesn't do a perfect job of that, and it doesn't cater to folks who are using Python as a strongly typed general-purpose programming language.  Or even folks who are only using Python as glue from a database to Excel.


> The best thing to do is to try this for a while and see if it is better.
> Neither method is very long and are easily implemented.

Presumably by "this" you mean write my own type-preserving version of write().  I certainly plan to at some point.  I just figured that if you are still actively developing or at least maintaining XlsxWriter, that you would have a good chance of being able getting to it before I do.

Also, your response makes it unclear to me that you even see the point of xlwt-style write behavior, thus making me discouraged that you would even accept a patch (adding this behavior, without removing the existing behavior) to XlsxWriter if it were presented to you.  So I guess you can chalk up the length and vehemence of my response to my hope that I won't have to maintain my own add-on module (or re-patch XlsxWriter with each release) to do this.

John Y.

John McNamara

unread,
Jul 19, 2013, 6:14:24 PM7/19/13
to python...@googlegroups.com
Hi John,

Again, I suggest you implement a write() method as you think it should be implemented and try it out to see if it works in practice.

Regards,

John. 
-- 




John Machin

unread,
Jul 19, 2013, 6:24:02 PM7/19/13
to python...@googlegroups.com


On Saturday, July 20, 2013 3:36:41 AM UTC+10, John Yeung wrote:
[snip]

Another way in which "let's work the way Excel works" falls apart in XlsxWriter is that write() doesn't even match Excel's coercion rules anyway.  For example, it doesn't automatically coerce '1/2/3' to the date 1/2/2003.  It doesn't automatically coerce 'tRuE' to the Boolean value TRUE.  Etc.

And nobody in their right mind would actually want a writer that emulated Excel reading once-were-dates from a CSV file. Is that 2nd January or  1st February?

I've seen this scenario more than once: User A in MDY land saves a file containing a column of dates as CSV, and transmits it to user B in DMY land. User B opens the CSV file with Excel. If the original day was <= 12, the result is an Excel date which is wrong except of course when day == month. The remainder (original day > 12) are left as Excel text because the presumed input month was > 12. No warning.

Another common problem is IDs (not numbers that you can do arithmetic on) that are converted to Excel numbers and thus are sure to lose their leading zeroes further downstream.


[snip]
Presumably by "this" you mean write my own type-preserving version of write().  I certainly plan to at some point.  I just figured that if you are still actively developing or at least maintaining XlsxWriter, that you would have a good chance of being able getting to it before I do.

Also, your response makes it unclear to me that you even see the point of xlwt-style write behavior, thus making me discouraged that you would even accept a patch (adding this behavior, without removing the existing behavior) to XlsxWriter if it were presented to you.  So I guess you can chalk up the length and vehemence of my response to my hope that I won't have to maintain my own add-on module (or re-patch XlsxWriter with each release) to do this.

+100, John Y

John McNamara

unread,
Jul 19, 2013, 7:09:32 PM7/19/13
to python...@googlegroups.com
On Friday, 19 July 2013 23:24:02 UTC+1, John Machin wrote:
> And nobody in their right mind would actually want a writer that emulated Excel reading once-were-dates from a CSV file. Is that 2nd January or  1st February?

Hi John,

Agreed. And I hope it is clear to the participants here that this isn't something that XlsxWriter does.

Regards,

John.

John Yeung

unread,
Jul 22, 2013, 1:30:17 AM7/22/13
to python-excel
On Fri, Jul 19, 2013 at 6:14 PM, John McNamara <jmcn...@cpan.org> wrote:
> Again, I suggest you implement a write() method as you think it should be
> implemented and try it out to see if it works in practice.

Of course it works in practice. It has worked in xlwt for many years now.

John Y.

John Yeung

unread,
Jul 22, 2013, 1:36:10 AM7/22/13
to python-excel
In case it was not clear: What I meant was that xlwt's write() method
(not implemented by me, but with the behavior I am seeking) has been
in existence for many years, during which time it has proven itself
robust and intuitive.

John Y.

John McNamara

unread,
Jul 22, 2013, 3:35:39 AM7/22/13
to python...@googlegroups.com

Hi John,

The XlsxWriter/Excel::Writer::XLSX/Spreadsheet::WriteExcel style interface has been in use for a lot longer than xlwt and it has also been robust and intuitive.

Even for just the XlsxWriter this has been in use for 6 months and this is the only complaint about the nature of the write() method that I have received.

I have a rationale for the current behaviour of write() and I feel the documentation is clear about it.

Nevertheless, I may be wrong. I'm open to that discussion and I'll try to restart it in another reply.

Regards,





 

John McNamara

unread,
Jul 22, 2013, 4:08:40 AM7/22/13
to python...@googlegroups.com


On Wednesday, 3 July 2013 22:09:11 UTC+1, John Yeung wrote:
I guess it's too late to recommend that XlsxWriter adopt the Pythonic,
strongly typed view as its default behavior, as it would probably
break a lot of code that is already out in the wild.  (Even though it
would probably also FIX a lot of code that is out in the wild!)  But I
think it would be well worth adding some xlwt-style write methods.
Perhaps call the basic one write_strict(), and the row-at-a-time one
write_row_strict().  I'm happy to bikeshed the names, which is part of
the reason I'm posting to this list instead of directly to the
tracker.

Hi John,


Up until just before the 0.0.1 release of XlsxWriter the default behaviour of write() was to strictly rely on the type of the data. In many ways it would have been easier to leave it like that since it shifts the responsibility of matching Excel's types onto the end user (a point you made in another thread I think). However, I changed the interface before release when I started using XlsxWriter to port some applications that I had and ran into the Excel "numbers stored as text" warning when reading from data source that didn't identify the data type (such as CSV and csv.py).


Nevertheless, that may have been the wrong decision.


One alternative for fixing this would be to add a constructor option to turn on the strict type maintenance in write(). Another alternative would be to make strict types the default in write() and add a constructor option to turn it off. The latter obviously has issues of backward compatibility but that may not affect many people. I'll create one or two branches on GitHub and let people try the options.

Regards,

John
--

John McNamara

unread,
Jul 24, 2013, 7:18:38 PM7/24/13
to python...@googlegroups.com


On Wednesday, 3 July 2013 22:09:11 UTC+1, John Yeung wrote:
I guess it's too late to recommend that XlsxWriter adopt the Pythonic,
strongly typed view as its default behavior, as it would probably
break a lot of code that is already out in the wild.  (Even though it
would probably also FIX a lot of code that is out in the wild!)  But I
think it would be well worth adding some xlwt-style write methods.
Perhaps call the basic one write_strict(), and the row-at-a-time one
write_row_strict().  I'm happy to bikeshed the names, which is part of
the reason I'm posting to this list instead of directly to the
tracker.


Hi,

I've added a constructor option to override the write() behaviour so that it doesn't convert strings to numbers. There changes are on the master branch on GitHub:


You can now get strict handling of strings as follows:

    import xlsxwriter

    workbook = xlsxwriter.Workbook('hello.xlsx', {'strings_to_numbers': False})
    worksheet = workbook.add_worksheet()

    worksheet.write(0, 0, 'Hello')
    worksheet.write(1, 0, '123')

    workbook.close()

The output is two strings instead or a string and a number.

Try it out and let me know if you encounter any issues. The documents have been updated so you rebuild them with "make docs". I'm still proof reading them so make allowances.

If there is interest I can create a branch where the default option is "False". It will require fixing some of the tests that read data from text files such as the tests for the autofilter examples.

There is also a tracker for the feature:


Regards,

John

Reply all
Reply to author
Forward
0 new messages