python decimal.Decimal objects and XlsxWriter 0.3.6

951 views
Skip to first unread message

Josh English

unread,
Aug 14, 2013, 12:54:43 PM8/14/13
to python...@googlegroups.com
Was there a change in XlsxWriter I'm not seeing? My report generator is parsing a bunch of text files, using the decimal.Decimal object from Python. They've worked just fine creating Excel reports.

I upgraded to XlsxWriter 0.3.6, and now my script generates an "TypeError: Unsupported data type in write()" error.

I have attached a script that gets this error.

I want to make sure others aren't getting this before I submit a bug report.

Josh English
decimalwritebug.py

John Yeung

unread,
Aug 14, 2013, 4:18:40 PM8/14/13
to python-excel
There was indeed a change in 0.3.6. The Worksheet.write() method used
to check whether the input was numeric by attempting to coerce to
float (via Python's built-in float() function). Now, it checks the
type via isinstance(), and the only types that register (directly) as
numeric are float and int (and long, if you're on Python 2). Other
than those types, it only handles None, datetimes, and strings.

For now, you have a few choices to work around this:

(1) Use the write_number() method instead of write()
(2) Coerce your data to float yourself (which would let you use write())
(3) Patch your version of xlsxwriter/worksheet.py to make write()
accept Decimal as a numeric type

One thing I have to ask is: Why are you using Decimal in the first
place? Not that there is anything wrong with Decimal. But if your
answer is something like "because I need exact decimal values, not
float approximations" then you are wasting your effort, because Excel
only stores floats anyway. If instead your answer is more like
"because I'm using a database package to pull data out of a database,
and it's automatically converting the SQL_DECIMAL type into Python's
Decimal" then that's understandable.

John Y.

Josh English

unread,
Aug 14, 2013, 5:51:32 PM8/14/13
to python...@googlegroups.com
I use Decimal because the reports I was building were running into rounding errors. Decimal fixed that. I read a bunch of .xls files with xlrd, and the conversion to float sometimes left nasty side effects. 

What's worse, now that the write_number() method solves some problems, I have merged fields in this report that I create with "merge_range" and now those don't work with the Decimal objects, so I have to force convert everything to strings (because if the number doesn't exist, it fills in "missing" or some other clue).

Thanks for confirming that I wasn't going mad. An update broke my code, so I have to fix the code. No worries.

Josh


John Yeung

unread,
Aug 14, 2013, 6:57:16 PM8/14/13
to python-excel
On Wed, Aug 14, 2013 at 5:51 PM, Josh English
<joshua.r...@gmail.com> wrote:
> I use Decimal because the reports I was building were running into rounding
> errors. Decimal fixed that. I read a bunch of .xls files with xlrd, and the
> conversion to float sometimes left nasty side effects.

The only numeric type in Excel is float (IEEE double-precision, same
as Python's float). If there are conversion problems, it's because of
Excel itself.

> What's worse, now that the write_number() method solves some problems, I
> have merged fields in this report that I create with "merge_range" and now
> those don't work with the Decimal objects, so I have to force convert
> everything to strings (because if the number doesn't exist, it fills in
> "missing" or some other clue).

I don't really understand the last part of that. But, I guess if you
can deal with it, great.

John Y.

John McNamara

unread,
Aug 18, 2013, 4:06:39 PM8/18/13
to python...@googlegroups.com


On Wednesday, 14 August 2013 22:51:32 UTC+1, Josh English wrote:
What's worse, now that the write_number() method solves some problems, I have merged fields in this report that I create with "merge_range" and now those don't work with the Decimal objects, so I have to force convert everything to strings (because if the number doesn't exist, it fills in "missing" or some other clue).


Hi Josh,

As John correctly points out the default write() behaviour got a little stricter in version 0.3.6 (and more so in 0.3.7).

I'm not sure how Decimal.decimal() should be handled in the XlsxWriter write() method. The most obvious approach would be to convert it to the Python float type. This would probably be the right thing most of the time but there may be times that the user wants it handled as a string.

So the options are:

1. Maintain the current behaviour and leave it up to the end user to map Decimal.decimal() to the type they want prior to using write().
2. Map Decimal.decimal() to float in write() with explicit documentation.
3. Something else?

Opinions, ideas?

John




John Yeung

unread,
Aug 18, 2013, 7:54:50 PM8/18/13
to python-excel
On Sun, Aug 18, 2013 at 4:06 PM, John McNamara <jmcn...@cpan.org> wrote:
>
> I'm not sure how Decimal.decimal() should be handled in the XlsxWriter
> write() method. The most obvious approach would be to convert it to the
> Python float type. This would probably be the right thing most of the time
> but there may be times that the user wants it handled as a string.
>
> So the options are:
>
> 1. Maintain the current behaviour and leave it up to the end user to map
> Decimal.decimal() to the type they want prior to using write().
> 2. Map Decimal.decimal() to float in write() with explicit documentation.
> 3. Something else?
>
> Opinions, ideas?

My opinion is that Decimal (the module is decimal, the class is
Decimal) should be accepted automatically as numeric, as it is a
numeric Python type. So my vote is for option 2.

Now, if decimal.Decimal is automatically converted to float, then it
probably makes sense to do the same for fractions.Fraction.

As for the concern that "there may be times that the user wants it
handled as a string", I would consider this a nonissue. It's
completely natural and intuitive to a Python programmer that if you
want a string representation of a number, you have to get it
explicitly.

By the way, thanks for changing the strings_to_numbers default in
0.3.7. I was just about to point out that your option 1 requires the
additional step of passing a keyword argument to the constructor if
the user truly does want the Decimal handled as a string; but then I
checked the release notes and discovered this was no longer the case.

John Y.

Josh English

unread,
Aug 19, 2013, 6:50:21 PM8/19/13
to python...@googlegroups.com


On Sunday, August 18, 2013 1:06:39 PM UTC-7, John McNamara wrote:

Hi Josh,

As John correctly points out the default write() behaviour got a little stricter in version 0.3.6 (and more so in 0.3.7).

I'm not sure how Decimal.decimal() should be handled in the XlsxWriter write() method. The most obvious approach would be to convert it to the Python float type. This would probably be the right thing most of the time but there may be times that the user wants it handled as a string.

So the options are:

1. Maintain the current behaviour and leave it up to the end user to map Decimal.decimal() to the type they want prior to using write().
2. Map Decimal.decimal() to float in write() with explicit documentation.
3. Something else?

Opinions, ideas?

John

The second options seems the most logical to me. I agree with John Y that if I want a string, I'll convert it myself.

I use Decimal because I need to-the-penny accuracy with my source material. I don't know how other coders use it, but accurate floating point precision seems to be the point of the thing.

So I'd assume Decimal objects are floats.

Josh 

John McNamara

unread,
Aug 21, 2013, 3:23:13 PM8/21/13
to python...@googlegroups.com


On Monday, 19 August 2013 00:54:50 UTC+1, John Yeung wrote:

Now, if decimal.Decimal is automatically converted to float, then it
probably makes sense to do the same for fractions.Fraction.


Hi John,

Yes that seems reasonable.

 

By the way, thanks for changing the strings_to_numbers default in 0.3.7.

In retrospect it was probably the right way to handle it. I am still figuring out what Python's "Principle of Least Astonishment" is as I go.

Speaking of which, someone wrote to say that XlsxWriter 0.3.6/7 broke handling of a range of Numpy types. The previous behaviour was to try convert tokens using float() in a try/except which meant that a large number of Numpy types were handled without additional work on their side.

It would be possible to have a compromise solution where unhandled types where evaluated with float() and then possibly with str() to see if they could be handled before rejecting the type. However, I'm not sure if that is swapping one can of worms for another.

John.

John Yeung

unread,
Aug 21, 2013, 7:33:37 PM8/21/13
to python-excel
On Wed, Aug 21, 2013 at 3:23 PM, John McNamara <jmcn...@cpan.org> wrote:
>
> someone wrote to say that XlsxWriter 0.3.6/7 broke
> handling of a range of Numpy types. The previous behaviour was to try
> convert tokens using float() in a try/except which meant that a large number
> of Numpy types were handled without additional work on their side.

I have been wondering about this myself. Not NumPy specifically, but
whether to test types by whitelist or by some other means. Clearly, I
am not in favor of using float() as a filter for deciding whether
arbitrary data is numeric. But I'm actually in favor of letting
float() do its thing, as long as we can be reasonably sure we're only
giving it numeric data.

My guess is that coming up with a "reasonably sure" whitelist of
numeric types is going to be harder than blacklisting string types.
I'm not a NumPy user, but just now I tried the obvious-looking string
types (namely, [x for x in dir(numpy) if x.startswith('str')]) and
they all appear to be identical to or subclasses of Python's string
types, in both Python 2 and Python 3. So I think my strategy would be
first to blacklist str_types, then try float(), as you did previously.
Maybe something like

if not isinstance(token, str_types):
try:
float(token)
return self.write_number(row, col, *args)
except ValueError:
# Not a (convertible) number. Continue to the checks below.
pass

Note that the complex type is definitely numeric, but not convertible to float.

Which brings me to another thing I've been wondering. My own
"principle of least astonishment" for strings is probably that they
stay strings, period, not just that they avoid being converted to
numbers. If I've got some product IDs (or funky IDs of whatever kind)
that I'm pulling from a database, they may well start with '=' or be
surrounded by curly braces, but I would be astonished indeed for them
to become formulas. And most of the time, they would be invalid
formulas.

(Relying on "magic" content is the source of the infamous Excel SYLK
bug: If the first two characters of your data are 'ID', Excel thinks
it's a SYLK file. Predictably, there are a good number of CSVs whose
first column is named ID, and they cannot be opened in Excel!)

Also, even if you are going to automatically linkify strings that seem
like URLs, I think '[fh]tt?ps?://' is a little too cute. Why not
'(ftp|http)s?://', which is both clearer and more accurate?

So, I didn't mean to completely gut your write() logic. Admittedly I
would expect very few false positives when automatically choosing
write_url, and I would expect those false positives to be quite
benign. But yeah, I'm pretty sure I'd prefer that formulas only get
written explicitly. The curly braces in particular strike me as too
common for uses other than formulas.

My guess is that there are folks who would like write() to be as much
like the way Excel treats CSV as possible (so not just strings to
numbers but the whole ball of wax), and other folks who would like
write() to respect Python's types as much as possible.

John Y.

John McNamara

unread,
Aug 21, 2013, 9:32:51 PM8/21/13
to python...@googlegroups.com


On Thursday, 22 August 2013 00:33:37 UTC+1, John Yeung wrote:
So I think my strategy would be 
first to blacklist str_types, then try float(), as you did previously.
 Maybe something like

    if not isinstance(token, str_types):
        try:
            float(token)
            return self.write_number(row, col, *args)
        except ValueError:
            # Not a (convertible) number. Continue to the checks below.
            pass


Hi John,

That is more or less the strategy I was thinking of too: handle the standard explicit types first and then fall through to float(), then to str() and then give up.

I've pushed a quick first pass to GitHub if you (or anyone else) want to have a look.


 
Note that the complex type is definitely numeric, but not convertible to float.


Yes. The complex types are definitely caveat emptor.


 
Which brings me to another thing I've been wondering.  My own
"principle of least astonishment" for strings is probably that they
stay strings, period, not just that they avoid being converted to
numbers.  If I've got some product IDs (or funky IDs of whatever kind)
that I'm pulling from a database, they may well start with '=' or be
surrounded by curly braces, but I would be astonished indeed for them
to become formulas.  And most of the time, they would be invalid
formulas.

Yes. That is the unfortunate (for me) logical conclusion to this. I has a TODO to tighten the array formulas to "{= }" but even so it isn't fool-proof. So that functionality will probably have to go as well.

I like the xlwt approach of using a Formula() class to designate a formula. I could add that and a URL() class for explicit handling of those types within write(). Or maybe no-one would miss the current behaviour if it were removed. There would be some knock on effects to a few methods that use write() internally but that is fixable.

Regards,

John.


 

John McNamara

unread,
Aug 23, 2013, 10:42:33 AM8/23/13
to python...@googlegroups.com


On Wednesday, 14 August 2013 17:54:43 UTC+1, Josh English wrote:
Was there a change in XlsxWriter I'm not seeing? My report generator is parsing a bunch of text files, using the decimal.Decimal object from Python. They've worked just fine creating Excel reports.

I upgraded to XlsxWriter 0.3.6, and now my script generates an "TypeError: Unsupported data type in write()" error.


Hi Josh,

Support for decimal.Decimal has been fixed/reinstated in version 0.3.8 of XlsxWriter.

    https://pypi.python.org/pypi/XlsxWriter/

Regards,

John.


Reply all
Reply to author
Forward
0 new messages