On Wed, Aug 21, 2013 at 3:23 PM, John McNamara <
jmcn...@cpan.org> wrote:
>
> someone wrote to say that XlsxWriter 0.3.6/7 broke
> handling of a range of Numpy types. The previous behaviour was to try
> convert tokens using float() in a try/except which meant that a large number
> of Numpy types were handled without additional work on their side.
I have been wondering about this myself. Not NumPy specifically, but
whether to test types by whitelist or by some other means. Clearly, I
am not in favor of using float() as a filter for deciding whether
arbitrary data is numeric. But I'm actually in favor of letting
float() do its thing, as long as we can be reasonably sure we're only
giving it numeric data.
My guess is that coming up with a "reasonably sure" whitelist of
numeric types is going to be harder than blacklisting string types.
I'm not a NumPy user, but just now I tried the obvious-looking string
types (namely, [x for x in dir(numpy) if x.startswith('str')]) and
they all appear to be identical to or subclasses of Python's string
types, in both Python 2 and Python 3. So I think my strategy would be
first to blacklist str_types, then try float(), as you did previously.
Maybe something like
if not isinstance(token, str_types):
try:
float(token)
return self.write_number(row, col, *args)
except ValueError:
# Not a (convertible) number. Continue to the checks below.
pass
Note that the complex type is definitely numeric, but not convertible to float.
Which brings me to another thing I've been wondering. My own
"principle of least astonishment" for strings is probably that they
stay strings, period, not just that they avoid being converted to
numbers. If I've got some product IDs (or funky IDs of whatever kind)
that I'm pulling from a database, they may well start with '=' or be
surrounded by curly braces, but I would be astonished indeed for them
to become formulas. And most of the time, they would be invalid
formulas.
(Relying on "magic" content is the source of the infamous Excel SYLK
bug: If the first two characters of your data are 'ID', Excel thinks
it's a SYLK file. Predictably, there are a good number of CSVs whose
first column is named ID, and they cannot be opened in Excel!)
Also, even if you are going to automatically linkify strings that seem
like URLs, I think '[fh]tt?ps?://' is a little too cute. Why not
'(ftp|http)s?://', which is both clearer and more accurate?
So, I didn't mean to completely gut your write() logic. Admittedly I
would expect very few false positives when automatically choosing
write_url, and I would expect those false positives to be quite
benign. But yeah, I'm pretty sure I'd prefer that formulas only get
written explicitly. The curly braces in particular strike me as too
common for uses other than formulas.
My guess is that there are folks who would like write() to be as much
like the way Excel treats CSV as possible (so not just strings to
numbers but the whole ball of wax), and other folks who would like
write() to respect Python's types as much as possible.
John Y.