Hiya John,
Am .06.2015, 16:04 Uhr, schrieb John Hays <
femur...@gmail.com>:
> So this is may sound like a wonky super-specific issue but it's a big
> deal
> in my particular implementation if the issue happens to be deeper than
> just
> this particular currency. Hopefully I'm just being dumb. As part of a
> larger program implementing openpyxl, I've included the ability to
> perform
> regex queries against cell values as read by openpyxl. It's been working
> fine against most spreadsheets but I ran across a strange problem with
> cells formatted as Bashkir currency. I checked to see if it happened with
> any numbers formatted as currency and I didn't have this problem with the
> few other formats I tested (Turkish, Konkani, Dollars). I don't
> necessarily
> need the ability to read Bashkir currency values...I just happened to
> stumble across this while I testing to see if the different Excel number
> formats would affect openpyxl and regex cell interpretation.
All these things are wonky until you come across in your work.
> I'm using Python 3.4.3, Excel 2013, and openpyxl 2.2.3.
> Here's the code I used to re-create the issue outside of my program:
> wb = load_workbook(filename='myfile.xlsx',guess_types=False) # False has
> been the default for a while
> ws = wb.worksheets[0]
> for row in ws.rows: # for row in ws: also works
> So while the *end* result looks fine, I use my regex query to determine
> if the cells get written to a new worksheet based on their cell.value.
> If they get mis-interpreted as dates, my regex queries won't match
> against values
> I'm needing to extract. I've attached the source worksheet .xml if it
> helps.
Unfortunately, you come across one of the bigger turds in the
specification. It makes me weep when I think of it: dates and times are
implemented in Excel via the number format. On their own they are just
numbers relevant to particular epochs. FWIW in response to a bug on the
lack of proper support for datetimes in the format I recently got this
response:
"""
Hi Charlie - after much discussion of your defect report
(
https://onedrive.live.com/view.aspx/Public%20Documents/2014/DR-14-0012.docx?cid=C8BA0861DC5E4ADC),
WG4 have decided to close it without action. Honestly, though, I do feel
your pain about this one as the standard is somewhat confusing…
"""
The strict specification does actually support a datetime cell type but
you still have to assign a format to it and very few applications actually
support it. :-( The side-effect of this is a significant performance
impact for all these cells.
So, when reading a cell you *have* to take the formatting to account.
There are some (locale-specific for extra fun) builtins that correspond to
datetimes otherwise you have to rely on the power your regex to determine
whether a number format actually is datetime or just some special
accounting formatting from Timbuktu, Tashkent, Ouagadougou, or wherever!
It's almost certain that this is at fault here. Your sample sheet.xml is
of no use, puny earthling! You also need to look at the styles.xml and
work your way through the style chain. Or, simply look at the number
format:
cell.number_format
and whether openpyxl has decided this is a date or not
cell.is_date
In which the culprit is a sneaky function:
from openpyxl.styles.numbers import is_date_format
The code and associated regex could almost certainly be improved…
OTOH it's perfectly okay to use internal_value, if that's what you want.
It's a public attribute that most people won't want but it's there for a
reason.
Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Kronenstr. 27a
Düsseldorf
D- 40217
Tel:
+49-211-600-3657
Mobile:
+49-178-782-6226