Hi all,
I just thought you might all be interested to know, I’ve implemented Excel output on the fly in Django, using xlwt. The snippet is posted here: http://www.djangosnippets.org/snippets/1151/
Any input is welcome!
Joseph Wilhelm
SMP Communications Corp.
7626 E. Greenway Rd. Suite 100
Scottsdale, AZ 85260
480.905.4100 F: 480.905.7110
This message is intended only for the use of the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this message is strictly prohibited. If you have received this communication in error, please notify us immediately by replying to the sender of this E-Mail by return E-Mail or by telephone.
> I just thought you might all be interested to know, I’ve implemented
> Excel output on the fly in Django, using xlwt. The snippet is posted
> here: http://www.djangosnippets.org/snippets/1151/
> Any input is welcome!
if use_xls is True:
Wah!
book = xlwt.Workbook()
sheet = book.add_sheet('Sheet 1')
for rowx, row in enumerate(data):
for colx, col in enumerate(row):
s/col/value/ ... it's not a "column"
sheet.write(rowx, colx, str(col))
Using str(col) means that you are writing all cells as text cells (i.e.
losing all information as to whether the original data was
number/text/datetime/whatever) and possibly losing about 4 digits of
precision on your numbers. Have you tried not using str()? Sheet.write()
does the "right thing" for the following Python types:
* bool
* int, long, float, decimal.Decimal
* unicode
* str (decoded as specified in the xlwt.Workbook() encoding arg (default
is 'ascii')
* None
* datetime.(datetime|date|time)
HTH,
John
I believe "Wah!" is Australian for:
You should have used:
if use_xls:
;-)
Also, why drop back to csv?
Just add a new sheet each time you it 64000 rows.
cheers,
Chris
--
Simplistix - Content Management, Zope & Python Consulting
- http://www.simplistix.co.uk
Oops.
> book = xlwt.Workbook()
> sheet = book.add_sheet('Sheet 1')
> for rowx, row in enumerate(data):
> for colx, col in enumerate(row):
> s/col/value/ ... it's not a "column"
Ah yes, true enough. This bit of code here also brings up the question,
is there a way to write an entire row at once? Given what's being done
here, I would be a little happier writing a row at a time rather than a
cell at a time. I'm generating some spreadsheets with 15+ columns and
35k+ rows, so that's a whole lot of calls to write() which could be cut
down drastically by row writes.
> sheet.write(rowx, colx, str(col))
> Using str(col) means that you are writing all cells as text cells
(i.e.
> losing all information as to whether the original data was
> number/text/datetime/whatever) and possibly losing about 4 digits of
> precision on your numbers. Have you tried not using str()?
Sheet.write()
> does the "right thing" for the following Python types:
Honestly, I can't remember why exactly I did that... but I'm sure I had
a reason that sounded good to myself at the time!
I'll do some more poking at it without the str() call and see how it
behaves.
Chris Withers wrote:
> Also, why drop back to csv?
> Just add a new sheet each time you it 64000 rows.
True, I could do that... it feels a little dirty to me though. In my
mind, what's being passed to this is a single list, and it should be
output as such. To me, separate worksheets represent separate lists.
Thanks for the help, guys! Those semantic issues are fixed and the
others will be addressed soon as well.
--Joey
Ah ha. I pulled off the str() call and immediately remembered my
reasoning. It appears that xlwt is not, in fact, doing the "right thing"
for datetime.datetime data. An example record in my output contains the
following data:
[datetime.datetime(2005, 10, 31, 0, 0), datetime.datetime(2005, 9, 19,
15, 34), datetime.datetime(2005, 10, 1, 0, 0)]
In the resulting xls file, these are producing, respectively:
38565, 38614.65, 38626
Perhaps a lurking xlwt bug?
--Joey
>> book = xlwt.Workbook()
>> sheet = book.add_sheet('Sheet 1')
>> for rowx, row in enumerate(data):
>> for colx, col in enumerate(row):
>> s/col/value/ ... it's not a "column"
>
> Ah yes, true enough. This bit of code here also brings up the question,
> is there a way to write an entire row at once? Given what's being done
> here, I would be a little happier writing a row at a time rather than a
> cell at a time. I'm generating some spreadsheets with 15+ columns and
> 35k+ rows, so that's a whole lot of calls to write() which could be cut
> down drastically by row writes.
What do you imagine a "row write" would or could do, other than
iterating over the individual cell values as you have done?
That's exactly why I was asking; I wasn't certain if there would be an
easy way to add the records in a batch rather than individually. It
would be nice in theory, but judging from your reaction it sounds like
it's far from feasible in practice. :)
Appearances can be deceptive. It is better when you suspect a problem to
report it so that you get a fix or a work-around or enlightenment as to
why it is not a bug, rather than belting it with a sledge-hammer till it
appears to go away.
> An example record in my output contains the
> following data:
>
> [datetime.datetime(2005, 10, 31, 0, 0), datetime.datetime(2005, 9, 19,
> 15, 34), datetime.datetime(2005, 10, 1, 0, 0)]
>
> In the resulting xls file, these are producing, respectively:
> 38565, 38614.65, 38626
If the first one is really 38565 and not mangled manual reporting of
38656, then you *have* found a bug. The second one is actually
38614.64861111.... (0.65 corresponds to 15:36, not 15:34).
> Perhaps a lurking xlwt bug?
Or perhaps not. Try this: Open your output spreadsheet in Excel or OOo
Calc or Gnumeric. Select the offending cells. Click on
Format/Cells/Number/Custom. Type yyyy-mm-dd hh:mm:ss into the text box.
Click on OK. What do you see?
For background, consider reading the section on dates up near the start
of the xlrd documentation (http://www.lexicon.net/sjmachin/xlrd.html).
You may like to try examining the type information available from your
Django query in order to set up an XFStyle object for each column. As a
first attempt, this could be xlwt.easyxf(num_format_str="yyyy-mm-dd
hh:mm:ss") for datetime objects, and xlwt.Style.default_style for all
others.
HTH,
John
Agreed, this was just a sledgehammer which I threw into the mix to shove
it along and make it work, and apparently forgot about until it came
crashing down on my head.
> If the first one is really 38565 and not mangled manual reporting of
> 38656, then you *have* found a bug. The second one is actually
> 38614.64861111.... (0.65 corresponds to 15:36, not 15:34).
Yes, the first one really is 38565; checked and double checked. You are
right on the value of the second though.
> Or perhaps not. Try this: Open your output spreadsheet in Excel or OOo
> Calc or Gnumeric. Select the offending cells. Click on
> Format/Cells/Number/Custom. Type yyyy-mm-dd hh:mm:ss into the text
box.
> Click on OK. What do you see?
>
> For background, consider reading the section on dates up near the
start
> of the xlrd documentation (http://www.lexicon.net/sjmachin/xlrd.html).
>
> You may like to try examining the type information available from your
> Django query in order to set up an XFStyle object for each column. As
a
> first attempt, this could be xlwt.easyxf(num_format_str="yyyy-mm-dd
> hh:mm:ss") for datetime objects, and xlwt.Style.default_style for all
> others.
See, I was hoping you weren't going to say that. :) By the "right thing"
I was hoping you meant it would do this formatting for you. I suppose it
would make sense for my class to have this bit of style detection
though.
Thanks again for the tips!
--Joey
>
>> If the first one is really 38565 and not mangled manual reporting of
>> 38656, then you *have* found a bug. The second one is actually
>> 38614.64861111.... (0.65 corresponds to 15:36, not 15:34).
>
> Yes, the first one really is 38565; checked and double checked. You are
> right on the value of the second though.
I cannot reproduce this. 38565 corresponds to 2005-08-01.
8<---
import datetime
import xlwt
book = xlwt.Workbook()
sheet = book.add_sheet('dates')
# first 3 items copy/pasted from Joey's message
data =[datetime.datetime(2005, 10, 31, 0, 0), datetime.datetime(2005, 9, 19,
15, 34), datetime.datetime(2005, 10, 1, 0, 0),
# added by SJM
datetime.datetime(2005, 8, 1, 0, 0),
]
for colx, value in enumerate(data):
print colx, repr(value)
sheet.write(0, colx, value)
book.save('joey_dates.xls')
8<---
Result in the xls, formatted to 9 decimal places:
38656.000000000 38614.648611111 38626.000000000 38565.000000000
Please send me a reproducible demonstration of the problem -- a minimal
one (i.e. doesn't need Django). Also exactly which version of xlwt and
Python, and what platform you ran it on.
Isn't this the way that Excel stores dates? These values are "a float
representing the number of days since (typically) 1899-12-31T00:00:00,
under the pretence that 1900 was a leap year" according to the xlwt
docs. Perhaps Excel needs to be told they are dates?
/alex
I just ran the above example verbatim, viewed it in Excel 2003 SP3, and
came up with:
38656, 38614.6486111111, 38626, 38565
When I open it via xlrd, I get:
[number:38656.0, number:38614.648611111108, number:38626.0,
number:38565.0]
I'm running this on:
Windows XP SP2 (32-bit)
Python 2.5.2
xlwt 0.7.0
xlrd 0.6.1
Apart from cosmetics, this is *EXACTLY* what I got. You have *NOT*
reproduced your alleged problem. Your allegation, which you later said
you checked and double-checked, was:
"""
An example record in my output contains the
following data:
[datetime.datetime(2005, 10, 31, 0, 0), datetime.datetime(2005, 9, 19,
15, 34), datetime.datetime(2005, 10, 1, 0, 0)]
In the resulting xls file, these are producing, respectively:
38565, 38614.65, 38626
"""
i.e. that datetime.datetime(2005, 10, 31, 0, 0) produced 38565 in the
resulting file.
Again, you are correct, my apologies. I initially made a typo (38565 vs
38656) which I did check and double check; I was just checking for the
wrong thing, as I misunderstood what you were looking at. I had thought
this was a problem with the precision not being generated. (38565 vs
38656.000000000) rather than the number actually being that far off.
Sorry about that, it's been a long week! My head must not be fully
functioning today.
Either way, despite me managing to waste far too much of our time today
by being blind, I have updated the snippet for which I started this
thread to incorporate the suggestions brought up here and also add an
optional encoding parameter (defaulted to utf8, as Django attempts to
use Unicode everywhere).
--Joey