> what's the best way to take the comma-
> separated tokens in the RAWDATA string below and correctly set their
> datatype? If I do nothing, everything is copied as-is as strings into
> the spreadsheet, which I don't want. The script below produces the
> result I'm looking for, but I wonder if there is a more idiomatic way
> using some built-in methods in xlwt for writing the correct data
> type.
In general, when you /know/ what type of Excel cell you want to write,
it is better to use directly the Row.set_cell_XXXX(col_index, value,
style) methods. Note that sheet.write() calls one of these after a fair
amount of testing to determine which one to call.
The values of XXXX for the 5 basic cell types are: number, text,
boolean, blank, and error.
"number" means float. There is no point in testing your input for int
separately unless you want to format int and float data cells differently.
"date" is NOT a basic type. A date is just a float number of days since
one of two epochs plus a "number format" that will display it as a date.
See the xlrd docs for the gory details. You can write a date cell by
using Row.set_cell_date(colx, a_datetime.datetime_object,
an_appropriate_style).
> #
> import re
> from xlwt import Workbook
from xlwt import Workbook, easyxf
int_style = easyxf('', num_format_str='#,##0')
float_style = easyxf('', num_format_str='#,##0.00')
>
> def doworksheet():
> '''Load a worksheet with data'''
> RAWDATA = 'A,1,3 more words,2.34501'
> wb = Workbook()
> ws = wb.add_sheet('Data')
> tokens = RAWDATA.split(',')
Consider using the csv module even when you think you don't need it.
r = ws.row(0)
> for c,data in enumerate(tokens):
> if re.match('^\d+$',data):
Negative numbers?
r.set_cell_number(c, float(data), int_style)
> elif re.match('^\d+\.\d*$',data):
The above pattern is not very general. Apart from negatives, it doesn't
handle .123 and 1e42. Consider stealing this from ExcelFormulaLexer.py:
flt_const_pattern = r"""
(?:
(?: \d* \. \d+ ) # .1 .12 .123 etc 9.1 etc 98.1 etc
|
(?: \d+ \. ) # 1. 12. 123. etc
)
# followed by optional exponent part
(?: [Ee] [+-]? \d+ ) ?
"""
and prepend "-?" to catch negatives.
r.set_cell_number(c, float(data), float_style)
> else:
> r.set_cell_text(c, data)
You may wish to consider using try/except instead of regular
expressions. For smallish input, speed doesn't matter. Ease of
understanding your code a week later is not enhanced by using that
float-matching regex :-)
With largish data (e.g. database query output) and a knowledge of what
kind of data is present in each column, a different approach is possible:
[OTTOMH, untested, caveat lector]
"""
Date,Ticker,Qty,Price
2009-09-14,BHP,123,456.789
"""
# setup
def date_conv(s):
return datetime.strpdate(s, '%Y-%m-%d')
cdate = (dateconv, Row.set_cell_date, date_style)
ctext = (None, Row.set_cell_text, text_style)
cint = (int, Row.set_cell_number, date_style)
cprice = (float, Row.set_cell_number, price_style)
config = (cdate, ctext, cint, cprice)
# for each row
row = ws.row(row_index)
for colx, data in enumerate(tokens):
convfunc, setfunc, style = config[c]
if convfunc:
data = convfunc(data)
setfunc(row, colx, data, style)
HTH,
John
For most people's uses, I don't think the overhead is actually worth
worrying about compared to the simplicity of just having to remember to
do sheet.write(...)
>> for c,data in enumerate(tokens):
>> if re.match('^\d+$',data):
>
> Negative numbers?
>
> r.set_cell_number(c, float(data), int_style)
>> elif re.match('^\d+\.\d*$',data):
>
> The above pattern is not very general. Apart from negatives, it doesn't
> handle .123 and 1e42. Consider stealing this from ExcelFormulaLexer.py:
Maybe simpler, how about just doing the try-except dance:
for type in (int,blah...):
try:
value = type(string)
except:
pass
else:
break
I'd also highly recommend this excellent time/date parser for spotting
dates:
http://labix.org/python-dateutil#head-1443e0f14ad5dff07efd465e080d1110920673d8-2
cheers,
Chris
--
Simplistix - Content Management, Batch Processing & Python Consulting
- http://www.simplistix.co.uk