pywin32 is available for both Python 2 and Python 3. It allows driving
Excel via COM i.e. you would be writing Visual Basic in Python.
xlrd (read) and xlwt (write) are more-or-less complementary modules but
are only available for Python 2. I have done two trial ports of each to
Python 3 but 0 <= demand <= epsilon.
> What we really need is to read and write columnar data with types that
> isn't (fully) based on type inference.
Please explain.
> My project is currently written for Python 2, mostly because pylint
> still only supports Python 2. But we've been trying to only take
> dependencies that won't restrict our options in the future with regard
> to Python 2 vs 3. And others at this location writing new code in
> Python 3.
>
> BTW, on the subject of Python 3 ports: It appears that sa2to3 (as
> opposed to 2to3, which sa2to3 wraps) is a decent way of accomplishing
> a single-code-base port.
I found that 2to3 did a reasonable job when I did the trial ports; has
it become worse?
If you want to directly read and write Excel 97-2003 files with
Python, the only "sturdy" (set of) packages I'm aware of are xlrd,
xlwt, and their accompanying xlutils. These are all Python 2 only.
If you want to directly read and write the newer file format used by
Excel 2007 and later with Python, I would guess openpyxl is the most
promising (for now), though it doesn't meet your definition of sturdy,
and it's also only Python 2.
If you want to automate an instance of Excel running in Windows, you
can use pywin32, which has versions for Python 2 and 3, and can
naturally work with any file format that the Excel instance can. Of
course, this requires Windows and Excel for Windows.
John Y.
> If you want to directly read and write the newer file format used by
> Excel 2007 and later with Python, I would guess openpyxl is the most
> promising (for now), though it doesn't meet your definition of sturdy,
> and it's also only Python 2.
Would you like a copy of the xlsxrd kit for reading Excel 2007 xlsx/xlsm
files using the xlrd interface? It's only alpha but it's certainly
sturdy (IMHO).
In general: alpha testers and reviewers are welcome.
Cheers,
John
>
>>> What we really need is to read and write columnar data with types that
>>> isn't (fully) based on type inference.
>>
>> Please explain.
>
> We just need to be able to read and write data that is organized into
> columns with each column having an internally-homogeneous type. But
> we have some data for which column-type-guessing doesn't work that
> well. So CSV isn't out of the question, but many CSV tools either
> treat everything as strings, or try to guess types.
>
> We might make our second row a bunch of types and convert as
> specified, after reading everything in as a bunch of strings.
Sounds like you need (and are reinventing) a database, one that rigidly
enforces column homogeneity (i.e. unlike sqlite3). I can't imagine why
you'd consider the Excel format, where each cell carries its own type
information (more or less), similar to sqlite3.
I don't understand reference to column-guessing, nor the "So" at the
start of the next sentence. Surely "guessing" the type of a column would
need to be done once only, when the data first enters your universe, not
every time you read it, irrespective of storage format (Excel/database/CSV).
On 6/10/2010 8:58 AM, Dan Stromberg wrote:
On Oct 5, 2:39 pm, John Machin<sjmac...@lexicon.net> wrote:
On 6/10/2010 7:17 AM, Dan Stromberg wrote:
Sounds like you need (and are reinventing) a database, one that rigidly enforces column homogeneity (i.e. unlike sqlite3). I can't imagine why you'd consider the Excel format, where each cell carries its own type information (more or less), similar to sqlite3.
What we really need is to read and write columnar data with types that
isn't (fully) based on type inference.
Please explain.
We just need to be able to read and write data that is organized into
columns with each column having an internally-homogeneous type. But
we have some data for which column-type-guessing doesn't work that
well. So CSV isn't out of the question, but many CSV tools either
treat everything as strings, or try to guess types.
We might make our second row a bunch of types and convert as
specified, after reading everything in as a bunch of strings.
I don't understand reference to column-guessing, nor the "So" at the start of the next sentence. Surely "guessing" the type of a column would need to be done once only, when the data first enters your universe, not every time you read it, irrespective of storage format (Excel/database/CSV).
"""Excel will allow us to save types""": basic cell types are
empty/blank: no data at all
text: restricted to the Unicode BMP, max len is 32767 characters
number: IEEE 64-bit float
boolean: true/false
error: codes for not applicable, zero division, illegal cell reference, etc
date is not a basic type, it is inferred from number with a date-like
display format; refer to the xlrd docs.
You need to be careful when reading Excel files. Some readers
transmogrify data in interesting fashions:
E.g. COM changes number cells with a "currency" display format from x to
long(x * 1000) or a decimal.Decimal with 4 decimal places (thus losing
information).
E.g. openpyxl deliberately converts what Excel has plainly labelled as a
text cell with contents u'50%' into a Python float 0.5 ...
I'm a little embarrassed to say that I haven't tried openpyxl yet
myself; I only "recommended" it because it was the most
complete-looking and mature-looking .xlsx-handler for Python I'd
found. (Yes, I have heard the saying regarding books and covers.)
The behavior you describe strikes me as something that might be
acceptable and indeed even expected in the PHP community. I guess
this isn't too surprising given that openpyxl is billed as a port of
PHPExcel.
John Y.
Hmm... Dan did seem to define sturdy as "non-beta"... ;)
But seriously, I would gladly accept a copy of xlsxrd. I can't
promise that I'll really put it through its paces, as I am a much
heavier writer of Excel files than a reader. But my office did
recently switch to MS Office 2010 (from 2000), so I do encounter both
.xls and .xlsx files.
John Y.
Why???
There's more:
A text cell containing u'678.9' is reported as a number cell, value
678.89 ... at least consistentent with the percentage caper.
A text cell containing u'1.2345e-4' is not molested, because their regex
for floatables doesn't grok suchlike.
A NUMBER cell whose value text is u'1.234567E+20' is reported as a TEXT
cell with that value.
It looks like they are passing the file text values through a routine
like Excel would use for assigning types to USER INPUT: test with
various regexes, if none match, treat it as text. The data-type flags
from the file appear to be overridden by these deduced types.
> I guess
> this isn't too surprising given that openpyxl is billed as a port of
> PHPExcel.
Is it acceptable and even expected to ignore the "several chunks" warning in
http://docs.python.org/library/xml.sax.handler.html#xml.sax.handler.ContentHandler.characters
?
I am not a PHP programmer, but the impression I get from reading
people talk about various programming languages is that PHP is
notorious for its free-wheeling implicit type conversions. It's often
used as the x in "like x, Python has dynamic typing; however, unlike
x, Python has relatively strong typing". PHP and Excel both seem to
me as though they are eager to guess on behalf of (or in spite of?)
the user.
> It looks like they are passing the file text values through a routine like
> Excel would use for assigning types to USER INPUT: test with various
> regexes, if none match, treat it as text. The data-type flags from the file
> appear to be overridden by these deduced types.
Your examples suggest that under openpyxl, .xlsx amounts to a form of
Dan's "CSV-with-type-inference".
> Is it acceptable and even expected to ignore the "several
> chunks" warning in
>
> http://docs.python.org/library/xml.sax.handler.html#xml.sax.handler.ContentHandler.characters
>
> ?
I must admit I know basically nothing of such things. I suppose
you're implying that openpyxl ignores the warning, and is thus
unsturdy? (You already sold me on the sketchy data type handling.)
John Y.
How is the data being exported from the iSeries? Are the Excel files
being generated on the iSeries itself?
John Y.