Python 3?

529 views
Skip to first unread message

Dan Stromberg

unread,
Oct 5, 2010, 4:17:59 PM10/5/10
to python-excel

Are there any sturdy (non-beta) modules for both Python 2 and Python 3
that support both reading and writing excel data (a pair of
complementary modules is fine)?

What we really need is to read and write columnar data with types that
isn't (fully) based on type inference.

My project is currently written for Python 2, mostly because pylint
still only supports Python 2. But we've been trying to only take
dependencies that won't restrict our options in the future with regard
to Python 2 vs 3. And others at this location writing new code in
Python 3.

BTW, on the subject of Python 3 ports: It appears that sa2to3 (as
opposed to 2to3, which sa2to3 wraps) is a decent way of accomplishing
a single-code-base port.

John Machin

unread,
Oct 5, 2010, 5:39:54 PM10/5/10
to python...@googlegroups.com
On 6/10/2010 7:17 AM, Dan Stromberg wrote:
>
> Are there any sturdy (non-beta) modules for both Python 2 and Python 3
> that support both reading and writing excel data (a pair of
> complementary modules is fine)?

pywin32 is available for both Python 2 and Python 3. It allows driving
Excel via COM i.e. you would be writing Visual Basic in Python.

xlrd (read) and xlwt (write) are more-or-less complementary modules but
are only available for Python 2. I have done two trial ports of each to
Python 3 but 0 <= demand <= epsilon.

> What we really need is to read and write columnar data with types that
> isn't (fully) based on type inference.

Please explain.

> My project is currently written for Python 2, mostly because pylint
> still only supports Python 2. But we've been trying to only take
> dependencies that won't restrict our options in the future with regard
> to Python 2 vs 3. And others at this location writing new code in
> Python 3.
>
> BTW, on the subject of Python 3 ports: It appears that sa2to3 (as
> opposed to 2to3, which sa2to3 wraps) is a decent way of accomplishing
> a single-code-base port.

I found that 2to3 did a reasonable job when I did the trial ports; has
it become worse?


John Yeung

unread,
Oct 5, 2010, 5:45:12 PM10/5/10
to python...@googlegroups.com
On Tue, Oct 5, 2010 at 4:17 PM, Dan Stromberg <dstromb...@gmail.com> wrote:
>
> Are there any sturdy (non-beta) modules for both Python 2
> and Python 3 that support both reading and writing excel
> data (a pair of complementary modules is fine)?

If you want to directly read and write Excel 97-2003 files with
Python, the only "sturdy" (set of) packages I'm aware of are xlrd,
xlwt, and their accompanying xlutils. These are all Python 2 only.

If you want to directly read and write the newer file format used by
Excel 2007 and later with Python, I would guess openpyxl is the most
promising (for now), though it doesn't meet your definition of sturdy,
and it's also only Python 2.

If you want to automate an instance of Excel running in Windows, you
can use pywin32, which has versions for Python 2 and 3, and can
naturally work with any file format that the Excel instance can. Of
course, this requires Windows and Excel for Windows.

John Y.

John Machin

unread,
Oct 5, 2010, 6:00:42 PM10/5/10
to python...@googlegroups.com
On 6/10/2010 8:45 AM, John Yeung wrote:

> If you want to directly read and write the newer file format used by
> Excel 2007 and later with Python, I would guess openpyxl is the most
> promising (for now), though it doesn't meet your definition of sturdy,
> and it's also only Python 2.

Would you like a copy of the xlsxrd kit for reading Excel 2007 xlsx/xlsm
files using the xlrd interface? It's only alpha but it's certainly
sturdy (IMHO).

In general: alpha testers and reviewers are welcome.

Cheers,
John

Dan Stromberg

unread,
Oct 5, 2010, 5:58:24 PM10/5/10
to python-excel

On Oct 5, 2:39 pm, John Machin <sjmac...@lexicon.net> wrote:
> On 6/10/2010 7:17 AM, Dan Stromberg wrote:
> > Are there any sturdy (non-beta) modules for both Python 2 and Python 3
> > that support both reading and writing excel data (a pair of
> > complementary modules is fine)?
>
> pywin32 is available for both Python 2 and Python 3. It allows driving
> Excel via COM i.e. you would be writing Visual Basic in Python.

Interesting.

> xlrd (read) and xlwt (write) are more-or-less complementary modules but
> are only available for Python 2. I have done two trial ports of each to
> Python 3 but 0 <= demand <= epsilon.

The interest at my site is greater than epsilon. It's not 100% my
decision, but for my part I'd want xlrd and xlwt to at least have a
3.x roadmap/timeline before selecting them for a new project when
slightly-augmented-CSV or .ods might be reasonable alternatives.

> > What we really need is to read and write columnar data with types that
> > isn't (fully) based on type inference.
>
> Please explain.

We just need to be able to read and write data that is organized into
columns with each column having an internally-homogeneous type. But
we have some data for which column-type-guessing doesn't work that
well. So CSV isn't out of the question, but many CSV tools either
treat everything as strings, or try to guess types.

We might make our second row a bunch of types and convert as
specified, after reading everything in as a bunch of strings.

> > My project is currently written for Python 2, mostly because pylint
> > still only supports Python 2.  But we've been trying to only take
> > dependencies that won't restrict our options in the future with regard
> > to Python 2 vs 3.  And others at this location writing new code in
> > Python 3.
>
> > BTW, on the subject of Python 3 ports: It appears that sa2to3 (as
> > opposed to 2to3, which sa2to3 wraps) is a decent way of accomplishing
> > a single-code-base port.
>
> I found that 2to3 did a reasonable job when I did the trial ports; has
> it become worse?

2to3 is pretty good, but it doesn't always do everything, which can
mean ending up maintaining two codebases: one for 2, one for 3. And
naturally they will tend to diverge over time. For the exceptions,
where 2to3 isn't quite enough and hence manual tweaks would otherwise
be necessitated, sa2to3 provides a way of putting python 2-specific
and python 3-specific code in the same file, and choosing between them
- basically you have python 2 not-commented-out, and python 3
commented-out. So you automatically generate python 3 code from your
python 2 code by letting sato3 comment out the 2-specific code and
uncommenting the 3-specific code.

One example that, IME, 2to3 doesn't do that well with: Figuring out
whether a 2.x string should converted to bytes or unicode.

John Machin

unread,
Oct 5, 2010, 6:38:11 PM10/5/10
to python...@googlegroups.com
On 6/10/2010 8:58 AM, Dan Stromberg wrote:
>
> On Oct 5, 2:39 pm, John Machin<sjmac...@lexicon.net> wrote:
>> On 6/10/2010 7:17 AM, Dan Stromberg wrote:

>
>>> What we really need is to read and write columnar data with types that
>>> isn't (fully) based on type inference.
>>
>> Please explain.
>
> We just need to be able to read and write data that is organized into
> columns with each column having an internally-homogeneous type. But
> we have some data for which column-type-guessing doesn't work that
> well. So CSV isn't out of the question, but many CSV tools either
> treat everything as strings, or try to guess types.
>
> We might make our second row a bunch of types and convert as
> specified, after reading everything in as a bunch of strings.

Sounds like you need (and are reinventing) a database, one that rigidly
enforces column homogeneity (i.e. unlike sqlite3). I can't imagine why
you'd consider the Excel format, where each cell carries its own type
information (more or less), similar to sqlite3.

I don't understand reference to column-guessing, nor the "So" at the
start of the next sentence. Surely "guessing" the type of a column would
need to be done once only, when the data first enters your universe, not
every time you read it, irrespective of storage format (Excel/database/CSV).

Dan Stromberg

unread,
Oct 5, 2010, 7:02:02 PM10/5/10
to python...@googlegroups.com
On Tue, Oct 5, 2010 at 3:38 PM, John Machin <sjma...@lexicon.net> wrote:
On 6/10/2010 8:58 AM, Dan Stromberg wrote:

On Oct 5, 2:39 pm, John Machin<sjmac...@lexicon.net>  wrote:
On 6/10/2010 7:17 AM, Dan Stromberg wrote:


What we really need is to read and write columnar data with types that
isn't (fully) based on type inference.

Please explain.

We just need to be able to read and write data that is organized into
columns with each column having an internally-homogeneous type.  But
we have some data for which column-type-guessing doesn't work that
well.  So CSV isn't out of the question, but many CSV tools either
treat everything as strings, or try to guess types.

We might make our second row a bunch of types and convert as
specified, after reading everything in as a bunch of strings.

Sounds like you need (and are reinventing) a database, one that rigidly enforces column homogeneity (i.e. unlike sqlite3). I can't imagine why you'd consider the Excel format, where each cell carries its own type information (more or less), similar to sqlite3.
That's pretty true, but I'm up against a "cultural issue": The people in this company are very accustomed to pulling up this data in a spreadsheet and doing some rather heavy analysis on the columns.  So we need something that'll allow them to continue doing that when necessary.
 
I don't understand reference to column-guessing, nor the "So" at the start of the next sentence. Surely "guessing" the type of a column would need to be done once only, when the data first enters your universe, not every time you read it, irrespective of storage format (Excel/database/CSV).
Well, Excel, I'm told, will allow us to save types, but CSV normally does not - so round tripping to/from an unaugmented CSV file doesn't work unless the CSV package correctly guesses the column types when reading the data back in.

--
Dan Stromberg

John Machin

unread,
Oct 5, 2010, 8:39:58 PM10/5/10
to python...@googlegroups.com
On 6/10/2010 10:02 AM, Dan Stromberg wrote:
>
> On Tue, Oct 5, 2010 at 3:38 PM, John Machin <sjma...@lexicon.net
> <mailto:sjma...@lexicon.net>> wrote:
>
> On 6/10/2010 8:58 AM, Dan Stromberg wrote:
>
>
> On Oct 5, 2:39 pm, John Machin<sjmac...@lexicon.net

"""Excel will allow us to save types""": basic cell types are

empty/blank: no data at all
text: restricted to the Unicode BMP, max len is 32767 characters
number: IEEE 64-bit float
boolean: true/false
error: codes for not applicable, zero division, illegal cell reference, etc

date is not a basic type, it is inferred from number with a date-like
display format; refer to the xlrd docs.

You need to be careful when reading Excel files. Some readers
transmogrify data in interesting fashions:

E.g. COM changes number cells with a "currency" display format from x to
long(x * 1000) or a decimal.Decimal with 4 decimal places (thus losing
information).

E.g. openpyxl deliberately converts what Excel has plainly labelled as a
text cell with contents u'50%' into a Python float 0.5 ...

John Yeung

unread,
Oct 5, 2010, 11:28:40 PM10/5/10
to python...@googlegroups.com
On Tue, Oct 5, 2010 at 8:39 PM, John Machin <sjma...@lexicon.net> wrote:
> You need to be careful when reading Excel files. Some
> readers transmogrify data in interesting fashions:
>
> E.g. openpyxl deliberately converts what Excel has
> plainly labelled as a text cell with contents u'50%' into
> a Python float 0.5 ...

I'm a little embarrassed to say that I haven't tried openpyxl yet
myself; I only "recommended" it because it was the most
complete-looking and mature-looking .xlsx-handler for Python I'd
found. (Yes, I have heard the saying regarding books and covers.)

The behavior you describe strikes me as something that might be
acceptable and indeed even expected in the PHP community. I guess
this isn't too surprising given that openpyxl is billed as a port of
PHPExcel.

John Y.

John Yeung

unread,
Oct 5, 2010, 11:49:10 PM10/5/10
to python...@googlegroups.com
On Tue, Oct 5, 2010 at 6:00 PM, John Machin <sjma...@lexicon.net> wrote:
>
> Would you like a copy of the xlsxrd kit for reading
> Excel 2007 xlsx/xlsm files using the xlrd interface?
> It's only alpha but it's certainly sturdy (IMHO).

Hmm... Dan did seem to define sturdy as "non-beta"... ;)

But seriously, I would gladly accept a copy of xlsxrd. I can't
promise that I'll really put it through its paces, as I am a much
heavier writer of Excel files than a reader. But my office did
recently switch to MS Office 2010 (from 2000), so I do encounter both
.xls and .xlsx files.

John Y.

John Machin

unread,
Oct 6, 2010, 4:21:51 AM10/6/10
to python...@googlegroups.com
On 6/10/2010 2:28 PM, John Yeung wrote:
> On Tue, Oct 5, 2010 at 8:39 PM, John Machin<sjma...@lexicon.net> wrote:
>> You need to be careful when reading Excel files. Some
>> readers transmogrify data in interesting fashions:
>>
>> E.g. openpyxl deliberately converts what Excel has
>> plainly labelled as a text cell with contents u'50%' into
>> a Python float 0.5 ...
>
> I'm a little embarrassed to say that I haven't tried openpyxl yet
> myself; I only "recommended" it because it was the most
> complete-looking and mature-looking .xlsx-handler for Python I'd
> found. (Yes, I have heard the saying regarding books and covers.)
>
> The behavior you describe strikes me as something that might be
> acceptable and indeed even expected in the PHP community.

Why???

There's more:

A text cell containing u'678.9' is reported as a number cell, value
678.89 ... at least consistentent with the percentage caper.

A text cell containing u'1.2345e-4' is not molested, because their regex
for floatables doesn't grok suchlike.

A NUMBER cell whose value text is u'1.234567E+20' is reported as a TEXT
cell with that value.

It looks like they are passing the file text values through a routine
like Excel would use for assigning types to USER INPUT: test with
various regexes, if none match, treat it as text. The data-type flags
from the file appear to be overridden by these deduced types.

> I guess
> this isn't too surprising given that openpyxl is billed as a port of
> PHPExcel.


Is it acceptable and even expected to ignore the "several chunks" warning in

http://docs.python.org/library/xml.sax.handler.html#xml.sax.handler.ContentHandler.characters

?


John Yeung

unread,
Oct 6, 2010, 12:25:46 PM10/6/10
to python...@googlegroups.com
On Wed, Oct 6, 2010 at 4:21 AM, John Machin <sjma...@lexicon.net> wrote:
> On 6/10/2010 2:28 PM, John Yeung wrote:
>> The behavior you describe strikes me as something that might be
>> acceptable and indeed even expected in the PHP community.
>
> Why???

I am not a PHP programmer, but the impression I get from reading
people talk about various programming languages is that PHP is
notorious for its free-wheeling implicit type conversions. It's often
used as the x in "like x, Python has dynamic typing; however, unlike
x, Python has relatively strong typing". PHP and Excel both seem to
me as though they are eager to guess on behalf of (or in spite of?)
the user.

> It looks like they are passing the file text values through a routine like
> Excel would use for assigning types to USER INPUT: test with various
> regexes, if none match, treat it as text. The data-type flags from the file
> appear to be overridden by these deduced types.

Your examples suggest that under openpyxl, .xlsx amounts to a form of
Dan's "CSV-with-type-inference".

> Is it acceptable and even expected to ignore the "several
> chunks" warning in
>
> http://docs.python.org/library/xml.sax.handler.html#xml.sax.handler.ContentHandler.characters
>
> ?

I must admit I know basically nothing of such things. I suppose
you're implying that openpyxl ignores the warning, and is thus
unsturdy? (You already sold me on the sketchy data type handling.)

John Y.

Anders K

unread,
Oct 11, 2010, 4:10:19 AM10/11/10
to python-excel
Chiming in with another site with interest > epsilon.
The need here is so far only reading, ie we use xlrd.
We essentially read columns of data from one or more sheets in a
workbook to use as parameters for simulations.
This means that the "bells and whistles" factor in the books is low,
and even a limited set of functionality on
Python 3 would be a big help.

Ralf Kelzenberg

unread,
Nov 8, 2010, 5:06:36 AM11/8/10
to python-excel
We would also be intrested in working python3 xlwt and xlrd.
We are using excel as a storage file for exported Iseries (really old
System/36 fromat) data, export it to a temporary SQLITE3 db for
processing and output the result in excel files. Because of "cultural
issues" we have to stick to that.
Better xslx is also of interest.

John Yeung

unread,
Nov 8, 2010, 10:30:25 AM11/8/10
to python...@googlegroups.com
On Mon, Nov 8, 2010 at 5:06 AM, Ralf Kelzenberg
<ralf.ke...@googlemail.com> wrote:
> We would also be intrested in working python3 xlwt and xlrd.
> We are using excel as a storage file for exported Iseries (really old
> System/36 fromat) data, export it to a temporary SQLITE3 db for
> processing and output the result in excel files. Because of "cultural
> issues" we have to stick to that.
> Better xslx is also of interest.

How is the data being exported from the iSeries? Are the Excel files
being generated on the iSeries itself?

John Y.

PythonDevDang

unread,
May 25, 2012, 10:21:37 AM5/25/12
to python...@googlegroups.com
I'm another party interested in Python 3 support for xlwrt and xlrd for working with the .xlsx format.
Right now, I just use Excel's SaveAs CSV or Tab-delimited, then process the data with Python,
and then try to import it back into Excel. I just noticed that I can save as OpenDocument
spreadsheet (ods) format. I don't have OpenDocument on this system (and can't install it), but
I will look to see if there are Python 3 modules for working with that format (assuming it doesn't
lose the cell type information).

Reply all
Reply to author
Forward
0 new messages