testing xlwt and xlrd

173 views
Skip to first unread message

Chris Withers

unread,
Oct 9, 2008, 12:06:47 PM10/9/08
to python...@googlegroups.com
Hi All,

It seems we should start writing some unit tests for xlrd and xlwt.
I know this has been discussed before, I just can't remember if it was
on this list or not.

Anyway, a couple of things need to be decided from my perspective...

First up, how we test. So, for xlrd, there's two parts: BIFF parsing,
and xlrd object behaviour.

For the first one, I guess we have no choice but to have a batch of
micro .xls files that exhibit the behaviour under test and fire them at
the BIFF parser and check we get the right stuff back.

For the object behaviour, my recommendation would be to instantiate the
objects individually and test their behaviour has the desired effect.

So, pseudocoding, for the Sheet class we might have a SheetTests class:

from unittest import TestCase
from xlrd import Sheet

class SheetTests(TestCase):

def setUp(self):
self.sheet = Sheet(...)

def test_some_method(self):
self.assertEqual(self.sheet.some_method(...),...)

...and so on.

For xlwt, which is where I'm going to be working first, I guess we have
two classes of tests as well.

Again, we'll need to test that the .xls's generated match up to some
specs of what we're expecting. Yes, we could just do a binary file
comparision, but that seems like a bad idea. Is there a really low level
biff dumper script anywhere that we could include in xlwt independent of
xlrd that we could use for unit testing?
(ie: we test that the series of biff records generated is as expected,
and provide some helpful diff-like error messages when they're not)

For xlwt object functionality, I guess we'd want to call the methods and
then check that the response is as expected and that xlwt's internal
data stuctures are as expected (sometimes seen as naughty by testing
bods so we might want to keep this to a minimum).

Right, what do people feel so far?

OK, the next step is what test runner to use. My favourite would be to
use zope.testing, the easiest way of which is to move to zc.buildout. If
we do this, you do the following to get a "working xlwt checkout":

svn co https://...
python bootstrap.py
bin\buildout

Then, to run the tests, you'd just do:

bin\test -m xlwt

Moving to buildout would also mean that we can set it up such that
documentation can be generated from source just by doing:

bin\docs

...or similar.

(and before you ask, yes the above will all work on Windows *and* Linux)

Sorry this has been long, hope you guys get through it all. In the event
of silence, I'll setup a buildout for each of xlwt and xlrd as and when
I need to do development, and test any changes I make with an aim to
running the tests with zope.testing.

cheers,

Chris

--
Simplistix - Content Management, Zope & Python Consulting
- http://www.simplistix.co.uk

John Machin

unread,
Oct 16, 2008, 8:09:14 AM10/16/08
to python...@googlegroups.com
Chris Withers wrote:
> Hi All,
>
> It seems we should start writing some unit tests for xlrd and xlwt.
> I know this has been discussed before, I just can't remember if it was
> on this list or not.
>
> Anyway, a couple of things need to be decided from my perspective...
>
> First up, how we test. So, for xlrd, there's two parts: BIFF parsing,
> and xlrd object behaviour.
>
> For the first one, I guess we have no choice but to have a batch of
> micro .xls files that exhibit the behaviour under test and fire them at
> the BIFF parser and check we get the right stuff back.
>
> For the object behaviour, my recommendation would be to instantiate the
> objects individually and test their behaviour has the desired effect.

I don't see two parts at all. There is no such thing as a "BIFF parser".
xlrd.open_workbook reads the BIFF records in the xls file, unpacks them,
and stuffs the results into objects (book, sheets, rows, columns, cells,
XFs, etc etc). Is that what you call "parsing"? Those objects have
methods whose "behaviour" is rarely more than getting the values of the
attributes of the objects -- IOW what you need to test open_workbook.

>
> So, pseudocoding, for the Sheet class we might have a SheetTests class:
>
> from unittest import TestCase
> from xlrd import Sheet
>
> class SheetTests(TestCase):
>
> def setUp(self):
> self.sheet = Sheet(...)
>
> def test_some_method(self):
> self.assertEqual(self.sheet.some_method(...),...)
>
> ...and so on.

I'd really like to see some examples before I'd be convinced that this
is the way to go.

>
> For xlwt, which is where I'm going to be working first, I guess we have
> two classes of tests as well.

What two?

>
> Again, we'll need to test that the .xls's generated match up to some
> specs of what we're expecting. Yes, we could just do a binary file
> comparision, but that seems like a bad idea. Is there a really low level
> biff dumper script anywhere that we could include in xlwt independent of
> xlrd that we could use for unit testing?

runxlrd.py biff_dump your_test_output.xls >the_dump.txt

Would need options to change from current format e.g.
448: 0031 FONT len = 001a (26)
452: c8 00 04 00 ff 7f 90 01 00 00 01 02 00 00 05 01
?~?~????~~??~~??
468: 41 00 72 00 69 00 61 00 6c 00 A~r~i~a~l~
478: 041e FORMAT len = 0018 (24)
482: 05 00 13 00 00 22 24 22 23 2c 23 23 30 3b 5c 2d
?~?~~"$"#,##0;\-
498: 22 24 22 23 2c 23 23 30 "$"#,##0
to something more diff-friendly e.g. one line per record
FONT c8 00 04 00 ff 7f 90 01 00 00 01 02 00 00 05 01 41 00 72 00 69 00
61 00 6c 00
FORMAT 05 00 13 00 00 22 24 22 23 2c 23 23 30 3b 5c 2d 22 24 22 23 2c 23
23 30

> (ie: we test that the series of biff records generated is as expected,
> and provide some helpful diff-like error messages when they're not)
>
> For xlwt object functionality, I guess we'd want to call the methods and
> then check that the response is as expected and that xlwt's internal
> data stuctures are as expected (sometimes seen as naughty by testing
> bods so we might want to keep this to a minimum).
>
> Right, what do people feel so far?

Let's see how some actual tests for a non-trivial piece of functionality
in each of xlrd and xlwt shape up before we lurch on to the z*.* stuff.

Chris Withers

unread,
Oct 16, 2008, 9:30:01 AM10/16/08
to python...@googlegroups.com
John Machin wrote:
>> For the object behaviour, my recommendation would be to instantiate the
>> objects individually and test their behaviour has the desired effect.
>
> I don't see two parts at all. There is no such thing as a "BIFF parser".
> xlrd.open_workbook reads the BIFF records in the xls file, unpacks them,
> and stuffs the results into objects (book, sheets, rows, columns, cells,
> XFs, etc etc). Is that what you call "parsing"?

Yes, especially as that process could theoritcally be used to fire off
parser-like events as we've discussed elsewhere ;-)

> Those objects have
> methods whose "behaviour" is rarely more than getting the values of the
> attributes of the objects -- IOW what you need to test open_workbook.

OK, so yes, for xlrd, it would appear there only needs to be one type of
test. Do you see any way of doing this other than starting with a set of
(small!) .xls files and checking they end up with the right xlrd objects?

>> class SheetTests(TestCase):
>>
>> def setUp(self):
>> self.sheet = Sheet(...)
>>
>> def test_some_method(self):
>> self.assertEqual(self.sheet.some_method(...),...)
>>
>> ...and so on.

Are you unsure of using pyUnit (aka unittest) or something else?

>> For xlwt, which is where I'm going to be working first, I guess we have
>> two classes of tests as well.
>
> What two?

There definitely are two types of test here, as a trivial example:

- checking that xlwt.Workbook().add_sheet('Sheet 1') results in a the
Workbook remember that it has the sheet and checking the sheet object
that's returned has its name set to 'Sheet 1'

- checking that when book.save('file.xls') is called, it outputs the
correct .xls file.

>> comparision, but that seems like a bad idea. Is there a really low level
>> biff dumper script anywhere that we could include in xlwt independent of
>> xlrd that we could use for unit testing?
>
> runxlrd.py biff_dump your_test_output.xls >the_dump.txt

Can that be easilly split out from xlrd?

> Would need options to change from current format e.g.
> 448: 0031 FONT len = 001a (26)
> 452: c8 00 04 00 ff 7f 90 01 00 00 01 02 00 00 05 01
> ?~?~????~~??~~??
> 468: 41 00 72 00 69 00 61 00 6c 00 A~r~i~a~l~
> 478: 041e FORMAT len = 0018 (24)
> 482: 05 00 13 00 00 22 24 22 23 2c 23 23 30 3b 5c 2d
> ?~?~~"$"#,##0;\-
> 498: 22 24 22 23 2c 23 23 30 "$"#,##0
> to something more diff-friendly e.g. one line per record
> FONT c8 00 04 00 ff 7f 90 01 00 00 01 02 00 00 05 01 41 00 72 00 69 00
> 61 00 6c 00
> FORMAT 05 00 13 00 00 22 24 22 23 2c 23 23 30 3b 5c 2d 22 24 22 23 2c 23
> 23 30

Indeed, although it might be sensible to have the sanity-quoted raw
binary there too, since A~r~i~a~l~ might give a hint if it was supposed
to be T~i~m~e~s~.

> Let's see how some actual tests for a non-trivial piece of functionality
> in each of xlrd and xlwt shape up before we lurch on to the z*.* stuff.

The stuff below is purely an easy way to get a test runner that runs
pyUnit tests in a handy way... (side effects are easy docgen script
creation, repeatablity, etc)

As always, I wish zc.buildouts docs didn't suck quite so badly. I was
scared off for a good 2 years or so before finally realising how easy it
is to get going and how good a tool it is (it goes quite some way to
un-suck the results of setuptools infestation of the python community:

http://www.simplistix.co.uk/presentations/python_package_management_08/python_package_management_08.pdf

Reply all
Reply to author
Forward
0 new messages