I suggest it's not worth too much time on this.
The last stage of our port to python 3.3 compatibility is for us to
change the ParaParser to work on something available in python 2.7 and
3.3 and get rid of sgmlop/xmllib forever. I think we will be there
within two weeks.
The problem is that we need to parse a lot of little chunks of text,
and the available C based parsers need some expensive setup (e.g. to
set up all the entities), then we loop over them in Python anyway.
After various speed experiments, I have concluded that there is no
performance benefit to messing around with expat/etree/lxml/pyRXP, so
I'm currently trying to rewrite paraparser.py using the html.parser in
Python's standard library. This will allow us to be fairly tolerant
of poor markup, and to initialize a parser object quickly. And we
can get rid of sgmlop/xmllib forever. I would hope that a parser in
the standard library is leak free; if not at least it's Somebody
Else's Problem ;-)
Once we get this done, we hope to 'juggle branches' so that the
default code is running the new paraparser and work towards a release
in January or early February.
- Andy
--
Andy Robinson
Managing Director
ReportLab Europe Ltd.
Thornton House, Thornton Road, Wimbledon, London SW19 4NG, UK
Tel +44-20-8405-6420