The Future of Pisa

867 views
Skip to first unread message

geraldcor

unread,
Oct 19, 2010, 3:41:41 PM10/19/10
to Pisa XHTML2PDF Support
Hello all,

Dirk has said on the Google Code Issue tracker that he is no longer
supporting Pisa and has suggested forking the current release that is
on github. Is anyone interested in forking this? I think if we get a
large enough team we can make it work and create a really stable,
highly documented and feature rich package.

This is by far the easiest (and cheapest) way to create PDF documents
for web based reporting etc. I am a strong web programmer with python
but a little weak in larger python applications like Pisa. I would
love to get something started so that Pisa can live on and make life
easier for those of us that need to make A LOT of reports via web
apps.

Hope there is interest to keep this alive.

Greg

Luc Saffre

unread,
Oct 19, 2010, 7:52:56 PM10/19/10
to xhtm...@googlegroups.com
Hello Greg,

for the moment I'd prefer to follow this project passively until my own
project Lino <http://code.google.com/p/lino/> has started to use Pisa.

I'd also suggest to talk about a related topic: Does it make sense to
keep Pisa alive? What alternatives to Pisa exist?

Luc

Pascal Bach

unread,
Oct 20, 2010, 3:58:10 AM10/20/10
to xhtm...@googlegroups.com
Hello Luc,

I recently switched to wkhtmltopdf
http://code.google.com/p/wkhtmltopdf/ because of performance reasons.
It is based on WebKit and Qt and allows you to use javascript and and
all the other web technologies available in a modern rendering engine.

For me it worked quite well so for. There are even plans to implement
a python interface for it.

Pascal

> --
> You received this message because you are subscribed to the Google Groups "Pisa XHTML2PDF Support" group.
> To post to this group, send email to xhtm...@googlegroups.com.
> To unsubscribe from this group, send email to xhtml2pdf+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/xhtml2pdf?hl=en.
>
>

Danny Adair

unread,
Oct 20, 2010, 4:33:19 AM10/20/10
to xhtm...@googlegroups.com
On Wed, Oct 20, 2010 at 20:58, Pascal Bach <pasci...@gmail.com> wrote:
>[...]

> I recently switched to wkhtmltopdf
>[...]

> For me it worked quite well so for. There are even plans to implement
> a python interface for it.

As per bottom of http://code.google.com/p/wkhtmltopdf/

http://github.com/mreiferson/py-wkhtmltox

Cheers,
Danny

Luc Saffre

unread,
Oct 20, 2010, 8:50:28 AM10/20/10
to xhtm...@googlegroups.com
Looks interesting. I'll try wkhtmltopdf asap. Any other opinions about
these or any other alternatives are welcome to this thread.

As long as our research for alternatives did not bring up a replacement
that is clearly better, let's assume that it *does* make sense to
continue working on Pisa.

So another important question is: how many users does Pisa have? Who
wants Pisa to live? Please post your statements here in order to
encourage Greg's idea!

Luc

Olivier Deckmyn

unread,
Oct 20, 2010, 9:08:28 AM10/20/10
to xhtm...@googlegroups.com
Dear all,

I've been using Reportlab for years. It's a very valuable and mature solution but a bit difficult to use. Thanx to Pisa, using it is much easier, much more flexible. Pisa is a smart idea, with a brilliant implementation.

However, using it intensely since a few weeks, it has a major drawback: it's slow. I've spent some time profiling it, and it appears that this slowness is mainly due to the parser, which does'nt use any cache. For example, methods for parsing CSS atoms are invoked with same arguments thousands of time per report (in my case). 

Two days ago, as I faced this issue, I've been looking for an alternative and found also wkxhtmltopdf. It's a much more complex implementation, very difficult to install (this is why one can download binaries directly). The python binding is very weak for the moment (AFAICT, you have to give it files, not strings or StringIOs). And more important : given the same xhtml file I use for Pisa, wkxhtmltopdf produces a much different PDF :(

So, for me, Pisa needs to go on living. It's a simple and maintainable solution. It's slowness is a really issue for us by now (generating a 90 pages report takes 2 minutes on a Corei5/SSD machine... impossible in a web solution context). 

I can give some of my time investigating a little, but :
1/ I'm not sure I'm skillful for optimising Pisa today (read : i'm sure I'm not)
2/ it's impossible for us that we do maintain this solution, as it's far far away from our business scope ; and we do need to rely on someone else or a community for maintenance of our PDF tool (Pisa today)

Hope this helps,
Olivier D.


2010/10/20 Luc Saffre <luc.s...@gmail.com>

geraldcor

unread,
Oct 20, 2010, 11:25:51 AM10/20/10
to Pisa XHTML2PDF Support
I think these are all valid. There are definitely other solutions out
there, but I am yet to find a solution that simply takes my (X)HTML
and CSS, and with a few page layout items, creates a wonderful PDF
just as I intended in my simple to layout HTML document. This is why I
want XHTML2PDF (Pisa) to live. It takes something we do everyday, and
gives us the ability to translate it into printable/savable documents.
I will look at wkxhtmltopdf and see if it satisfies these criteria,
but I have saved so much time by being able to just convert an HTML
page directly into a pdf (Preview/Print type functionality).

Pisa works. However, Top problems include - all of the google code
issues, performance, python compatibility. If we get these as solid as
we can, It will be a near perfect solution. The only problem is, it's
a relatively large program, and it will take a dedicated team to debug
the code and do serious profiling to search out problems and
subsequently optimize the code.

I have no problem using another solution if it is just as quick and
easy, otherwise, I am really intent on making this a useful, and
powerful tool.

Greg

On Oct 20, 7:08 am, Olivier Deckmyn
> 2010/10/20 Luc Saffre <luc.saf...@gmail.com>
>
> > Looks interesting. I'll try wkhtmltopdf asap. Any other opinions about
> > these or any other alternatives are welcome to this thread.
>
> > As long as our research for alternatives did not bring up a replacement
> > that is clearly better, let's assume that it *does* make sense to
> > continue working on Pisa.
>
> > So another important question is: how many users does Pisa have? Who
> > wants Pisa to live? Please post your statements here in order to
> > encourage Greg's idea!
>
> > Luc
>
> > On 20.10.2010 11:33, Danny Adair wrote:
> > > On Wed, Oct 20, 2010 at 20:58, Pascal Bach <pasci.b...@gmail.com> wrote:
> > >> [...]
> > >> I recently switched to wkhtmltopdf
> > >> [...]
> > >> For me it worked quite well so for. There are even plans to implement
> > >> a python interface for it.
>
> > > As per bottom ofhttp://code.google.com/p/wkhtmltopdf/
>
> > >http://github.com/mreiferson/py-wkhtmltox
>
> > > Cheers,
> > > Danny
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Pisa XHTML2PDF Support" group.
> > To post to this group, send email to xhtm...@googlegroups.com.
> > To unsubscribe from this group, send email to
> > xhtml2pdf+...@googlegroups.com<xhtml2pdf%2Bunsu...@googlegroups.com>
> > .

Olivier Deckmyn

unread,
Oct 20, 2010, 11:28:47 AM10/20/10
to xhtm...@googlegroups.com
Dirk, would you help us address main issues (few bugs, performance issues), if we help on day-to-day work (tickets, mailing-list, tests, etc.) ?


2010/10/20 geraldcor <greg...@gmail.com>
To unsubscribe from this group, send email to xhtml2pdf+...@googlegroups.com.

David Bolton

unread,
Oct 20, 2010, 11:59:27 AM10/20/10
to xhtm...@googlegroups.com
I have looked at xhtmlrender/Flying Saucer in the past. It is
Java-based and has better CSS support than Pisa.
https://xhtmlrenderer.dev.java.net/

However, Flying Saucer did not support Chinese, Japanese, Arabic, and
Hebrew. I use Pisa for producing the PDF handbooks for MuseScore in
Chinese, Japanese, and 17 other languages. I am also looking for a
solution that supports right-to-left languages such as Arabic and
Hebrew.
I don't remember if I looked at wkhtmltopdf when I was reviewing
available options. I will have to investigate.

David

Raoul Snyman

unread,
Oct 20, 2010, 12:30:20 PM10/20/10
to xhtm...@googlegroups.com
On 19 October 2010 21:41, geraldcor wrote:
> This is by far the easiest (and cheapest) way to create PDF documents
> for web based reporting etc. I am a strong web programmer with python
> but a little weak in larger python applications like Pisa. I would
> love to get something started so that Pisa can live on and make life
> easier for those of us that need to make A LOT of reports via web
> apps.

I make use of PISA/xhtml2pdf on both a professional and a personal
level. A number of the (web based) applications at work use PISA, so
we'd really like to see its continued support (if not development).
Unfortunately I don't have the time to contribute to it, but I do have
an interest in making sure it stays around.

wkhtmltopdf won't work for me or the company I work for, because we're
writing web-based software, and there's no ways the sysadmins will
install WebKit and X11 on the servers just for a PDF writer.


--
Raoul Snyman
B.Tech Information Technology (Software Engineering)
E-Mail:   raoul....@gmail.com
Web:      http://www.saturnlaboratories.co.za/
Blog:      http://blog.saturnlaboratories.co.za/
Mobile:   082 550 3754
Registered Linux User #333298 (http://counter.li.org)

geraldcor

unread,
Oct 22, 2010, 5:58:34 PM10/22/10
to Pisa XHTML2PDF Support
I've looked at those other utilities, and none of them are quite as
convenient as Pisa, but may become necessary to use if we can't keep
Pisa alive. Dirk, it would help if you could point us in the right
direction to make some of the more major outstanding problems. If I am
put on the right track, I might be able to fix some of the bugs that
are creeping out there. So let's try to keep it going. If it is a
simple, easy and well done piece of software, more people will use it
and it will justify the time that not only Dirk has put into it, but
any time we put into making it a better piece of software. End Preach.

Greg

On Oct 20, 10:30 am, Raoul Snyman <raoul.sny...@gmail.com> wrote:
> On 19 October 2010 21:41, geraldcor wrote:
>
> > This is by far the easiest (and cheapest) way to create PDF documents
> > for web based reporting etc. I am a strong web programmer with python
> > but a little weak in larger python applications like Pisa. I would
> > love to get something started so that Pisa can live on and make life
> > easier for those of us that need to make A LOT of reports via web
> > apps.
>
> I make use of PISA/xhtml2pdf on both a professional and a personal
> level. A number of the (web based) applications at work use PISA, so
> we'd really like to see its continued support (if not development).
> Unfortunately I don't have the time to contribute to it, but I do have
> an interest in making sure it stays around.
>
> wkhtmltopdf won't work for me or the company I work for, because we're
> writing web-based software, and there's no ways the sysadmins will
> install WebKit and X11 on the servers just for a PDF writer.
>
> --
> Raoul Snyman
> B.Tech Information Technology (Software Engineering)
> E-Mail:   raoul.sny...@gmail.com

Olivier Deckmyn

unread,
Oct 22, 2010, 6:00:55 PM10/22/10
to xhtm...@googlegroups.com
+1

2010/10/22 geraldcor <greg...@gmail.com>

Dirk Holtwick

unread,
Oct 25, 2010, 5:05:07 AM10/25/10
to xhtm...@googlegroups.com
Hi,

it is very nice to see so much interest in keeping Pisa alive. Thanks for this and the positive feedback.

As the author of the software I think I should summarize the pros and cons I have in mind.


THE PROS:

- Easy to learn: The main idea behind Pisa is that a person with HTML and CSS skills (so most of us) is able to produce a PDF.

- Optimized for PDF: Pisa enhances HTML and CSS with some print specific features like headers and footers. It still tries to be compatible with all standards.

- Integration into processes: Pisa is great for dynamic generated content. It can be used directly via Python modules or via command line tools. It also integrates well with web frameworks. And it (somehow) works on Google App Engine ;)


THE CONS:

- Pisa is not very fast (caching may help here a lot)

- Pisa currently depends on ReportLab

- Pisa is not fully compatible with HTML and CSS specifications


THE ALTERNATIVES:

- wkhtmltopdf <http://code.google.com/p/wkhtmltopdf/> This is a great tool. If I would write Pisa again I would probably also start with the WebKit rendering machine. It is fast, reliable and portable. It also seems to support some print specific features http://madalgo.au.dk/~jakobt/wkhtmltopdf-0.9.9-doc.html

- FOP <http://xmlgraphics.apache.org/fop/> This may be the best choice for production environments though the XSL-FO format is not easy to understand. This project <http://html2fo.sourceforge.net/> might be helpful to get around this.

- In the PHP world there are some nice projects doing the job of Pisa to. Here are some of them in random order: http://mpdf.bpm1.com/, http://code.google.com/p/dompdf/, http://www.tecnick.com/public/code/cp_dpage.php?aiocp_dp=tcpdf

- PrinceXML <http://www.princexml.com/> If you are ready to invest a lot of money this may also be a workingsolution.


THE FUTURE:

With theses strong competitors around for me personally it does not make sense to put a lot of work into the Pisa project any more. To make Pisa more stable and faster it would need a complete rewrite and I would avoid using ReportLab and all the other dependencies. HTML and CSS are evolving technologies and it is almost impossible to keep up with all features in a one person project.

Anyway, for those who already integrated Pisa in their projects or persons who are looking for a simple Python only solutions Pisa is still a good choice.

My proposal for ad hoc renovation of the project would be that someone does the following steps:


Step 1:

- Cleanup the module hierarchy: Eliminate 'sx' and 'ho' namespaces and introduce 'xhtml2pdf' or something similar

- Integrate the HTML parser: The parser is evolving. To make it work nicely with the current Pisa version it would help to integrate http://code.google.com/p/html5lib/ directly into Pisa. This would also make installation easier. To make things faster maybe lxml or similar C based tools might be integrated. The same for the CSS parsing.

- Integrate PDFRW: http://code.google.com/p/pdfrw/ seems to be a great PDF toolset. This is also under MIT and could be integrated for PDF background images and other features. Else pyPDF might be used instead.

- Replace Reportlabs Paragraph implementation: I did start this work already. You'll find the code here: http://code.google.com/p/xhtml2pdf-base/source/list

- Creating a stable testing environment: The most important thing to keep in mind is that testing is crucial. Pisa worked on Windows, Mac and Linux and testing was always the hardest part. I started to write a test suite that also did visual comparing on Windows.


Step 2:

- Become independent from ReportLab


As I mentioned already the source code is available at http://github.com/holtwick/xhtml2pdf and ready for getting forked. My latest researches on the future of Pisa are seeing available here http://code.google.com/p/xhtml2pdf-base/source/list


I'm sorry that I can not help here, because I estimate that doing all these things would be a full time job for several months for one person and then it still needs a lot of maintenance. Personally I can not spend that time on it and for myself I don't see a good reason that good alternatives exist.


But I will not discourage people that have interest in keeping the project alive. It is a very very fascinating project and you can learn a lot about the basic technologies that make the web and print work. I think the basic technical approach that Pisa uses is also smart and extensible. The only drawback is the dependency on 3rd party software that often changes and often doesn't do what you expect it to do (yes, I'm ranting about ReportLab ;) ) I spend several months fixing bugs of this kind and this is discouraging.


If you have any concrete questions where I may help please let me know. I would be very happy to hand over the project and everything I know about it to persons who like to continue the work. I may also hand over the domains xhtml2pdf.com and htmltopdf.org to others if I see a big engagement in the open source project.

Thanks again for all the interest in Pisa, I really appreciate it!

Cheers
Dirk

Greg Corey

unread,
Oct 25, 2010, 11:47:00 AM10/25/10
to xhtm...@googlegroups.com
Thanks Dirk! That is a great basis to begin thinking about how to actually work on this project. Personally, it will take a good deal of research to understand everything that is going on here, but if I can get past that first hurdle, I am more than happy to start forking this and keep it going. Whether there are good alternatives or not, I love the simplicity of this package (to use) so I am going to do what I can to make it work. It may take a year, but I am going to try.

I am going to start walking through the code and try to figure out exactly how a pdf is generated. Once I get a grasp on that, I will make some changes and if I successfully *improve* the software, I will start a fork and go from there. That is my current plan of attack. I may have to ask some specific questions once I start getting elbow deep into it Dirk, and I appreciate your willingness to help out.

Any others that want to start working on this software with me, feel free to email me directly and we can start coordinating.

Greg






--

alanjds

unread,
Oct 25, 2010, 1:15:45 PM10/25/10
to Pisa XHTML2PDF Support
+1 to maintain this project.

I hope I'm not late here. I am using PISA from up to 2 years at my
start-up and it solves my problems. Have some edges to be cut but,
having no company/foundation behind it, does a good job.

But there are a lot of bugs waiting. We can at least fix this bugs
with some guidance from core developers (yes this is my free-time
offer).

On Oct 25, 1:47 pm, Greg Corey <gregco...@gmail.com> wrote:
> Thanks Dirk! That is a great basis to begin thinking about how to actually
> work on this project. Personally, it will take a good deal of research to
> understand everything that is going on here, but if I can get past that
> first hurdle, I am more than happy to start forking this and keep it going.
> Whether there are good alternatives or not, I love the simplicity of this
> package (to use) so I am going to do what I can to make it work. It may take
> a year, but I am going to try.
>
> I am going to start walking through the code and try to figure out exactly
> how a pdf is generated. Once I get a grasp on that, I will make some changes
> and if I successfully *improve* the software, I will start a fork and go
> from there. That is my current plan of attack. I may have to ask some
> specific questions once I start getting elbow deep into it Dirk, and I
> appreciate your willingness to help out.
>
> Any others that want to start working on this software with me, feel free to
> email me directly and we can start coordinating.
>
> Greg
>
> >http://madalgo.au.dk/~jakobt/wkhtmltopdf-0.9.9-doc.html<http://madalgo.au.dk/%7Ejakobt/wkhtmltopdf-0.9.9-doc.html>
> >http://code.google.com/p/html5lib/directly into Pisa. This would also
> > make installation easier. To make things faster maybe lxml or similar C
> > based tools might be integrated. The same for the CSS parsing.
>
> > - Integrate PDFRW:http://code.google.com/p/pdfrw/seems to be a great PDF
> > toolset. This is also under MIT and could be integrated for PDF background
> > images and other features. Else pyPDF might be used instead.
>
> > - Replace Reportlabs Paragraph implementation: I did start this work
> > already. You'll find the code here:
> >http://code.google.com/p/xhtml2pdf-base/source/list
>
> > - Creating a stable testing environment: The most important thing to keep
> > in mind is that testing is crucial. Pisa worked on Windows, Mac and Linux
> > and testing was always the hardest part. I started to write a test suite
> > that also did visual comparing on Windows.
>
> > Step 2:
>
> > - Become independent from ReportLab
>
> > As I mentioned already the source code is available at
> >http://github.com/holtwick/xhtml2pdfand ready for getting forked. My
> > xhtml2pdf+...@googlegroups.com<xhtml2pdf%2Bunsubscribe@googlegroups .com>
> > .

Jared

unread,
Nov 5, 2010, 5:56:38 PM11/5/10
to Pisa XHTML2PDF Support
I'm a django programmer, and my software currently depends on pisa. I
agree that although there are alternatives this is the best I can
find. I won't be able to do a lot, but count me in and assign me
simple tasks and I'll see if I can do them. Thanks Dirk and Greg.

On Oct 25, 9:47 am, Greg Corey <gregco...@gmail.com> wrote:
> Thanks Dirk! That is a great basis to begin thinking about how to actually
> work on this project. Personally, it will take a good deal of research to
> understand everything that is going on here, but if I can get past that
> first hurdle, I am more than happy to start forking this and keep it going.
> Whether there are good alternatives or not, I love the simplicity of this
> package (to use) so I am going to do what I can to make it work. It may take
> a year, but I am going to try.
>
> I am going to start walking through the code and try to figure out exactly
> how a pdf is generated. Once I get a grasp on that, I will make some changes
> and if I successfully *improve* the software, I will start a fork and go
> from there. That is my current plan of attack. I may have to ask some
> specific questions once I start getting elbow deep into it Dirk, and I
> appreciate your willingness to help out.
>
> Any others that want to start working on this software with me, feel free to
> email me directly and we can start coordinating.
>
> Greg
>
> >http://madalgo.au.dk/~jakobt/wkhtmltopdf-0.9.9-doc.html<http://madalgo.au.dk/%7Ejakobt/wkhtmltopdf-0.9.9-doc.html>
> >http://code.google.com/p/html5lib/directly into Pisa. This would also
> > make installation easier. To make things faster maybe lxml or similar C
> > based tools might be integrated. The same for the CSS parsing.
>
> > - Integrate PDFRW:http://code.google.com/p/pdfrw/seems to be a great PDF
> > toolset. This is also under MIT and could be integrated for PDF background
> > images and other features. Else pyPDF might be used instead.
>
> > - Replace Reportlabs Paragraph implementation: I did start this work
> > already. You'll find the code here:
> >http://code.google.com/p/xhtml2pdf-base/source/list
>
> > - Creating a stable testing environment: The most important thing to keep
> > in mind is that testing is crucial. Pisa worked on Windows, Mac and Linux
> > and testing was always the hardest part. I started to write a test suite
> > that also did visual comparing on Windows.
>
> > Step 2:
>
> > - Become independent from ReportLab
>
> > As I mentioned already the source code is available at
> >http://github.com/holtwick/xhtml2pdfand ready for getting forked. My
> > xhtml2pdf+...@googlegroups.com<xhtml2pdf%2Bunsu...@googlegroups.com>
> > .

Aziz Bookwala

unread,
Nov 6, 2010, 4:05:25 AM11/6/10
to xhtm...@googlegroups.com
Hello All
I too have used pisa for a coupla projects, and its the simplest html to pdf app to pluggin and use into an existing application. anyways, count me in too. Im not too confident in what i can accomplish, but like the earlier poster said, assign me a simple task, and we can take it from there.

To unsubscribe from this group, send email to xhtml2pdf+...@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/xhtml2pdf?hl=en.




--
- Aziz M. Bookwala

Gerard @ Gmail

unread,
Nov 8, 2010, 3:31:37 AM11/8/10
to xhtm...@googlegroups.com
Hi all,

(+1 on supporting a fork of pisa)

I developed an invoice system previously generating invoices in HTML.
Wanting PDF based invoices pisa was the exact tool to fill that gap.
Especially because I'm using django, so feeding rendered XHTML to pisa is
very doable.

If I can help with a fork of the project I'd gladly do that. I am however
not a hardcore coder by origin. But I do try to comply to the PEP's and
other conventions in this world .. Oh and I get the job done! :)

So please inform me (on this list or directly) about a possible group that
will arise working on a fork. Although my time is limited I'd definitely
like to help.

Regards,

Gerard.


On 06-11-10 09:05, Aziz Bookwala wrote:
> Hello All
> I too have used pisa for a coupla projects, and its the simplest html to pdf
> app to pluggin and use into an existing application. anyways, count me in
> too. Im not too confident in what i can accomplish, but like the earlier
> poster said, assign me a simple task, and we can take it from there.
>
> On Sat, Nov 6, 2010 at 3:26 AM, Jared <jaredt...@gmail.com
> <mailto:jaredt...@gmail.com>> wrote:
>
> I'm a django programmer, and my software currently depends on pisa. I
> agree that although there are alternatives this is the best I can
> find. I won't be able to do a lot, but count me in and assign me
> simple tasks and I'll see if I can do them. Thanks Dirk and Greg.
>
> On Oct 25, 9:47 am, Greg Corey <gregco...@gmail.com

> <mailto:gregco...@gmail.com>> wrote:
> > Thanks Dirk! That is a great basis to begin thinking about how to
> actually
> > work on this project. Personally, it will take a good deal of research to
> > understand everything that is going on here, but if I can get past that
> > first hurdle, I am more than happy to start forking this and keep it
> going.
> > Whether there are good alternatives or not, I love the simplicity of this
> > package (to use) so I am going to do what I can to make it work. It
> may take
> > a year, but I am going to try.
> >
> > I am going to start walking through the code and try to figure out
> exactly
> > how a pdf is generated. Once I get a grasp on that, I will make some
> changes
> > and if I successfully *improve* the software, I will start a fork and go
> > from there. That is my current plan of attack. I may have to ask some
> > specific questions once I start getting elbow deep into it Dirk, and I
> > appreciate your willingness to help out.
> >
> > Any others that want to start working on this software with me, feel
> free to
> > email me directly and we can start coordinating.
> >
> > Greg
> >
> > On Mon, Oct 25, 2010 at 3:05 AM, Dirk Holtwick

> <dirk.holtw...@gmail.com <mailto:dirk.holtw...@gmail.com>>wrote:

> <http://madalgo.au.dk/%7Ejakobt/wkhtmltopdf-0.9.9-doc.html><http://madalgo.au.dk/%7Ejakobt/wkhtmltopdf-0.9.9-doc.html>

> > > xhtml2pdf.com <http://xhtml2pdf.com> and htmltopdf.org
> <http://htmltopdf.org> to others if I see a big engagement in the


> > > open source project.
> >
> > > Thanks again for all the interest in Pisa, I really appreciate it!
> >
> > > Cheers
> > > Dirk
> >
> > > --
> > > You received this message because you are subscribed to the Google
> Groups
> > > "Pisa XHTML2PDF Support" group.
> > > To post to this group, send email to xhtm...@googlegroups.com

> <mailto:xhtm...@googlegroups.com>.


> > > To unsubscribe from this group, send email to
> > > xhtml2pdf+...@googlegroups.com

> <mailto:xhtml2pdf%2Bunsu...@googlegroups.com><xhtml2pdf%2Bunsu...@googlegroups.com
> <mailto:xhtml2pdf%252Buns...@googlegroups.com>>


> > > .
> > > For more options, visit this group at
> > >http://groups.google.com/group/xhtml2pdf?hl=en.
>
> --
> You received this message because you are subscribed to the Google
> Groups "Pisa XHTML2PDF Support" group.
> To post to this group, send email to xhtm...@googlegroups.com

> <mailto:xhtm...@googlegroups.com>.


> To unsubscribe from this group, send email to
> xhtml2pdf+...@googlegroups.com

> <mailto:xhtml2pdf%2Bunsu...@googlegroups.com>.


> For more options, visit this group at
> http://groups.google.com/group/xhtml2pdf?hl=en.
>
>
>
>
> --

> - Aziz M. Bookwala


>
> --
> You received this message because you are subscribed to the Google Groups
> "Pisa XHTML2PDF Support" group.
> To post to this group, send email to xhtm...@googlegroups.com.
> To unsubscribe from this group, send email to
> xhtml2pdf+...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/xhtml2pdf?hl=en.

--
self.url = www.gerardjp.com

James Paige

unread,
Nov 30, 2010, 11:31:58 AM11/30/10
to Pisa XHTML2PDF Support
I would also be happy to see a continuation of pisa. The alternatives
listed are all unsuitable for my purposes.

On Oct 25, 1:05 am, Dirk Holtwick <dirk.holtw...@gmail.com> wrote:
> THE ALTERNATIVES:
>
> - wkhtmltopdf <http://code.google.com/p/wkhtmltopdf/> This is a great tool. If I would write Pisa again I would probably also start with the WebKit rendering machine. It is fast, reliable and portable. It also seems to support some print specific featureshttp://madalgo.au.dk/~jakobt/wkhtmltopdf-0.9.9-doc.html

Webkit+Qt is too heavy of a requirement for me. Especially Qt, since I
am trying to print from a gtk application :P

> - FOP <http://xmlgraphics.apache.org/fop/> This may be the best choice for production environments though the XSL-FO format is not easy to understand. This project <http://html2fo.sourceforge.net/> might be helpful to get around this.

FOP is java based, and adding a Java requirement to my python
application turns distribution from mostly-easy to frighteningly
complex

> - In the PHP world there are some nice projects doing the job of Pisa to. Here are some of them in random order:http://mpdf.bpm1.com/,http://code.google.com/p/dompdf/,http://www.tecnick.com/public/code/cp_dpage.php?aiocp_dp=tcpdf

I have used some PHP based html-to-pdf converters before, and have
been happy with the results, but using a PHP tool in a non-web
application can sometimes be problematic.

> - PrinceXML <http://www.princexml.com/> If you are ready to invest a lot of money this may also be a workingsolution.

$495 single user, $3800 server license. need I say more? ;)

Right now, if pisa becomes unmaintained to the point where I cannot
use it anymore (hopefully that won't happen anytime soon) I would
probably have to resort to printing to html and sending it to the
user's installed web browser, requiring them to print from the browser
(to a PDF printer) as a separate step which is kludgy, and almost as
bad as the alternatives listed above ;)

---
James Paige

Greg Corey

unread,
Dec 1, 2010, 6:21:37 PM12/1/10
to xhtm...@googlegroups.com
Ok. Work has been extremely busy these last months, but I am getting closer to having time to fork Pisa and get things going. Here is what I would like to get a feel for.
  1. Name. Dirk, what name would be good to fork the project to? should I just use xhtml2pdf under my own git user?
  2. Current Knowledge. Dirk (again), sometime it would be nice to have a complete rundown of what you already know and the problems already present beyond what you listed in your previous posting. Also, the flow of the code and what depends on what and what is outdated etc.
  3. People's expertise. Everyone, what are all of you able and capable of doing? What are your strengths and weaknesses. The problem with forking this project is that I don't know that any of us are quite as capable as Dirk, even as a team, so it will take some research to get up to speed on all of these different technologies and how they work.
  4. Assigning work. How to assign work? Just wait until a fork is complete and go from there? 
FOP and wkhtmltopdf are by far the closest to what I would want. It is just really nice to have a python library for those of us who love django and python for our web applications and for those like James who are creating desktop apps that don't want a lot of dependencies.

Once I hear from Dirk, I will start this process along and see what I can do about development documentation so that I can be the go to guy for any development questions.

Dirk, if you are game, we can talk via Skype or similar if that would help in the exchange of knowledge. I leave it up to you.

Whew. I hope this works :)

Greg


On Tue, Nov 30, 2010 at 9:31 AM, James Paige <bob.the...@gmail.com> wrote:
I would also be happy to see a continuation of pisa. The alternatives
listed are all unsuitable for my purposes.

On Oct 25, 1:05 am, Dirk Holtwick <dirk.holtw...@gmail.com> wrote:
> THE ALTERNATIVES:
>
> - wkhtmltopdf <http://code.google.com/p/wkhtmltopdf/> This is a great tool. If I would write Pisa again I would probably also start with the WebKit rendering machine. It is fast, reliable and portable. It also seems to support some print specific featureshttp://madalgo.au.dk/~jakobt/wkhtmltopdf-0.9.9-doc.html

Webkit+Qt is too heavy of a requirement for me. Especially Qt, since I
am trying to print from a gtk application :P

> - FOP <http://xmlgraphics.apacxhtml2pdf is he.org/fop/> This may be the best choice for production environments though the XSL-FO format is not easy to understand. This project <http://html2fo.sourceforge.net/> might be helpful to get around this.

FOP is java based, and adding a Java requirement to my python
application turns distribution from mostly-easy to frighteningly
complex

> - In the PHP world there are some nice projects doing the job of Pisa to. Here are some of them in random order:http://mpdf.bpm1.com/,http://code.google.com/p/dompdf/,http://www.tecnick.com/public/code/cp_dpage.php?aiocp_dp=tcpdf

I have used some PHP based html-to-pdf converters before, and have
been happy with the results, but using a PHP tool in a non-web
application can sometimes be problematic.

> - PrinceXML <http://www.princexml.com/> If you are ready to invest a lot of money this may also be a workingsolution.

$495 single user, $3800 server license. need I say more? ;)

Right now, if pisa becomes unmaintained to the point where I cannot
use it anymore (hopefully that won't happen anytime soon) I would
probably have to resort to printing to html and sending it to the
user's installed web browser, requiring them to print from the browser
(to a PDF printer) as a separate step which is kludgy, and almost as
bad as the alternatives listed above ;)

---
James Paige

Phoebe Bright gmail

unread,
Dec 2, 2010, 4:10:05 AM12/2/10
to xhtm...@googlegroups.com
My django skills are basic but put me down for some testing if that helps?

Phoebe.

Dirk Holtwick

unread,
Dec 2, 2010, 5:34:56 AM12/2/10
to xhtm...@googlegroups.com
Hi Greg,

> Ok. Work has been extremely busy these last months, but I am getting closer to having time to fork Pisa and get things going. Here is what I would like to get a feel for.
> • Name. Dirk, what name would be good to fork the project to? should I just use xhtml2pdf under my own git user?

I think xhtml2pdf is well established, you should use that name. Best way would be to fork from my GitHub repository, so people can easily find the repository etc. Others participating in development may also fork from that repository, that should make collaboration easy.

> • Current Knowledge. Dirk (again), sometime it would be nice to have a complete rundown of what you already know and the problems already present beyond what you listed in your previous posting. Also, the flow of the code and what depends on what and what is outdated etc.

Ok, I'll see if I find time for doing that. I'll keep you updated.

> • Assigning work. How to assign work? Just wait until a fork is complete and go from there?

Forking and merging should be the best approaches. For new features you might want to add branches. If most contributors prefer another versioning system you might want to move to another platform. Mercurial is also a good choice. SVN is deprecated in my opinion for agile development.

> Dirk, if you are game, we can talk via Skype or similar if that would help in the exchange of knowledge. I leave it up to you.

I tried to contact you via Skype. My Skype name is my lastname.

> Whew. I hope this works :)

It will, I'm optimistic :)

Dirk

Greg Corey

unread,
Dec 2, 2010, 11:30:28 AM12/2/10
to xhtm...@googlegroups.com
I received your skype contact this morning. I wasn't by a computer all night.

I think Git is the best choice for version control as well so I will just fork for now and go from there.

Thanks for your assistance.

Greg


--

alanjds

unread,
Dec 5, 2010, 6:51:39 AM12/5/10
to Pisa XHTML2PDF Support
+1 to fork on github

I like more Mercurial, but using it on github with hg-git plugin has
been a breeze for me, and github can be the best code collaboration
platform available. In fact, I already have a bugfix to "request
pull" :)

On Dec 2, 2:30 pm, Greg Corey <gregco...@gmail.com> wrote:
> I received your skype contact this morning. I wasn't by a computer all
> night.
>
> I think Git is the best choice for version control as well so I will just
> fork for now and go from there.
>
> Thanks for your assistance.
>
> Greg
>
> > xhtml2pdf+...@googlegroups.com<xhtml2pdf%2Bunsubscribe@googlegroups .com>
> > .

Philippe Raoult

unread,
Jan 13, 2011, 4:36:16 PM1/13/11
to Pisa XHTML2PDF Support
Hello all,

I'm joining the discussion a bit late but here are a few random
thoughts:
- I use pisa in a commercial project. It has served us very well so
far, and I'm already putting some time into it to maintain our own
version
- performance is a bit disappointing, and I've been fooling around to
figure out how to improve it. I'm willing to share patches and tips.
- regarding the future, I think pisa fills a good niche. Reportlab is
a great tool but it cannot work from (x)html. pisa fills that gap and
has virtually no dependencies. Before using pisa I had a custom made
parser/reportlab but it only supported a few tags.
- regarding the next steps, I think the priority should be to replace
the html5lib parser by something else. There are some very fast and
mature parsing modules for python out there, and reusing one of them
will save us some work (lxml, cElementTree).
- performance-wise, I've found that the current use of pypdf for
backgrounds is very inefficient. I have some alternative code that
produces much smaller PDFs (reuse of images).

Regards,
Philippe

Greg Corey

unread,
Jan 13, 2011, 4:43:14 PM1/13/11
to xhtm...@googlegroups.com
Philippe, you just may be the person to help this get off the ground. I have been so busy with my daily work that I have had hardly any time to work on this. I haven't even made a fork yet on github :). So hopefully, with your advanced understanding already, we can make some real progress on this project.

I will contact you when it is forked and ready for work.

Thank you so much for responding.

Greg

To unsubscribe from this group, send email to xhtml2pdf+...@googlegroups.com.

ChrisGlass

unread,
May 18, 2011, 11:15:27 AM5/18/11
to xhtm...@googlegroups.com
Hi all,

We use Pisa pretty extensively too, and think it should live.

I started a fork merging in some of the work done by the community already, and created an IRC channel on freenode (#xhtml2pdf).

The repository is at: https://github.com/chrisglass/xhtml2pdf

Please feel free to fork it, and send pull requests against it.

Also, here is a list of tasks (low-level) I think need to be worked on, and that's where I'll start myself too:

The easies (no need for wizard programming skills!):
* Writing docs. There is an enormous need for documentation. While it would make sense to use pisa to document itself, I think we're better off using something a little easier, so as to lower the barrier of entry and let as many people as possible write docs. Please write rst formatted text (someone needs to make a decision, so here, my fork will use rst).
* Writing tests. Let's not aim for the fancy super duper top to bottom test suite, but rather try to get at least _some_ unit tests going, so that future additions can be tested for regression and/or performance.
* Code cleanup. There is _lots_ of cleanup to do: unused variables, imports, dead code, stuff that is commented out... Anybody can help with this, but to make the merger's life easier, please make _small_ and _encapsulated_ commits, and try to limit yourself to one type of fix per pull request (it's not like I will refuse commits, but it's just good practice)

The bigger ones (programming involved, lone programmers welcome):

* Refactor the module structure. ho and sx need to go, and all code should live in a xhtml2pdf folder/package. This obviously means fixing all the relevant import statements, too :(
* Kill wild imports (wild imports are "from X import * "). It's a little more complicated than unused imports since you need to make decisions about which exact class you want to import (and there can be several called the same way).
* Demining (document the code with docstrings and inline comments). It's not going to make the program faster or better, but it'll help other developers enormously. Don't write tons of useless comments, but try to be as clear as possible in as few words as possible to describe what a function does, what a variable is, etc... If you feel variables are misnamed, go ahead and rename them to make them more explicit

The big ones (please drop a mail on the mailing list first, maybe come talk about it on IRC):

* Integrate the HTML parser. While I'm not certain including dependencies is a good idea, it might be nice to look at the html5lib, eventually at lxml too and see what there is to do to make things work better together.
* Integrate with pdfrw.
* Get free of reportlab (not sure why, but well, the original author is more savvy than me on the issue, for sure)

That I think is a good starting list.
Obviously, you're free to work on whatever you wish, that's just a list of stuff that needs to be done in my humble opinion.

Hope to see this project alive again soon!

Best regards,

- Chris

PS: Sorry if you already received this email, but google groups is not my friend, apparently :(

Dirk Holtwick

unread,
May 18, 2011, 11:29:31 AM5/18/11
to xhtm...@googlegroups.com
Hi Chris,

Thank you very much for your effort and your great looking agenda. As the original author of XHTML2PDF, also known as 'Pisa', I would like to encourage everyone interested in the project to support Chris.

I personally cannot continue to support the project as much as I did before, therefore I'm happy to see that Chris is willing to take the lead. I agree that documentation, testing and modernization are the primary tasks and I think that after having accomplished that, the maintenance, development and use of the project should become easier for everyone.

Thanks to Chris and all of you guys for supporting XHTML2PDF!

Cheers,
Dirk

Tribaal

unread,
May 19, 2011, 4:21:19 AM5/19/11
to xhtm...@googlegroups.com
So, as you might have noticed I merged back the "merging" branch into my "master", and started "demining" like I call it, that is, going over the code and adding #TODO statements where I feel they are necessary.

Most of theses are really simple tasks like killing wild imports, so don't hesitate to send in pull requests if you fixed some of them already.

Also, I memoized some heavily used functions in pisa_utils. This should mean a performance improvement for people with complex layouts and many pages, I'd love to hear some feedback about it, if you happen to have such a setup.

Cheers,

- Chris

Philippe Raoult

unread,
May 19, 2011, 7:42:10 AM5/19/11
to xhtm...@googlegroups.com
Hi all,

I'm not really familiar with github so I'll just post my patches here.
Feel free to use them in any way you see fit. Here is a quick
description of what each of them does:

- hr patch: allow a "width" attribute for hr tags. My only use case
for this was with "XX%" values so I'm not sure it works for absolute
pixels/points values.
- hash image: reverse a workaround for a (supposedly) fixed bug in
reportlab. Including the same image many times resulted in N copies
being added to the pdf. With the patch each individual image will be
included once, regardless of the number of uses.
- img px spec: allow img tags to have "XXpx" width/height attributes.
In my tests only numbers worked (px unit was implicit). This was a big
deal to me because ckeditor uses the "XXpx" attribute format.
- nb pages: add a page counter to the pisa doc instance. Use as this:
doc = pisa.CreatePDF(xhtml, buffer, blah blah)
print doc._pisa_page_counter

Page counting and adding a "Page X of Y" label to each page with
reportlab/pisa has been an issue for some time. There are many ways to
skin that particular cat. My first approach was to run pisa to
generate the pdf first just to count the pages, and then add the
("Page <pdf:pagenumber/> of %d" % doc._pisa_page_counter) frame to the
html body and run pisa again. This was by far the simplest and worked
quite well, but performance was very poor for large documents. My
strategy is now to generate a new pdf with only the background and the
page counter frame and to merge it with the previously generated
content (pypdf). Before this patch, I had to count pages with pypdf
which is relatively slow because it needed to parse everything.


Also, I've done a bit of benchmarking back when I was working with
500-pages pdf with images, and it seems the parsing does take quite a
bit of time (and memory!). I'd advise against using the current parser
in pisa. lxml can do the job faster with less memory.

Also the biggest performance gain I got was from adding xhtml = 1 to
the CreatePDF arguments. Without it the parser considered each
<pdf:nextpage/> as an opening tag and put everything following inside.

I hope some of you will find this to be helpful. Pisa is deployed on
my projects and will continue to be used for the foreseeable future.
I'd also like to take this opportunity to thank Dirk for his work.

Best regards,
Philippe

pisa_patch_hr.diff
pisa_patch_hash_image.diff
pisa_patch_img_px_spec.diff
pisa_patch_nb_pages.diff

Tribaal

unread,
May 19, 2011, 8:53:51 AM5/19/11
to xhtm...@googlegroups.com
Hi Philippe,

Thanks a lot for your patches and precious advice!

I merged you patches in after a quick review and testing. Replacing the parser should indeed be relatively high on the priorities list, but I will first focus on cleaning up the code. lxml looks like it's doing a really good job, so I'll naturally start playing around with that, indeed.

The short-term goal for me is to refactor the internal structure to be all in one, tidy package. Since it is highly backwards-incompatible, I'll take the opportunity to rename the package to xhtml2pdf (so everybody's projects don't suddently start breaking when the automatic deployment scripts start pulling version 2.0 or whatever).
While I'm at it, I'll most likely change the default to be xhtml=True, instead. This makes most sense, and as you said might increase performance.

Thanks again for your contribution!

- Chris

Dirk Holtwick

unread,
May 19, 2011, 10:37:54 AM5/19/11
to Pisa XHTML2PDF Support
Hi,

Long time ago I decided to use html5lib because of some reasons:

1. It was available in pure Python, so less portabilty problems
2. It was able to handle dirty HTML
3. It had a liberal license

I think 'lxml' is a great choice, but remember it depends on some C
code. Anyway I would encourage you to go this way, but maybe it could
be done in some plugable way since for the parsing also simpler pure
Python solutions should work. If you are planning to only support
XHTML in the default varient even a 2-liner using
http://docs.python.org/library/xml.dom.minidom.html would be sufficent
to get the DOM you need.

Another beast is the CSS part. I think if you are getting your hand on
the HTML part you should also have a look at this one. My personal
opinion is that you should throw away the old implementation and write
a new one or use something else. The http://cthedot.de/cssutils/
project looks good, but the LGPL license will cause headaches for
those who are using xhtml2pdf in commercial projects. Maybe there is
something else availble that would fit. Anyway it seems to be a good
approach to 1. Parse HTML and CSS, 2. Create a DOM, 3. Apply CSS to
DOM.

Any way you go you should have a look at this http://lxml.de/dev/cssselect.html
It should also work without lxml. I used this XPATH magic in my Pyxer
project together with Genshi without lxml. I think this should make
half the work you need to replace the current CSS solution ;)
https://github.com/holtwick/pyxer/tree/master/src/pyxer/template

Please remember that with http://code.google.com/p/xhtml2pdf-base/source/browse/#hg%2Fsrc%2Fxhtml2pdf
there is already a template of how the new structure of xhtml2pdf
could look like. Maybe this is a source of inspiration for a major
reconstruction of the projects structure.

Cheers,
Dirk

Tribaal

unread,
May 20, 2011, 6:06:37 AM5/20/11
to xhtm...@googlegroups.com
So I have refactored most of the code (at least the main code, not the tests and examples etc...), it lives in a branch on my repository for the time being, but will probably merge it into my master, then push it to pypi as the "xhtml2pdf" package (with a glorious version number of 0.0.1).

Please have a look at it, comments welcome!

https://github.com/chrisglass/xhtml2pdf/tree/xhtml2pdf-refactor

Cheers!

- Chris


Cheers,
Dirk

YongTai Liang

unread,
May 23, 2012, 1:37:47 AM5/23/12
to xhtm...@googlegroups.com
The pdf:pagecount is added finally in this issue:


>> For more options, visit this group at
>> http://groups.google.com/group/xhtml2pdf?hl=en.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Pisa XHTML2PDF Support" group.
> To post to this group, send email to xhtm...@googlegroups.com.
> To unsubscribe from this group, send email to

Philippe Raoult

unread,
May 23, 2012, 2:58:48 AM5/23/12
to xhtm...@googlegroups.com
Hey,

I checked the patch, thanks for adding that.

I also noticed that you removed the current page_counter. Is there a
specific reason for that ? Is there still a way to get the number of
pages afterwards with your version ?

Regards,
Philippe
>> >> xhtml2pdf+...@googlegroups.com.
>> >> For more options, visit this group at
>> >> http://groups.google.com/group/xhtml2pdf?hl=en.
>> >>
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups
>> > "Pisa XHTML2PDF Support" group.
>> > To post to this group, send email to xhtm...@googlegroups.com.
>> > To unsubscribe from this group, send email to
>> > xhtml2pdf+...@googlegroups.com.
>> > For more options, visit this group at
>> > http://groups.google.com/group/xhtml2pdf?hl=en.
>> >
>
> --
> You received this message because you are subscribed to the Google Groups
> "Pisa XHTML2PDF Support" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/xhtml2pdf/-/ooQA_Pm9XV0J.
>
> To post to this group, send email to xhtm...@googlegroups.com.
> To unsubscribe from this group, send email to
> xhtml2pdf+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages