Alternate to Conversion API

1,478 views
Skip to first unread message

aswath satrasala

unread,
Aug 21, 2012, 7:00:57 AM8/21/12
to google-a...@googlegroups.com
Hello,
We were deeply involved in utilizing the conversion api for the HTML to PDF conversion.  Suddenly, I got the email from Google about the plan for decommissioning from Nov 2012.

Does anyone has suggestions for doing the HTML to PDF conversion that is compatible with Google Appengine for Java.  


Regards
-Aswath

Jeff Schnitzer

unread,
Aug 21, 2012, 12:46:58 PM8/21/12
to google-a...@googlegroups.com
We're planning to offload this to a service running on Heroku (we
convert the first page of a PDF to PNG). We need to do it anyways
because the Conversion API has a 2M document limit that we keep
bumping into. There really aren't good tools for doing this in Java
(there's one library that claims to be comprehensive, but it's
thousands of dollars) but it's easy with any platform that supports
ImageMagick.

I might write up the process when we get around to implementing it,
but it probably won't be until the last minute.

Jeff
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, send email to google-a...@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengi...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.

aswath satrasala

unread,
Aug 22, 2012, 7:24:27 AM8/22/12
to google-a...@googlegroups.com
Anyone tried working with flying-saucer, 

Will this work on Java Appengine

-Aswath

Jeff Schnitzer

unread,
Aug 22, 2012, 11:36:16 AM8/22/12
to google-a...@googlegroups.com
Since it uses Swing, probably not.

Jeff

On Wed, Aug 22, 2012 at 7:24 AM, Aswath Satrasala

Daniel Florey

unread,
Aug 22, 2012, 1:02:48 PM8/22/12
to google-a...@googlegroups.com, aswath satrasala
Same problem here. We have invested a lot of time (=money) into Conversion API.
So I am very frustrated and need to find a replacement.
From what I can tell App Engine has been using Princexml for the conversion.
Unfortunately this will not run on App Engine and costs a lot of €€€$$$.
So I'm basically lost right now.
Any suggestions?
The Conversion API worked perfectly fine for us :-(

Daniel

Gianni

unread,
Aug 22, 2012, 2:38:54 PM8/22/12
to google-a...@googlegroups.com
I suppose html to pdf is the feature of which will feel more the lack. The alternative free and most used in python is reportlab (perhaps it could be added to the libraries supported third-party libraries at this point) . But if conversion api still under discussion, I think that the problem of low utilization depends on the tricky usage of paged media css and not in converison api in itself.

-- Gianni



2012/8/22 Daniel Florey <daniel...@gmail.com>
--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/gqKJQqgouzAJ.

timh

unread,
Aug 25, 2012, 1:15:57 AM8/25/12
to google-a...@googlegroups.com, aswath satrasala
The conversion api seems to map to functionality in google docs.

Could the facilities in google docs for upload/download and conversion be used ? ( Possibly not depending on what your app does.)
Or are they being deprecated as well ?

T

Richard Watson

unread,
Aug 25, 2012, 2:42:08 AM8/25/12
to google-a...@googlegroups.com, aswath satrasala
Yup, here's an example of how to create a doc on Docs and export it as a PDF:

I'm almost 100% sure that functionality is safe as it's being used by many more people. If you try it out, please tell us how you find it?

timh

unread,
Aug 25, 2012, 3:24:27 AM8/25/12
to google-a...@googlegroups.com
The main problem with this approach for dymamic generation will be the requirement to stick stuff into docs, and pull back out with conversions, and then probably delete documents once generated.  So if it's being used as part of an anonymous service docs will be given a serious workout ;-)

Richard Watson

unread,
Aug 25, 2012, 4:08:28 AM8/25/12
to google-a...@googlegroups.com
Truly. Personally I'm inclined to use direct PDF generation, which is what I currently need. Fewer moving parts and dependencies.

timh

unread,
Aug 25, 2012, 9:01:33 PM8/25/12
to google-a...@googlegroups.com
For me, PDF generation was never really the problem, generating PDF is fairly straightforward.  It was the other facilities in the conversion api I was interested in ;-)

T

Bryce Cutt

unread,
Mar 7, 2013, 3:56:59 PM3/7/13
to google-a...@googlegroups.com, aswath satrasala
A lot of this can be done in Python and since you need this in functionality in Java you can set up a Python version or backend wrapped in a RESTful API that uses xhtml2pdf and then consume that API from your Java code.

For HTML/image to PDF I use the Python library http://www.xhtml2pdf.com/ which uses Reportlab, pyPdf, and html5lib running on GAE. I have been using it to generate very nice article PDFs with embedded images and once I figured out how to get the page size correct I have found this to be a very good library.

For PDF to image I have been using ImageMagick on an EC2 and for extracting text I have used Apache PDFBox (Java) and pdfminer (Python) on GAE. I have to convert CMYK PDFs into RGB JPGs with watermarks and ImageMagick makes this nice and easy but of course cannot be run on GAE so I wrapped it in an API and consume from GAE. I mostly use PDFBox to extract text for my search index and have no experience trying to get a nicely formatted text version from a PDF but I know pdfminer will give you a formatted HTML version of a PDF.

- Bryce
Reply all
Reply to author
Forward
0 new messages