PDF generation, conversion to PDF, and merging PDFs in Java environment

469 views
Skip to first unread message

Don M

unread,
Apr 30, 2008, 5:07:41 PM4/30/08
to The Java Posse, ds...@stanford.edu
Hi,

This isn't Java Posse related, but I'm looking for suggestions or
recommendations from anyone that has had to work with generating PDFs,
converting existing documents to PDF, and merging multiple PDFs into
one new PDF.

I'm currently running in a Java 5, Apache Tomcat 5.0, Oracle, Solaris
environment, and have the budget to explore other applications and
hardware including non-Java products as long as I can integrate them
in our existing Java environment.

We basically have a "document management" application that has been
allowing people to upload Word, PDF, RTF, and text documents. Related
documents are stored together with other metadata about them. They
would like to have all of these documents converted and merged into a
single PDF with some cover pages, appropriate headers and footers, and
some custom pages sprinkled throughout based on the metadata. The
final document will be under 100 pages in almost all cases and
probably around 30 pages.

Has anyone done anything like this and have some experiences to
share? Creating new PDFs from metadata in Java is something we
already do in a few apps, but converting different document types to
PDF and merging them together is new to us.

Don

Casper Bang

unread,
Apr 30, 2008, 6:04:16 PM4/30/08
to The Java Posse
For PDF generation, have a look at Ujac and Jasper Reports. I like
Ujac, it's simple and I have yet to see something that can't be
generated with it. It uses iText underneath, which you can always
resort to should you need to do something special, like pre- or
postprocessing (merge, stamp, weave etc.) Although quite a weird API
to work with, iText performs very well (it's stream based) and 100
pages is definitely a non-issue. As to conversion, have a look at
Apache POI.

/Casper

Peter Becker

unread,
Apr 30, 2008, 6:24:23 PM4/30/08
to java...@googlegroups.com
Hmmm...

with iText (http://www.lowagie.com/iText/ ) and PDFBox (http://www.pdfbox.org/ ) you have two pretty good tools on the PDF side -- I've used both over the years (iText to save Graphics2D rendering into PDF, PDFBox for text extraction). Merging PDF should certainly be possible, PDFBox lists that feature on the frontpage, for iText you find this via Google: http://java-x.blogspot.com/2006/11/merge-pdf-files-with-itext.html

The big problem you will find is that to get from Word or RTF into PDF you need a renderer. With RTF to PDF you might have some chance (Apache FOP maybe?), although I seem to recall that RTF is not always RTF -- particularly not if it comes out of Word. But I'm not sure about that last point. There is some RTF in Swing, but I would not expect that to be good enough -- at least not if the HTML rendering is any type of indication (http://java.sun.com/j2se/1.4.2/docs/api/javax/swing/text/rtf/package-summary.html ).

To render Word documents the only thing that comes to my mind is OpenOffice.org with it's SDK (http://udk.openoffice.org/ ). OpenOffice can render Word documents and can export PDF, so you should be able to use it as a Word->PDF conversion tool driven by Java code. It is not Java itself, though, so seperate deployment will be necessary. And even OOo doesn't get all Word documents right; it's pretty good but I think to render Word documents correctly you need a lot of insanity build into your rendering architecture, so only Word itself has any chance to do it. I remember reading an interview with an AbiWord developer years ago who said that understanding the Word format is easy, it's figuring out what Word does with it that's hard.

I had planned to play with the OOo SDK a while ago (using it for text extraction in my little project: http://tockit.sourceforge.net/docco/ ), but I never got beyond a quick look. Not that it seemed that hard, I just got distracted. Opening a document and saving it as PDF should be reasonably easy. I'd be happy to hear your experiences if you should try since I haven't given up on that old plan of mine yet :-)

  Peter

Viktor Klang

unread,
May 1, 2008, 6:16:01 AM5/1/08
to java...@googlegroups.com
I've successfully used BFO PDF ( big.faceless.org ) for a couple of years.
Used iText like 4 years ago but it was really cumbersome.

The good part with BFO PDF is that you can use AcroForms, so you can let a designer do the static part of your PDF, and then define AcroForms on it and then programmatically inject data into it from Java/Scala/what-have-you.

Cheers,
-V
--
Viktor Klang
Rogue Software Architect

Todd Costella

unread,
May 1, 2008, 11:48:21 AM5/1/08
to java...@googlegroups.com

You may also want to check out

 

http://icesoft.com/products/icepdf.html

 

 


Jack

unread,
May 2, 2008, 7:23:54 AM5/2/08
to The Java Posse
I've just recently had to use iText, and I'm glad I'm not the only
person who thinks the API is really unusual (from a Java perspective).
> > Don- Hide quoted text -
>
> - Show quoted text -

Don M

unread,
May 2, 2008, 11:54:16 AM5/2/08
to The Java Posse
Thanks for the replies. This was a lot of good information. I'm also
looking at Adobe's LiveCycle product (http://www.adobe.com/products/
livecycle/pdfgenerator/). Has anyone tried that? I'm hoping to talk
to the Adobe folks at JavaOne about this.

Peter Becker

unread,
May 5, 2008, 2:59:58 AM5/5/08
to java...@googlegroups.com
If I'm not utterly mistaken POI will only read the files, not render their content. The latter is a much harder job for which I don't know a pure Java solution. I'd be happy to stand corrected, though.

  Peter

Casper Bang

unread,
May 5, 2008, 6:45:45 AM5/5/08
to The Java Posse
You are right Peter, I forgot, POI will not help in this respect.
There is an open source rendering kit available which Josh Marinacci
helped push out from Sun internals:
http://weblogs.java.net/blog/joshy/archive/2007/12/the_big_secret.html

...but it's rather buggy (I actually had some javabean wrappers in the
works to go on top of it, but had to abandon the idea since it failed
to render my files).

/Casper

On May 5, 8:59 am, "Peter Becker" <peter.becker...@gmail.com> wrote:
> If I'm not utterly mistaken POI will only read the files, not render their
> content. The latter is a much harder job for which I don't know a pure Java
> solution. I'd be happy to stand corrected, though.
>

sherod

unread,
May 5, 2008, 7:00:34 AM5/5/08
to The Java Posse
BFO has a licensible component which provides a PDF viewer with
printing capability

I prototyped up a solution in it and was quite happy with the outcome.

If you have a lot of seats (> 5000) you are looking at mid to high 5
figures in cost however.

On May 5, 4:59 pm, "Peter Becker" <peter.becker...@gmail.com> wrote:
> If I'm not utterly mistaken POI will only read the files, not render their
> content. The latter is a much harder job for which I don't know a pure Java
> solution. I'd be happy to stand corrected, though.
>
>   Peter
>

sherod

unread,
May 5, 2008, 7:11:08 AM5/5/08
to The Java Posse
Not used it in anger, but have evaluated it (on paper) and sniffed
around it a few times.

1. Makes enormous PDF doco's by the time you put all the logic in
(like 5MB or more) - the demo person sidestepped that 'minor' issue.
2. Is very expensive, at the time we looked at it it was like $10K AU
to enable a single 'smart' form for lifecycle and platform costs like
$100K
3. Seemed quite closed - you seemed to have to buy into the entire
platform with workflow etc....

It seemed to be aimed at sites with the need to 'make smart' existing
paper forms for high value transactions that need to operate in a
mixed online/printed world.

I believe the NSW government uses the technology for divorce
applications (It's a reference site they brought up, its not my
personal experience :) ) and I've also bumped into another NSW Govt
department who was pushing it for insurance claims.

All of this is from distant memories (a couple of years ago) it may
have changed since that time.

Lars Westergren

unread,
May 5, 2008, 3:20:21 AM5/5/08
to java...@googlegroups.com
On Mon, May 5, 2008 at 8:59 AM, Peter Becker <peter.b...@gmail.com> wrote:
> If I'm not utterly mistaken POI will only read the files, not render their
> content. The latter is a much harder job for which I don't know a pure Java
> solution. I'd be happy to stand corrected, though.

iText. Haven't used it myself, but I think it is fairly popular.
http://www.lowagie.com/iText/

Cheers,
Lars

Lars Westergren

unread,
May 5, 2008, 3:24:11 AM5/5/08
to java...@googlegroups.com
Bah. Casper already mentioned iText. Sorry, Monday morning, brain not
woken up totally yet.

Peter Becker

unread,
May 5, 2008, 7:49:19 PM5/5/08
to java...@googlegroups.com
iText creates the PDF for you, POI can read some MS Office documents for you, but you still need something that knows how to take the information in the files and render a picture from it. If you can do it on screen, iText makes sure you can do it in PDF -- but I don't know of any Java tool that can actually display an MS Office document. There's a huge gap there and I wouldn't get my hopes up that there is some little gem to fill it, it's a bloody hard job.

I still keep my original advice: OpenOffice is probably the only decent candidate for the job.

  Peter

FMSantos

unread,
May 6, 2008, 4:55:58 AM5/6/08
to The Java Posse
If you want to merge PDFs in Java, you should think about PDF Sam

It is also able to split PDFs

Fer Out

On May 6, 12:49 am, "Peter Becker" <peter.becker...@gmail.com> wrote:
> iText creates the PDF for you, POI can read some MS Office documents for
> you, but you still need something that knows how to take the information in
> the files and render a picture from it. If you can do it on screen, iText
> makes sure you can do it in PDF -- but I don't know of any Java tool that
> can actually display an MS Office document. There's a huge gap there and I
> wouldn't get my hopes up that there is some little gem to fill it, it's a
> bloody hard job.
>
> I still keep my original advice: OpenOffice is probably the only decent
> candidate for the job.
>
>   Peter
>
> On Mon, May 5, 2008 at 5:20 PM, Lars Westergren <lars.westerg...@gmail.com>
> wrote:
>
>
>
>
>
> > On Mon, May 5, 2008 at 8:59 AM, Peter Becker <peter.becker...@gmail.com>
> > wrote:
> > > If I'm not utterly mistaken POI will only read the files, not render
> > their
> > > content. The latter is a much harder job for which I don't know a pure
> > Java
> > > solution. I'd be happy to stand corrected, though.
>
> > iText. Haven't used it myself, but I think it is fairly popular.
> >http://www.lowagie.com/iText/
>
> > Cheers,
> > Lars- Hide quoted text -

Viktor Klang

unread,
May 6, 2008, 11:39:13 AM5/6/08
to java...@googlegroups.com
BFO PDF also supports merging and splitting PDFs.

Seriously, it's the best PDF framework I've seen so far.
The only drawback is that it is not free as in beer for developers of OSS-apps.

They (BFO) should really consider going the route of offering it for free for non-commercial usage.

Cheers,
-V

Ranga

unread,
May 7, 2008, 6:19:04 AM5/7/08
to The Java Posse
Hello,

I am working for a company called ceTe Software which has a product
using which you can create PDF documents programmatically from
scratch, merge existing PDF documents, add contents to existing PDF
files, stamping PDFs, appending existing PDF documents, form filling,
rotating and scaling PDFs, etc.

You can merge the existing PDF documents without any problem you can
also modify the existing PDF documents. You can create a PDF using the
text files but there is no direct conversion for this.

Currently it is not possible to convert the word and RTF files into
PDF documents but we are working on developing a product which can
convert these files into PDF.

You can add headers and footers to the PDF and you can also use the
Template element for this. It is not a problem for creating a 100 page
PDF using our product.

You can refer to our website at http://www.cete.com.

Thanks,

Ranganadh.


On May 1, 2:07 am, Don M <donaldsmitch...@gmail.com> wrote:
> Hi,
>
> This isn't Java Posse related, but I'm looking for suggestions or
> recommendations from anyone that has had to work with generating PDFs,
> converting existing documents toPDF, and merging multiple PDFs into
> one newPDF.
>
> I'm currently running in a Java 5, Apache Tomcat 5.0, Oracle, Solaris
> environment, and have the budget to explore other applications and
> hardware including non-Java products as long as I can integrate them
> in our existing Java environment.
>
> We basically have a "document management" application that has been
> allowing people to upload Word,PDF, RTF, and text documents.  Related
> documents are stored together with other metadata about them.  They
> would like to have all of these documents converted and merged into a
> singlePDFwith some cover pages, appropriate headers and footers, and
> some custom pages sprinkled throughout based on the metadata.  The
> final document will be under 100 pages in almost all cases and
> probably around 30 pages.
>
> Has anyone done anything like this and have some experiences to
> share?  Creating new PDFs from metadata in Java is something we
> already do in a few apps, but converting different document types toPDFand merging them together is new to us.
>
> Don
Reply all
Reply to author
Forward
0 new messages