PDF: Creating links that open embedded files

1,946 views
Skip to first unread message

Robert Vogel

unread,
Jul 19, 2012, 6:27:10 AM7/19/12
to flying-sa...@googlegroups.com
Hello!

When creating PDFs with xhtmlrenderer I often attach files on the document level using the iText PdfStamper class in a second pass processing. This works pretty well but I wondered if I could improve that process.

The idea is to embed special non-html tags to the source which reference a file to embed. For example <embedfile src="/path/to/file.docx">Some cool file</embedfile>. I then could use a ReplacementElement to embed the file as a FileAttachment annotation on the page level, and place a link (Annotation) to the embeded file at the position of the element within the normal text. This could all be done at rendering time, couldn't it?

Does anybody has experience with stuff like this? Maybe there is already an implementation? Help and comments would be very appreciated. Thanks!

Best regards,
Robert

Peter Brant

unread,
Jul 19, 2012, 4:13:46 PM7/19/12
to flying-sa...@googlegroups.com
You might be able to piggy back off the existing link support too (e.g. something like

<a href="..../foo.docx" data-attach-file="true">...</a>

)

The ITextReplacedElement does have access to the underlying PdfWriter so you could add the file attachment there.  Let me know if you run into problems.  I'm a little fuzzy on the iText API for this, but could certainly brush up if necessary.

Pete

Robert Vogel

unread,
Jul 20, 2012, 4:17:03 AM7/20/12
to flying-sa...@googlegroups.com
Hi Pete,

thanks for your advice! I think you are right, it would be better to use an existing html element instead o creating an own one. And now that I've recently learned that only inline-block or block elements can be replaced (Yay, I finally read the manual :) ) I see a good chance that is might work.

Maybe I can ask you for another hint?

As far as I understand the API I will have to create a class that implements the ReplacedElementFactory interface. Within the methods I can implement my own logic and return objects derived from ReplacedElement. I will have to set my  ReplacedElementFactory via <ITextRenderer>.getSharedContext().setReplacedElementFactory(). But there can only be one ReplacedElementFactory, can't it? So either my ReplacedElementFactory extends the ITextReplacedElementFactory, or I use a pattern like the ChainedReplacedElementFactory suggested in the R8 user guide (http://flyingsaucerproject.github.com/flyingsaucer/r8/guide/users-guide-R8.html#xil_18) and the SVG demo? Otherwise I'd loose the functionality of the original ITextReplacedElementFactory.

Can you also give me a hint where to find the general replacement of anchor tags in the library? I already browsed througt the code but got a little confused :)

Thanks in advance!

Greetings,
Robert

Peter Brant

unread,
Jul 21, 2012, 8:07:59 AM7/21/12
to flying-sa...@googlegroups.com
Yeah, links and anchors are handled differently.  getLinkUri in the user agent is consulted to see if the resulting box should be treated as a link.  For PDFs, it's then processed by processLink() in ITextOutputDevice.  I think the data-* attribute idea is unobtrusive enough and the functionality useful enough that we could put the implementation directly into the project if you want.  Maybe we should namespace the attribute to avoid collisions (data-fs-embed-file="true")?

Pete

Robert Vogel

unread,
Jul 23, 2012, 4:07:50 AM7/23/12
to flying-sa...@googlegroups.com
Hi Pete,

okay thanks! That should get me started. I think something like data-fs-embed-file would be perfect...

Greetings, Robert

Robert Vogel

unread,
Jul 24, 2012, 5:46:28 AM7/24/12
to flying-sa...@googlegroups.com
Hi,

I just wrote a little test programm. I extended the ITextReplacedElementFactory class and reimplemented the createReplacedElement method. But it seems to me that the method gets called twice for each element. I'm a little confused.

public class FileEmbeddingITextReplacedElementFactory extends ITextReplacedElementFactory{
    public FileEmbeddingITextReplacedElementFactory( ITextOutputDevice outputDevice) {
    super(outputDevice);
    }

    @Override
    public ReplacedElement createReplacedElement(LayoutContext lc, BlockBox bb, UserAgentCallback uac, int i, int i1) {
    Element e = bb.getElement();
    if (e == null) {
        return null;
    }

    String nodeName = e.getNodeName();
    if (nodeName.equals("a")) {
        System.out.println( "data-fs-embed-file" );
        System.out.println( e.getAttribute("data-fs-embed-file") );
        System.out.println( e.getAttribute("href") );
    }

    return super.createReplacedElement(lc, bb, uac, i1, i1);
    }

But following your hint I now think it might be better to extend the ITextOutputDevice and override the processLink method. But in this case I also would have to create an own ITextRenderer that sets the correct OutputDevice. Maybe I should just get used to git and fork the xhtmlrenderer repo and modify the code directly instead of trying to extend it... :)

greetings,
Robert

Peter Brant

unread,
Jul 26, 2012, 11:12:39 AM7/26/12
to flying-sa...@googlegroups.com
I'm not sure why that would be, but I think you would be better off hacking on processLink directly.  Links are usually inline content and replaced elements can only be blocks.

Pete

Jens Rutschmann

unread,
Aug 10, 2012, 11:11:44 AM8/10/12
to flying-sa...@googlegroups.com
Hi Robert,

I was wondering what the current state of your patch looks like.

Actually I'm also interested in this functionality as a customer asked for it. Do you need any help in merging the
change into a git branch and sending a pull request? I can also help you with coding but I'm not yet familiar with the
itext functionality required to add attachments.

We could also do this off-list and in German if you prefer.

Cheers,
Jens

Robert Vogel

unread,
Nov 20, 2012, 12:07:28 PM11/20/12
to flying-sa...@googlegroups.com
Hi everybody!

At last I've found some time to work on this topic. I've committed a little patch to my fork repo on github (https://github.com/osnard/flyingsaucer/commit/e4b9966a55e7d88520e68203f76e52ed61aca4c9).

In org.xhtmlrenderer.pdf.ITextOutputDevice those are my changes:

    [...]
    private void processLink(RenderingContext c, Box box) {
        Element elem = box.getElement();
        if (elem != null) {
            NamespaceHandler handler = _sharedContext.getNamespaceHandler();
            String uri = handler.getLinkUri(elem);
            if (uri == null) {
                return;
            }

            String doEmbedFile = handler.getAttributeValue( elem, "data-fs-embed-file" );
            if( "true".equals( doEmbedFile.toLowerCase() ) ) {
                String fileName = new File( uri ).getName();
                try {
                    com.lowagie.text.Rectangle targetArea = checkLinkArea(c, box);
                    if (targetArea == null) {
                        return;
                    }

                    byte[] fileBytes = _sharedContext.getUac().getBinaryResource(uri);
                    PdfFileSpecification fs;
                    fs = PdfFileSpecification.fileEmbedded(
                        _writer,
                        null,
                        fileName,
                        fileBytes
                    );
                   
                    String titleAttribute = handler.getAttributeValue( elem, "title" );
                    fs.addDescription(
                        titleAttribute,
                        true
                    );

                    targetArea.setBorder(0);
                    targetArea.setBorderWidth(0);
                   
                    PdfAnnotation annot = PdfAnnotation.createFileAttachment(
                        _writer,
                        targetArea,
                        titleAttribute,
                        fs
                    );
                    annot.setBorderStyle(new PdfBorderDictionary(0.0f, 0));
                    annot.setBorder(new PdfBorderArray(0.0f, 0.0f, 0));
                    _writer.addAnnotation(annot);
                } catch (IOException ex) {
                    XRLog.render(Level.INFO, "Could not embed file " + fileName );
                }
            }
            else
if (uri.length() > 1 && uri.charAt(0) == '#') { //internal jumplink
                String anchor = uri.substring(1);
                [...]


As you can see, I use the UserAgentCallback in the SharedContext to retrieve the binary data of the file. Afterwards I create a FileSpecification and a FileAttachmentAnnotation. In a first approach I tried to attach the file on document level using the _writer.addFileAttachment(...) interface and link to it with a PdfAction.gotoEmbedded(...) action. But this didn't work. I think it might have been because I was attaching excel files ( "PDF supports "embedded go-to actions" for PDFs within PDFs. If you're embedded file isn't a PDF, that won't work... and it doesn't look like iText supports them at a higher level.").

There is still one big problem: The link now gets covered with the default "PushPin" icon for file attachments. Does anybody know how to get rid of it?
And I'm not sure what happens if there are more links pointing to the same file. I guess it will be attached multiple times.

Btw.: I'd love to have some feedback on the code in general. If you like it I can make a pull request (after fixing the mentioned problems of course).

best regards,
Robert


2012/8/10 Jens Rutschmann <je...@k15t.com>

Robert Vogel

unread,
Nov 26, 2012, 8:06:28 AM11/26/12
to flying-sa...@googlegroups.com
Hi, it's me again :)

I've tried various ways now. Unfortunately without success yet. After failing to create a FileAttachment Annotation without a PushPin (or similar) icon, I tired to create a "Launch" PdfAction.

The following ways didn't work for me (for context please see last post):

1)
PdfAction action = PdfAction.createLaunch(fileName, null, null, null);
PdfAnnotation annot = new PdfAnnotation(_writer, targetArea.getLeft(), targetArea.getBottom(), targetArea.getRight(), targetArea.getTop(), action);
annot.put(PdfName.SUBTYPE, PdfName.LINK);


2)
PdfAction action = new PdfAction();
action.put( PdfName.S, PdfName.LAUNCH );
action.put( PdfName.F, fs ); //Also tried fs.getReference() instead of just fs as second parameter
PdfAnnotation annot = new PdfAnnotation(_writer, targetArea.getLeft(), targetArea.getBottom(), targetArea.getRight(), targetArea.getTop(), action);
annot.put(PdfName.SUBTYPE, PdfName.LINK);


I've attached an example outline made with iText RUPS. The PDF content seems correct to me. The annotation gets rendered as a clickable link, but when clicked nothing happens. I think it might be a problem with the Lauch action. Maybe the reader does not know which application to use with the file.

There are Windows/Unix/Mac specific parameters described in http://partners.adobe.com/public/developer/en/pdf/PDFReference.pdf on page 521. Maybe I have to add some of these.

Does anybody have expierence with this? Any hints? Am I heading in the right direction?

greetings,
Robert
iText RUPS_2012-11-26_13-45-23.png

Robert Vogel

unread,
Nov 26, 2012, 8:29:40 AM11/26/12
to flying-sa...@googlegroups.com
Yes, me again :)

I've found the following two sources:

The suggested method here is to make a JavaScript action:

PdfAction action = PdfAction.javaScript("this.exportDataObject({cName:\""+fileName+"\", nLaunch:2});", _writer);

I'm not quite sure if the JS call has to be on this or on doc. But I'll figure out :).

--
Robert

Robert Vogel

unread,
Nov 26, 2012, 8:51:07 AM11/26/12
to flying-sa...@googlegroups.com
OMG it works :D

I'm going to clean up the code, make some tweaks and commit it to the github.com fork repo on https://github.com/osnard/flyingsaucer

Please give me some feeback :)

--
Robert

Jens Rutschmann

unread,
Nov 26, 2012, 8:53:51 AM11/26/12
to flying-sa...@googlegroups.com
Hi Robert,

great news!

I tried your previous patches last Friday and did some other tests.
I'll post some feedback on that (and your latest changes) later.

Cheers,
Jens

Robert Vogel

unread,
Nov 26, 2012, 9:35:17 AM11/26/12
to flying-sa...@googlegroups.com
Hi Jens!

Yes I'm very happy too. I've put my recent work on github:

https://github.com/osnard/flyingsaucer/commit/efc975bc323da615d966e87199e9dfee48b6677e

The changes are few but the diff view shows a lot of lines because the indenting changed.

I will have to optimize the code. I.e. use the full file URI as identifier in the _attachments array. Maybe even expose _attachments field with a getter/setter. It would be great to have the value of the anchor tags title attribute as a description for the FileSpecification And so on...

Thanks in advance for any feedback!

--
Robert

Jens Rutschmann

unread,
Nov 28, 2012, 9:00:09 AM11/28/12
to flying-sa...@googlegroups.com
Hi,

I've had a look at your changes today and must say it works pretty good. I also experimented with the different kind of
annotations last week. The FILEEMBEDDED annotations always had the drawback that the files were listed twice in Adobe
Reader even if the annotation and the document level attachment pointed to the same file spec. So I like the solution
with the PdfAction a lot (even if it only seems to work in Adobe products).

I added some remarks inline here: https://github.com/osnard/flyingsaucer/commit/efc975bc323da615d966e87199e9dfee48b6677e

Adding support for 'data-description' or title and maybe 'data-display-filename' attributes would also be a nice thing
in my opinion.

Jens

Am 26.11.2012 15:35 schrieb Robert Vogel:
> Hi Jens!
>
> Yes I'm very happy too. I've put my recent work on github:
>
> https://github.com/osnard/flyingsaucer/commit/efc975bc323da615d966e87199e9dfee48b6677e
>
> The changes are few but the diff view shows a lot of lines because the
> indenting changed.
>
> I will have to optimize the code. I.e. use the full file URI as identifier
> in the _attachments array. Maybe even expose _attachments field with a
> getter/setter. It would be great to have the value of the anchor tags titleattribute as a description for the FileSpecification

Robert Vogel

unread,
Dec 5, 2012, 3:01:44 AM12/5/12
to flying-sa...@googlegroups.com
Hi!

Thanks for your comments! It's a pitty that the JavaScript Annotation solution works only in Adobe products. Maybe a FileAttachment Annotaion with a PushPin icon would be better. But in this case I'd have to find a way to have multiple Annotations point to the same FileSpecification/FileAttachement. I also worry about the position of the PushPin within the text.

Btw.: I've found that using new File(uri).getName(); should work for different protocols. I'm using JRE 1.7:

new File("http://www.domain.tld/some/File.ext").getName(); // "File.ext"
new File("ftp://www.domain.tld/some/File.ext").getName(); // "File.ext"
new File("gopher://www.domain.tld/some/File.ext").getName(); // "File.ext"
new File("//www.domain.tld/some/File.ext").getName(); // "File.ext"
new File("/some/File.ext").getName(); // "File.ext"
new File("../some/File.ext").getName(); // "File.ext"
new File("some/File.ext").getName(); // "File.ext"
new File("file://some/File.ext").getName(); // "File.ext"
new File("C:\some\File.ext").getName(); // "File.ext"

Jens Rutschmann

unread,
Dec 5, 2012, 3:58:14 AM12/5/12
to flying-sa...@googlegroups.com
Hi Robert,

you can try something like the code below. The file is embedded only once but available using a pushpin annotation and
on document level. This also works at least in my KDE PDF viewer.

Unfortunately it has another drawback: Adobe reader lists such embedded files twice in its attachments tab.

As long as we don't find a way to make Adobe reader 'merge' its entries I think the PDFAction approach is better, given
that probably Adobe Reader is much more spread.

Cheers,
Jens


public static void main(String[] args) throws DocumentException, IOException {
String filename = "/home/jens/tmpfs/test.pdf";
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(filename));
document.open();

String filenameInPdf = "test.txt";
PdfFileSpecification spec = PdfFileSpecification.fileEmbedded(writer,
"/home/jens/test.txt", filenameInPdf, null);

// Attach the file on document-level.
writer.addFileAttachment("Attached to chapter \"Process Description\"", spec);

document.add(new Paragraph("Attached files:"));

// Add a "link" to the attachment on the current page.
PdfAnnotation annotation = new PdfAnnotation(writer, null);
annotation.put(PdfName.SUBTYPE, PdfName.FILEATTACHMENT);
annotation.put(PdfName.CONTENTS, new PdfString(filenameInPdf, PdfObject.TEXT_UNICODE));
annotation.put(PdfName.FS, spec.getReference());

Chunk linkChunk = new Chunk("\u00a0\u00a0");
linkChunk.setAnnotation(annotation);
Phrase phrase = new Phrase(filenameInPdf);
phrase.add(linkChunk);
document.add(new Paragraph(phrase));

document.close();
Reply all
Reply to author
Forward
0 new messages