How to test/check contents of pdf files with robotframework?

6,651 views
Skip to first unread message

mr

unread,
Feb 13, 2014, 12:11:32 PM2/13/14
to robotframe...@googlegroups.com
Hi everybody!

I have to test if some pdf files do contain some strings.
What is the best strategy to do it with the help of robot framework?

Regards!

Kevin O.

unread,
Feb 13, 2014, 1:08:23 PM2/13/14
to robotframe...@googlegroups.com
After evaluating the PDF tools native to Python, we decided to go with an external tool, pdftotext, a component of Xpdf.
Attached the library that we have barely used so far. It is a thin wrapper around the tool.
As is, the library will only work on Windows with pdftotext.exe placed in a folder called xpdf that is in the same folder as the library. We put pdftotext.exe in SCM and do not need cross-platform support yet, but it would not be too hard to fix that.
I hope it is helpful.
I am curious what others have done.

Kevin
PdfLibraryPOC.py

David

unread,
Feb 13, 2014, 6:44:52 PM2/13/14
to robotframe...@googlegroups.com
If one is not restricted to Python only, then there are more options.

On the Java side, one could look into using Apache PDFBox or iText. I don't suppose anyone wrote wrapper RF libraries for them yet, but one could easily do that, following the RF documentation & the example docs for the Java PDF libraries. And run these over Jython, switching to Java remote server if Jython route is a no go.

I've used Apache PDFBox to extract out text from a PDF for test verification. But we don't use RF at my organization so I just had it as a regular Java wrapper class library that's not built for RF in mind.

FYI, there's a list of good PDF tools that one can evaluate for this job as well for integration with RF. I haven't checked which are Python and not.

mr

unread,
Feb 27, 2014, 4:41:35 AM2/27/14
to robotframe...@googlegroups.com
@Kevin: Thanks for the library. It works well.

A hint: I had some encoding/decoding problems which could not be easily solved by the "-enc" parameter of pdftotext. Therefore I used the string library:
${content}= Convert Pdf To Text Using Pdftotext somePdfFile.pdf
${string}= Decode Bytes To String ${content} ISO-8859-1

David

unread,
Apr 20, 2014, 4:05:48 PM4/20/14
to robotframe...@googlegroups.com
FYI, I did come across this recently that can be an option as well:


Java based library so to be run with Jython or Java remote server. Uses iText library.
Reply all
Reply to author
Forward
Message has been deleted
0 new messages