Convert documents (PDF too) to PDF/A with a web service

494 views
Skip to first unread message

JB

unread,
Nov 4, 2011, 5:31:00 AM11/4/11
to JODConverter
Hi

Read “2.2.2 vs 3.0b4 in multi-thread stress test “ for the web service
issue.
So I’ve got my Web Service to do ANY2PDF.

1/ convert to PDF/A.
The only solution I found is to modify the core.
In the DefaultDocumentFormatRegistry, I put this to add the FilterData
options

LinkedHashMap<String,Object> aFilterMap = new
LinkedHashMap<String,Object>();
aFilterMap.put("FilterName", "writer_pdf_Export");
aFilterMap.put("FilterData",
Collections.singletonMap("SelectPdfVersion", Integer.valueOf(1)));
pdf.setStoreProperties(DocumentFamily.TEXT, aFilterMap);

Idem for "calc_pdf_Export", "impress_pdf_Export", "draw_pdf_Export".

With the DumpJsonDefaultDocumentFormatRegistry utility, I check the
result:
[
{
"extension": "pdf",
"mediaType": "application/pdf",
"name": "Portable Document Format",
"storePropertiesByFamily": {
"DRAWING": {
"FilterData": {"SelectPdfVersion": 1},
"FilterName": "draw_pdf_Export"

]

Everything’s fine, my WS convert to PDF/A (checked by
http://www.pdf-tools.com/pdf/validate-pdfa-online.aspx)

The DocumentFormat.xml file of 2.2.2 had disappeared (I’m not sure
that the FilterData worked, see XmlDocumentFormatRegistry utility,
anyway), so I found this way to add the FilterData. Use of a config
file may be better, but I saw nothing in web service context (Json
file is only for command line).
http://shervinasgari.blogspot.com/search/label/pdfa
Explain about the same thing but I found that very complex.

2/ Convert PDF 2 PDF/A
Add the oracle-pdfimport oxt file to your OO and the WS can transform
PDF to PDF/A.

CONFIG FILE IN THE JODCONVERTER CORE WILL BE A GOOD ISSUE IN THE
ROADMAP, TO ADD FILTER DATA, TO SET MULTI INSTANCE, TO PARAM THE
OFFICEMANAGER…

By.

Fran Díaz

unread,
Nov 8, 2011, 7:56:17 AM11/8/11
to JODConverter
Hi Julien, thank you for your response. I replied only to you by
mistake. Now I'll share what I tried.

Following what you explain in your post (but using jodconverter API
directly from within my application instead of the webservice), this
is what I do:

I extend DefaultDocumentFormatRegistry:

public MyDocumentFormatRegistry() {
DocumentFormat pdf = new DocumentFormat("Portable Document Format",
"pdf", "application/pdf");

LinkedHashMap<String,Object> aFilterMap = new
LinkedHashMap<String,Object>();
aFilterMap.put("FilterName", "writer_pdf_Export");
aFilterMap.put("FilterData",
Collections.singletonMap("SelectPdfVersion", Integer.valueOf(1)));
pdf.setStoreProperties(DocumentFamily.TEXT, aFilterMap);

aFilterMap = new LinkedHashMap<String,Object>();
aFilterMap.put("FilterName", "calc_pdf_Export");
aFilterMap.put("FilterData",
Collections.singletonMap("SelectPdfVersion", Integer.valueOf(1)));
pdf.setStoreProperties(DocumentFamily.SPREADSHEET,
aFilterMap);

aFilterMap = new LinkedHashMap<String,Object>();
aFilterMap.put("FilterName", "impress_pdf_Export");
aFilterMap.put("FilterData",
Collections.singletonMap("SelectPdfVersion", Integer.valueOf(1)));
pdf.setStoreProperties(DocumentFamily.PRESENTATION,
aFilterMap);

aFilterMap = new LinkedHashMap<String,Object>();
aFilterMap.put("FilterName", "draw_pdf_Export");
aFilterMap.put("FilterData",
Collections.singletonMap("SelectPdfVersion", Integer.valueOf(1)));
pdf.setStoreProperties(DocumentFamily.DRAWING, aFilterMap);

addFormat(pdf);
}

And then I call the converter this way:

converter = new OfficeDocumentConverter(officeManager, new
MyDocumentFormatRegistry());
converter.convert(inputFile, outputFile);

The issue is that I'm getting an illegible PDF file, like
Jodconverter was reading the input file as an ASCII file and dumping
every character to the output PDF, instead of reading the input as a
PDF file.

Can you guess something I'm doing wrong?

As additional information for other readers, I also tried what I wrote
in the following post:

http://groups.google.com/group/jodconverter/browse_thread/thread/9412400df0cf059a


Regards and thanks in advance.
> Everything’s fine, my WS convert to PDF/A (checked byhttp://www.pdf-tools.com/pdf/validate-pdfa-online.aspx)
>
> The DocumentFormat.xml file of 2.2.2 had disappeared (I’m not sure
> that the FilterData worked, see XmlDocumentFormatRegistry utility,
> anyway), so I found this way to add the FilterData. Use of a config
> file may be better, but I saw nothing in web service context (Json
> file is only for command line).http://shervinasgari.blogspot.com/search/label/pdfa

JB

unread,
Nov 9, 2011, 6:19:26 AM11/9/11
to JODConverter
In your case the method addFormat(pdf) might be a replaceFormat(pdf)
method because the DefaultDocumentRegistry had already done a
addFormat(pdf) in his constructor and yours add a second pdf entry in
the list (see DefaultDocumentFormatRegistry).

But there’s no accessor to the list in DefaultDocumentFormatRegistry,
replace pdf entry can’t be possible.

So I’m afraid that you have to modify the core (classes
DefaultDocumentFormatRegistry), like I did, to add filterData or copy
all the DefaultDocumentRegistry in your java class.

Verify, that you have installed the PDF extension to read PDF
http://extensions.services.openoffice.org/project/pdfimport

Now you have java problems rather than Jodconverter problems.
> http://groups.google.com/group/jodconverter/browse_thread/thread/9412...
Reply all
Reply to author
Forward
0 new messages