Issue 168 in xdocreport: PDF Converter loses formatting

708 views
Skip to first unread message

xdocr...@googlecode.com

unread,
Oct 17, 2012, 9:27:12 AM10/17/12
to xdocr...@googlegroups.com
Status: New
Owner: ----
Labels: Type-Defect Priority-Medium

New issue 168 by Mr.M.McM...@gmail.com: PDF Converter loses formatting
http://code.google.com/p/xdocreport/issues/detail?id=168

The problem is described below, but firstly can I request that you:

1. Provide a list of the PDF converter improvements that you mention
in Issue 159 ("I suggest you to use 1.0.0-SNAPSHOT (use maven for
that or download at hand on the for docx converter because teh
converter is very very lot improved (I'm improving again). The
docx converter 0.9.8 is very bad.").

2. Indicate when 1.0.0-SNAPSHOT will be released - presumable as
version 1.0.0.

This would be very useful info for me in deciding whether to use xdocreport
going forwards.

What steps will reproduce the problem?

1. Run FormattingTests.docx through the PDF converer code (eg see
attached modified java junit and associated docx file).

2. Observe the output in the PDF conversion (see attached pdf file).

What is the expected output?

. It is expected that the pdf formatting matches the docx exactly. The
following is an analysis of the differences. Note that in addition to
these, header and footer formatting did not work at all well.

. Tables:
. Row height of less than 1cm is converted to 1cm.
. A table which is not of full page width will be centred in
the page.
. Coloured table borders are converted to black.

. Free text:
. The number of characters per line appears to have increased
between docx and pdf. The font size produced in the PDF appears
to be slightly larger than the source. This is difficult to
determine and requires further analysis to confirm what the
exact nature of the difference is.

. Font/style:
. The PDF and DOCX rendering of the different fonts and sizes
differs slightly (as mentioned in the free text section).
. Header 3 styling appears to be too small.
. Strikethrough appears as normal text.
. Subscript appears as normal text.
. Superscript appears as normal text.
. Highlighting is lost.

. Bullets:
. Microsoft Word bullets are lost
. Microsoft Word numbering is lost
. Microsoft Word multilevel lists lose numbering and indentation
beyond the first item.

Note however that all of these bullet representations can be
reproduced as non-Microsoft bullets using normal text and will
survive the pdf translation (see example in attached files).

. Tabs:
. Tabs within text are lost.


. Images:
. Text alongside an image results in both the text and the image
being misplaced.

What do you see instead?

. See above.

What version of the product are you using?

. XDocReport 0.9.8

On what operating system?

. Windows XP.

Please provide any additional information below.



Attachments:
DocxProjectWithVelocity2PDF.java 9.9 KB
FormattingTests.docx 22.3 KB
FormattingTests.pdf 13.6 KB

xdocr...@googlecode.com

unread,
Oct 18, 2012, 5:39:57 AM10/18/12
to xdocr...@googlegroups.com
Updates:
Status: Accepted
Owner: angelo.z...@gmail.com

Comment #1 on issue 168 by angelo.z...@gmail.com: PDF Converter loses
formatting
http://code.google.com/p/xdocreport/issues/detail?id=168

Hi,

At first many thank's for your docx. I have add it to our Junit docx->pdf
converter
http://code.google.com/p/xdocreport/source/browse/#git%2Fthirdparties-extension%2Forg.apache.poi.xwpf.converter.pdf%2Fsrc%2Ftest%2Fresources%2Forg%2Fapache%2Fpoi%2Fxwpf%2Fconverter%2Fcore

To answer to your 2 questions:

> 1. Provide a list of the PDF converter improvements that you mention

See in the http://code.google.com/p/xdocreport/issues/list which starts
with docx->converter.

See too below link where you can see docx which we use to test our
converter. It exists again some problem but the result can start be good.

> 2. Indicate when 1.0.0-SNAPSHOT will be released - presumable as version
> 1.0.0.

I don't know. I would like finish to improve pdf->docx converter to manage
commons case (for instance shape will not be supported in the 1.0.0) but
develop a a docx->pdf converter is a very hard task for me (I'm not an
expert with iText and with docx). We develop XDocReport on our spare time
so I cannot tell you when it will be release (I hope we will able to do
this release in one months, but I cannot promise that).

If you want test 1.0.0 docx->pdf converter, you can :

1) test it with our live demo at
http://xdocreport-converter.opensagres.cloudbees.net/
2) get sources from Git and build it yourself.
3) use maven to get docx->pdf converter from maven central with version
1.0.0-SNAPSHOT.


> . Tables:
> . Row height of less than 1cm is converted to 1cm.
fixed. Try it with the live demo.
> . A table which is not of full page width will be centred in
> the page.
fixed. Try it with the live demo.
> . Coloured table borders are converted to black.
fixed. Try it with the live demo.

Table is very improved but now there is a problem with inside borders which
are doubled. I must manage that by developping the Conflict adjacent
borders algorythm.

> . Free text:
> . The number of characters per line appears to have increased
> between docx and pdf. The font size produced in the PDF appears
> to be slightly larger than the source. This is difficult to
> determine and requires further analysis to confirm what the
> exact nature of the difference is.
it's fixed. The problem came from that default font (Calibri) was not
retrieved. It works in my local JUnit but the live demo doesn't work? Why?
I don't know?

> . Font/style:
> . The PDF and DOCX rendering of the different fonts and sizes
> differs slightly (as mentioned in the free text section).
> . Header 3 styling appears to be too small.
it's the same problem than below. The default font Calibri was not applyed.
> . Strikethrough appears as normal text.
Ok I have created http://code.google.com/p/xdocreport/issues/detail?id=169
issue
. Subscript appears as normal text.
Ok I have created http://code.google.com/p/xdocreport/issues/detail?id=170
issue
. Superscript appears as normal text.
Ok I have created http://code.google.com/p/xdocreport/issues/detail?id=171
issue
. Highlighting is lost.
Ok I have created http://code.google.com/p/xdocreport/issues/detail?id=172
issue

> . Bullets:
> . Microsoft Word bullets are lost
> . Microsoft Word numbering is lost
> . Microsoft Word multilevel lists lose numbering and indentation
> beyond the first item.

Bullet/Numbered list is not managed. See
http://code.google.com/p/xdocreport/issues/detail?id=151
> Note however that all of these bullet representations can be
> reproduced as non-Microsoft bullets using normal text and will
> survive the pdf translation (see example in attached files).

> . Tabs:
> . Tabs within text are lost.
Tabs is very complex to manage. I have started to manage it. See
http://code.google.com/p/xdocreport/issues/detail?id=164

> . Images:
> . Text alongside an image results in both the text and the image
> being misplaced.
Image is basic for the moment (but it seems that 1.0.0 resolves a little
your problem (just a problem with space before the image).

Don't hesitate to create some issue and attach docx sample as you have done
like this. I will add it in our JUnit docx. More we will have docx sampel
to convert, more converter will be improved.

Many thank's.

Regards Angelo

xdocr...@googlecode.com

unread,
Oct 22, 2012, 11:17:32 AM10/22/12
to xdocr...@googlegroups.com

Comment #2 on issue 168 by Mr.M.McM...@gmail.com: PDF Converter loses
formatting
http://code.google.com/p/xdocreport/issues/detail?id=168

Hi Angelo

Thanks for the response. That's excellent news.

The document I provided was generated from MS Word. Have you performed any
testing with docx produced via other technologies such as LibreOffice?

Do you know if docx from other technologies also currently exhibit the same
formatting issues?

Thanks again.

Mike



xdocr...@googlecode.com

unread,
Oct 22, 2012, 11:24:45 AM10/22/12
to xdocr...@googlegroups.com

Comment #3 on issue 168 by angelo.z...@gmail.com: PDF Converter loses
formatting
http://code.google.com/p/xdocreport/issues/detail?id=168

Hi Mike,

> The document I provided was generated from MS Word. Have you performed
> any testing >with docx produced via other technologies such as
> LibreOffice?

A little, but I have not done big test. But I think it should not have
problem with LibreOffice because it should follow ooxml specification.

> Do you know if docx from other technologies also currently exhibit the
> same formatting issues?

I'm not an expert with that, but I think it should not have problem if
LibreOffice follows ooxml specification.

You are welcome to test that:)

Regards Angelo


xdocr...@googlecode.com

unread,
Oct 31, 2012, 1:02:08 PM10/31/12
to xdocr...@googlegroups.com

Comment #4 on issue 168 by angelo.z...@gmail.com: PDF Converter loses
For your information, I have improved the position of the image (but it
should again improved). With your docx sample, the image are well
positionned.

Next step is to improve table border (o avoid that border are doubled).

Regards Angelo



xdocr...@googlecode.com

unread,
Nov 5, 2012, 2:09:12 PM11/5/12
to xdocr...@googlegroups.com

Comment #5 on issue 168 by Mr.M.McM...@gmail.com: PDF Converter loses
formatting
http://code.google.com/p/xdocreport/issues/detail?id=168

Hi Angelo

Thanks for the update. Do you have a release date in mind?

xdocr...@googlecode.com

unread,
Nov 6, 2012, 3:38:04 AM11/6/12
to xdocr...@googlegroups.com

Comment #6 on issue 168 by angelo.z...@gmail.com: PDF Converter loses
formatting
http://code.google.com/p/xdocreport/issues/detail?id=168

Hi Mike,

Before doing the release, I would like manage :

* hyperlink
* table border
* bullet/numbered list

I hope I will finish this month to do this release.

Regards Angelo

xdocr...@googlecode.com

unread,
Aug 8, 2013, 2:21:26 AM8/8/13
to xdocr...@googlegroups.com

Comment #7 on issue 168 by oueslati...@gmail.com: PDF Converter loses
formatting
http://code.google.com/p/xdocreport/issues/detail?id=168

Hello Angelo,

Have you manage to find a solution for the bullet without changing then to
normal text?

Thanks,
Bilel

--
You received this message because this project is configured to send all
issue notifications to this address.
You may adjust your notification preferences at:
https://code.google.com/hosting/settings

xdocr...@googlecode.com

unread,
Aug 8, 2013, 3:14:08 AM8/8/13
to xdocr...@googlegroups.com

Comment #8 on issue 168 by angelo.z...@gmail.com: PDF Converter loses
formatting
http://code.google.com/p/xdocreport/issues/detail?id=168

Hi Bilel,

XDocReport 1.0.3 (not released) improves a lot the font to use (line
height, font symbol, asian font, etc). Tell me if it works with you.

Regards Angelo

xdocr...@googlecode.com

unread,
Mar 10, 2015, 2:27:06 AM3/10/15
to xdocr...@googlegroups.com

Comment #9 on issue 168 by surabhi....@gmail.com: PDF Converter loses
formatting
https://code.google.com/p/xdocreport/issues/detail?id=168

hi angelo ,

Is the issue reported above has been resolved .i am using version 1.0.2 of
xdocreport

xdocr...@googlecode.com

unread,
Mar 10, 2015, 4:00:57 AM3/10/15
to xdocr...@googlegroups.com

Comment #10 on issue 168 by angelo.z...@gmail.com: PDF Converter loses
formatting
https://code.google.com/p/xdocreport/issues/detail?id=168

A lot of issues was fixed but not the whole. I have no time to support
today this converter.

Note that last version is 1.0.5.

xdocr...@googlecode.com

unread,
Mar 13, 2015, 2:57:48 AM3/13/15
to xdocr...@googlegroups.com

Comment #11 on issue 168 by surabhi....@gmail.com: PDF Converter loses
formatting
https://code.google.com/p/xdocreport/issues/detail?id=168

hi angelo,i have used last version 1.0.5 ,i find that bullet issue has not
been resolved till now and also the formatting of generated pdf is not same
as source document when it is being generated using xdocreport.

xdocr...@googlecode.com

unread,
Mar 13, 2015, 4:28:38 AM3/13/15
to xdocr...@googlegroups.com

Comment #12 on issue 168 by angelo.z...@gmail.com: PDF Converter loses
formatting
https://code.google.com/p/xdocreport/issues/detail?id=168

> i find that bullet issue has not been resolved

If I remember, it's a font problem that you must have installed in the
computer which converts the docx to PDF.

> also the formatting of generated pdf is not same as source document when
> it is being generated using xdocreport.

Yes I know it's not perfect, but I have no time today to work on this
topic. Any contribution are welcome!
Reply all
Reply to author
Forward
0 new messages