PDF creation takes lot of time on Linux

1,698 views
Skip to first unread message

Anand

unread,
Oct 17, 2011, 5:45:38 AM10/17/11
to Flying Saucer Users
I am using XmlRenderer to generate PDF from a XHTML template but it
takes a lot of time on Linux server. In fact it is taking almost
50mins but the same program and same template takes just 1 min on
windows machine. Am I missing something? Does it have any specific
library requirement from an OS perspective?
I have copied the code snippet below. The method call
ITextRenderer#layout() takes maximum time and it seems that JRE is
stuck up there and does nothing. I also took thread dump and it showed
me that Java2D daemon thread is in WAIT state.

Linux server details:
Red Hat Enterprise Linux Server release 5.1 (Tikanga)


Code snippet
ITextRenderer localITextRenderer = new ITextRenderer();

ByteArrayOutputStream localByteArrayOutputStream = new
ByteArrayOutputStream();

DocumentBuilderFactory localDocumentBuilderFactory =
DocumentBuilderFactory.newInstance();
DocumentBuilder localDocumentBuilder =
localDocumentBuilderFactory.newDocumentBuilder();

localDocumentBuilder.setEntityResolver(FSEntityResolver.instance());

Document localDocument = localDocumentBuilder.parse(new
StringBufferInputStream(paramString));

localITextRenderer.setDocument(localDocument, null);

localITextRenderer.layout();

Peter Brant

unread,
Oct 17, 2011, 7:47:51 AM10/17/11
to flying-sa...@googlegroups.com
There isn't any fundamental reason that performance on Linux should be
any different than Windows. It's likely your Linux process just has a
lower effective heap size than the Windows one (either because you're
not setting an explicit heap size and the default sizes are different
or you're running 32-bit Windows and 64-bit Linux with the same heap
size). The JVM is supposed to notice when it's doing nothing but
running the garbage collector, but my experience has been that this
doesn't work very well especially if the program in question is
creating lots and lots of small objects.

In short, the simple answer is to throw more memory at the problem.

Pete

Patrick Wright

unread,
Oct 17, 2011, 7:55:18 AM10/17/11
to flying-sa...@googlegroups.com
Also, please test with a large document with no external resources
(CSS, Images) and see how that performs, to exclude any potential
network/network configuration problems.


Patrick

strap

unread,
Nov 18, 2011, 6:21:52 AM11/18/11
to Flying Saucer Users
Hi,

I had a similar problem, the code that I used its like yours and my
machines are linux based.
In my code I used a file stream, not a string stream, but it's the
same.
So... my creative solution:

Given an xhtml document like:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://
www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<!-- ... document content ... -->
</html>

Download from w3 the following files:
http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd
http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent
http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent
http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent

Put them into a directory in your webserver and change the doctype
definition in your xhtml, in order to point dtd 'locally' and not from
w3.org:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://
yourdomain/DTD/xhtml1-strict.dtd">
<html>
<!-- ... document content ... -->
</html>

If you notice the files are downloaded slowly, the parser retrieve
them to do its job, but if you have this file 'locally'... it's a
little bit faster ;)
Enjoy!

HTH
Cheers
Strap

Patrick Wright

unread,
Feb 26, 2013, 7:18:23 AM2/26/13
to flying-sa...@googlegroups.com
Hi

It's possible, though I think unlikely, that your page includes some very rare combination of layout characteristics that causes the layout routines to spin madly. I doubt it. I think if that were the case we'd see more reports of this, as so many people use FS with so many different combinations of HTML and CSS.

What is much more likely is
- you have too little memory assigned to the process - try configuring your process to dump a GC log (see http://stackoverflow.com/questions/895444/java-garbage-collection-log-messages or other documentation online) to see how often the collector is running, then follow tuning advice (also online)

- you have some reference to resources online (XML/HTML entity declarations, CSS, images) where the remote server is slow or blocked - to isolate this, try removing *all* remote references, inline all CSS, and if that fixes it, use a process of elimination to track down which resources are loading slowly. 


HTH
Patrick 


On Tue, Feb 26, 2013 at 1:03 PM, Asad Abbas <asada...@gmail.com> wrote:
Any success with the issue Anand?

I am facing same issue .. layout method takes the most time

--
You received this message because you are subscribed to the Google Groups "Flying Saucer Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flying-saucer-u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Alex GMail

unread,
Feb 26, 2013, 10:03:10 AM2/26/13
to flying-sa...@googlegroups.com
I faced the same problem. I am using Flying Saucer to generate PDF from
JSF (xhtml) pages.
The reference to external on-line resources is indeed what was slowing
down my generation. In my case that was the <!DOCTYPE> with schema
declarations and xml name space references.

So I included some initial processing of the text before I give it to
Flying Saucer for rendering:

I create the export using a servlet filter. Here is what my filter code
looks like:

@WebFilter(filterName="Exporter", urlPatterns="*.xhtml",
asyncSupported=true)
public class RendererFilter implements Filter {
@Override
public void init(FilterConfig config) throws ServletException {
}

@Override
public void doFilter(ServletRequest req, ServletResponse resp,
FilterChain filterChain) throws IOException, ServletException {

HttpServletRequest request = (HttpServletRequest) req;
HttpServletResponse response = (HttpServletResponse) resp;
String pageName =
request.getRequestURI().substring(request.getRequestURI().indexOf("/")+1,request.getRequestURI().length());
//Check to see if this filter should apply.
String renderType = request.getParameter("RenderType");
// this is to render only single dome element if present
String renderId = request.getParameter("RenderId");
// sometimes there is an SVG chart on the page. It's data is processed
separately
String chartData = request.getParameter("chartData");
if (renderType != null && !renderType.equals("pdf")) {
//Capture the content for this request
String baseURL = request.getRequestURL().toString();
// this class wraps the Servlet response and captures the output
from the filter chain
ContentCaptureServletResponse capContent = new
ContentCaptureServletResponse(response);
filterChain.doFilter(request, capContent);

try {
//Parse the XHTML content to a document that is readable by the
XHTML renderer.
String content = capContent.getContent();
StringBuffer resContent = new StringBuffer();
// remove DOCTYPE tag - schema references in it result in extremely slow
processing
Pattern p = Pattern.compile("<[ ]*![ ]*DOCTYPE[^>]*>");
content = p.matcher(content).replaceAll("");
// remove the content of the <script> tags
p = Pattern.compile("<[ ]*script[^>]*>[^<]*<[ ]*/[
]*script[^>]*>");
content = p.matcher(content).replaceAll("");
if (chartData != null && !chartData.isEmpty()) {
// if we have SVG chart data - find where to render it and add it to
the DOM
p = Pattern.compile("(chartContainer[^>]*>)(\\.\\.\\.)(</div>)");
Matcher m = p.matcher(content);
if (m.find()) {
m.appendReplacement(resContent, "$1" + chartData + "$3");
}
m.appendTail(resContent);
content = resContent.toString();
// do some svg clean-up that otherwise will result in incorrect
rendering
p = Pattern.compile("NaN");
content = p.matcher(content).replaceAll("0");
resContent.delete(0, resContent.length());
p = Pattern.compile("(<[ ]*svg)");
m = p.matcher(content);
if (m.find()) {
// make sure the SVG elements have style="display:block;" - needed
for proper SVG rendering
m.appendReplacement(resContent, "$1" + "
style=\"display:block;\" ");
}
m.appendTail(resContent);
} else {
resContent.append(content);
}

InputSource source = new InputSource(new
ByteArrayInputStream(resContent.toString().getBytes()));
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = factory.newDocumentBuilder();
Document xhtmlContent = documentBuilder.parse(source);
if(renderId != null){
// if we want to render only a single DOM element - we extract it into
a new document
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr;
try {
expr = xpath.compile("//*[contains(@id, '"+renderId+"')]");
NodeList elem1List = (NodeList) expr.evaluate(xhtmlContent,
XPathConstants.NODESET);
if(elem1List.getLength()>0){
xhtmlContent = documentBuilder.newDocument();
xhtmlContent.setDocumentURI(baseURL);
xhtmlContent.adoptNode(elem1List.item(0));
xhtmlContent.appendChild(elem1List.item(0));
}
} catch (XPathExpressionException ex) {
Logger.getLogger(RendererFilter.class.getName()).log(Level.SEVERE,
null, ex);
}
}

if (renderType.equals("pdf")) {
// and finally do the rendering
ITextRenderer renderer = new ITextRenderer();
SharedContext ctx = renderer.getSharedContext();
ctx.setUserAgentCallback(new JSFUserAgent());

ChainedReplacedElementFactory cef = new
ChainedReplacedElementFactory();
cef.addFactory(new
ITextReplacedElementFactory(renderer.getOutputDevice()));
cef.addFactory(new SVGReplacedElementFactory());
ctx.setReplacedElementFactory(cef);

renderer.setDocument(xhtmlContent, baseURL);
renderer.layout();

response.setContentType("application/pdf");
response.setHeader("Content-Disposition", "attachment;
filename="+pageName+".pdf");
OutputStream browserStream = response.getOutputStream();

try {
renderer.createPDF(browserStream);
} catch (com.lowagie.text.DocumentException ex) {
Logger.getLogger(RendererFilter.class.getName()).log(Level.SEVERE,
null, ex);
}
}
} catch (ParserConfigurationException ex) {
throw new ServletException(ex);
} catch (SAXException ex) {
throw new ServletException(ex);
}
} else {
//Normal processing
filterChain.doFilter(request, response);
}
}

@Override
public void destroy() {
}
}

The JSFUserAgent class provides cashing for the image data and Faces
resources.
The SVGReplacedElement class and supporting factory class provides
rendering for the SVG data. It renders the data to a separate pdf element
which is than embedded in the main docment.

Regards,
Aleksandar Nikolov


On Tue, 26 Feb 2013 14:18:23 +0200, Patrick Wright <pdou...@gmail.com>
wrote:
>>> DocumentBuilderFactory.**newInstance();
>>> DocumentBuilder localDocumentBuilder =
>>> localDocumentBuilderFactory.**newDocumentBuilder();
>>>
>>> localDocumentBuilder.**setEntityResolver(**FSEntityResolver.instance());
>>>
>>> Document localDocument = localDocumentBuilder.parse(new
>>> StringBufferInputStream(**paramString));
>>>
>>> localITextRenderer.**setDocument(localDocument, null);
>>>
>>> localITextRenderer.layout();
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups
>> "Flying Saucer Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send
>> an
>> email to flying-saucer-u...@googlegroups.com.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>>
>>
>


--
Using Opera's revolutionary email client: http://www.opera.com/mail/

Alex GMail

unread,
Feb 27, 2013, 6:44:29 AM2/27/13
to flying-sa...@googlegroups.com
Do you use an user agent, to implement caching and external resources
resolution, like I do? During the layout() call it will try to calculate
the size and position of each element on your page. It will also try to
load any external resource you reference.

Initially I did not use custom user agent, and the layout was slow - not
that slow, but still slow.
Than after I implemented the user agent, the slowest part of my code was
the parsing of the text into a DOM tree. That is where I implemented the
"cleaning" of the text from irrelevant data - the <!DOCUMENT> and <script>
tags.

I also run on Linux and currently my conversion takes about 1 sec for 2-3
page documents, where before the custom user agent and the text cleaning
it could take up to 10-11 min for the same document. It was so slow, I
thought the code is going into infinite loop :)

Regards,
Aleksandar Nikolov


On Wed, 27 Feb 2013 10:09:59 +0200, Asad Abbas <asada...@gmail.com>
wrote:

> Thanks Alex for your reply.
>
> But my issue is still there after executing this preprocessing step, it
> seems i have same issue as OP, one of the call of *layout()* method took
> around 190 minutes , all other steps are quite fast for my requirement.
>> <pdou...@gmail.com<javascript:>>
>> <asada...@gmail.com<javascript:>>
>> <javascript:>.

Chetan c

unread,
Apr 29, 2015, 3:43:32 AM4/29/15
to flying-sa...@googlegroups.com
Hi,

I am facing a similar issue. A fairly rich HTML(with 1-2 images) is taking 2-3 seconds when tested in local windows environment, taking about 20 minutes in a unix environment. Any inputs here ?

Thanks in advance.
Reply all
Reply to author
Forward
0 new messages