Sitemesh 2 performance improvements

180 views
Skip to first unread message

James Roper

unread,
Nov 11, 2011, 1:25:56 PM11/11/11
to SiteMesh 2 Users
Hello Sitemesh users,

I have made a number of performance improvements to Sitemesh 2, which
have now been accepted into the main repository. This post details
some of these performance improvements:

Single buffer parsing

The HTMLPageParser previously would copy large amounts of content into
new buffers, for example, the content of the body, head and content
tags. Additonally, sitemesh would copy the original buffer into a new
buffer, and would unnecessarily convert character arrays to strings,
which is effectively copying the content into another buffer. The new
implementation keeps all the content in the original buffer, and
stores metadata about what areas of the buffer have been removed and
inserted into, as well as having extracted parts such as the body and
head referring to a section of the original buffer. This should
improve performance in two ways, firstly sitemesh isn't copying so
much content between buffers, secondly sitemesh isn't generating as
much garbage, so the garbage collector doesn't have as much work to
do. Any plugins for sitemesh, for example, custom filters, may break
with this change, as they need to be able to handle the new buffer
data structure. The FastPageParser does not use single buffer
parsing.

Chained buffers

If you have nested decorations of sitemesh content, chained buffers
prevents sitemesh from needing to write each nested buffer into the
next buffer, and also prevents sitemesh from needing to reparse the
entire content from the nested buffers. When the nested buffer is
written out to the parent buffer, sitemesh records the insertion
point, and then when writing the parent buffer out to the stream,
writes the nested buffer in the correct location.

PartialPageParser

This is a new parser that, along with only using a single buffer, does
not parse the entire page, but just parses the <head> section of the
page. It does this by stopping the parsing when it reaches the <body>
tag, and then parsing backwards from the end of the content to find
the end </body> tag. This is a massive performance improvement
suitable for anyone that is just using sitemesh to decorate pages, and
not extract any content from the body or transform any of the body
content. For large pages, the performance improvement is 99% faster
than the HTMLPageParser. Seeing as it only does a partial parse, this
implies that it is not a full featured parser, and will not be useful
for all use cases. It can be used by configuring it in the
sitemesh.xml descriptor:

<page-parsers>
<parser content-type="text/html"
class="com.opensymphony.module.sitemesh.parser.PartialPageParser" />
</page-parsers>

These new performance improvements have been in place in JIRA 4.4 and
Confluence 4.0, so they should be considered quite stable. They will
be included in the next release of Sitemesh.
Reply all
Reply to author
Forward
0 new messages