Hi Carlos,
We've noticed an issue on XWiki. It seems that after we execute CSS4J we get our HTML entities removed.
Example input to CSS4J:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"><head>
<title>
Main - Home
</title>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type" />
<meta content="en" name="language" />
</head><body class="exportbody" id="body" pdfcover="0" pdftoc="0">
<div id="xwikimaincontainer">
<div id="xwikimaincontainerinner">
<div id="xwikicontent">
<p>Clément Aubin</p>
</div>
</div>
</div>
</body></html>
Output after applying CSS4J (0.41.3):
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head style="display: none; ">
<title style="display: none; ">
Main - Home
</title>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type" style="display: none; "/>
<meta content="en" name="language" style="display: none; "/>
</head><body class="exportbody" id="body" pdfcover="0" pdftoc="0" style="display: block; margin-top: 8px; margin-right: 8px; margin-bottom: 8px; margin-left: 8px; unicode-bidi: embed; ">
<div id="xwikimaincontainer" style="display: block; unicode-bidi: embed; ">
<div id="xwikimaincontainerinner" style="display: block; unicode-bidi: embed; ">
<div id="xwikicontent" style="display: block; unicode-bidi: embed; ">
<p style="display: block; margin-top: 3pt; margin-bottom: 3pt; unicode-bidi: embed; ">Clment Aubin</p>
</div>
</div>
</div>
</body></html>
Notice the é which has been removed.
Actually our code is:
String applyCSS(String html, String css, XWikiContext context)
{
LOGGER.debug("Applying the following CSS [{}] to HTML [{}]", css, html);
try {
//System.setProperty("org.w3c.css.sac.parser", "org.apache.batik.css.parser.Parser");
// Prepare the input
Reader re = new StringReader(html);
InputSource source = new InputSource(re);
SAXReader reader = new SAXReader(XHTMLDocumentFactory.getInstance());
reader.setEntityResolver(new DefaultEntityResolver());
XHTMLDocument document = (XHTMLDocument) reader.read(source);
// Set the base URL so that CSS4J can resolve URLs in CSS. Use the current document in the XWiki Context
document.setBaseURL(new URL(context.getDoc().getExternalURL("view", context)));
// Apply the style sheet
document.addStyleSheet(new org.w3c.css.sac.InputSource(new StringReader(css)));
applyInlineStyle(document.getRootElement());
OutputFormat outputFormat = new OutputFormat("", false);
if ((context == null) || (context.getWiki() == null)) {
outputFormat.setEncoding("UTF-8");
} else {
outputFormat.setEncoding(context.getWiki().getEncoding());
}
StringWriter out = new StringWriter();
XMLWriter writer = new XMLWriter(out, outputFormat);
writer.write(document);
String result = out.toString();
// Debug output
if (LOGGER.isDebugEnabled()) {
LOGGER.debug("HTML with CSS applied [{}]", result);
}
return result;
} catch (Exception e) {
LOGGER.warn("Failed to apply CSS [{}] to HTML [{}]", css, html, e);
return html;
}
}
We use CSS4J's DefaultEntityResolver class which has xhtml-lat1.ent added:
public DefaultEntityResolver() {
this.dtdNameToFilename.put("-//W3C//DTD XHTML 1.0 Strict//EN", "w3c/xhtml1-strict.dtd");
this.dtdNameToFilename.put("-//W3C//DTD XHTML 1.0 Transitional//EN", "w3c/xhtml1-transitional.dtd");
this.dtdNameToFilename.put("-//W3C//DTD XHTML 1.1//EN", "w3c/xhtml11.dtd");
this.dtdNameToFilename.put("-//W3C//ENTITIES Latin 1 for XHTML//EN", "w3c/xhtml-lat1.ent");
this.dtdNameToFilename.put("-//W3C//ENTITIES Symbols for XHTML//EN", "w3c/xhtml-symbol.ent");
this.dtdNameToFilename.put("-//W3C//ENTITIES Special for XHTML//EN", "w3c/xhtml-special.ent");
this.dtdNameToURL.put("-//W3C//DTD XHTML 1.0 Strict//EN", "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd");
this.dtdNameToURL.put("-//W3C//DTD XHTML 1.0 Transitional//EN", "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd");
this.dtdNameToURL.put("-//W3C//DTD XHTML 1.1//EN", "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd");
this.dtdNameToURL.put("-//W3C//ENTITIES Latin 1 for XHTML//EN", "http://www.w3.org/TR/xhtml11/DTD/xhtml-lat1.ent");
this.dtdNameToURL.put("-//W3C//ENTITIES Symbols for XHTML//EN", "http://www.w3.org/TR/xhtml11/DTD/xhtml-symbol.ent");
this.dtdNameToURL.put("-//W3C//ENTITIES Special for XHTML//EN", "http://www.w3.org/TR/xhtml11/DTD/xhtml-special.ent");
}
WDYT? Is there something we don't do correctly?
Thanks a lot
Links: