[mule-dev] Problem using XMLUtils with CDATA

45 views
Skip to first unread message

Pascal Combescot

unread,
Apr 4, 2011, 10:05:04 AM4/4/11
to d...@mule.codehaus.org
Hello,

I'm trying to read an xmlStreamReader (DepthXMLStreamReader received by the http connector) using the org.mule.module.xml.util.XMLUtils static methods. And I found that CDATA text parts are just ignored. I wrote a simple Main class that shows the problem :

public class Main {

public static void main(String[] args) throws Exception {

InputStreamReader isr = new InputStreamReader(ClassLoader.getSystemResourceAsStream("my_file.xml"));
XMLInputFactory xmlif = XMLInputFactory.newInstance();
XMLStreamReader xmlsr = xmlif.createXMLStreamReader(isr);
DepthXMLStreamReader reader = new DepthXMLStreamReader(xmlsr);

Document doc = XMLUtils.toW3cDocument(reader);

System.out.println("<" + doc.getChildNodes().item(0).getNodeName() + ">");
for (int i = 0; i < doc.getChildNodes().item(0).getChildNodes().getLength(); i++) {
//Skip spaces and Co : NodeType=#text
if(doc.getChildNodes().item(0).getChildNodes().item(i).getNodeType() != Node.TEXT_NODE){
System.out.print("\t<" + doc.getChildNodes().item(0).getChildNodes().item(i).getNodeName() + ">");
System.out.print(doc.getChildNodes().item(0).getChildNodes().item(i).getTextContent());
System.out.println("</" + doc.getChildNodes().item(0).getChildNodes().item(i).getNodeName() + ">");
}
}
System.out.println("</" + doc.getChildNodes().item(0).getNodeName() + ">");
}
}

My xml file looks like this :
<ser:messageTag xmlns:ser="http://test.com/">
<ser:message>normalTEXT<![CDATA[<Info>Test</Info>]]>normalTEXT</ser:message>
<!--Optional:-->
<ser:testFlag>Test</ser:testFlag>
</ser:messageTag>

And I get the following result (all the CDATA part is lost):

<ser:messageTag>
<ser:message>normalTEXTnormalTEXT</ser:message>
<#comment>Optional:</#comment>
<ser:testFlag>Test</ser:testFlag>
</ser:messageTag>

I get the same result in a Mule environemment and with the "XMLUtils.toDocument" method.

Do you have any idea why the XMLUtils method is not working with CDATA text element ?

Thanks in advance
Pascal

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email


Pascal Combescot

unread,
Apr 4, 2011, 10:13:28 AM4/4/11
to d...@mule.codehaus.org
Sorry my code got trapped during the formating :

public class Main {

public static void main(String[] args) throws Exception {

InputStreamReader isr = new InputStreamReader(ClassLoader.getSystemResourceAsStream("my_file.xml"));
XMLInputFactory xmlif = XMLInputFactory.newInstance();
XMLStreamReader xmlsr = xmlif.createXMLStreamReader(isr);
DepthXMLStreamReader reader = new DepthXMLStreamReader(xmlsr);

Document doc = XMLUtils.toW3cDocument(reader);

System.out.print("<\");
System.out.println(doc.getChildNodes().item(0).getNodeName());
System.out.print(">\");


for (int i = 0; i < doc.getChildNodes().item(0).getChildNodes().getLength(); i++) {
//Skip spaces and Co : NodeType=#text
if(doc.getChildNodes().item(0).getChildNodes().item(i).getNodeType() != Node.TEXT_NODE){

System.out.print("\t<\");
System.out.print(doc.getChildNodes().item(0).getChildNodes().item(i).getNodeName());
System.out.print(">\");
System.out.print(doc.getChildNodes().item(0).getChildNodes().item(i).getTextContent());
System.out.print("</\");
System.out.println(doc.getChildNodes().item(0).getChildNodes().item(i).getNodeName());
System.out.print(">\");
}
}
System.out.print("</\");
System.out.println(doc.getChildNodes().item(0).getNodeName());
System.out.print(">\");
}
}

and my xml is also well formatted but got trapped too :)

<ser:messageTag xmlns:ser="http://test.com/">

<ser:message>NoCDATA<![CDATA\[<Info>CDATA</Info>\]\]>NoCDATA</ser:message>


<!--Optional:-->
<ser:testFlag>Test</ser:testFlag>
</ser:messageTag>

---------------------------------------------------------------------

Chris Campos

unread,
Apr 4, 2011, 11:52:21 AM4/4/11
to d...@mule.codehaus.org
I had the same problem when parsing data from whitepages.com. Mule default xml parser will not work with lots of nasty xml. Your best best is to write a custom xml parser. I recommend [SAX XML Parser|http://download.oracle.com/javase/1.4.2/docs/api/org/xml/sax/package-summary.html] . Its event driven so very flexable to nasty xml. Could give you some sample code to get you on your way if needed.


Chris

Pascal Combescot

unread,
Apr 4, 2011, 12:48:59 PM4/4/11
to d...@mule.codehaus.org
Thank you for your answer, but I think there is already something in Mule that is able to read CDATA but I don't know what it is. To be clear, I'm using Mule 3 and if I add an xsltTransformer before the cxf proxy, all the CDATA gets re-encoded ("<" becomes "&lt;" and so on...) and I find my complete message in the end. So something in the xslt transformer is able to read CDATA.

<custom-transformer name="xmlStreamReaderToDocumentTransformer" class="com.test.mule.util.XmlStreamReaderToDocumentTransformer" />
<custom-transformer name="domDocumentToXml" class="org.mule.module.xml.transformer.DomDocumentToXml" />
<mule-xml:xslt-transformer xsl-file="void.xslt" name="voidXsltTransformer" />
<flow name="flow">
<http:inbound-endpoint address="http://host:port/path"
exchange-pattern="request-response">
<transformer ref="voidXsltTransformer" />
<cxf:proxy-service wsdlLocation="wsdl/TIPS_messageIREService.wsdl" service="MessageIREService"
namespace="http://test.com/" payload="body" />
<transformer ref="xmlStreamReaderToDocumentTransformer" />
<transformer ref="domDocumentToXml" />
</http:inbound-endpoint>

<vm:outbound-endpoint path="perturbation-to-sharp_vm" exchange-pattern="request-response" />
</flow>

My XmlStreamReaderToDocumentTransformer class is simply doing "return XMLUtils.toDocument(src, muleContext);"
If I remove <transformer ref="voidXsltTransformer" /> I loose the CDATA part like I said in the previous post.

I attached the xslt file.

Mike Schilling

unread,
Apr 4, 2011, 2:02:33 PM4/4/11
to d...@mule.codehaus.org, Pascal Combescot
The problem is the line

if(doc.getChildNodes().item(0).getChildNodes().item(i).getNodeType() != Node.TEXT_NODE)

CDATA sections create CDATA nodes, not TEXT nodes. Try replacing it with

if (! (doc.getChildNodes().item(0).getChildNodes().item(i) instanceof Text))

which works correctly, because the interface CDATASection extends Text.

Mike

Chris Campos

unread,
Apr 4, 2011, 2:53:59 PM4/4/11
to d...@mule.codehaus.org
One way I have been successful in mapping XML (in other ESB's) is using the visual GUI. Their is one in Data Integrator for eclipse. Have you tried it out? Ill probably give it a run later.

Chris

Reply all
Reply to author
Forward
0 new messages