How to Improve XML parsing performance

543 views
Skip to first unread message

Subzerothegreat

unread,
Dec 3, 2007, 11:37:09 AM12/3/07
to Google Web Toolkit
I have a simple XML data that is output by the php server. The Xml has
the following format

Dont ask why...but i get the following schema........
e.g
<data>
<row>
<Address1> 149 XYZ street </Address1>
<Address2> ST JPY </Address2>
<Address3> XML WAY </Address3>
<Address4>HEllo WOrld </Address4>
<City> LONDON</City>
<County> LONDON</County>
<Pcode1> E1 </Pcode1>
<Pcode1> W1 </Pcode1>
</row>
</data>


Now i am displaying a table and all works fine. However, the problem
is the performance...If there are 200 or 300 records it takes ages to
process them...literaly more than 20 seconds. I am using the DOM
parser to get the children. and using a flextable to display the data.
I have found that when i iterate through each child then the program
really gets slow and hoggs up the memory. Is there a SAX or simpler
way to do it rather than using the DOM approach.


Cheers

Reinier Zwitserloot

unread,
Dec 3, 2007, 1:17:27 PM12/3/07
to Google Web Toolkit
Not really. 200 to 300 is too much; split it up and paginate. JS can't
handle this in general, it's not really GWT. You can try and paginate
through your data (e.g. get a big sack of JSON, eval() it, and work
through it). GWT doesn't support this natively, but some JSNI magic
can help out. Still, if you're going to go through that kind of
trouble, pagination is a better idea.

L Frohman

unread,
Dec 3, 2007, 2:00:18 PM12/3/07
to Google-We...@googlegroups.com
If it works for your case, you can just use String manipulation to
process the xml String, for example

public String getXmlTag(String xml, String tagName) {
String beginTag = "<" + tagName + ">";
int beginIndex = xml.indexOf(beginTag);
if (beginIndex == -1)
return null;
String endTag = "</" + tagName + ">";
int endIndex = xml.indexOf(endTag);
return xml.substring(beginIndex + beginTag.length(), endIndex);
}

Then
String address1 = getXmlTag(xmlString, "address1");
This is fast even for large files.
One problem, if, for some reason you have something like:
<Pcode1> old="</Pcode1>" E1 </Pcode1>
or some other weird case where the exact end tag text is
duplicated before the real end tag, it won't work.

mP

unread,
Dec 3, 2007, 4:38:12 PM12/3/07
to Google Web Toolkit
Firstly why not send back json which is more native than xml and can
be more efficiently processed in javascript using eval.

Subzerothegreat

unread,
Dec 4, 2007, 4:28:12 AM12/4/07
to Google Web Toolkit
Well I would love to pass the JSON format but i am not incharge of the
server side return....although i get the XML as a JSON string.....but
that is about it.
Now i would try the string parsing approach first and then let you
guys know the results...
> > > Cheers- Hide quoted text -
>
> - Show quoted text -

Peter Blazejewicz

unread,
Dec 4, 2007, 7:26:46 PM12/4/07
to Google Web Toolkit
hi,

wouldn't be better if you tell us how you have tested that? in hosted
mode or within a real-browsers???
just for reference I've made quick test using your data (xml), here is
an example,
it does not display html table at the end but simply puts DIVS with
text data to page because you've concentrated on parsing speed,
However it converts raw data into VO objects,

XML FILE:
consists of your example data copied 270 times (270 rows)
<data>
<row>
<Address1> 149 XYZ street </Address1>
<Address2> ST JPY </Address2>
<Address3> XML WAY </Address3>
<Address4>HEllo WOrld </Address4>
<City> LONDON</City>
<County> LONDON</County>
<Pcode1> E1 </Pcode1>
<Pcode1> W1 </Pcode1>
</row>
<!-- next 270 rows of the same data for tests -->
<row>
.......
</data>


Implementation (please feel free to improve it, it uses incremental
commands to save CPU and make GUI responsive):


package com.mycompany.project.client;

import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;

import com.google.gwt.core.client.EntryPoint;
import com.google.gwt.core.client.GWT;
import com.google.gwt.http.client.Request;
import com.google.gwt.http.client.RequestBuilder;
import com.google.gwt.http.client.RequestCallback;
import com.google.gwt.http.client.RequestException;
import com.google.gwt.http.client.Response;
import com.google.gwt.user.client.Command;
import com.google.gwt.user.client.DeferredCommand;
import com.google.gwt.user.client.IncrementalCommand;
import com.google.gwt.user.client.ui.Button;
import com.google.gwt.user.client.ui.ClickListener;
import com.google.gwt.user.client.ui.Label;
import com.google.gwt.user.client.ui.RootPanel;
import com.google.gwt.user.client.ui.Widget;
import com.google.gwt.xml.client.Document;
import com.google.gwt.xml.client.Element;
import com.google.gwt.xml.client.Node;
import com.google.gwt.xml.client.NodeList;
import com.google.gwt.xml.client.XMLParser;

/**
* Entry point classes define <code>onModuleLoad()</code>.
*/
public class TestModule implements EntryPoint {
/**
* That class works as JavaBean for values read from XML
*/
public class AddressVO {
public String addressLineFour;
public String addressLineOne;
public String addressLineThree;
public String addressLineTwo;
public String city;
public String county;
public List/* <String> */postalCode;

public AddressVO() {
}

/* @Override */
public String toString() {
StringBuffer asString = new StringBuffer();
asString.append("AddressVO:");
asString.append("Address lines: ").append(addressLineOne);
asString.append(" ").append(addressLineTwo).append(" ");
asString.append(addressLineThree).append(" ").append(
addressLineFour);
asString.append(" City:").append(city).append(" County:").append(
county);
asString.append(" Postal codes: ");
if (postalCode != null) {
Iterator iter = postalCode.iterator();
while (iter.hasNext()) {
asString.append((String) iter.next()).append(", ");
}
} else {
asString.append("n/d");
}
return asString.toString();
}
}

/**
* That class loads local XML file and passes results to xml
processing
* command
*/
class LoadXMLDataCommand implements Command, RequestCallback,
ClickListener {
long startTime;

/* @Override */
public void execute() {
startTime = System.currentTimeMillis();
RootPanel.get().add(new Label("loading xml ..."));
RequestBuilder.Method method = RequestBuilder.GET;
String url = GWT.getModuleBaseURL() + "data.xml";
RequestBuilder rb = new RequestBuilder(method, url);
try {
rb.sendRequest(null, this);
} catch (RequestException e) {
RootPanel.get().add(
new Label("cannot load xml: " + e.getMessage()));
}
}

/* @Override */
public void onClick(Widget sender) {
RootPanel.get().remove(clickMeButton);
DeferredCommand.addCommand(this);
}

/* @Override */
public void onError(Request request, Throwable exception) {
RootPanel.get().add(
new Label("Error loading xml: " + exception.getMessage()));
}

/* @Override */
public void onResponseReceived(Request request, Response response) {
final String xmlString = response.getText();
final long endTime = System.currentTimeMillis() - startTime;
RootPanel.get().add(new Label("XML loaded in " + endTime + "ms"));
IncrementalCommand parsingCommand = new
XMLParsingCommand(xmlString);
DeferredCommand.addCommand(parsingCommand);
}
}

/**
* That class parsers XML Node lists (ROWS) into rows of VO objects
and fill
* application defined List variable with results
*/
class XMLNodeParsingCommand implements IncrementalCommand {
/**
* That class parses NodeList values into VO as helper class
*/
class AddressVOBuilder {
private static final String ADDRESS_FOUR_NODE = "Address4";
private static final String ADDRESS_ONE_NODE = "Address1";
private static final String ADDRESS_THREE_NODE = "Address3";
private static final String ADDRESS_TWO_NODE = "Address2";
private static final String CITY_NODE = "City";
private static final String COUNTY_NODE = "County";
private static final String POSTAL_CODE_NODE = "Pcode1";

AddressVO buildVOFromNodes(final NodeList nodes) {
AddressVO address = new AddressVO();
int numberOfNodes = nodes.getLength();
String nodeName;
String nodeValue;
Node currentNode;
for (int i = 0; i < numberOfNodes; i++) {
currentNode = nodes.item(i);
// we are not interested in white space text elements
if (currentNode.getNodeType() != Node.ELEMENT_NODE) {
continue;
}
nodeName = currentNode.getNodeName();
// we are not interested in empty nodes (invalid data)
if (currentNode.getFirstChild() == null) {
continue;
}
nodeValue = currentNode.getFirstChild().getNodeValue();
if (ADDRESS_ONE_NODE.equals(nodeName)) {
address.addressLineOne = nodeValue;
} else if (ADDRESS_TWO_NODE.equals(nodeName)) {
address.addressLineTwo = nodeValue;
} else if (ADDRESS_THREE_NODE.equals(nodeName)) {
address.addressLineThree = nodeValue;
} else if (ADDRESS_FOUR_NODE.equals(nodeName)) {
address.addressLineFour = nodeValue;
} else if (CITY_NODE.equals(nodeName)) {
address.city = nodeValue;
} else if (COUNTY_NODE.equals(nodeName)) {
address.county = nodeValue;
} else if (POSTAL_CODE_NODE.equals(nodeName)) {
if (address.postalCode == null) {
address.postalCode = new ArrayList/* <String> */();
}
address.postalCode.add(nodeValue);
}
}
return address;
}
}
AddressVO address;
AddressVOBuilder addressVOBuilder;
int currentRow = 0;
int numberOfRows;
final Element root;
NodeList rowItems;
NodeList rowsNodes;

long startTime;

public XMLNodeParsingCommand(final Element root) {
this.root = root;
}

/* @Override */
public boolean execute() {
if (rowsNodes == null) {
startTime = System.currentTimeMillis();
rowsNodes = root.getChildNodes();
numberOfRows = rowsNodes.getLength();
return true;
}
if (currentRow < numberOfRows) {
if (addresses == null) {
addresses = new ArrayList/* <AddressVO> */();
}
rowItems = rowsNodes.item(currentRow).getChildNodes();
if (rowItems != null && rowItems.getLength() > 0) {
if (addressVOBuilder == null) {
addressVOBuilder = new AddressVOBuilder();
}
address = addressVOBuilder.buildVOFromNodes(rowItems);
addresses.add(address);
}
currentRow++;
return true;
}
long endTime = System.currentTimeMillis() - startTime;
RootPanel.get().add(
new Label("XML data parsed in " + endTime + "ms"));
if (addresses != null) {
RootPanel.get().add(
new Label("Number of rows parsed to VO: "
+ addresses.size()));
RootPanel.get()
.add(
new Label("Example VO: "
+ addresses.get(0).toString()));
}

return false;
}

}

/**
* That class converts string data into XML objects and then passes
* resulting document to xml parsing processing
*/
class XMLParsingCommand implements IncrementalCommand {
private Document document;
long startTime;
final String xmlData;

public XMLParsingCommand(final String xmlData) {
this.xmlData = xmlData;
}

/* @Override */
public boolean execute() {
if (document == null) {
startTime = System.currentTimeMillis();
document = XMLParser.parse(xmlData);
return true;
}
long endTime = System.currentTimeMillis() - startTime;
RootPanel.get().add(new Label("XML parsed in " + endTime + "ms"));
IncrementalCommand nodeParsingCommand = new XMLNodeParsingCommand(
document.getDocumentElement());
DeferredCommand.addCommand(nodeParsingCommand);
return false;
}

}

/**
* list of rows data as VOs
*/
private List/* <AddressVO> */addresses;

private Button clickMeButton;

public void onModuleLoad() {
RootPanel rootPanel = RootPanel.get();
clickMeButton = new Button("Start parsing");
clickMeButton.addClickListener(new LoadXMLDataCommand());
rootPanel.add(clickMeButton);
}
}



Browsers speeds:

Safari 3 (Win):

loading xml ...
XML loaded in 20ms
XML parsed in 1852ms
XML data parsed in 1462ms
Number of rows parsed to VO: 270
Example VO: AddressVO:Address lines: 149 XYZ street ST JPY XML WAY
HEllo WOrld City:LONDON County:LONDON Postal codes: E1, W1,


FireFox WIn:
loading xml ...
XML loaded in 50ms
XML parsed in 20ms
XML data parsed in 2604ms
Number of rows parsed to VO: 270
Example VO: AddressVO:Address lines: 149 XYZ street ST JPY XML WAY
HEllo WOrld City:LONDON County:LONDON Postal codes: E1, W1,

IExplore (7.0 Win):
loading xml ...
XML loaded in 130ms
XML parsed in 10ms
XML data parsed in 1442ms
Number of rows parsed to VO: 270
Example VO: AddressVO:Address lines: 149 XYZ street ST JPY XML WAY
HEllo WOrld City:LONDON County:LONDON Postal codes: E1, W1,

wow, Safari/IE are pretty fast,

While hosted mode:

loading xml ...
XML loaded in 160ms
XML parsed in 50ms
XML data parsed in 21681ms
Number of rows parsed to VO: 270
Example VO: AddressVO:Address lines: 149 XYZ street ST JPY XML WAY
HEllo WOrld City:LONDON County:LONDON Postal codes: E1, W1,


Doesn't "21681ms" sound similair to your 20 seconds?

regards,
Peter

On Dec 3, 5:37 pm, Subzerothegreat <muneeb.shau...@gmail.com> wrote:

Subzerothegreat

unread,
Dec 5, 2007, 9:58:55 AM12/5/07
to Google Web Toolkit
Thanks for all the replies...they have been really helpful. The fact
the DOM is a bit slower than anyother XML parser scheme is because of
its recursive nature that consumes a lot of memory. Yes there is a
difference between the hosted mode and the compiled mode....but that
gives you a standard ratio. so e.g if you process 20 lines in 200msec
(hosted) and 20 lines in 50 msec(compiled) then that would mean that
would mean 40 lines would consume double (or more) time or something
similar. Since the data i had, had lots of whitespaces and i had to
remove the whitespaces using XMLParser.removeWhitespace().....it was
taking around 7sec to go through the document in the hosted mode.
However, using this function iteratively on smaller chunks (got from
every iteration of the childnodes) reduced the time to
2sec.......although this may be a hack but it really made a
difference. Using different techniques and same old variable
declaration optimization concepts i was able to reduce the time from
14898msec to 9192msec.....but this makes a difference in compiled mode
as well (only subtle though)




On Dec 5, 12:26 am, Peter Blazejewicz <peter.blazejew...@gmail.com>
wrote:
Reply all
Reply to author
Forward
0 new messages