XMLParser EndOfFile question

25 views
Skip to first unread message

coffeMan

unread,
Oct 14, 2011, 10:00:04 AM10/14/11
to Google Web Toolkit
I am retrieving XML from a servlet and parsing through it. the xml
code is one large XML file but on through the servlet. I can parse
through it easy and get my results but i am not sure when to stop it.
I keep getting NullPointerException that kicks off when it reaches the
end. I never know when its going to end because every file that i
parse through is different in length.

I am using the DOM parser. Document messageDom =
XMLParser.parse(srv.getXmlObject());

string name =
messageDom.getElementsByTagName("name").item(n).getFirstChild().getNodeValue();
- n being a variable that is an integer value that increases after
each loop

Thanks

Colin Alworth

unread,
Oct 14, 2011, 10:35:18 AM10/14/11
to google-we...@googlegroups.com
NodeList, the return value of getElementsByTagName, has a method called getLength(). The standard way of iterating through the contents would involve using a for loop, and testing that n never reaches getLength()

coffeMan

unread,
Oct 14, 2011, 10:36:31 AM10/14/11
to Google Web Toolkit
Great Thanks....ill give it a try

coffeMan

unread,
Oct 14, 2011, 10:36:42 AM10/14/11
to Google Web Toolkit
great thanks

On Oct 14, 10:35 am, Colin Alworth <niloc...@gmail.com> wrote:

Jeffrey Chimene

unread,
Oct 14, 2011, 10:40:18 AM10/14/11
to google-we...@googlegroups.com

A couple of questions come to mind:

1) Are you sure the document is valid? Does there exist an XML schema
against which you can test this document instance? If not, you might
consider creating an XML schema, a sample document, and running the pair
through a validating parser such as xmllint. Such validation tests can
be a useful part of your overall product verification/validation regime.

2) Have you considered using GQuery to produce nodelists? For complex,
valid documents it can be a useful tool.

3) Consider using loops controlled by NodeList.length() instead of using
the builder pattern to process the tree. In my experience, using loops
instead of the builder pattern yields fewer surprises at runtime. I
realize there's a "cool factor" to chaining those method calls, but it
usually results in issues such as the one you're now trying to resolve.

Bueno Suerte,
jec

coffeMan

unread,
Oct 14, 2011, 12:32:51 PM10/14/11
to Google Web Toolkit
I got the solution resolved.....i am parsing over 11,000 different
file types...it is going slow using the DOM Xml Parser...any ideas on
how to improve performance?

I cannot think of any other way to parse it

Jeff Chimene

unread,
Oct 14, 2011, 9:51:11 PM10/14/11
to google-we...@googlegroups.com
On 10/14/2011 09:32 AM, coffeMan wrote:
> I got the solution resolved.....i am parsing over 11,000 different
> file types...it is going slow using the DOM Xml Parser...any ideas on
> how to improve performance?
>
> I cannot think of any other way to parse it

Well, first things first: let's clear-up the terminology. I think you
mean 11x10^3 different documents, all conforming to the same schema.
You only have /one/ document type: xml.

Please correct my impression otherwise.

Short answer: Form a NodeList of "interesting" leaf nodes, and don't
worry about the path from the document root to each leaf.

Long answer follows.

11x10^3 different documents is not unusual in a production environment.
For example, consider the single DocBook schema, and the count of
documents derived from that single schema.

Apparently, all you know is that the current document is well-formed.
You do not know if it's valid. Some might argue that you do not even
know if the document is well-formed, but let's assume the document was
produced mechanically, and that all elements, attributes, and PCDATA are
well-formed.

So, you should only write code that relies on the document's physical
structure, not its logical structure.

I think that the best you can do is to treat the document as a "flat
space". Go directly to the child nodes of interest. There's probably
nothing to gain by parsing the document as though it were a tree (which
it is, I know...). In other words, given what little I know about your
specific problem, I believe you are probably just interested in leaf
nodes. So, form a NodeList of those leaf nodes, and don't worry about
the path from the document root to each leaf. The leaf nodes in the list
will probably have different parents, but I don't think that matters in
this instance.

Forget my earlier advice about GQuery. It's probably over-kill, given
what little I know about the problem you're trying to solve.

Bueno Suerte,
jec

J.Ganesan

unread,
Oct 17, 2011, 8:33:17 AM10/17/11
to Google Web Toolkit
An alternative way is to use JAXB in the server side, convert XML
documents into object hierarchy and fetch them to client by rpc. It is
likely to be much faster as string data becomes binary data. Besides,
you get first class objects in the client side.

J.Ganesan
www.DataStoreGwt.com
Persist objects directly in App Engine

Ahmet Dakoglu

unread,
Oct 18, 2011, 2:40:15 AM10/18/11
to google-we...@googlegroups.com
Jaxb +1

--
You received this message because you are subscribed to the Google Groups "Google Web Toolkit" group.
To post to this group, send email to google-we...@googlegroups.com.
To unsubscribe from this group, send email to google-web-tool...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-web-toolkit?hl=en.




--
Ahmet DAKOĞLU

Reply all
Reply to author
Forward
0 new messages