[groovy-user] Is XmlSlurper buggy when using replaceBody?

145 views
Skip to first unread message

Luis Muniz

unread,
Feb 12, 2012, 5:48:03 PM2/12/12
to us...@groovy.codehaus.org
Hi,

I am using groovy 1.7.8 and having a really strange behaviour when manipulating a GPathResult.

I have tried to find solutions on the interwebs but they all seem to use XmlParser for some reason. I don't have the choice, I have to use XmlSlurper.

So this is an example script that shows the stange behaviour:


import groovy.xml.*


def xml='''
<a>
    <b>Hello</b>
    <list>
        <c>1</c> <c>2</c> <c>3</c> <c>4</c>
        </list>
</a>
'''

def n=new XmlSlurper().parseText(xml)

println "initial XML"
println XmlUtil.serialize(n)


n.list.replaceBody('')


println "after replaceBody"
println XmlUtil.serialize(n)

n.list << new XmlSlurper().parseText('<c>5</c>')

println "after adding a <c>"
println XmlUtil.serialize(n)

println "Printing children of <list> (it is empty?!)"
n.list.children().each{println it}


n=new XmlSlurper().parseText(XmlUtil.serialize(n))

println "After re-parsing"
println XmlUtil.serialize(n)

println "Printing children of <list>"
n.list.children().each{println it}


So basically I want to replace the contents of the <list> element with my own.

The output of the script above is as follows:

initial XML
<?xml version="1.0" encoding="UTF-8"?>
<a>
  <b>Hello</b>
  <list>
    <c>1</c>
    <c>2</c>
    <c>3</c>
    <c>4</c>
  </list>
</a>

after replaceBody
<?xml version="1.0" encoding="UTF-8"?>
<a>
  <b>Hello</b>
  <list/>
</a>

after adding a <c>
<?xml version="1.0" encoding="UTF-8"?>
<a>
  <b>Hello</b>
  <list>
    <c>5</c>
  </list>
</a>

Printing children of <list> (it is empty?!)
After re-parsing
<?xml version="1.0" encoding="UTF-8"?>
<a>
  <b>Hello</b>
  <list>
    <c>5</c>
  </list>
</a>

Printing children of <list>
5


So XmlUtil.serialize does print out the correct XML, but whenever I try to traverse the GPathResult with the Gpath methods (children(), etc) I dont' get the correct result. I think that somehow the internal structure of the tree is corrupted somehow.


If anyone has solved this in the past or can explain why this is not a bug but a weird side effect, please help.

Luis

Paul King

unread,
Feb 12, 2012, 7:37:33 PM2/12/12
to us...@groovy.codehaus.org

This is by-design behavior for XmlSlurper. Its internal structures
are mostly immutable and queries and updates are carried out in
a lazy fashion. To cut a long story short, when you attempt to
make changes, rather than actually making them on the immutable
tree structure, it simply remembers the changes you want to make.
Eventually when you have finished all of your changes you serialize
it and all the changes are made at once - and if you re-parse you
then have another (mostly) immutable tree data type.

Incidentally you can also use replaceNode in your example:

n.list.replaceNode {
list {
c(5)
}
}

And of course, it too would need to be serialized before you would
see the changes. XmlParser and DOMCategory update the tree directly.
So they are your options if you aren't in a position to re-serialize.

Cheers, Paul.


---------------------------------------------------------------------
To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email


Luis Muniz

unread,
Feb 13, 2012, 3:45:14 AM2/13/12
to us...@groovy.codehaus.org
Hi Paul,

Thanks for the quick reply! Good to know that messing with an XmlSlurper tree via the GPathResult API is not a free lunch.
I understand and appreciate that the parser api tries to delay getting its hands dirty for as long as possible, that's a killer feature,
but this seems like the GPath API is not able to analyze a mutated tree.

So XmlUtil,serialize finds the 'right' way to traverse the tree but GPathResult and its compañeros are not able to?
I looked at the source code and man is this bit complicated.
It is not often that I throw my arms up in the air and give up trying to understand a portion of code...

Thanks anyway
Reply all
Reply to author
Forward
0 new messages