golang based xml manipulation package

1,175 views
Skip to first unread message

Tong Sun

unread,
Nov 27, 2015, 3:23:06 PM11/27/15
to golang-nuts

Echoing the request from http://stackoverflow.com/questions/31104777/golang-xml-processing, but need more manipulating power. 


I often found myself wanting to generate or process an XML document without having to rely on marshaling or encoding/decoding logic. 

However, I don't know if beevik's etree can answer my request. The following is just using the above stackoverflow question to simplify my request, which is much more complicated to describe and we don't need to go there. OK, the simplify version of my request is,

I am trying to process XML files with complicated structure with Go, change values of couple of nodes and save to the alternated file while preserving the rest. For example:


<description>
    <title-info>
        <genre>Comedy</genre>
        <author>
            <first-name>Kevin</first-name>
            <last-name>Smith</last-name>
        </author>
        <movie-title>Clerks</movie-title>
        <annotation>
            <p>!!!</p>
        </annotation>
        <keywords>comedy,jay,bob</keywords>
        <date></date>
    </description>
</title-info>

And many more fields. 


I would like to change the node:


<author>
    <first-name effect_range="1999-2011">Sam</first-name>
    <first-name effect_range="2012-">Kevin</first-name>
<last-name>Smith</last-name>
   <full-name></full-name>
</author>


to


<author>
     <first-name effect_range="1999-2011">Sam</first-name>
    <first-name effect_range="2012-">Kevin</first-name>
    <last-name>Smith</last-name>
   <full-name>Kevin Smith</full-name>
</author>


I.e., I want to manipulation one xml node according to some previous xml nodes met earlier or, from node attributes somewhere above. However, since files have massive tags I really don't want to describe the complete structure.


I know etree (https://github.com/beevik/etree) can put a DOM on top of the standard library's XML processing, and it has a basic xpath syntax to select nodes, however, is it possible for me to use etree to, 


  • locate all the <full-name> nodes
  • update the node according to values from the nearest nodes (of first-name, last-name, etc)
If not possible, will some other pacakges (e..g, https://github.com/PuerkitoBio/goquery) can do it? 

Thanks



Tamás Gulácsi

unread,
Nov 27, 2015, 4:47:55 PM11/27/15
to golang-nuts
Why not just iterate over the tokens, echo what you don't want to change, store what you need, and echo a modified token when reach that specific tag?

Tong Sun

unread,
Nov 27, 2015, 8:45:55 PM11/27/15
to golang-nuts
On Fri, Nov 27, 2015 at 4:47 PM, Tamás Gulácsi <tgula...@gmail.com> wrote:
Why not just iterate over the tokens, echo what you don't want to change, store what you need, and echo a modified token when reach that specific tag?

Yeah, that's exactly what I thought to be the best approach, but the question is, how? -- I don't see any go packages that allow me to do xml DOM level manipulation the way I wanted. Or, you are talking about something entirely different? 

I meant, check over here,
https://groups.google.com/d/msg/golang-nuts/v9SDlW3kDeo/elIa_3JyDQAJ

when I'm at a certain token, and want to print out the entire raw xml as-is, the standard encoding/xml lib doesn't even allow me to. 


Tamás Gulácsi

unread,
Nov 28, 2015, 12:22:10 PM11/28/15
to golang-nuts
I thoight that std encoding/xml.Decoder.Token() would work, calling it in a loop.

Tong Sun

unread,
Nov 28, 2015, 1:28:40 PM11/28/15
to Tamás Gulácsi, golang-nuts
Nah, I guess you never try it yourself, and just think it would work. 

Check out this code
and the whole thread over here,


I.e., looping over xml.Decoder.Token() won't cut it, moreover, It doesn't even allow me to print out the entire raw xml as-is.





On Sat, Nov 28, 2015 at 12:22 PM, Tamás Gulácsi <tgula...@gmail.com> wrote:
I thoight that std encoding/xml.Decoder.Token() would work, calling it in a loop.

--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/tf4aDQ1Hn_c/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Gulácsi Tamás

unread,
Nov 28, 2015, 1:47:26 PM11/28/15
to Tong Sun, golang-nuts
Exactly :)

But http://play.golang.org/p/2oHEoq_PcH seems to do what I want: just parses & outputs everything. It's rough at a lot of places, but from here, adding namespace handling, and the filtering logic would be easy.

Tong Sun

unread,
Nov 28, 2015, 3:13:15 PM11/28/15
to Gulácsi Tamás, golang-nuts

On Sat, Nov 28, 2015 at 1:46 PM, Gulácsi Tamás wrote:

But http://play.golang.org/p/2oHEoq_PcH seems to do what I want: just parses & outputs everything.

Ops, silly me. Thanks. That's working fine.
 
It's rough at a lot of places, but from here, adding namespace handling, and the filtering logic would be easy.

Adding the namespace handling seems to be a challenging part to me. I looked through my http://localhost:6060/pkg/encoding/xml/, and found only one spot mentioning "namespace", 

If the XMLName field has an associated tag of the form "name" or "namespace-URL name", the XML element must have the given name...

As such the code here is totally broken because of the lacks of namespace handling,

Do you think it is an easy fix please? 

Thanks

David Arroyo

unread,
Dec 28, 2015, 8:27:27 PM12/28/15
to Tong Sun, golang-nuts
On Nov 28, 2015, at 1:12 PM, Tong Sun <sunto...@gmail.com> wrote:

Adding the namespace handling seems to be a challenging part to me. I looked through my http://localhost:6060/pkg/encoding/xml/, and found only one spot mentioning "namespace", 

There are other more powerful packages, but I wrote aqwari.net/xml/xmltree, which may do what you need. I wrote it for the purpose of writing a set of packages for working with xml schema, WSDL, and soap. Specifically, it handles namespaces “correctly”, to the best of my understanding of the xml spec. One thing that was difficult just with encoding/xml was decoding qnames in attribute values; e.g. the “type” attribute in

<myelem type=“ns1:int”>3</myelem>

This does not seem to be too important for your use case, but this should do what you need; See http://play.golang.org/p/Cb1fvUxieE . For more examples, see the documentation for Search and SearchFunc: https://godoc.org/aqwari.net/xml/xmltree#Element.SearchFunc

cheers,
-David

Tong Sun

unread,
Dec 28, 2015, 10:25:50 PM12/28/15
to David Arroyo, golang-nuts


On Wed, Dec 23, 2015 at 10:21 PM, David Arroyo  wrote:

On Nov 28, 2015, at 1:12 PM, Tong Sun wrote:

Adding the namespace handling seems to be a challenging part to me. I looked through my http://localhost:6060/pkg/encoding/xml/, and found only one spot mentioning "namespace", 

There are other more powerful packages, but I wrote aqwari.net/xml/xmltree, which may do what you need. I wrote it for the purpose of writing a set of packages for working with xml schema, WSDL, and soap. Specifically, it handles namespaces “correctly”, to the best of my understanding of the xml spec.

Super!
 
One thing that was difficult just with encoding/xml was decoding qnames in attribute values; e.g. the “type” attribute in

<myelem type=“ns1:int”>3</myelem>

This does not seem to be too important for your use case,

 Yes, I deal with soap as well, so I very much need it as well. 

this should do what you need; See http://play.golang.org/p/Cb1fvUxieE . For more examples, see the documentation for Search and SearchFunc: https://godoc.org/aqwari.net/xml/xmltree#Element.SearchFunc

Thanks David, I took a look at the document and noticed at the very end:

The Unmarshal method unmarshals an XML fragment as it was returned by the Parse method; further modifications to a tree of Elements are ignored by the Unmarshal method.

That sounds like the package only do parsing, but not much xml manipulation, and even it do, it will not honor the changes by the Unmarshal method, am I interpreting it correctly? That's really a pity, because I do need to do xml manipulation. 

Thanks anyway.


Reply all
Reply to author
Forward
0 new messages