XML processing on a Maven POM: attributes and namespaces on a root element

186 views
Skip to first unread message

Mark Petrovic

unread,
Feb 19, 2015, 3:09:13 PM2/19/15
to golan...@googlegroups.com
Hello.  Thank you for Go.  It makes me feel young again.

I know this topic has come up in the past, but in what I believe are slightly different forms than mine.  I'm trying to unmarshal a Maven POM into a struct.  I'm having problems with the root element, which has a namespace, a namespace definition and an attribute that uses that namespace definition.


My use case is to read the POM into a struct, make minor changes to it, and write it back to disk.  When I unmarshal then marshal the struct, the additional namespace (xmlns:xsi) is not of the form of the original input, nor do I know how to articulate the tag to successfully capture the xsi:schemaLocation attribute so I can successfully marshal it as well.  

So I'm not so concerned that I cannot read these attributes faithfully, because I really don't care about their values during my in memory manipulation of the struct.  It's the marshaling back to disk that concerns me, because a well-formed Maven POM should have these attributes and namespaces in its root element.

PTAL if you would be so kind.  There are a tremendous number of Maven POMs in nature, and recording a solution to this here will surely help me, and posterity.

Thank you.




roger peppe

unread,
Feb 19, 2015, 3:37:45 PM2/19/15
to Mark Petrovic, golang-nuts
Please could you try with Go tip, as I've pushed some changes to the XML
package that affect marshaling of XML with name spaces.

If that works for you and if you don't want to remain on tip (probably
not a great idea) you can potentially use github.com/juju/xml as
a temporary replacement.

cheers,
rog.
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Mark Petrovic

unread,
Feb 19, 2015, 4:43:52 PM2/19/15
to golan...@googlegroups.com, mspet...@gmail.com
Thank you for courteous and civil reply.  I never take that for granted.

I built tip, then built my code.  The result is essentially unchanged.

root@ae2e73a9ee7c:~/code/src/poms# go version
go version devel +6c4b54f Thu Feb 19 20:46:59 2015 +0000 linux/amd64

root@ae2e73a9ee7c:~/code/src/poms# ./poms 
Why is p.XSI empty?
<project xmlns:XMLSchema-instance="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://maven.apache.org/POM/4.0.0" XMLSchema-instance:xsi="" xsi:schemaLocation=""><modelVersion>4.0.0</modelVersion></project>

I want to ask if the tags in my playground code should work, based on a quick visual inspection?  The XMLName and XSI fields look right to me, but again only the former seems to work.  But I considered the SchemaLocation with the xsi:schemaLocation a total hail mary.

roger peppe

unread,
Feb 20, 2015, 5:29:52 AM2/20/15
to Mark Petrovic, golang-nuts
As far as I can tell, encoding/xml is behaving as intended here.

For XML that looks like that, I'd define the structure slightly differently:

http://play.golang.org/p/5rHSgihcH-

Note that all the names in the Go type must define the full namespace
URI, not the prefix - the XML package should cope with all the
namespace aliasing logic.

Does that fit your needs?

cheers,
rog.

Mark Petrovic

unread,
Feb 20, 2015, 7:47:21 AM2/20/15
to golan...@googlegroups.com, mspet...@gmail.com
That does fit my needs, thank you.

I now see how the unmarshal handles the aliasing.  In the case of a traditional POM, it changes the name of the attributed namespace, but I think that's ok for what I need.  I doubt anyone will even notice, in fact, were they to read the POM.

One last question:  I notice you changed the type of XMLName from xml.Name to struct{}.  What is the motivation for doing that?

Thanks for the time you put into this.  Much appreciated.

roger peppe

unread,
Feb 20, 2015, 10:40:06 AM2/20/15
to Mark Petrovic, golang-nuts
On 20 February 2015 at 12:47, Mark Petrovic <mspet...@gmail.com> wrote:
> That does fit my needs, thank you.

Great!

>
> I now see how the unmarshal handles the aliasing. In the case of a
> traditional POM, it changes the name of the attributed namespace, but I
> think that's ok for what I need. I doubt anyone will even notice, in fact,
> were they to read the POM.

Yes, that's a down side of the current Go approach. From the
XML point of view, the namespace alias shouldn't matter at all,
but it would perhaps be nice if Go had a better algorithm for
choosing a nice looking one.

> One last question: I notice you changed the type of XMLName from xml.Name
> to struct{}. What is the motivation for doing that?

The XMLName type is useful for containing an XML name. In your
case, you don't need any data there, so using a struct{}
provides a zero-overhead way of setting your XML name space,
while also reassuring the reader that there is nothing to
be concerned about in the XMLName field contents.

cheers,
rog.

Mark Petrovic

unread,
Feb 20, 2015, 1:38:50 PM2/20/15
to golan...@googlegroups.com, mspet...@gmail.com
Got it.

The final nit:  a POM has a <properties> element that contains elements whose names cannot be predicted.  It looks like this

   <properties>
      <runtimeArguments xmlns="http://maven.apache.org/POM/4.0.0">-Xms256m -Xmx256m -XX:PermSize=128m -XX:MaxPermSize=128m</runtimeArguments>
     <some.other.property>blah</some.other.property>
   </properties>

That's output actually produced by the next iteration of the code you helped make right.  

Here are the data structures that describe those properties:

// Top level struct
type Project struct {
        XMLName                struct{}               `xml:"http://maven.apache.org/POM/4.0.0 project"`
        // elided fields
        Properties             Properties             `xml:"properties"`
}

type Property struct {
        XMLName xml.Name `xml:""`
        Value   string   `xml:",chardata"`
}

type Properties struct {
        Values []Property `xml:",any"`
}

I can live with the namespace the marshaller adds to these "dynamic" elements.  I'll probably run the Go output through sed so the final output does not end up surprising my users - no one expects to see namespaces on Maven property values, even though it's technically valid.  But is there a way to suppress the namespace on these "any"-type elements to make the solution visually cleaner?
Reply all
Reply to author
Forward
0 new messages