[Haskell-cafe] xml conduit

88 views
Skip to first unread message

grant

unread,
Feb 8, 2013, 6:31:12 PM2/8/13
to haskel...@haskell.org
Hi,

Is there a nice way to update xml. I want to be able to use xml-conduit
to find a location in the xml and then add/update that node.

eg xpath from //d/e/f and then change the content at 'f' or add a new node

<a>
...
<d>
<e>
<f>some data to change
</f>
</e>
</d>
...
</a>


Thanks for any help,
Grant


_______________________________________________
Haskell-Cafe mailing list
Haskel...@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Mateusz Kowalczyk

unread,
Feb 9, 2013, 1:10:27 AM2/9/13
to haskel...@haskell.org
I don't know about xml-conduit but I know that such thing is possible in
HXT. See the `Modifying a Node' section at [1] for a trivial example.
You probably will have to read the whole page to somewhat understand
what's going on though.

[1] - http://adit.io/posts/2012-04-14-working_with_HTML_in_haskell.html

grant

unread,
Feb 9, 2013, 10:30:15 AM2/9/13
to haskel...@haskell.org
Mateusz Kowalczyk <fuuzetsu <at> fuuzetsu.co.uk> writes:

>
> I don't know about xml-conduit but I know that such thing is possible in
> HXT. See the `Modifying a Node' section at [1] for a trivial example.
> You probably will have to read the whole page to somewhat understand
> what's going on though.
>
> [1] - http://adit.io/posts/2012-04-14-working_with_HTML_in_haskell.html
>
>

Thanks for the tip, which looks really promising. The problem is that
I cannot get large xml files to load with HXT.
I tried a 24mb file and it ran out of memory, whereas with xml-conduit
it took 15 seconds to load. Is there something I'm missing?

Mateusz Kowalczyk

unread,
Feb 9, 2013, 10:37:26 AM2/9/13
to haskel...@haskell.org
Hmm, I just tried with a 112MB file and I ended up having to kill it
after it chewed through 3GB of memory. That's rather worrying. Hopefully
someone on cafe can point out whether it's an inherent issue with the
package, a bug or whether we're just doing something wrong.

Michael Snoyman

unread,
Feb 9, 2013, 12:13:52 PM2/9/13
to grant, Haskell Cafe
Hi Grant,

As you might expect from immutable data structures, there's no way to update in place. The approach you'd take to XSLT: traverse the tree, check each node, and output a new structure. I put together the following as an example, but I could certainly imagine adding more combinators to the Cursor module to make something like this more convenient.

{-# LANGUAGE OverloadedStrings #-}
import Prelude hiding (readFile, writeFile)
import Text.XML
import Text.XML.Cursor

main = do
    doc@(Document pro (Element name attrs _) epi) <- readFile def "test.xml"
    let nodes = fromDocument doc $/ update
    writeFile def "output.xml" $ Document pro (Element name attrs nodes) epi
  where
    update c =
        case node c of
            NodeElement (Element "f" attrs _)
                | parentIsE c && gparentIsD c ->
                    [ NodeElement $ Element "f" attrs
                        [ NodeContent "New content"
                        ]
                    ]
            NodeElement (Element name attrs _) ->
                [NodeElement $ Element name attrs $ c $/ update]
            n -> [n]
    parentIsE c = not $ null $ parent c >>= element "e"
    gparentIsD c = not $ null $ parent c >>= parent >>= element "d"

Michael

grant

unread,
Feb 9, 2013, 9:04:16 PM2/9/13
to haskel...@haskell.org
Michael Snoyman <michael <at> snoyman.com> writes:

>
>
> Hi Grant,
> As you might expect from immutable data structures, there's no way to
> update in place. The approach you'd take to XSLT: traverse the tree,
> check each node, and output a new structure. I put together the following
> as an example, but I could ...

Thanks a lot, I really appreciate it,

grant

unread,
Feb 10, 2013, 1:51:56 PM2/10/13
to haskel...@haskell.org
Michael Snoyman <michael <at> snoyman.com> writes:

>

Hi Michael,

Just one last thought. Does it make any sense that xml-conduit could be
rewritten as a lens instead of a cursor? Or leverage the lens package somehow?

Michael Snoyman

unread,
Feb 10, 2013, 11:34:12 PM2/10/13
to grant, Haskell Cafe
On Sun, Feb 10, 2013 at 8:51 PM, grant <the...@hotmail.com> wrote:
Michael Snoyman <michael <at> snoyman.com> writes:

>

Hi Michael,

Just one last thought. Does it make any sense that xml-conduit could be
rewritten as a lens instead of a cursor? Or leverage the lens package somehow?


That's a really interesting idea, I'd never thought about it before. It's definitely something worth playing around with. However, I think in this case the Cursor is providing a totally different piece of functionality than what lenses would do. The Cursor is really working as a Zipper, allowing you to walk the node tree and do queries about preceding and following siblings and ancestors.

Now given that every time I'm on #haskell someone mentions zippers in the context of lens, maybe lens *would* solve this use case as well, but I'm still a lens novice (if that), so I can't really speak on the matter. Maybe someone with more lens experience could provide some insight.

Either way, some kind of lens add-on sounds really useful.

Michael 

Michael Sloan

unread,
Feb 11, 2013, 12:40:37 AM2/11/13
to Michael Snoyman, grant, Haskell Cafe
I'm no lens authority by any means, but indeed, it looks like something like Cursor / Axis could be done with the lens zipper.

https://github.com/snoyberg/xml/blob/0367af336e86d723bd9c9fbb49db0f86d1f989e6/xml-enumerator/Text/XML/Cursor/Generic.hs#L38

This cursor datatype is very much like the (:>) zipper type (I'm linking to old code, because that's when I understood it - the newer stuff is semantically the same, but more efficient, more confusing, and less directly relatable):

https://github.com/ekmett/lens/blob/f8dfe3fd444648f61b8594cd672c25e70c8a30ff/src/Control/Lens/Internal/Zipper.hs#L317

Which is built out of the following two datatypes:

1) parent (and the way to rebuild the tree on the way back up) is provided by this datatype:

https://github.com/ekmett/lens/blob/f8dfe3fd444648f61b8594cd672c25e70c8a30ff/src/Control/Lens/Internal/Zipper.hs#L74

2) precedingSibling / followingSibling / node is provided by this datatype (which is pretty much the familiar list zipper!):

https://github.com/ekmett/lens/blob/f8dfe3fd444648f61b8594cd672c25e70c8a30ff/src/Control/Lens/Internal/Zipper.hs#L317


One way that this would be powerful is that some of the Axis constructors could return a zipper.  In particular, all of the axis yielding functions except the following would be supported:

parent, precedingSibling, followingSibling, ancestor, descendent, orSelf, check

This is because zippers can be used for modification, which doesn't work out very well when you can navigate to something outside of your focii's children.  If we have a new datatype, that represents a node's payload, then we could conceivably represent all of the axis yielding operations except for parent / ancestor.  However, those operations would be navigations to payloads - further xml-hierarchy level navigation would be impossible because you'd no longer have references to children.  (further navigation into payloads on the other hand, would still be possible)

So, that's just my thoughts after looking at it a bit - I hope it's comprehensible / helpful!  An XML zipper would be pretty awesome.

-Michael


Michael Sloan

unread,
Feb 11, 2013, 12:41:59 AM2/11/13
to Michael Snoyman, grant, Haskell Cafe

Michael Sloan

unread,
Feb 11, 2013, 1:09:48 AM2/11/13
to Michael Snoyman, grant, Haskell Cafe
I realized that the term "payload" wouldn't make much sense in the context of XML.  What I meant was "elementName" with "elementAttributes" (but not "elementNodes" - that's the point).  So, such navigations could yield a datatype containing those.

-Michael

Michael Snoyman

unread,
Feb 11, 2013, 11:53:22 AM2/11/13
to Michael Sloan, grant, Haskell Cafe
OK, after some experimentation, I've decided that this would be something really cool, but I don't have the experience with lens to do it myself. Here's what I came up with so far:


(Also available on School of Haskell[1], but our version of lens is too old for this snippet.)

So if someone wants to pursue this, I'd be really interested to see the results.

Michael

Reply all
Reply to author
Forward
0 new messages