New API methods

73 views
Skip to first unread message

Blago

unread,
Aug 9, 2010, 11:57:15 PM8/9/10
to libxmljs
First and formost, thanks for all the hard work making libxmljs stable
and fun to use.

While working on a libxmljs-based DOM library I found that there are a
few pieces of missing functionality that make implementation all but
impossible without resorting to ugly hacks. I propose the following
API enhancements that should be useful to other people as well:
* TextNode
* CommentNode
* DocumentFragment
* Element.type()
* Element.clone()
* Versions of Element.addChild, Element.addPrevSibling,
Element.addNextSibling that don't coalesce adjacent text nodes (this
is caused by xmlAddChild, xmlAddPrevSibling, xmlAddNextSibling)

Needless to say, I'm willing to help.

Marco Rogers

unread,
Aug 10, 2010, 10:12:13 PM8/10/10
to libxmljs
Thanks for the feedback. Definitely want to add the rest of the node
types. Here's what I'm thinking.

Document
Namespace
Node->Element
Node->Attribute
Node->TextNode
Node->TextNode->Comment
Node->TextNode->CData
Node->DocumentFragment

This will be the major update in version 0.5.0. The element.text()
function will be upgraded. If you pass in a string, it'll append a
new TextNode with that value to the children. Probably also
element.comment() and element.cdata().

If you want to take a stab at adding the new text types, please feel
free. I haven't really decided how to approach document fragment.
It's probably harder than I think it is. I want it to behave as much
as possible like a mini document. Except it has no root.

Also, what problem are you having with text nodes? Why is it a
problem that libxml coalesces them? I'm curious.

:Marco

Blagovest Dachev

unread,
Aug 11, 2010, 3:31:43 AM8/11/10
to libx...@googlegroups.com
From a practical point of view, I would expect addChild to simply insert the TEXT node at the specified point in the tree rather than merge its content and then get freed. This last part is especially problematic for libxmljs as the example program below will crash with a SEGFAULT.

I'm starting to thing that perhaps a DOM API would be better suited for a scripting language, rather then staying true to libxml.

#!/usr/bin/env node

var http     = require('http'),
    events   = require('events'),
    inherits = require('sys').inherits,
    libxml   = require('libxmljs');

function WebClient(host, path) {
    var self       = this,
        transport  = http.createClient(80, host),
        request    = transport.request('GET', path, {'host': host});

    request.end();

    request.on('response', function (response) {
        if (response.statusCode != 200) {
            self.emit('done', response.statusCode, '');
        }
        else {
            var text = '';

            response.setEncoding('utf8');
            response.on('data', function (chunk) {
                text += chunk;
            });
            response.on('end', function (chunk) {
                self.emit('done', 200, text);
            });
        }
    });
}
inherits(WebClient, events.EventEmitter);

var client = new WebClient('maps.google.com', '/maps/api/geocode/xml?address=salem&sensor=false');
client.on('done', function(status, xml) {
    if (status != 200) {
        throw 'unable to load URL';
    }
    
    // the XML looks something along the lines of
    // <geometry>
    //   <location>
    //     <lat>44.9428975</lat>
    //     ...
    //   </location>
    // </geometry>
    // <geometry>
    //   <location>
    //     <lat>45.0060447</lat>
    //     ...
    //   </location>
    // </geometry>
    
    var doc  = libxml.parseXmlString(xml),
        lats = doc.find('//lat'),
        lat1 = lats[0],
        lat2 = lats[2],
        txt1 = lat1.childNodes()[0],
        txt2 = lat2.childNodes()[0];
    
    console.log(txt1.toString());
    console.log(txt2.toString());
    console.log('-----------');
    
    // this line will call xmlAddChild which will
    //  1) add txt2 to the content of lat1
    //  2) merge the (now two) TEXT nodes inside lat1
    //  3) free the C struct representing txt2
    lat1.addChild(txt2);
    
    // and... this line will casue a SEGFAULT because
    // libxmljs will try to access the freed struct
    txt2.toString();
});

Marco Rogers

unread,
Aug 11, 2010, 9:07:01 AM8/11/10
to libxmljs
Ah, now I finally understand one of the problems. That is indeed a
sticky issue. It's actually a mismatch between how libxml wants to
handle nodes and how a dynamic language like javascript expects them
to be handled.

As things go in and out of the tree, libxml expects to be able to
change them. In C land this is fine because you can only deal with
inputs and outputs of functions. But when you attach these to
persistent references in js space, things get complicated. This
library started as a js version of the ruby gem nokogiri. When I took
over the project, I frequently wondered why nokogiri had to be so
complicated. Now I begin to understand.

Okay, so the first question is, how would you expect this to work?
You want to just not coalesce the text nodes? That leaves you open
for other issues. Imagine in your example above, you've modified lat1
and now it has two adjacent text nodes. If someone now does
lat1.childNodes()[0], they are not getting that full text the way they
might expect. Dealing with lots of extraneous text nodes is a pain to
me. I would expect to have one contiguous text node between each
element node.

So the next question is, how should txt2 behave after it has been
added and coalesced? Using V8 I can actually change the value of that
reference. To say null or undefined. But that would be some awful
usability :)

I don't now. Let's think on it some more. I'm going to test out some
other libraries that wrap libxml and see what they do. Do you have
other suggestions?

:Marco

Marco Rogers

unread,
Aug 11, 2010, 9:09:21 AM8/11/10
to libxmljs
Also, I went ahead and committed your patch on master. It won't go
out in a release until I'm satisfied that it's the right approach.
But you can go forward with it for now.

:Marco

Blago

unread,
Aug 13, 2010, 8:09:09 PM8/13/10
to libxmljs
I think I'm going to start an experimental branch implementing
classical DOM API. I can use the PHP implementation as a reference:
http://github.com/php/php-src/blob/master/ext/dom/node.c

Blago

Marco Rogers

unread,
Aug 14, 2010, 4:29:26 PM8/14/10
to libx...@googlegroups.com
I've been thinking a lot about where I want this lib to go.  I actually want less C/C++ and more js.  I want a small tight binding to the basic functions in libxml that allow parsing, creating and manipulating xml trees.  And then write lots of extensions on top of that in js.

If you feel we are missing things from the C/C++ api let's talk about that.  The functions for text nodes, comments, etc are a good example of this.  But if you want to an actual dom spec on top, I would prefer that to be in pure js using these bindings.

Does that make sense?  I'm open to discussion about it.

:Marco

--
Marco Rogers
marco....@gmail.com

Life is ten percent what happens to you and ninety percent how you respond to it.
- Lou Holtz
Reply all
Reply to author
Forward
0 new messages