It's functional, it's young, it validates, it handles compressed
documents, it's C, it's quick (at least as far as XML parsers go), and
feature rich (or will be: I haven't turned on XInclude, XPointer,
Schemas, DocBook, or a hand full of others that are available). The
biggest two things that libxml is lacking right now are documentation
and a more complete API: features are coming and there likely won't be
any shortage of them. Here's a sample script that shows some of the
API:
xd = XML::Document.file('some_xml_file.xml')
xd.xpath_find('/my/xpath/query').each do |node|
puts "Filename: #{node.child('filename')}"
puts "Mode: #{node.child('mode')}"
puts "Content: #{node.child('content')}"
end
And the corresponding test doc:
<my>
<xpath>
<query>
<foo>uga</foo>
<bar>ble</mode>
<baz>sweet</baz>
</query>
<query>
<foo>uga</foo>
<bar>ble</mode>
<baz>sweet</baz>
</query>
</xpath>
</my>
Simple for the most part.
<advanced> If someone gets curious and runs the "document_self.rb"
script, they'll notice a whole ton of methods. This API doesn't hide
any of the internals of XML to the user, which is good and bad. For
example, in the xml bit "<foo>bar</foo>", there are technically two
nodes there. The <foo> tag is one, and it's contents is another. The
other interesting thing about this is that in the following XML block,
there are _five_ nodes.
<foo>bar</foo>
<baz>asd</baz>
1) <foo>
2) bar
3) [white space between </foo> and <baz>]
4) <baz>
5) asd
While this is technically the correct behavior, it's unintuitive to
most people and isn't what they want. As a result, node.to_s is the
exact same as node.child.content. In the API, I'm making things nice
and friendly though so no one will notice (or so the theory goes).
Anyway, point being that there's a lot of power at the moment for
those who want to have full access to the XML document. </advanced>
Anyway, that's my bit for now. I'm looking for new users/suggestions:
both are very welcome. -sc
--
Sean Chittenden
[libxml]
great
> DocBook
what's to be implemented for DocBook?
> xd = XML::Document.file('some_xml_file.xml')
> xd.xpath_find('/my/xpath/query').each do |node|
> puts "Filename: #{node.child('filename')}"
> puts "Mode: #{node.child('mode')}"
> puts "Content: #{node.child('content')}"
> end
what do the puts put?
Tobi
Honestly, I'm not 100% sure yet. libxml's huge and there's lots
there. I'm still uncovering features and nifty ways of doing things.
My best guess is that it's basically a Schema/DTD for the DocBook
markup (ie, won't let you write a bad/invalid docbook document).
> >xd = XML::Document.file('some_xml_file.xml')
> >xd.xpath_find('/my/xpath/query').each do |node|
> > puts "Filename: #{node.child('filename')}"
> > puts "Mode: #{node.child('mode')}"
> > puts "Content: #{node.child('content')}"
> >end
>
>
> what do the puts put?
In this case, the contents of the node's child. With the XML below
(and the amended example: sorry, was doing a little too much
copy/pasting from the rubynet test examples), it would produce:
Filename: uga
Mode: ble
Content: sweet
# Amended example
xd = XML::Document.file('some_xml_file.xml')
xd.xpath_find('/my/xpath/query').each do |node|
puts "Filename: #{node.child('foo')}"
puts "Mode: #{node.child('bar')}"
puts "Content: #{node.child('baz')}"
end
# XML snippet
<my>
<xpath>
<query>
<foo>uga</foo>
<bar>ble</mode>
<baz>sweet</baz>
</query>
</xpath>
</my>
node.to_s == node.child.content
-sc
--
Sean Chittenden
>>what's to be implemented for DocBook?
>>
> Honestly, I'm not 100% sure yet. libxml's huge and there's lots
> there. I'm still uncovering features and nifty ways of doing things.
> My best guess is that it's basically a Schema/DTD for the DocBook
> markup (ie, won't let you write a bad/invalid docbook document).
Well, there are various DTDs for DocBook; authors want to choose one,
and switch it independently from the kit. I don't think it should be an
integrated part of an XML parser or toolkit, if this doesn't offer any
advantages over simply using a DTD which didn't come with the XML kit.
Tobi
> Sean Chittenden wrote:
>
> [libxml]
>
> great
>
> > DocBook
>
> what's to be implemented for DocBook?
IIRC, libxml includes a limited "SGML" parser which is designed
specifically to parse DocBook/SGML documents, and no other kind of
SGML.
--
Pierre-Charles David (pcdavid <at> tiscali <dot> fr)
Computer Science PhD Student, École des Mines de Nantes, France
Homepage: http://purl.org/net/home/pcdavid
> IIRC, libxml includes a limited "SGML" parser which is designed
> specifically to parse DocBook/SGML documents, and no other kind of
> SGML.
Ah OK. I use the XML DTDs, so I can use any XML tool; actually, any SGML
tool should be able to deal with it as well.
Tobi