XML namespaces in nokogiri

828 views

Skip to first unread message

Jonathan Rochkind

unread,

Oct 26, 2010, 12:59:25 PM10/26/10

to nokogiri-talk

I'm getting really confused with how to do certain things with
namespaces in nokogiri, and if anyone can give me some help, I would
GREATLY appreciate it. Here is a whole bunch of questions, sorry but
I've got a lot, and nokogiri documentation on namespace-related things
is pretty light. (If I actually manage to figure this out, I could try
to write some for you).

0. Time-saver in #xpath namespaces.

Every time I call #xpath, I've got to pass in a hash of namespaces, so
I can use the namespace prefixes. This starts to get tiresome. Is
there any way to set default xpath namespaces on the Document or Node
itself, that will be used as default on every subsequent call to
#xpath that doesn't have a namespace argument? Would make my code a
lot cleaner. But I think there is not.

1. #collect_namespaces. Depending on what instance of RDocs I'm
looking at, this is documented as a method on XML::Document or
XML::Node.

http://nokogiri.org/Nokogiri/XML/Document.html#method-i-collect_namespaces
http://nokogiri.rubyforge.org/nokogiri/Nokogiri/XML/Node.html#M000302

But it doesn't seem to exist in 1.4.3.1 on either XML::Document or
XML::Node.... oh wait, that's not right, it does exist on
XML::Document! Okay, answered that one myself. I guess the other
rdocs were outdated, it used to be on node instead?

#collect_namespaces does seem to be awfully slow, yes?

2. You don't have to supply a namespace hash if you are using a
namespace defined on the very same node you are calling #xpath on. But
this does NOT work on an XML::Document, nor does it work on a child
node. Is this correct?

document.namespaces => {"example" => "http://example.org"}
document.root.namespaces => {"example" => "http://example.org"}

document.xpath("example:something") => [] # Note, NOT an exception,
but it doesn't find the node
document.xpath("unregistered:something") => raise SyntaxError
Undefined namespace prefix

document.root.xpath("example:something") => non-empty array

It finds it only on document.root.xpath, not on document.xpath. But it
doesn't raise an exception on document.xpath, like it does for an
unregistered namespace, it just gives an empty set result. This
seems.... inconsistent. Thoughts?

I believe, although I'm getting confused, you also can't count on the
default #namespaces from a _parent_ node, they have to be on the very
same node you are calling #xpath on. Is this correct/intended?

2. Serialization #to_xml

It is unclear to me how to get xmlns attributes into the output
#to_xml. If I have an XML::Node, and it has #namespaces, if I call
node.to_xml.... those namespaces still do not appear in the XML.
Whether I added them myself with add_namespace, or whether they were
there from parsing of an XML string.... still can't manage to get em
in the output #to_xml. What am I missing?

If I actually add it myself with node["xmlns:prefix"] = "http://
example.org"

then it does show up in the serialized #to_xml, but is NOT included in
#namespaces. Do I really need to manually add it both ways for a
consistent graph?

3. Node#namespace_definitions vs #namespaces

why does this sometimes return an empty array, when
same_node#namespaces does not? Where are each of these actually
coming from?

4. #namespace_scopes

and then there's #namespace_scopes too. This DOES seem to find all
namespaces declared on parent nodes too. I'd love to then be able to
pass this to #xpath, to tell it to use ALL the operative namespaces
for that node. But the #xpath namespace argument is a hash, and
#namespace_scopes returns an array of Nokogiri::XML::Namespace. Do I
just need to manually turn this into a hash myself, or is there
something I'm missing?

Or maybe I want to use document.collect_namespaces... but then I've
still got to go remove all the "xmlns:" prefixes from it's keys before
I can just pass it to #xpath. I'm getting confused by all the
different methods for interrogating or supplying namespaces, which are
not consistent with each other in the data structures they use to
represent namespaces.

5. Putting it all together: My actual challenge

Here's what I'm actually trying to do, which led me to investigate all
that namespace API. I've got a somewhat complex nokogiri document,
which has namespace declerations on several nodes, all over the
place.

I want to take an _excerpt_ of this document, some particular
nokogiri_node and all it's descedents,and serialize this to XML. But
this serialization needs to include all the operative namespace
declerations from it's parents, but in the excerpted serialization
they need to be actually redeclared on the new parent node, right?

I can't quite figure out how to do this, without knowing all the
namespaces in advance -- or even WITH knowing all the namespaces in
advance, since I'm having trouble even figuring out how to explicitly
add a namespace such that it's xmlns decleration actually gets
serialized.

Phew, that's a whole bunch, sorry, GREATLY appreciate any tips.

Jonathan Rochkind

unread,

Oct 26, 2010, 1:55:21 PM10/26/10

to nokogiri-talk

Mike Dalessio

unread,

Oct 26, 2010, 4:17:52 PM10/26/10

to nokogi...@googlegroups.com

Greetings.

On Tue, Oct 26, 2010 at 12:59 PM, Jonathan Rochkind <roch...@jhu.edu> wrote:

I'm getting really confused with how to do certain things with
namespaces in nokogiri, and if anyone can give me some help, I would
GREATLY appreciate it. Here is a whole bunch of questions, sorry but
I've got a lot, and nokogiri documentation on namespace-related things
is pretty light. (If I actually manage to figure this out, I could try
to write some for you).

0. Time-saver in #xpath namespaces.

Every time I call #xpath, I've got to pass in a hash of namespaces, so
I can use the namespace prefixes. This starts to get tiresome. Is
there any way to set default xpath namespaces on the Document or Node
itself, that will be used as default on every subsequent call to
#xpath that doesn't have a namespace argument? Would make my code a
lot cleaner. But I think there is not.

Funny story: tenderlove cornered Tim Bray at the bar at Rubyconf 2009 and berated him about how broken namespaces are in XML. And Tim didn't argue with him.

One option, if you don't care about namespaces, is to remove them all from the document:

http://nokogiri.org/search?q=remove_namespaces#method-i-remove_namespaces%21

Then you don't need to pass in the namespace bindings all the time. I'm only half-joking. This works for some people.

The other alternative is to suggest an API change for how this would work. We're open to ideas, and if this is a pain point for you, this is an opportunity for you to effect change.

1. #collect_namespaces. Depending on what instance of RDocs I'm
looking at, this is documented as a method on XML::Document or
XML::Node.

http://nokogiri.org/Nokogiri/XML/Document.html#method-i-collect_namespaces
http://nokogiri.rubyforge.org/nokogiri/Nokogiri/XML/Node.html#M000302

But it doesn't seem to exist in 1.4.3.1 on either XML::Document or
XML::Node.... oh wait, that's not right, it does exist on
XML::Document! Okay, answered that one myself. I guess the other
rdocs were outdated, it used to be on node instead?

http://nokogiri.org/search?q=collect_namespaces#method-i-collect_namespaces is correct. Rubyforge docs are out of date, Nokogiri.org is always up to date.

#collect_namespaces does seem to be awfully slow, yes?

It is, by definition, a recursive function that must collect namespaces from every node in the document, so you can't do better than O(n), though you might be able to implement it in C (and Java) if it's a particular pain point for you. Patches are welcome.

2. You don't have to supply a namespace hash if you are using a
namespace defined on the very same node you are calling #xpath on. But
this does NOT work on an XML::Document, nor does it work on a child
node. Is this correct?

document.namespaces => {"example" => "http://example.org"}
document.root.namespaces => {"example" => "http://example.org"}

document.xpath("example:something") => [] # Note, NOT an exception,
but it doesn't find the node
document.xpath("unregistered:something") => raise SyntaxError
Undefined namespace prefix

document.root.xpath("example:something") => non-empty array

It finds it only on document.root.xpath, not on document.xpath. But it
doesn't raise an exception on document.xpath, like it does for an
unregistered namespace, it just gives an empty set result. This
seems.... inconsistent. Thoughts?

document.xpath should probably pick up namespaces on the root node. If you open a github issue for this, with a reproducible test case attached, someone will look into it. Sadly, it will probably be me. :-\

On a related note, you may want to check out guidelines for bug reporting at http://nokogiri.org/tutorials/getting_help.html

I believe, although I'm getting confused, you also can't count on the
default #namespaces from a _parent_ node, they have to be on the very
same node you are calling #xpath on. Is this correct/intended?

Intended. Anything else would require a partial collection of namespaces which, I believe you already pointed out, is a slow operation, and would be a performance hit for people who know in advance what namespace bindings they care about when they write their xpath query.

2. Serialization #to_xml

It is unclear to me how to get xmlns attributes into the output
#to_xml. If I have an XML::Node, and it has #namespaces, if I call
node.to_xml.... those namespaces still do not appear in the XML.
Whether I added them myself with add_namespace, or whether they were
there from parsing of an XML string.... still can't manage to get em
in the output #to_xml. What am I missing?

If I actually add it myself with node["xmlns:prefix"] = "http://
example.org"

then it does show up in the serialized #to_xml, but is NOT included in
#namespaces. Do I really need to manually add it both ways for a
consistent graph?

This behavior may depend on the version of libxml you're running. If you can devote more time to investigating, feel free to open a github issue that follows the guidelines referred to earlier (http://nokogiri.org/tutorials/getting_help.html).

3. Node#namespace_definitions vs #namespaces

why does this sometimes return an empty array, when
same_node#namespaces does not? Where are each of these actually
coming from?

I'm not sure I understand the question. Are the rdocs not clear? If so, how? Can you rephrase the question?

If it helps: http://github.com/tenderlove/nokogiri/blob/master/ext/nokogiri/xml_node.c#L792-813 and http://github.com/tenderlove/nokogiri/blob/master/lib/nokogiri/xml/node.rb#L527-538

4. #namespace_scopes

and then there's #namespace_scopes too. This DOES seem to find all
namespaces declared on parent nodes too. I'd love to then be able to
pass this to #xpath, to tell it to use ALL the operative namespaces
for that node. But the #xpath namespace argument is a hash, and
#namespace_scopes returns an array of Nokogiri::XML::Namespace. Do I
just need to manually turn this into a hash myself, or is there
something I'm missing?

#namespaces turns #namespace_scopes into a hash. This is what it does. This is what its rdoc string says it does. If it doesn't do what you want it to, can you provide a concrete use case (read: codes) so we can see what you're trying to accomplish?

Or maybe I want to use document.collect_namespaces... but then I've
still got to go remove all the "xmlns:" prefixes from it's keys before
I can just pass it to #xpath. I'm getting confused by all the
different methods for interrogating or supplying namespaces, which are
not consistent with each other in the data structures they use to
represent namespaces.

Funny story, tenderlove cornered Tim Bray at the bar at Rubyconf ... oh wait, I already told that story.

Look, the namespace API is complicated because namespaces are complicated, and people use them in complicated ways. If you have suggestions on how to improve or simplify, we're all ears.

Because (and I think I can speak for Aaron here, too) I deal with namespaces as rarely as I possibly can, I rely on users of Nokogiri to make helpful suggestions, or provide patches, or at the very least sparkling conversation about drunken encounters with members of the XML specification committee.

5. Putting it all together: My actual challenge

Here's what I'm actually trying to do, which led me to investigate all
that namespace API. I've got a somewhat complex nokogiri document,
which has namespace declerations on several nodes, all over the
place.

I want to take an _excerpt_ of this document, some particular
nokogiri_node and all it's descedents,and serialize this to XML. But
this serialization needs to include all the operative namespace
declerations from it's parents, but in the excerpted serialization
they need to be actually redeclared on the new parent node, right?

I can't quite figure out how to do this, without knowing all the
namespaces in advance -- or even WITH knowing all the namespaces in
advance, since I'm having trouble even figuring out how to explicitly
add a namespace such that it's xmlns decleration actually gets
serialized.

Let's iterate on the questions above and see if we can't get you there, eventually.

Jonathan Rochkind

unread,

Oct 26, 2010, 4:47:16 PM10/26/10

to nokogi...@googlegroups.com

Awesome, thanks for your quick response. I'm totally happy to file
issues, to suggest API changes, and even to supply patches when feasible
for me. But I'm still a bit confused, in some cases, about what is
"expected" behavior, and if I'm understanding the API correctly.

[And I agree that XML namespaces are a serious pain, but they are also
the only way to do certain things in the domain I work in, I can't just
strip em out, have to deal with it. Can't live with em, can't live
without em. ]

Thoughts and further questions inline....

Mike Dalessio wrote:
>
> 0. Time-saver in #xpath namespaces.
>

Will think about a useful API change, and try to submit a patch for it,
if I can find the time. Actually, I could think of the API change right
now, I'll try to file a ticket on it, understanding that it'll get dealt
with a lot faster if I can supply a patch too, which I may or may not be
able to get to.

>
> 1. #collect_namespaces. Depending on what instance of RDocs I'm
> looking at, this is documented as a method on XML::Document or
> XML::Node.
>

Okay, collect_namespaces _is_ on Document, _not_ on Node. And is
expectedly slow because it's recursively travelling the tree to find em.

One thought: When parsing the XML document, nokogiri is already
travelling past any xmlns declerations, would it be possible for it
simply to notice these as they go by, and build a cached
collected_namespaces hash then, so a dynamic traversing isn't required
on every call? One complication there is that a nokogiri
document/nodeset is editable after initial creation, so the cached
collected_namespaces would have to be updated if there's a namespace
change. Not neccesarily unfeasible, since every node already knows it's
#document though.

If you think this seems possible at all, I will add it to my list to try
and find time to create a patch.

>
>
>
> document.xpath("example:something") => [] # Note, NOT an exception,
> but it doesn't find the node
> document.xpath("unregistered:something") => raise SyntaxError
> Undefined namespace prefix
>
> document.root.xpath("example:something") => non-empty array
>

I will file a ticket for this one.

>
> 2. Serialization #to_xml
>
> It is unclear to me how to get xmlns attributes into the output
> #to_xml.

It's still unclear to me what the expected behavior is here. I'll file a
bug and/or try to come up with a patch once I understand it.

some_element # an XML::Node, not a document
some_element.to_xml => no problem
some_element.add_namespace("example", "http://www.example.org")

some_element.to_xml

Am I wrong to expect that the namespace you added with add_namespace
will be present in that #to_xml as an xmlns attribute on the
some_element tag? Or should I file a ticket suggesting that it ought
to be?

You say it may depend on the version of libxml I'm running -- I'm not
sure how to tell that (I just did a "gem install nokogiri" and it
worked), but first thing is figuring out what the expected/desired
behavior here is, I think?

>
> 3. Node#namespace_definitions vs #namespaces
>
> why does this sometimes return an empty array, when
> same_node#namespaces does not? Where are each of these actually
> coming from?
>
> I'm not sure I understand the question. Are the rdocs not clear? If so, how? Can you rephrase the question?
>

Yes, the rdocs are not clear. Actually, I'm having trouble loading
nokogiri.org right now, but the comments from the source you helpfully
linked to, one ruby one C (thanks!):

#namespaces: "Get a hash containing the Namespace definitions for this
Node."
#namespace_definitions: "returns a list of Namespace nodes defined on
_self_"

From this, I'd think that #namespaces and #namespace_definitions should
always return the same data, just in different form: Hash or
prefixes=>values in one case, and an array of XML::Namespace in the other.

However, I am finding that this is not always so, sometimes #namespaces
gives you a non-empty hash, but #namespace_definitions gives you an
empty array.

So sounds like a bug I should file?

>
>
> #namespaces turns #namespace_scopes into a hash. This is what it does. This is what its rdoc string says it does. If it doesn't do what you want it to, can you provide a concrete use case (read: codes) so we can see what you're trying to accomplish?
>

Sweet, this clarifies things. I did NOT see that in the rdoc. The RDoc
simply says "Get a hash containing the Namespace
<http://nokogiri.org/Nokogiri/XML/Namespace.html> definitions for this
Node <http://nokogiri.org/Nokogiri/XML/Node.html>", it doesn't say
anything about being based on #namespace_scopes.

So I think that means my above bug about #namespaces returning something
different than #namespace_definitions is NOT a bug, it makes sense
now. #namespaces is the hash version of #namespace_scopes, which
includes includes ancestor namespaces. While #namespace_definitions is
only those defined on self, not including ancestors. Okay, makes sense
now.

Are you seeing different RDoc than me?
http://nokogiri.org/Nokogiri/XML/Node.html#method-i-namespaces
"Get a hash containing the Namespace
<http://nokogiri.org/Nokogiri/XML/Namespace.html> definitions for this
Node <http://nokogiri.org/Nokogiri/XML/Node.html>".

Suggest this should say something like: "Returns a Hash of prefix=>value
definitions for all namespaces on this node and it's ancestors. A Hash
version of #namespace_scopes".

Okay, this will get me started. The only thing I'm still really
uncertain about is the 'right' way to modify an XML::Node to add a
namespace that will actually get serialized in #to_xml. Or am I wrong
to be trying to take an individual node and serialize #to_xml in the
first place, do I need to add it to a brand new XML::Document instead,
as root node, and serialize the whole Document? You follow?

Thanks again for your help, very helpful.

Mike Dalessio

unread,

Oct 27, 2010, 9:13:43 AM10/27/10

to nokogi...@googlegroups.com

On Tue, Oct 26, 2010 at 4:47 PM, Jonathan Rochkind <roch...@jhu.edu> wrote:

Awesome, thanks for your quick response. I'm totally happy to file issues, to suggest API changes, and even to supply patches when feasible for me. But I'm still a bit confused, in some cases, about what is "expected" behavior, and if I'm understanding the API correctly.

[And I agree that XML namespaces are a serious pain, but they are also the only way to do certain things in the domain I work in, I can't just strip em out, have to deal with it. Can't live with em, can't live without em. ]

Thoughts and further questions inline....

Mike Dalessio wrote:

0. Time-saver in #xpath namespaces.

Will think about a useful API change, and try to submit a patch for it, if I can find the time. Actually, I could think of the API change right now, I'll try to file a ticket on it, understanding that it'll get dealt with a lot faster if I can supply a patch too, which I may or may not be able to get to.

1. #collect_namespaces. Depending on what instance of RDocs I'm
looking at, this is documented as a method on XML::Document or
XML::Node.

Okay, collect_namespaces _is_ on Document, _not_ on Node. And is expectedly slow because it's recursively travelling the tree to find em.

One thought: When parsing the XML document, nokogiri is already travelling past any xmlns declerations, would it be possible for it simply to notice these as they go by, and build a cached collected_namespaces hash then, so a dynamic traversing isn't required on every call? One complication there is that a nokogiri document/nodeset is editable after initial creation, so the cached collected_namespaces would have to be updated if there's a namespace change. Not neccesarily unfeasible, since every node already knows it's #document though.

Nope. libxml is actually doing the parsing, not Nokogiri. And actually, part of why collect_namespaces is so slow is that Ruby objects are being created for every element (and namespace, and attribute) in the document before we can get the namespaces. There may be a faster way to do this in C without creating Ruby objects (patches welcome), but then keeping the namespaces in sync with the changing document turns this into a cache expiry problem, which is not in the scope of what Nokogiri should be doing as a parsing library.

If you think this seems possible at all, I will add it to my list to try and find time to create a patch.

document.xpath("example:something") => [] # Note, NOT an exception,
but it doesn't find the node
document.xpath("unregistered:something") => raise SyntaxError
Undefined namespace prefix

document.root.xpath("example:something") => non-empty array

I will file a ticket for this one.

2. Serialization #to_xml

It is unclear to me how to get xmlns attributes into the output
#to_xml.

It's still unclear to me what the expected behavior is here. I'll file a bug and/or try to come up with a patch once I understand it.

some_element # an XML::Node, not a document
some_element.to_xml => no problem
some_element.add_namespace("example", "http://www.example.org")

some_element.to_xml

Am I wrong to expect that the namespace you added with add_namespace will be present in that #to_xml as an xmlns attribute on the some_element tag? Or should I file a ticket suggesting that it ought to be?

You say it may depend on the version of libxml I'm running -- I'm not sure how to tell that

"nokogiri -v" or inside ruby, "puts Nokogiri::VERSION_INFO.inspect"

(I just did a "gem install nokogiri" and it worked), but first thing is figuring out what the expected/desired behavior here is, I think?

It sounds like you know what the desired behavior is -- namespaces should be in the output.

3. Node#namespace_definitions vs #namespaces

why does this sometimes return an empty array, when
same_node#namespaces does not? Where are each of these actually
coming from?

I'm not sure I understand the question. Are the rdocs not clear? If so, how? Can you rephrase the question?

Yes, the rdocs are not clear. Actually, I'm having trouble loading nokogiri.org right now,

Heroku had some downtime yesterday.

but the comments from the source you helpfully linked to, one ruby one C (thanks!):

#namespaces: "Get a hash containing the Namespace definitions for this Node."
#namespace_definitions: "returns a list of Namespace nodes defined on _self_"

From this, I'd think that #namespaces and #namespace_definitions should always return the same data, just in different form: Hash or prefixes=>values in one case, and an array of XML::Namespace in the other.

However, I am finding that this is not always so, sometimes #namespaces gives you a non-empty hash, but #namespace_definitions gives you an empty array.

So sounds like a bug I should file?

Yes please. With working code as per the guidelines. Thanks!

#namespaces turns #namespace_scopes into a hash. This is what it does. This is what its rdoc string says it does. If it doesn't do what you want it to, can you provide a concrete use case (read: codes) so we can see what you're trying to accomplish?

Sweet, this clarifies things. I did NOT see that in the rdoc. The RDoc simply says "Get a hash containing the Namespace <http://nokogiri.org/Nokogiri/XML/Namespace.html> definitions for this Node <http://nokogiri.org/Nokogiri/XML/Node.html>", it doesn't say anything about being based on #namespace_scopes.

So I think that means my above bug about #namespaces returning something different than #namespace_definitions is NOT a bug, it makes sense now. #namespaces is the hash version of #namespace_scopes, which includes includes ancestor namespaces. While #namespace_definitions is only those defined on self, not including ancestors. Okay, makes sense now.

Are you seeing different RDoc than me? http://nokogiri.org/Nokogiri/XML/Node.html#method-i-namespaces
"Get a hash containing the Namespace <http://nokogiri.org/Nokogiri/XML/Namespace.html> definitions for this Node <http://nokogiri.org/Nokogiri/XML/Node.html>".

Suggest this should say something like: "Returns a Hash of prefix=>value definitions for all namespaces on this node and it's ancestors. A Hash version of #namespace_scopes".

captured this suggestion in http://github.com/tenderlove/nokogiri/issues/issue/358

Okay, this will get me started. The only thing I'm still really uncertain about is the 'right' way to modify an XML::Node to add a namespace that will actually get serialized in #to_xml. Or am I wrong to be trying to take an individual node and serialize #to_xml in the first place, do I need to add it to a brand new XML::Document instead, as root node, and serialize the whole Document? You follow?

Thanks again for your help, very helpful.

--
You received this message because you are subscribed to the Google Groups "nokogiri-talk" group.
To post to this group, send email to nokogi...@googlegroups.com.
To unsubscribe from this group, send email to nokogiri-tal...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/nokogiri-talk?hl=en.

Jonathan Rochkind

unread,

Oct 27, 2010, 10:17:09 AM10/27/10

to nokogi...@googlegroups.com

I'm going to prepare a bunch of documentation improvement suggestions,
now that I think I understand what's going on, would like to provide
some slightly improved documentation to save those following in my path
some time. Would you like that in one combined ticket, can I just add it
to the ticket you already created?

I'm still a bit confused about whether add_namespace _should_ add an
attribute though. I'm going to play around with all the various nokogiri
namespace options and how they interact, before feeling more sure of
what I think the behavior should be.

Mike Dalessio wrote:

> Yes, the rdocs are not clear. Actually, I'm having trouble loading nokogiri.org<http://nokogiri.org> right now,

>
> Heroku had some downtime yesterday.
>
> but the comments from the source you helpfully linked to, one ruby one C (thanks!):
>
> #namespaces: "Get a hash containing the Namespace definitions for this Node."
> #namespace_definitions: "returns a list of Namespace nodes defined on _self_"
>
> From this, I'd think that #namespaces and #namespace_definitions should always return the same data, just in different form: Hash or prefixes=>values in one case, and an array of XML::Namespace in the other.
>
> However, I am finding that this is not always so, sometimes #namespaces gives you a non-empty hash, but #namespace_definitions gives you an empty array.
>
> So sounds like a bug I should file?
>
> Yes please. With working code as per the guidelines. Thanks!
>
>
>
>
> #namespaces turns #namespace_scopes into a hash. This is what it does. This is what its rdoc string says it does. If it doesn't do what you want it to, can you provide a concrete use case (read: codes) so we can see what you're trying to accomplish?
>
> Sweet, this clarifies things. I did NOT see that in the rdoc. The RDoc simply says "Get a hash containing the Namespace <http://nokogiri.org/Nokogiri/XML/Namespace.html> definitions for this Node <http://nokogiri.org/Nokogiri/XML/Node.html>", it doesn't say anything about being based on #namespace_scopes.
>
> So I think that means my above bug about #namespaces returning something different than #namespace_definitions is NOT a bug, it makes sense now. #namespaces is the hash version of #namespace_scopes, which includes includes ancestor namespaces. While #namespace_definitions is only those defined on self, not including ancestors. Okay, makes sense now.
>
> Are you seeing different RDoc than me? http://nokogiri.org/Nokogiri/XML/Node.html#method-i-namespaces
> "Get a hash containing the Namespace <http://nokogiri.org/Nokogiri/XML/Namespace.html> definitions for this Node <http://nokogiri.org/Nokogiri/XML/Node.html>".
>
> Suggest this should say something like: "Returns a Hash of prefix=>value definitions for all namespaces on this node and it's ancestors. A Hash version of #namespace_scopes".
>
> captured this suggestion in http://github.com/tenderlove/nokogiri/issues/issue/358
>
>
> Okay, this will get me started. The only thing I'm still really uncertain about is the 'right' way to modify an XML::Node to add a namespace that will actually get serialized in #to_xml. Or am I wrong to be trying to take an individual node and serialize #to_xml in the first place, do I need to add it to a brand new XML::Document instead, as root node, and serialize the whole Document? You follow?
>
> Thanks again for your help, very helpful.
>
>
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups "nokogiri-talk" group.

> To post to this group, send email to nokogi...@googlegroups.com<mailto:nokogi...@googlegroups.com>.
> To unsubscribe from this group, send email to nokogiri-tal...@googlegroups.com<mailto:nokogiri-talk%2Bunsu...@googlegroups.com>.

Mike Dalessio

unread,

Oct 27, 2010, 10:48:07 AM10/27/10

to nokogi...@googlegroups.com

On Wed, Oct 27, 2010 at 10:17 AM, Jonathan Rochkind <roch...@jhu.edu> wrote:

I'm going to prepare a bunch of documentation improvement suggestions, now that I think I understand what's going on, would like to provide some slightly improved documentation to save those following in my path some time. Would you like that in one combined ticket, can I just add it to the ticket you already created?

I'd really like it if you rewrote the doc strings in a fork and submitted a pull request. :) :) :)

Jonathan Rochkind

unread,

Oct 27, 2010, 11:02:43 AM10/27/10

to nokogi...@googlegroups.com

Mike Dalessio wrote:
> I'd really like it if you rewrote the doc strings in a fork and submitted a pull request. :) :) :)
>

Will do, no problem.

I feel like at some point I saw a general overview on working with XML
namespaces in nokogiri -- but now I can't find it. In my memory it
wasn't just part of the rdocs, but a seperate website page or
something. But maybe that was just someone's personal blog, and not
actually part of the official nokogiri docs? I think a paragraph of
intro on working with namespaces in nokogiri would be helpful, any
suggestion of where it should go?

>
> I'm still a bit confused about whether add_namespace _should_ add an attribute though. I'm going to play around with all the various nokogiri namespace options and how they interact, before feeling more sure of what I think the behavior should be.
>
> Mike Dalessio wrote:

> Yes, the rdocs are not clear. Actually, I'm having trouble loading nokogiri.org<http://nokogiri.org><http://nokogiri.org> right now,

>
>
> Heroku had some downtime yesterday.
>
> but the comments from the source you helpfully linked to, one ruby one C (thanks!):
>
> #namespaces: "Get a hash containing the Namespace definitions for this Node."
> #namespace_definitions: "returns a list of Namespace nodes defined on _self_"
>
> From this, I'd think that #namespaces and #namespace_definitions should always return the same data, just in different form: Hash or prefixes=>values in one case, and an array of XML::Namespace in the other.
>
> However, I am finding that this is not always so, sometimes #namespaces gives you a non-empty hash, but #namespace_definitions gives you an empty array.
>
> So sounds like a bug I should file?
>
> Yes please. With working code as per the guidelines. Thanks!
>
>
>
>
> #namespaces turns #namespace_scopes into a hash. This is what it does. This is what its rdoc string says it does. If it doesn't do what you want it to, can you provide a concrete use case (read: codes) so we can see what you're trying to accomplish?
>
> Sweet, this clarifies things. I did NOT see that in the rdoc. The RDoc simply says "Get a hash containing the Namespace <http://nokogiri.org/Nokogiri/XML/Namespace.html> definitions for this Node <http://nokogiri.org/Nokogiri/XML/Node.html>", it doesn't say anything about being based on #namespace_scopes.
>
> So I think that means my above bug about #namespaces returning something different than #namespace_definitions is NOT a bug, it makes sense now. #namespaces is the hash version of #namespace_scopes, which includes includes ancestor namespaces. While #namespace_definitions is only those defined on self, not including ancestors. Okay, makes sense now.
>
> Are you seeing different RDoc than me? http://nokogiri.org/Nokogiri/XML/Node.html#method-i-namespaces
> "Get a hash containing the Namespace <http://nokogiri.org/Nokogiri/XML/Namespace.html> definitions for this Node <http://nokogiri.org/Nokogiri/XML/Node.html>".
>
> Suggest this should say something like: "Returns a Hash of prefix=>value definitions for all namespaces on this node and it's ancestors. A Hash version of #namespace_scopes".
>
> captured this suggestion in http://github.com/tenderlove/nokogiri/issues/issue/358
>
>
> Okay, this will get me started. The only thing I'm still really uncertain about is the 'right' way to modify an XML::Node to add a namespace that will actually get serialized in #to_xml. Or am I wrong to be trying to take an individual node and serialize #to_xml in the first place, do I need to add it to a brand new XML::Document instead, as root node, and serialize the whole Document? You follow?
>
> Thanks again for your help, very helpful.
>
>
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups "nokogiri-talk" group.

> To post to this group, send email to nokogi...@googlegroups.com<mailto:nokogi...@googlegroups.com><mailto:nokogi...@googlegroups.com<mailto:nokogi...@googlegroups.com>>.
> To unsubscribe from this group, send email to nokogiri-tal...@googlegroups.com<mailto:nokogiri-talk%2Bunsu...@googlegroups.com><mailto:nokogiri-talk%2Bunsu...@googlegroups.com<mailto:nokogiri-talk%252Buns...@googlegroups.com>>.

Mike Dalessio

unread,

Oct 27, 2010, 11:06:22 AM10/27/10

to nokogi...@googlegroups.com

On Wed, Oct 27, 2010 at 11:02 AM, Jonathan Rochkind <roch...@jhu.edu> wrote:

Mike Dalessio wrote:

I'd really like it if you rewrote the doc strings in a fork and submitted a pull request. :) :) :)

Will do, no problem.
I feel like at some point I saw a general overview on working with XML namespaces in nokogiri -- but now I can't find it. In my memory it wasn't just part of the rdocs, but a seperate website page or something. But maybe that was just someone's personal blog, and not actually part of the official nokogiri docs? I think a paragraph of intro on working with namespaces in nokogiri would be helpful, any suggestion of where it should go?

Maybe http://nokogiri.org/tutorials/searching_a_xml_html_document.html ?

These tutorials are open-sourced as well: http://github.com/flavorjones/nokogiri.org-tutorials

Again, a fork and a pull request would be greatly appreciated. If you have any questions about how to operate the tutorials' ad-hoc ruby code templating, feel free to ask, either on this list or off-list.

Jonathan Rochkind

unread,

Oct 27, 2010, 3:37:15 PM10/27/10

to nokogi...@googlegroups.com

So as I'm trying to go through all the namespace related methods and
understand them, with the plan of documenting them a bit better, I'm
running into a few things that still confuse me, or that seem missing.

There is a #default_namespace= method. But there is no default_namespace
method. There seems to be no way to find out what the default_namespace
is currently set to. Am I missing anything in thinking there should
be, and filing a patch for it?

Note that #namespace_definitions, #namespace_scopes, and #namespaces do
NOT include any default namespaces. So that's not a way to find the
default namespace. I am not sure if they _should_ include
default_namespaces or not.

But I get at this point that Mike does not use XML namespaces, and has
basically no opinion on what the API should be. I guess if anyone else
happens to be reading this, and has an opinion, please share it.
Otherwise I guess I'll just think some more, and file a ticket if I wind
up leaning toward a change being advisable. Just floating it on the
listserv first.

Jonathan Rochkind

unread,

Oct 27, 2010, 3:49:47 PM10/27/10

to nokogi...@googlegroups.com

Oops, I'm wrong, I think I just discovered it. #namespace will show you
the default namespace for a node, as a Namespace object.

So we have #namespace and #namespace= to get and set the default
namespace as a Namespace object.

And then we have #default_namespace to get the default namespace as a
String. But we're missing #default_namespace= to set the default
namespace as a string. So I'm thinking that should be there for full
parallelism, to make things work how a developer would expect.

I think the method choice names are confusing, its' hard to
guess/remember that "namespace" means "default namespace as a Namespace
obj" and "default_namespace" means "default namespace as a Namespace
node." This makes things hard on a develoepr figuring out/remembering
the API.

But I'm not sure what can be done here, without breaking backwards
compatibility. Maybe other than providing additional aliases for one or
both or something. Mike, how would you feel about a ticket to set up
some method name aliases to provide some more consistent naming of a
bunch of namespace related methods? This issue isn't the only place
that parallelism/obviousness in method names kind of breaks down.

Reply all

Reply to author

Forward

0 new messages