Namespaced XML support

421 views
Skip to first unread message

Herwig Hochleitner

unread,
Apr 1, 2014, 6:41:23 AM4/1/14
to cloju...@googlegroups.com
Hello Everyone,

I'm writing a little WebDAV frontend for a custom storage server, so I decided to push data.xml's namespaced XML support up to snuff.

There is a ticket and design page for this:

However, I've taken a fresh approach, which, I think, will fit well with how things are done in clojure. Namespaced xml currently is the sore thumb of clojure data support and as you will see, it's easily fixed if we manage to agree on an interface. In addition to feedback on my current work, I'm soliciting propositions on which API we want _specifically_ for namespaced XML (Step 3), because the rest is pretty much reasoned from design constraints.

I'll summarize my current work (https://github.com/bendlas/data.xml) here and if people think it's the right direction, I'll update the design page and finish up.

Step 1: Getting namespaced XML to roundtrip properly
-----------

This is done by just mapping keyword namespaces <-> xmlns prefixes.
With that it's possible to emit (or parse) broken XML with bogus namespaces and you have complete control over the generated prefixes. This is pretty much the current interface, bugfixed and I've already attached a patch for this to DXML-4.

eg.
<element xmlns:A="AURI:"><A:foo B:attr="..." /></element>
<!-->
{:tag :element :attrs {:xmlns/A "AURI"} :content [{:tag :A/foo :attrs {:B/attr "..."}]}

Note how this is already sufficient to do any kind of namespaced XML processing and it's a great baseline representation because of full roundtripability, but it requires consumers to resolve prefixes. So we need another representation, where names are qualified by URI.

Step 2: Representing uri-namespaced names directly
----------

Current work uses a custom defrecord XmlName as the data structure, but javax.xml.namespace.QName seems fully appropriate, so will use that as of next revision.

URI namespaced names can be used in place of keywords in an xml tree, appropriate xmlns* declarations have to be in place in for emitting, in order to assign a prefix to the name.

When parsing, keywords can be replaced with those names, either by parser option or by a separate tree walker, that keeps track of xmlns* attributes.

Step 3: Helpers
-----------

Currently, I've implemented the following helpers for namespaced XML:

- resolve-name: generate uri-namespaced name from prefixed name
- walk-resolve-names: the tree walker mentioned in Step 2
- walk-cleanup-prefixes: remove redundant prefixes (multiple bound to same uri)
- with-xmlns (macro): syntactically replace namespaced keywords with xml-names in source code
      The primary purpose of this is to denote xml-names in query functions et al. To generate straight namespaced xml fragments, just declare appropriate xmlns* attributes on it.

More possibilities:

- find the minimal set of necessary xmlns uris to declare on a root element, in order to be able to emit the whole fragment. This will be non-lazy, since it adds a pass before actualy emitting. For large xml data, it's recommended to statically add possible xmlns* attributes and emit lazy with walk-cleanup-prefixes.

- an xml zipper giving access to the namespace environment at each location

thanks for taking the time to review this, let's make clojure the best choice for XML processing aswell!

cheers

Ryan Senior

unread,
Apr 1, 2014, 8:28:38 AM4/1/14
to cloju...@googlegroups.com
Hi Herwig,

I'm excited to see this work being done! There have been several people looking for it. I have some questions in-line below. I will pull down your code and give it a try soon.


On Tue, Apr 1, 2014 at 5:41 AM, Herwig Hochleitner <hhochl...@gmail.com> wrote:
Hello Everyone,

I'm writing a little WebDAV frontend for a custom storage server, so I decided to push data.xml's namespaced XML support up to snuff.

There is a ticket and design page for this:

However, I've taken a fresh approach, which, I think, will fit well with how things are done in clojure. Namespaced xml currently is the sore thumb of clojure data support and as you will see, it's easily fixed if we manage to agree on an interface. In addition to feedback on my current work, I'm soliciting propositions on which API we want _specifically_ for namespaced XML (Step 3), because the rest is pretty much reasoned from design constraints.

I'll summarize my current work (https://github.com/bendlas/data.xml) here and if people think it's the right direction, I'll update the design page and finish up.

Step 1: Getting namespaced XML to roundtrip properly
-----------

This is done by just mapping keyword namespaces <-> xmlns prefixes.
With that it's possible to emit (or parse) broken XML with bogus namespaces and you have complete control over the generated prefixes. This is pretty much the current interface, bugfixed and I've already attached a patch for this to DXML-4.

eg.
<element xmlns:A="AURI:"><A:foo B:attr="..." /></element>
<!-->
{:tag :element :attrs {:xmlns/A "AURI"} :content [{:tag :A/foo :attrs {:B/attr "..."}]}

Note how this is already sufficient to do any kind of namespaced XML processing and it's a great baseline representation because of full roundtripability, but it requires consumers to resolve prefixes.


You are saying that consumers need to resolve prefixes in that given a position in the XML tree and a namespaced keyword (i.e. :xmlns/A) what does that resolve to at this position in the document? There's not a document-level only map of this resolution? I'm mostly trying to figure out how you're dealing with the same namespace defined multiple times (with different URIs) in a document. 
 

So we need another representation, where names are qualified by URI.

Step 2: Representing uri-namespaced names directly
----------

Current work uses a custom defrecord XmlName as the data structure, but javax.xml.namespace.QName seems fully appropriate, so will use that as of next revision.

URI namespaced names can be used in place of keywords in an xml tree, appropriate xmlns* declarations have to be in place in for emitting, in order to assign a prefix to the name.

When parsing, keywords can be replaced with those names, either by parser option or by a separate tree walker, that keeps track of xmlns* attributes.


I like this. So the user will be able to opt-in to getting full URIs rather than the namespaced keywords from the "parser". Although looking at the URIs is a bit ugly, I think it's the only way we can support both streaming and namespaces 

 

Step 3: Helpers
-----------

Currently, I've implemented the following helpers for namespaced XML:

- resolve-name: generate uri-namespaced name from prefixed name


Maybe this is the function that answers my question from step 1?

 
- walk-resolve-names: the tree walker mentioned in Step 2
- walk-cleanup-prefixes: remove redundant prefixes (multiple bound to same uri)
- with-xmlns (macro): syntactically replace namespaced keywords with xml-names in source code
      The primary purpose of this is to denote xml-names in query functions et al. To generate straight namespaced xml fragments, just declare appropriate xmlns* attributes on it.

More possibilities:

- find the minimal set of necessary xmlns uris to declare on a root element, in order to be able to emit the whole fragment. This will be non-lazy, since it adds a pass before actualy emitting. For large xml data, it's recommended to statically add possible xmlns* attributes and emit lazy with walk-cleanup-prefixes.


How does emitting XML that contains namespace work now in your fork?
 
- an xml zipper giving access to the namespace environment at each location

thanks for taking the time to review this, let's make clojure the best choice for XML processing aswell!

cheers

--
You received this message because you are subscribed to the Google Groups "Clojure Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure-dev...@googlegroups.com.
To post to this group, send email to cloju...@googlegroups.com.
Visit this group at http://groups.google.com/group/clojure-dev.
For more options, visit https://groups.google.com/d/optout.

Alex Miller

unread,
Apr 1, 2014, 9:26:05 AM4/1/14
to cloju...@googlegroups.com
Thanks for working on this Herwig, very glad to see someone digging in on it.

While I have not looked at the code much, the words in Step 2 make me ask whether there should be a QualifiedName protocol that is extended to QName instead of a specific concrete type?

In Step 3, hearing tree walkers sounds like something that precludes streaming, which is something the current impl can do (and is really important). To what extent has that been affected for non-namespaced XML and to what extent is it now possible with namespaced XML?


--

Herwig Hochleitner

unread,
Apr 1, 2014, 10:46:45 AM4/1/14
to cloju...@googlegroups.com
Hello Alex, Ryan,

thanks for taking me up on this!
First, a question: I'd like to implement an #xml/name reader tag, in order to be able to directly emit QNames from the with-xmlns macro.
Can you advise on how to deliver a reader tag with a contrib library?

Concerning the issue with tree-walkers vs streaming: Those seem like mutually exclusive at first, but consider how long xml documents generally don't grow in depth with more data added, there are only more (often flat) sibling elements added, at some maximum depth. So it's basically a lazy-seq within a couple of containers. A lazy tree walker should behave nicely for a one-pass algorithm, no?

2014-04-01 14:28 GMT+02:00 Ryan Senior <senio...@gmail.com>:

You are saying that consumers need to resolve prefixes in that given a position in the XML tree and a namespaced keyword (i.e. :xmlns/A) what does that resolve to at this position in the document? There's not a document-level only map of this resolution? I'm mostly trying to figure out how you're dealing with the same namespace defined multiple times (with different URIs) in a document. 

:xmlns/A in particular would resolve to the name A in the namespace "http://www.w3.org/2000/xmlns/" (because the "xmlns" prefix is not assignable)
In general, one would need a namespace context to resolve the uri of that name. More on namespace contexts below.

The deal with multiple prefixes as such is, that in the base representation, names are just prefixed keywords, like in serialized xml.
In the second representation, names are resolved to QName, which consider just the name and uri for equality, thus you can compare names from different documents and such.
When serializing a QName, the method NamespaceContext.getPrefix is used to select a currently active prefix for the uri.

Thus, one can arbitrarily combine resolved xml fragments and emit them. Generally, a user would make sure to introduce all ns uris in the root element.
I'm considering making it an error to emit a name whose uri has not been introduced. It would also work to allow it and just introduce the uri with the element itself, but that could cause massive overhead if you include a long list of children from a foreign document. It will at least be a warning.

I like this. So the user will be able to opt-in to getting full URIs rather than the namespaced keywords from the "parser". Although looking at the URIs is a bit ugly, I think it's the only way we can support both streaming and namespaces

As I said above, I think most streaming use cases could be satisfied with just the tree walker, but with an added parser option we can save on allocation.

- resolve-name: generate uri-namespaced name from prefixed name


Maybe this is the function that answers my question from step 1?

In part, walk-resolve-names is the tree walker that reimplements namespacing from just xmlns* attrs and prefixed keywords;
The parser and emitter are able to go from/to representation 2 directly. The tree walkers are for users that need sophisticated namespacing without going through the parser, i.e. want to go to representation 2 from 1 and vice versa.
That also means a custom Namespace Context implementation, that can be updated by tree walkers.
 
How does emitting XML that contains namespace work now in your fork?

In the base case, prefixed keywords get emitted verbatim.
So it's possible to generate ill-formed xml, by using undeclared prefixes. That's half a feature (the actual feature is backwards compatibility, aswell as tight control)
Luckily, the only case when this is nessecary is when you actually want to use undeclared prefixes.
If you just want to rename prefixes, you can resolve the names, assoc new xmlns* attributes onto the root element and use walk-cleanup-prefixes. The emitter will the emit the resolved names
with the right prefix. Of course, there can be a compound operation in the API.

2014-04-01 15:26 GMT+02:00 Alex Miller <al...@puredanger.com>:
While I have not looked at the code much, the words in Step 2 make me ask whether there should be a QualifiedName protocol that is extended to QName instead of a specific concrete type?

Funny that you mention that. I have just introduced an XmlName protocol into my local branch, to handle strings, keywords and QNames. Will ping here when I push it.
 
In Step 3, hearing tree walkers sounds like something that precludes streaming, which is something the current impl can do (and is really important). To what extent has that been affected for non-namespaced XML and to what extent is it now possible with namespaced XML?

A basic tree walker follows the form

(fn walk [tree op]
   {:tag (op :tag (:tag tree))
    :attrs (op :attrs (:attrs tree))
    :children (map #(walk % op) (:children tree))})

notice the implicit lazy-seq in map. As mentioned above, that should behave well for documents with bounded depth, as long as the algorithm can let go of the old element before processing children.
Still, the tree walkers are optional passes and not necessary for resolving names when parsing an xml stream.

kind regards

Chouser

unread,
Apr 1, 2014, 11:27:55 AM4/1/14
to cloju...@googlegroups.com
Herwig, thanks for stepping up and tackling this problem. After a long
break from it, I've been delving into the realm of XML again recently,
and have needed namespace support. Unfortunately for all of you, that
means I have opinions to impose on the conversation.

First, I very much like the idea of using the fully qualified URI in
tag names (and sometimes in attributes as well, but I'll come to that
in a sec). One key benefit is that a fully qualified node can be
lifted out of one XML snippet and dropped into another without
becoming undefined or incorrect. Another way of saying this is that
any two elements or trees of elements in Clojure would give the same
answer to Clojure's = as xpath's deep-equal function. The deep-equal
function (http://www.w3.org/TR/2005/CR-xpath-functions-20051103/#func-deep-equal)
is the most widely used sense of XML equality, and I think matching
that is a valuable property.

So, as soon as a code base starts manipulating any XML data that uses
just xmlns aliases (prefixes) without the full URI being attached, it
now has two *different kinds* of XML data that will behave
differently, respond to equality comparisons differently, etc. Note
that the "lifting" I referred to in the previous paragraph can be as
innocent an operation as taking an XML element and returning one of
its child-elements. So I'm nervous about *ever* producing aliased
(prefixed) tag names. If we must do so, I think there should be big
warnings in the docstrings of the related functions about their
unsuitability for use in various circumstances.

Attributes are a little different in that they live in a map and a
MapEntry is not a normal thing to pass around in an application.
Combined with the fact that the namespace of an attribute is almost
always the same as that of its element, perhaps the namespace of
attributes can be elided except when they differ from their element.
If this is done consistently, then we still have a canonical format
such that Clojure equality will match xpath equality.

Now for the element and attribute names themselves--why not continue to
use keywords? :http://foo.bar/namespace/tag is a valid Clojure
keyword, already has reader and printer support. And perhaps more
importantly keywords already have alias support, so that we can use
Clojure's existing reader aliases to say ::foo/tag *anywhere* instead
of having to wrap uses of aliases in a call to a macro like
with-xmlns. This would also obviate the need for a new protocol of the
sort Alex suggested (XmlName). It is unfortunate that the original
alias or prefix supplied by a parsed document can't be hung off the
keyword itself since they don't support metadata, but I think metadata
on the element object itself would be sufficient to collect all the
relevant namespace prefixes used to support round-tripped XML reusing
the same alises as the original document. What do you think?

--Chouser
--
--Chouser

Herwig Hochleitner

unread,
Apr 1, 2014, 5:13:45 PM4/1/14
to cloju...@googlegroups.com
2014-04-01 17:27 GMT+02:00 Chouser <cho...@n01se.net>:
Unfortunately for all of you, that
means I have opinions to impose on the conversation.

Au contraire, I highly value your opinion.

First, I very much like the idea of using the fully qualified URI in
tag names (and sometimes in attributes as well, but I'll come to that
in a sec). One key benefit is that a fully qualified node can be
lifted out of one XML snippet and dropped into another without
becoming undefined or incorrect. Another way of saying this is that
any two elements or trees of elements in Clojure would give the same
answer to Clojure's = as xpath's deep-equal function. The deep-equal
function (http://www.w3.org/TR/2005/CR-xpath-functions-20051103/#func-deep-equal)
is the most widely used sense of XML equality, and I think matching
that is a valuable property.

I completely agree with you on this, and I envision almost all use cases that involve parsing to use name resolution.
I've been thinking about xml tree equality, and the main issue I see with it are different or unused xmlns attributes. Do you know how xpath's deep-equals handles those?
 
So, as soon as a code base starts manipulating any XML data that uses
just xmlns aliases (prefixes) without the full URI being attached, it
now has two *different kinds* of XML data that will behave
differently, respond to equality comparisons differently, etc. Note
that the "lifting" I referred to in the previous paragraph can be as
innocent an operation as taking an XML element and returning one of
its child-elements. So I'm nervous about *ever* producing aliased
(prefixed) tag names. If we must do so, I think there should be big
warnings in the docstrings of the related functions about their
unsuitability for use in various circumstances.

The decision to introduce two different kinds of XML was deliberate, the reason is twofold:

1) XML applications actually deal with two kinds of XML: The serialized stream with prefixes and the tree of resolved elements.
  I agree that for regular use cases, the serialization should be abstracted away.
  On the other hand it's not unheard of to need to produce or consume "other" kinds of xml.
  e.g. I've encountered wsdls where prefixes were bound to the wrong uri, or one might need to assign specific prefixes for legacy consumers.
  I felt it was more natural to deal with such issues at the serialization level

2) Emitting straight prefixes for keyword names is current behavior, if we were to store ns-uris in keyword namespaces, it would be a breaking change.

The decision to allow emitting mixed forms was made because as detailed earlier, producers will mostly deal with the set of imported ns-uris at the root element, so the user should be concious about the fact that if they emit straight prefixes, they better be defined.
 
Attributes are a little different in that they live in a map and a
MapEntry is not a normal thing to pass around in an application.
Combined with the fact that the namespace of an attribute is almost
always the same as that of its element, perhaps the namespace of
attributes can be elided except when they differ from their element.
If this is done consistently, then we still have a canonical format
such that Clojure equality will match xpath equality.

The issue goes away in resolved xml. When dealing with unresolved xml, prefixes are represented as is. If there is an unprefixed name, the current behavior is to emit/resolve the name in the default namespace (the innermost xmlns="") attribute, which matches xml behavior.

Now for the element and attribute names themselves--why not continue to
use keywords? :http://foo.bar/namespace/tag is a valid Clojure
keyword, already has reader and printer support.

- A keyword like this is literally un-read-able (with the intended meaning)
  (namespace :http://foo.bar/namespace/tag) => "http:"
  (name :http://foo.bar/namespace/tag) => "/foo.bar/namespace/tag"

- :prefix/bar has established meaning to emit and it's useful to have control at this level

- If the user types xml into her .clj file, she uses prefixes, much like with a verbatim xml string. e.g. in hiccup syntax
  [:D/propfind {:xmlns/D "DAV:"}
    [:D/prop ...] ...]
or
  [:propfind {:xmlns "DAV:"}
    [:prop ...] ...]

And perhaps more
importantly keywords already have alias support, so that we can use
Clojure's existing reader aliases to say ::foo/tag *anywhere* instead
of having to wrap uses of aliases in a call to a macro like
with-xmlns. This would also obviate the need for a new protocol of the
sort Alex suggested (XmlName). It is unfortunate that the original
alias or prefix supplied by a parsed document can't be hung off the
keyword itself since they don't support metadata, but I think metadata
on the element object itself would be sufficient to collect all the
relevant namespace prefixes used to support round-tripped XML reusing
the same alises as the original document. What do you think?

The alias support is an interesting possibility, I hadn't thought of (TBH, I didn't even know of that).
Still even without the metadata issue, using keywords to represent resolved xml names feels too much
like pounding a square peg into a round hole.
Keyword is a perfect fit for representing a literal D:foo as :D/foo,
but it doesn't cut it for denoting qualified names.

QName is a perfect match for that, it's:
- immutable
- understood by other xml libraries
- capable of storing the original prefix as a hint (for a prefix-set-finding tree-walker?)

Also with the appropriate reader tag, it will be
(= #xml/name {:uri "DAV:" :prefix "D" :name "propfind"}
   #xml/name "DAV:D:propfind"
   #xml/name "DAV:propfind")

Your example name would be #xml/name "http://foo.bar/namespace/tag"

I'd like to answer the point about metadata on the element object instead of directly representing resolved names with an echo of your own sentiment "... to say ::foo/tag *anywhere* instead of having to wrap uses ...": In this context it means: From the point that an xml tree is resolved onwards, it makes sense to be able to just take an element tag and use it in a new context, without having to carry over metadata.

Ad with-xmlns: As demonstrated above, denoting a namespaced xml-tree doesn't need to use with-xmlns

with-xmlns is meant for generate arbitrary clojure code with embedded QNames.
Think enlive's tag= function et.al. Wrapping such code in (with-xmlns {"D" "DAV:"} ...) at toplevel makes the meaning very explicit.

In summary, I think namespaced keywords should continue to be accepted as prefixed xml-names by data.xml, and resolved names should get a new datatype with a reader tag.
Prefixes should be controlled by setting regular xmlns attributes into the tree and and the user should use :prefix/names outside of a lexically enclosing :xmlns* at his own peril.

Does that make sense?

kind regards 

Herwig Hochleitner

unread,
Apr 1, 2014, 6:37:51 PM4/1/14
to cloju...@googlegroups.com
Pardon me, the example: #xml/name "DAV:D:propfind" is bogus. It could be represented as #xml/name ["D" "DAV:propfind], but not with a variant of uri syntax.

It's debateable whether #xml/name "DAV:propfind" should parse 
as #xml/name {:prefix "DAV" :name "propfind"}
or #xml/name {:uri "DAV:" :name "propfind"},
but definitely not both.

Another addendum:
A change, I didn't mention in this thread, because I did it as part of the roundtrip-fixes already attached to DXML-4:
IIRC, in the current release, {:tag "http://foo.bar/namespace/tag"} actually does parse as "http:" "/foo.bar/namespace/tag", which I changed to a .lastIndexOf

Since this is a breaking change to how data.xml interprets strings as xml-names, maybe we can find a good string encoding for uri + name + optional prefix, that is not a hack?

kind regards 

Christophe Grand

unread,
Apr 2, 2014, 3:45:55 AM4/2/14
to clojure-dev
On Tue, Apr 1, 2014 at 5:27 PM, Chouser <cho...@n01se.net> wrote:
First, I very much like the idea of using the fully qualified URI in
tag names (and sometimes in attributes as well, but I'll come to that
in a sec). One key benefit is that a fully qualified node can be
lifted out of one XML snippet and dropped into another without
becoming undefined or incorrect. Another way of saying this is that
any two elements or trees of elements in Clojure would give the same
answer to Clojure's = as xpath's deep-equal function. The deep-equal
function (http://www.w3.org/TR/2005/CR-xpath-functions-20051103/#func-deep-equal)
is the most widely used sense of XML equality, and I think matching
that is a valuable property.

I totally second Chouser here and that's why the mapping between uris and attributes was put in metadat in http://dev.clojure.org/display/DXML/Fuller+XML+support (please see my recent comment as I recently tried to implement the proposal and hit a limiattion)
 
Attributes are a little different in that they live in a map and a
MapEntry is not a normal thing to pass around in an application.
Combined with the fact that the namespace of an attribute is almost
always the same as that of its element, perhaps the namespace of
attributes can be elided except when they differ from their element.

Attributes are even more special: unlike elements, non-prefixed attributes do not resolve to the default ns or even to their elemeent ns: they should stay unqualified.
 
Now for the element and attribute names themselves--why not continue to
use keywords? :http://foo.bar/namespace/tag is a valid Clojure
keyword, already has reader and printer support.

I haven't checked recently (I have vague memories of having looked this up ages ago but don't remember the conclusion) Are all URIs (IRIs would be even cooler) readable as namespaces in a keyword?

Right now it's quite an exercise: you have to create a namespace with an unreadable symbol just to be able to create the alias (funny that you recently asked for being able to create aliases without namespaces :-))

Christophe


--
On Clojure http://clj-me.cgrand.net/
Clojure Programming http://clojurebook.com
Training, Consulting & Contracting http://lambdanext.eu/

Herwig Hochleitner

unread,
Apr 2, 2014, 9:07:17 AM4/2/14
to cloju...@googlegroups.com
2014-04-02 9:45 GMT+02:00 Christophe Grand <chris...@cgrand.net>:

On Tue, Apr 1, 2014 at 5:27 PM, Chouser <cho...@n01se.net> wrote:
First, I very much like the idea of using the fully qualified URI in
tag names (and sometimes in attributes as well, but I'll come to that
in a sec). One key benefit is that a fully qualified node can be
lifted out of one XML snippet and dropped into another without
becoming undefined or incorrect. Another way of saying this is that
any two elements or trees of elements in Clojure would give the same
answer to Clojure's = as xpath's deep-equal function. The deep-equal
function (http://www.w3.org/TR/2005/CR-xpath-functions-20051103/#func-deep-equal)
is the most widely used sense of XML equality, and I think matching
that is a valuable property.

I totally second Chouser here and that's why the mapping between uris and attributes was put in metadat in http://dev.clojure.org/display/DXML/Fuller+XML+support (please see my recent comment as I recently tried to implement the proposal and hit a limiattion)

Yes, supporting proper equality within XML is the main motivation for the work I'm doing.

Could you detail on why you thought the namespace info needs to live in metadata?
Also I'm not sure I understand the limitation you ran into, from reading the page, but I have the impression that most issues previously discussed go away with the two tier model I implemented.
Note that this is decidedly not the proposal from the old thread about representing plain names with one type and namespaced names with another. Both tiers can represent both kinds of names.

Concerning the case `multiple prefix <-> same uri`, my implementation uses the *outermost* prefix for an given uri, when emitting a *resolved* name. Other modes could be supported. Can you tell me of a use case, where one would want to emit a resolved name and still control the prefix on a per-name basis?

For reference, here is my current implementation of a namespace context to be used in walkers outside of the parser/emitter, just a bijection:  https://github.com/bendlas/data.xml/blob/a0ba58aa253f5423f1b9260223acb42b70101534/src/main/clojure/clojure/data/xml/impl.clj#L27
  
Attributes are even more special: unlike elements, non-prefixed attributes do not resolve to the default ns or even to their elemeent ns: they should stay unqualified.

Thanks for mentioning that, I had overlooked that tidbit. The impact for my implementation is limited: The lowlevel representation stays unchanged (as it should be), I just need to adjust the translation between the two tiers. 
 
Now for the element and attribute names themselves--why not continue to
use keywords? :http://foo.bar/namespace/tag is a valid Clojure
keyword, already has reader and printer support.

I haven't checked recently (I have vague memories of having looked this up ages ago but don't remember the conclusion) Are all URIs (IRIs would be even cooler) readable as namespaces in a keyword?

Right now it's quite an exercise: you have to create a namespace with an unreadable symbol just to be able to create the alias (funny that you recently asked for being able to create aliases without namespaces :-))

It's even worse: if the qualified name http://my-ns/foo/name will be represented as (keyword "http://my-ns/foo" "name"), then what is the qualified name DAV:propfind going to be?
(keyword "DAV" "propfind")? (keyword "DAV:" "propfind")?

If there is just one thing that I can add to previous conversations, it's this: Don't represent uri-qualified names as keywords. It doesn't fit.
OTOH, as mentioned before, representing a prefixed name as a keyword, the way data.xml does currently, fits well.

kind regards

Christophe Grand

unread,
Apr 2, 2014, 10:41:25 AM4/2/14
to clojure-dev
On Wed, Apr 2, 2014 at 3:07 PM, Herwig Hochleitner <hhochl...@gmail.com> wrote:
Could you detail on why you thought the namespace info needs to live in metadata?

It's an expedite way to get them out of the equality scope :-) To me a namespace-aware tool should ignore aliases and xmlns attributes, focusing only on resolved names.

The two tiers of your model are : representation tier and model tier -- and only the model tier is (and should) be namespace aware. It should be made clear (through api more than through doc)  to users that the representation tier should be avoided since you can emit broken XML.

Concerning the case `multiple prefix <-> same uri`, my implementation uses the *outermost* prefix for an given uri, when emitting a *resolved* name. Other modes could be supported. Can you tell me of a use case, where one would want to emit a resolved name and still control the prefix on a per-name basis?

Broken consumers working at the representation level :-) The "Fuller XML Support" proposal tried hard (too hard?) to get by using only default data structures. However something like QName solves most (all?) issues I had (the alias is part of the name but not used for equality).
Additionally a reader tag may be introduced.


It's even worse: if the qualified name http://my-ns/foo/name will be represented as (keyword "http://my-ns/foo" "name"), then what is the qualified name DAV:propfind going to be?
(keyword "DAV" "propfind")? (keyword "DAV:" "propfind")?

First thing, a:name with a mapped to http://my-ns/foo DOES NOT resolve to http://my-ns/foo/nameit does resolve to the pair [http://my-ns/foo name]. But using / to separate namespaces from names adds more confusion.

=> :http://example.org/?stupid#test/name
:http://example.org/?stupid#test/name
=> [(namespace *1) (name *1)]
["http://example.org/?stupid#test" "name"]

DAV:propfind would just be :DAV/propfind -- no ambiguity: DAV has no scheme hence it's not a URI. (I'm still uneasy about shoehorning URIs in namespaces)

I really think a tagged literal with a custom (implementing Named) or existing (QName) type which embeds the alias without using it for equality may be a sensible middle ground.

Chouser

unread,
Apr 3, 2014, 1:30:42 AM4/3/14
to cloju...@googlegroups.com
On Tue, Apr 1, 2014 at 5:13 PM, Herwig Hochleitner
<hhochl...@gmail.com> wrote:
>
> I completely agree with you on this, and I envision almost all use cases
> that involve parsing to use name resolution.
> I've been thinking about xml tree equality, and the main issue I see with it
> are different or unused xmlns attributes. Do you know how xpath's
> deep-equals handles those?

Unused xmlns attributes do not effect equality (just like metadata).
In fact I don't think any xmlns attributes effect equality, except to
the extent that they're used to set the namespaces of other things.

> The decision to introduce two different kinds of XML was deliberate, the
> reason is twofold:
>
> 1) XML applications actually deal with two kinds of XML: The serialized
> stream with prefixes and the tree of resolved elements.
> I agree that for regular use cases, the serialization should be abstracted
> away.
> On the other hand it's not unheard of to need to produce or consume
> "other" kinds of xml.
> e.g. I've encountered wsdls where prefixes were bound to the wrong uri, or
> one might need to assign specific prefixes for legacy consumers.
> I felt it was more natural to deal with such issues at the serialization
> level

I hear regexes are great for parsing malformed xml. I kid--I do see
your point. I'm still concerned about the loss of composability when
some xml functions take or return one kind of xml and others another,
and the potential for confusion when these are sometimes but not
always compatible.

> 2) Emitting straight prefixes for keyword names is current behavior, if we
> were to store ns-uris in keyword namespaces, it would be a breaking change.
>
> The decision to allow emitting mixed forms was made because as detailed
> earlier, producers will mostly deal with the set of imported ns-uris at the
> root element, so the user should be concious about the fact that if they
> emit straight prefixes, they better be defined.

I think it's pretty common for a function to return an xml element or
fragment that is not at the root of the final document, and we should
strive to make it convenient for such code to be explicit about
namespaces rather than encourage it to rely on aliases provided in
some other lexical scope.

I'm not sure if I'm disagreeing with you on that point or not. :-)
I agree there is some awkwardness trying to use Clojure keywords here,
and I hope we can find some solution that is less. But to reiterate,
one key benefit is how aliases would be resolved.

It would be unfortunate for every literal element name in Clojure
source to require a full uri. It would also be unfortunate for any
alias or prefix that's used instead to be resolved in some kind of
reverse dynamic scope, depending on the xmlns maps set up in some xml
document where the literal at hand will eventually be embedded. This
leaves, I think, (1) keywords using standard Clojure ns aliases, (2)
wrapping every instance of a literal element name in a macro that
expands custom aliases, or (3) a tagged literal that is picking up on
some kind of alias-map state somewhere if such is possible.

> I'd like to answer the point about metadata on the element object instead of
> directly representing resolved names with an echo of your own sentiment "...
> to say ::foo/tag *anywhere* instead of having to wrap uses ...": In this
> context it means: From the point that an xml tree is resolved onwards, it
> makes sense to be able to just take an element tag and use it in a new
> context, without having to carry over metadata.

Ah, I agree. I think when building xml trees (either with literals or
while parsing) the prefix map should be propagated inward so that
every node has a full set of the prefixes that were in scope when it
was created.

> Ad with-xmlns: As demonstrated above, denoting a namespaced xml-tree doesn't
> need to use with-xmlns
>
> with-xmlns is meant for generate arbitrary clojure code with embedded
> QNames.
> Think enlive's tag= function et.al. Wrapping such code in (with-xmlns {"D"
> "DAV:"} ...) at toplevel makes the meaning very explicit.
>
> In summary, I think namespaced keywords should continue to be accepted as
> prefixed xml-names by data.xml, and resolved names should get a new datatype
> with a reader tag.
> Prefixes should be controlled by setting regular xmlns attributes into the
> tree and and the user should use :prefix/names outside of a lexically
> enclosing :xmlns* at his own peril.

This is a great peril and I hope we can find a way to make it very
easy to avoid.

Chouser

unread,
Apr 3, 2014, 1:41:59 AM4/3/14
to cloju...@googlegroups.com
On Wed, Apr 2, 2014 at 3:45 AM, Christophe Grand <chris...@cgrand.net> wrote:
> I haven't checked recently (I have vague memories of having looked this up
> ages ago but don't remember the conclusion) Are all URIs (IRIs would be even
> cooler) readable as namespaces in a keyword?

When you put it that way, it does seem unlikely, doesn't it.

> Right now it's quite an exercise: you have to create a namespace with an
> unreadable symbol just to be able to create the alias (funny that you
> recently asked for being able to create aliases without namespaces :-))

Heh, you caught me. This is exactly what I was thinking about when I asked that.

In fact, if that ends up working out, then under most circumstances,
the full URI or IRI wouldn't need to be readable. The alias could be
set up using a string or some such for the URI, and then the rest of
the code could use a readable alias.

So even if we don't use keywords for element names, we can still use
the Clojure's regular alias mechanism. A tagged literal reader has
access to *ns* and it's alias map. This would allow something like:

(alias 'myns "http://long-url-to-my-ns") ;; hypothetical future alias fn

Then later in the same namespaces,

#xml/name html/h1

Could be read as {:uri "http://long-url-to-my-ns" :prefix "html" :name
"h1"} or whatever canonical representation we end up with.

Ryan Senior

unread,
Apr 3, 2014, 12:19:27 PM4/3/14
to cloju...@googlegroups.com
I'm getting a little confused, though I'm no namespaces expert. It seems like there are two things we are talking about here. A format that will need to handle parsing xml with crazy namespacing and having some syntax sugar around writing xml in clojure code. This syntax sugar won't work for parsed documents right? XML like the one below highlight makes some global map of namespaces not possible, right?

<root xmlns:f="http://foo1">
  <f:outer>
    <f:inner xmlns:f="http://foo2">
      <f:inner-inner>
        <f:some-val xmlns:f="http://foo3">10</f:some-val>
      </f:inner-inner>
    </f:inner>
  </f:outer>
</root>

So we would have to have something that can handle the above, even if we have some sugar around writing xml with reasonable namespaces in clojure, or am I missing something?
 

--Chouser

Herwig Hochleitner

unread,
Apr 3, 2014, 4:27:38 PM4/3/14
to cloju...@googlegroups.com
Hi Everybody,

I've just pushed to https://github.com/bendlas/data.xml after having made some progress. Changes include:

- walk-resolve-names now has an option to remove xmlns attributes
  it then attaches the full ns-context (not just xmlns attributes from the node) into metadata.
  This information is not yet used by the emitter, but with it, it will be possible to add unresolved names into a resolved, lifted fragment and have the prefixes resolved correctly.
- namespace contexts now keep track of every prefix that is pointing to a URI in the order that it was added.
  Removing a prefix (xmlns:P="") restores an alternate prefix for the uri.
- Unprefixed attributes are now in the empty namespace as per standard
- There is an #xml/name{:tag :name :prefix} reader tag, which maps to QName + corresponding print-dup code.
- walkers refactored, it's now easy to write a namespace-aware, lazy tree walker
- experimental namespace aware xml-zip

Further plans:

- Update walk-emit-prefixes to
  - reintroduce xmlns bindings from changes in the ns-context metadata, but bind every ns-uri to at most one prefix at a time
  - resolve unresolved names in the original context and emit them with their possibly new prefix <-- this is important and solves some composability issues
- When walk-resolve-names is stable and agreed upon, integrate it's behavior into parse with a :resolve option, that can be :full or :keep-xmlns
- When walk-emit-prefixes is stable and agreed upon, integrate it's behavior into the emitter; not sure if this should do the auto-resolving thing.
  Currently, the emitter requires that xmlns attributes be present and it ignores metadata, which suffices for roundtripping through :resolve :keep-xmlns

Replies:

2014-04-02 16:41 GMT+02:00 Christophe Grand <chris...@cgrand.net>:
It's an expedite way to get them out of the equality scope :-) To me a namespace-aware tool should ignore aliases and xmlns attributes, focusing only on resolved names.

The two tiers of your model are : representation tier and model tier -- and only the model tier is (and should) be namespace aware. It should be made clear (through api more than through doc)  to users that the representation tier should be avoided since you can emit broken XML.
 
2014-04-03 7:30 GMT+02:00 Chouser <cho...@n01se.net>:
Unused xmlns attributes do not effect equality (just like metadata).
In fact I don't think any xmlns attributes effect equality, except to
the extent that they're used to set the namespaces of other things.

Christophe, you are talking about the XPath model here, wich specifically ignores xmlns declarations. The XML Infoset (kind of the uber spec) counts xmlns attributes as regular attributeinfo nodes. (Yes, I've read up a bit, thanks chouser for giving me the push).

I agree, that the XPath model is what we want to work with in most cases of namespaced xml, currently implemented by walk-resolve-names with xmlns attribute removal enabled.

2014-04-02 16:41 GMT+02:00 Christophe Grand <chris...@cgrand.net>:
Broken consumers working at the representation level :-) The "Fuller XML Support" proposal tried hard (too hard?) to get by using only default data structures. However something like QName solves most (all?) issues I had (the alias is part of the name but not used for equality).
Additionally a reader tag may be introduced.

I think another part is solved by a protocol XmlNamespace, which unifies javax.xml.namespace.NamespaceContext and a custom deftype used in tree walkers.
 
First thing, a:name with a mapped to http://my-ns/foo DOES NOT resolve to http://my-ns/foo/nameit does resolve to the pair [http://my-ns/foo name]. But using / to separate namespaces from names adds more confusion.

=> :http://example.org/?stupid#test/name
:http://example.org/?stupid#test/name
=> [(namespace *1) (name *1)]
["http://example.org/?stupid#test" "name"]

If only it were just confusion about the separator, clojure's reader splits at the first /


DAV:propfind would just be :DAV/propfind -- no ambiguity: DAV has no scheme hence it's not a URI. (I'm still uneasy about shoehorning URIs in namespaces)

I was talking about the namespaced/resolved name {DAV:}propfind, not the prefixed name DAV:propfind, sorry, my fault!

I really think a tagged literal with a custom (implementing Named) or existing (QName) type which embeds the alias without using it for equality may be a sensible middle ground. 

That's what I went for, see release notes above. I'd be glad about feedback, which syntax besides #xml/name{:uri :name :prefix} should be accepted.
#xml/name "uri#name" ? what about symbols? 

 2014-04-03 7:30 GMT+02:00 Chouser <cho...@n01se.net>:
I'm still concerned about the loss of composability when
some xml functions take or return one kind of xml and others another,
and the potential for confusion when these are sometimes but not
always compatible.

The work on fragments to retain their full namespace-environment in metadata should help with that, I think.
 
I think it's pretty common for a function to return an xml element or
fragment that is not at the root of the final document, and we should
strive to make it convenient for such code to be explicit about
namespaces rather than encourage it to rely on aliases provided in
some other lexical scope.

I'm not sure if I'm disagreeing with you on that point or not. :-)

Technically, clojure namespaces + vars, hence aliases aren't lexically scoped, so being fully explicit about prefixes requires a construct like with-xmlns or always putting those xmlns:* attributes in there.
I do see you point, that, if we were to have some notion of global xml prefixes for our program, it should tie in to clojure's namespacing facilities. More on that below.

It would be unfortunate for every literal element name in Clojure
source to require a full uri. It would also be unfortunate for any
alias or prefix that's used instead to be resolved in some kind of
reverse dynamic scope, depending on the xmlns maps set up in some xml
document where the literal at hand will eventually be embedded.

The way I originally envisioned this, would be that the user would run walk-resolve-names at namespace context boundaries. I agree that this could be made easier, hence my plans to use the original namespace-context to resolve prefixes, thus there would be no need to run walk-resolve-names and everything would be pretty much DWIM.
e.g.
Fragment1 comes from a context with xmlns:D="NSA:", user embeds [:D/tag ..] somewhere inside it,
Fragment2 comes from a context with xmlns:D="NSB:", user embeds [:D/tag ..] somewhere inside it
Fragment 1 and 2 are embedded into a context with xmlns:E="NSA:" and xmlns:F="NSB:" and eventually emitted. First occurrence will be rewritten to E:tag, second to F:tag.
If Fragment 1 or 2 had lost its metadata in some transformation, the emitter could pick up on it and warn/error.
 
This leaves, I think, (1) keywords using standard Clojure ns aliases, (2)
wrapping every instance of a literal element name in a macro that
expands custom aliases, or (3) a tagged literal that is picking up on
some kind of alias-map state somewhere if such is possible.

I'd vote 3, make that alias-map namespace local and define aliases with (xml-prefix *ns* "D" "DAV:" ...) with *ns* being default.
This still doesn't help with plain keywords, unfortunately.
 
> Prefixes should be controlled by setting regular xmlns attributes into the
> tree and and the user should use :prefix/names outside of a lexically
> enclosing :xmlns* at his own peril.

This is a great peril and I hope we can find a way to make it very
easy to avoid.

Fully agree, I admit that till now, I focussed on two use cases:
- Resolving a full xml document with xmlns and then, selecting and bending the nodes to your will, aka enlive style 
- Instanciating XML tidbits within code (with-xmlns)

Come to think about it, there is quite a bit of middle ground, mainly generating xml with a mix of literal chunks and code, aka hiccup style.
This style is too fine-grained for declaring xmlns attributes everywhere and it's too coarse-grained for enclosing everything in a with-xmlns.

So even if we don't use keywords for element names, we can still use
the Clojure's regular alias mechanism. A tagged literal reader has
access to *ns* and it's alias map. This would allow something like:

  (alias 'myns "http://long-url-to-my-ns") ;; hypothetical future alias fn

Then later in the same namespaces,

   #xml/name html/h1

Could be read as {:uri "http://long-url-to-my-ns" :prefix "html" :name
"h1"} or whatever canonical representation we end up with.

I've got a proposal: Declare long program-global prefixes + mapping to xml-namespaces, alias those long prefixes to short ones.
Even better, the mapping to xml-namespace could be declared within the target namespace with the empty prefix:

(in-ns 'user.dav)
(xml-prefix "" "DAV:")

Then :user.dav/propfind would always emit as {DAV:}propfind
What do you think about that approach?

kind regards

Christophe Grand

unread,
Apr 4, 2014, 5:03:16 AM4/4/14
to clojure-dev
Hi Herwig,

On Thu, Apr 3, 2014 at 10:27 PM, Herwig Hochleitner <hhochl...@gmail.com> wrote:
Christophe, you are talking about the XPath model here, wich specifically ignores xmlns declarations. The XML Infoset (kind of the uber spec) counts xmlns attributes as regular attributeinfo nodes. (Yes, I've read up a bit, thanks chouser for giving me the push).

They are attributeinfo nodes too but they are segregated from regular attributes:

[attributes] An unordered set of attribute information items, one for each of the attributes (specified or defaulted from the DTD) of this element. Namespace declarations do not appear in this set. If the element has no attributes, this set has no members.
[namespace attributes] An unordered set of attribute information items, one for each of the namespace declarations (specified or defaulted from the DTD) of this element. Declarations of the form xmlns="" and xmlns:name="", which undeclare the default namespace and prefixes respectively, count as namespace declarations. Prefix undeclaration was added in Namespaces in XML 1.1. By definition, all namespace attributes (including those named xmlns, whose [prefix] property has no value) have a namespace URI of http://www.w3.org/2000/xmlns/. If the element has no namespace declarations, this set has no members.
 

Herwig Hochleitner

unread,
Apr 8, 2014, 6:44:20 AM4/8/14
to cloju...@googlegroups.com
I've laid out a proposal, that hopefully addresses most issues that came up so far: http://dev.clojure.org/display/DXML/Namespaced+XML
Please let me know what you think.

kind regards


--

Chouser

unread,
Apr 9, 2014, 8:56:27 AM4/9/14
to cloju...@googlegroups.com
On Tue, Apr 8, 2014 at 6:44 AM, Herwig Hochleitner
<hhochl...@gmail.com> wrote:
> I've laid out a proposal, that hopefully addresses most issues that came up
> so far: http://dev.clojure.org/display/DXML/Namespaced+XML
> Please let me know what you think.

Excellent work on the write up, thanks! Moreover, I like all of it.

A couple thoughts:

What do you think of using namespaced keywords in place of :in-scope
and :namespace-attrs? :clojure.data.xml/in-scope, for example. My
thinking here is that these are not going to be typed out users very
often, and others might have reason to add metadata that we wouldn't
want to collide.

Your proposal for pseudo-raw names is clever, and I think I mean that
in the good way. It's interesting how it leverages Clojure's reader
namespace aliases without requiring the Clojure alias to ever deal
with the full URI. Seems like it's the best idea yet proposed for how
refer to full URIs in prefixes to names in Clojure code.

Now, while your description of how to use aliases from other Clojure
namespaces help illustrate how the mechanism actually works, I think
we'd generally want to discourage this. Other aliases in Clojure are
private to the namespace declaring them, not part of their public API.
The full URI is the public interface for XML tags, and having each
namespace spend a line to declare its own alias would surely be worth
the decoupling that it buys.

I think your security concerns are interesting as well, and a wise
thing to consider early. So to make sure I understand, we're talking
about untrusted data that may have once been a string (like
user-entered form data) but which has now been edn/read and is Clojure
data, right? Then providing that as input to a resolving function
could return XML elements of various namespaces. But wouldn't such edn
data be able to specify any namespace it wants anyway?

One way to help plug any security hole that may exist there would be
to only do the alias resolution in the context of the #xml/name reader
macro, and not provide that to your edn reader when reading untrusted
data. When used this way, they could be called "literal resolved
names" or "tagged resolved names" or something, since they are fully
qualified resolved QName's by the time the reader is done with them.
They are much safer to use that the fully-raw names, so "pseudo-raw"
sounds perhaps scarier than necessary.

And thanks again for writing this up! I'm really looking forward to
this proper namespace support trickling up through all the Clojure
XML-related libraries.

--Chouser

Christophe Grand

unread,
Apr 9, 2014, 10:13:38 AM4/9/14
to clojure-dev
Hi Herwing,

Thanks for pulling clojure xml support out of the dar ages :-) I

On Wed, Apr 9, 2014 at 2:56 PM, Chouser <cho...@n01se.net> wrote:
What do you think of using namespaced keywords in place of :in-scope
and :namespace-attrs? :clojure.data.xml/in-scope, for example. My
thinking here is that these are not going to be typed out users very
often, and others might have reason to add metadata that we wouldn't
want to collide.

I second this argument.
 
Now, while your description of how to use aliases from other Clojure
namespaces help illustrate how the mechanism actually works, I think
we'd generally want to discourage this. Other aliases in Clojure are
private to the namespace declaring them, not part of their public API.
The full URI is the public interface for XML tags, and having each
namespace spend a line to declare its own alias would surely be worth
the decoupling that it buys.

What bothers me is thet the distinction between raw and pseudo-raws names is not static: you cant' know whether :x/foo is raw or pseudo-raw without looking up for the namespace and then the mapping in the ns. Plus raw names prefixes may collide with single-segment (clojure) namespaces.
 
Christophe

Herwig Hochleitner

unread,
Apr 10, 2014, 11:39:13 AM4/10/14
to cloju...@googlegroups.com
Thank you for the encouragement!

2014-04-09 14:56 GMT+02:00 Chouser <cho...@n01se.net>:

What do you think of using namespaced keywords in place of :in-scope
and :namespace-attrs? :clojure.data.xml/in-scope, for example. My
thinking here is that these are not going to be typed out users very
often, and others might have reason to add metadata that we wouldn't
want to collide.

2014-04-09 16:13 GMT+02:00 Christophe Grand <chris...@cgrand.net>:
I second this argument.

Yes, namespaced metadata keys are the way to go.

Now, while your description of how to use aliases from other Clojure
namespaces help illustrate how the mechanism actually works, I think
we'd generally want to discourage this. Other aliases in Clojure are
private to the namespace declaring them, not part of their public API.
The full URI is the public interface for XML tags, and having each
namespace spend a line to declare its own alias would surely be worth
the decoupling that it buys. 

I'm with you there.

I think your security concerns are interesting as well, and a wise
thing to consider early.

TBH, I'm more worried about leak-proof abstractions, I think security is a good way to communicate about it, because it conveys the gravity of the issue and makes it easier to imagine failure scenarios.
 
So to make sure I understand, we're talking
about untrusted data that may have once been a string (like
user-entered form data) but which has now been edn/read and is Clojure
data, right?

That + clojure data, read by data.xml/read (or a new read-raw function, if we break API). The worst we could do is to have emit-raw do some resolve-pseuo-prefixes + emit-appropriate-prefix magic. Such a thing would break simple raw-tier roundtrips for some prefixes and it will leak info about your namespace layout to an attacker.
 
Then providing that as input to a resolving function
could return XML elements of various namespaces. But wouldn't such edn
data be able to specify any namespace it wants anyway?

You're right and the same principle, of course, applies to data from data.xml/read.
The attack scenario becomes more realistic, if you assume some xml-processing policy enforcer in front of your consumer, which doesn't know that the tag of
<com.example.danger/operation xmlns:com.example.danger="com.example.harmless"> might actually mean {com.example.danger}operation instead of (com.example.harmless}operation to your application.
Now, if you say "a ha, but a bug-free data.xml would know to prefer the xmlns declaration to the global prefix map", I'd argue that if we allowed that kind of mixed usage (raw with pseudo-raw) at all, the global prefix map would have to have precedence, because otherwise somebody will change the meaning of your composed fragments by adding xmlns declarations.

The bottom line with respect to consistency and security: If we overload keyword namespaces to mean either raw prefixes referring to an xmlns decl or pseudo-raw prefixes referring into the live state of a clojure process, we must ensure that data, carrying different meanings, can never meet. 

One way to help plug any security hole that may exist there would be
to only do the alias resolution in the context of the #xml/name reader
macro, and not provide that to your edn reader when reading untrusted
data. When used this way, they could be called "literal resolved
names" or "tagged resolved names" or something, since they are fully
qualified resolved QName's by the time the reader is done with them.
They are much safer to use that the fully-raw names, so "pseudo-raw"
sounds perhaps scarier than necessary.

If a reader tag or explicit API transforms them to QNames, they are not pseudo-raw anymore. Pseudo-raw refers to when they are still keywords. I'm open minded about terminology, but the short name for this needs to be chatroom compatible, since that's what people will write (and the reader will read in the first step)

Naming issues aside, I'm gravitating towards the same solution here: Only allow them within a reader tag (and the with-xmlns macro), i.e. on data that can be assumed to be pre-approved (i.e. debugged) by the programmer. That would also help with =.

`#xml/name ::dav/propfind` doesn't buy us much over `#xml/name {:uri dav-uri :name "propfind"}`, so I'm thinking about another reader tag like #xml/literal (called it #xml/resolve in the proposal), that can be applied to xml trees (maybe arbitrary data structures, like with-xmlns).

2014-04-09 16:13 GMT+02:00 Christophe Grand <chris...@cgrand.net>:

What bothers me is thet the distinction between raw and pseudo-raws names is not static: you cant' know whether :x/foo is raw or pseudo-raw without looking up for the namespace and then the mapping in the ns. Plus raw names prefixes may collide with single-segment (clojure) namespaces.

That's my concern aswell, I think limiting the use of pseudo-raw names to tagged literals (and macros) might solve that. That would mean striking the respective "embedding pseudo-raw within resolved fragments passage.

To a similar issue: How do you feel about explicit vs implicit tier transitions? Should mixage of resolved and raw content be forbidden aswell?

What do you think?

thanks,

Herwig

Joshua Griffith

unread,
Sep 17, 2015, 11:30:21 AM9/17/15
to Clojure Dev
Thank you for this patch. What's the status of this issue? Is there anything else required to get this merged?

Thanks,

Joshua Griffith

Ryan Senior

unread,
Oct 7, 2015, 9:57:19 AM10/7/15
to cloju...@googlegroups.com
Herwig sent an email to the clojure-users list looking for some help testing it out. If you've got some real use cases for it, would be good to see how they work with the new code. I was planning to take a look next week.

My guess is we'll be ready to get it merged in pretty soon, 

-Ryan

--
Reply all
Reply to author
Forward
0 new messages