What is the status of data.xml, and how can I help?
Much of the effort so far seems to have been put into parsing XML, I
spend the last couple of days and a couple of days a while back
implementing the generation of XML, especially with support for
namespaces[1].
This work currently lives under https://github.com/pepijndevos/ArmageDOM
but I'd be happy to contribute it to core.xml instead, or work with
what's there, depending on the state of the current code[2].
One philosophical difference I should point out is that core.xml seems
to use [tag attr content] for syntax, while I use ^{attr} [tag
content], since attributes are in fact exactly that, meta data.
Pepijn
[1]: https://github.com/clojure/data.xml/issues/2
[2]: at frist glance, mine doesn't have 20-line functions.
Anyway, data.xml is owned by Chouser, with some help from me. Right
now what it really needs is an official release so that it can get
some field testing. And what's holding that up is that one test, the
indentation test, is failing. This is because, apparently, Java has
pluggable XML handlers, which all have different ways to configure
things like whitespace handling. The tests work for me and Chouser,
but the indentation test fails on the CI machine, probably because it
has a different indenter. So if someone could figure out a reliable
way to detect what handler we're using and how to set up indentation,
a release would magically make its way to some maven repo.
data.xml certainly attempts to be namespace-aware, but I don't think
there were any tests for that yet. I added a test just now, at
https://github.com/clojure/data.xml/tree/namespaces, but it fails. I
suspect that's because my test is incorrect, though: I don't know a
lot about namespaces. You could probably also help by getting that
straightened out.
On Nov 23, 8:12 am, "pepijn (aka fliebel)" <pepijnde...@gmail.com>
wrote:
> Hi,
>
> What is the status of data.xml, and how can I help?
>
> Much of the effort so far seems to have been put into parsing XML, I
> spend the last couple of days and a couple of days a while back
> implementing the generation of XML, especially with support for
> namespaces[1].
>
> This work currently lives underhttps://github.com/pepijndevos/ArmageDOM
data.xml certainly attempts to be namespace-aware, but I don't think
there were any tests for that yet. I added a test just now, at
https://github.com/clojure/data.xml/tree/namespaces, but it fails. I
suspect that's because my test is incorrect, though: I don't know a
lot about namespaces. You could probably also help by getting that
straightened out.
The problem with namespaces is that most (all) clojure xml libs are oblivious to the semantics of xmlns and xmlns:* attributes.
In your test, :api/method, api is an alias for http://some.uri.com/location. Aliases are serialization artifacts and, as such, I think they should not be part of the Clojure XML model.
It also follows that a node is not context-free anymore: you need to know its parents to resolve prefixes.
One way to fix this issue would be, in the presence of namespaces, to qualify everything[1] with the full URI (keyword (munge uri) name) and add aliases mapping in metadata (which implies, I think, the removal of xmlns and xmlns:* attributes from the model)
^{:xmlns {"api" "http://some.uri.com/location"}} {:tag :body :content [{:tag :http://some.uri.com/location/method :attrs {:args "" :ret ""}}]}
Hence nodes would be equals even if parsed from serializations using different aliases and a node could be serialized correctly (by gensyming aliases) even if its context has been lost (eg a xml node copied from one doc to another).
What do you think?
While I agree that what you propose is the right way to do it, I would also like simple things to remain simple.As an example, I don't want to be forced to change simple code like this:(if (= (:tag e) :a) ...)to something like:(if (= (:tag e) :http://www.w3.org/1999/xhtml/a) ...)In many cases - especially where there is only one namespace, with no prefix, and it's on the root element - it's nice to be able to simply ignore namespaces altogether.I'm not sure what API I would want to support this, but it may be as simple as a flag to xml/parse to flip it back to the current, blissfully namespace-unaware, behavior.
(0) This representation will break round-trip-ability for some XML
vocabularies.
The intention of declared namespace prefixes seems to have been as a
lexically scoped shortcut to keep element and attribute names compact
while using multiple namespaces. This scenario would otherwise require
a plethora of xmlns="..." declarations throughout the tree, or as one
early proposal had it: "{" some-ns-uri "}" local-name.
This implies that namespace prefixes are arbitrary and need not be
preserved by an XML parser. Round-tripping is preserved by simply
inserting synthesized xmlns declarations as necessary when writing the
DOM tree back out again.
Unfortunately, the "inspired" designers of XSD and XSLT chose to use
these namespace prefixes not just in names, but in *content*.
(Attribute values, specifically). Now suddenly, parsers must preserve
the particular prefixes chosen by the document author for fear of
otherwise breaking the document because those prefixes might be used
in content that is opaque to an general XML parser.
This presents another problem because while the prefix must be
preserved for the sake of sick vocabularies like XSD and XSLT it
shouldn't participate in determining equality of qualified names. So,
a Clojure representation of qualified names that preserves NS URI,
local name, and prefix would either need define its own
qualified-xml-names-are-equal? predicate and use it in place of =
where required, or it would have to store the *prefix* as meta-data
attached to an object representing the tuple (namespace-url,
local-name). And be careful not to lose it during transformations and
such.
(2) It doesn't seem advisable to mash the NS URI and the local name
together in this fashion as it's common to want to get at them
separately. Also, this prevents us from storing the NS URI only once
on the heap and reusing that form every element instead of duplicating
the potentially length NS URI as part of every element name.
This is what you're already doing with element names. Why are
attribute names being handled differently? Is it because attribute
names are generally not namespace qualified? (The fact that they are
scoped lexically to the element within which they occur generally
makes that unnecessary, I guess.
(3) At first glace :http://other.uri.com/security/role looks like it
could be potentially ambiguous. Local name must contain no slashes
(this is true [1]). The ns URI should not end in a slash (this is not
promised by RFC3986 3.3 [2]).
[1] http://www.w3.org/TR/2008/REC-xml-20081126/#NT-Name
[2] http://www.rfc-editor.org/rfc/rfc3986.txt
(p.s. I've been searching for a tool that lets me do non-lossy
editing/transformation of XML in Clojure; I've even sketched out ideas
for a few approaches -- but nothing has really gelled yet. One factor
is that the current conventional XML representation used established
by Clojure core sin't very smart about namespaces. p.p.s. XML makes
me grumpy; Sorry.)
// Ben
> Feedback welcome.
>
> Christophe
>
> --
> You received this message because you are subscribed to the Google Groups
> "Clojure Dev" group.
> To post to this group, send email to cloju...@googlegroups.com.
> To unsubscribe from this group, send email to
> clojure-dev...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/clojure-dev?hl=en.
>
(0) This representation will break round-trip-ability for some XML> with namespaces into:
>
> ^{:xmlns {"api" "http://some.uri.com/location"
> "s" "http://other.uri.com/security"}}
> {:tag :body
> :content [{:tag :method
> :ns "http://some.uri.com/location"
> :attrs {:args "" :ret "int"
> :http://other.uri.com/security/role "*"}}]}
vocabularies.
Unfortunately, the "inspired" designers of XSD and XSLT chose to use
these namespace prefixes not just in names, but in *content*.
(Attribute values, specifically).
(2) It doesn't seem advisable to mash the NS URI and the local name
together in this fashion as it's common to want to get at them
separately. Also, this prevents us from storing the NS URI only once
on the heap and reusing that form every element instead of duplicating
the potentially length NS URI as part of every element name.
This is what you're already doing with element names. Why are
attribute names being handled differently? Is it because attribute
names are generally not namespace qualified?
(3) At first glace :http://other.uri.com/security/role looks like it
could be potentially ambiguous. Local name must contain no slashes
(this is true [1]). The ns URI should not end in a slash (this is not
promised by RFC3986 3.3 [2]).
[1] http://www.w3.org/TR/2008/REC-xml-20081126/#NT-Name
[2] http://www.rfc-editor.org/rfc/rfc3986.txt
--
You received this message because you are subscribed to the Google Groups "Clojure Dev" group.
To post to this group, send email to cloju...@googlegroups.com.
To unsubscribe from this group, send email to clojure-dev...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/clojure-dev?hl=en.
This is a fantastic discussion. Keep it up! I'd love to see data.xml
have solid support for namespaces, comments, processing instructions,
etc. Several of you clearly understand the issues involved here
better than I do. It sounds like you're close to a plan, so don't
stop now.
I just want to clarify a non-technical point: I have no particular
interest in being the owner (whatever that means) of data.xml. When I
wrote the old contrib lib it was because I needed it. I'm not in
particular need of xml processing these days, which unfortunately
comes out in my poor stewardship of the lib. Both Allan Malloy and
Ryan Senior have dome more recently to add features and push data.xml
toward a release.
Please don't let me be any kind of impediment to getting the design
and implementation you need, and pushing out releases.
Let me know how I can help, as needs arise.
--Chouser
1) I agree that xmlns metadata is needed on every element. But it would also be nice to be able to do exact, textual round-tripping of XML, which means knowing on which elements the namespaces were actually declared in the original.
2) Should the xmlns metadata should have the keys and values the other way around, mapping uri -> prefix? That's what is needed for writing out the XML, anyway.
3) I sometimes need to keep comments and processing instructions, and possibly cdata sections too, when I work with XML. Any opinions on supporting these? (maybe too much for now? get the basics first?)
Well earlier in this thread I argued in favor of keeping all attributes keys as keywords for uniformity sake. But uniformity doesn't buy us anything: you can't destructure :http://other.uri.com/security/role with :keys and nobody in is right mind is going to type such a keyword. The only thing that uniformity gives us is access to the local name through #'name (namespace access requires unmunging -- unless one decides to roll out a new type implementing c.l.Named but I don't think it's a good idea for a broadly used schema to use custom datatypes). It's better/simpler to have some xml/local-name and xml/namespace functions.
All that to say I have changed my mind and ^{:prefix prefix} [localName nsUri] seems ok (I like that localName comes first) albeit at first I would have discarded the prefix metadata.
However I still think that non-namespaced attributes should still be keywords and that [localName], [localName nil] or [localName ""] should not be legal representations of such attributes names. (For implementors, Postel's Law may apply but better not to mandate it at the schema level).
On Fri, Nov 25, 2011 at 3:07 PM, Chris Perkins <chrispe...@gmail.com> wrote:1) I agree that xmlns metadata is needed on every element. But it would also be nice to be able to do exact, textual round-tripping of XML, which means knowing on which elements the namespaces were actually declared in the original.
Exact round-tripping is a harsh requirement. How exact do you want to be? Up to the entities used? Attributes order?
However your point regarding elementtree is important and maintaining original ns aliases (and the place where thay are declared) as far as possible should be a goal.
2) Should the xmlns metadata should have the keys and values the other way around, mapping uri -> prefix? That's what is needed for writing out the XML, anyway.
It should, definitely -- I got it backwards.
3) I sometimes need to keep comments and processing instructions, and possibly cdata sections too, when I work with XML. Any opinions on supporting these? (maybe too much for now? get the basics first?)
I already have support of comments in Enlive: it's a necessity in HTML between javascript in comments and IE conditional comments. It's simply {:type :comment :data "xxx"}.
--
You received this message because you are subscribed to the Google Groups "Clojure Dev" group.
2) Should the xmlns metadata should have the keys and values the other way around, mapping uri -> prefix? That's what is needed for writing out the XML, anyway.
It should, definitely -- I got it backwards.Is it legal to have the same uri referred to by different aliases? If so, the prefix -> uri mapping may be a better fit?
- Chris
--
You received this message because you are subscribed to the Google Groups "Clojure Dev" group.
To view this discussion on the web visit https://groups.google.com/d/msg/clojure-dev/-/st5pqJIglvEJ.
In those cases though presumably the assembled XML will be sent to
some other system and it's unlikely that the other system would
appreciate different prefixes for the same URI.
--
Cosmin Stejerean
http://offbytwo.com
It's not clear to me what sort of munging and unmunging would be necessary. Isn't (keyword "anything-at-all" "localName") legal clojure? Why do you need to munge anything?
Regardless, I agree that using a vector of [localName uri] would probably be more convenient.All that to say I have changed my mind and ^{:prefix prefix} [localName nsUri] seems ok (I like that localName comes first) albeit at first I would have discarded the prefix metadata.I don't think you need the prefix metadata because that information is in the element's metadata (assuming you don't need to be able to write out or otherwise process an attribute independent of its containing element).
However I still think that non-namespaced attributes should still be keywords and that [localNme], [localName nil] or [localName ""] should not be legal representations of such attributes names. (For implementors, Postel's Law may apply but better not to mandate it at the schema level).
I don't quite follow what you're saying here. Postel's Law is "be liberal in what you accept...", so why should [localName], [localName nil], and [localName ""] be illegal?
Or do you just mean that a parser should be required not to emit those forms for unprefixed attributes, but that other code (eg: code to write out XML) should accept them?
Yes, you can have the same uri referred to by different prefixes:
<root xmlns="http://foo" xmlns:foo="http://foo" xmlns:foo2="http://foo">
<element/>
<foo:element/>
<foo2:element/>
</root>
Is perfectly legal. All three "element" elements belong to the
"http://foo" namespace (as does the "root" element).
I don't believe that's accurate.
<element xmlns:foo="http://foo" a="" foo:a=""/>
- element belongs to no namespace
- attribute a belongs to no namespace (but is contained in element)
- attribute foo:a belongs to namespace "http://foo" and has the local name "a"
<element xmlns="http://foo" a=""/>
- element belongs to namespace "http://foo"
- attribute a belongs to no namespace.
(if you require an attribute belong to a particular namespace, the only way
to accomplish this is to declare a prefix for that namespace and use this
prefix.)
At least that's how I have always understood section 6.2
http://www.w3.org/TR/REC-xml-names/#defaulting:
# A default namespace declaration applies to all unprefixed element names
# within its scope. Default namespace declarations do not apply directly
# to attribute names; the interpretation of unprefixed attributes is
# determined by the element on which they appear."
See also 6.3: http://www.w3.org/TR/REC-xml-names/#uniqAttrs which gives
the following example:
# However, each of the following is legal, the second because the default
# namespace does not apply to attribute names:
#
# <!-- http://www.w3.org is bound to n1 and is the default -->
# <x xmlns:n1="http://www.w3.org"
# xmlns="http://www.w3.org" >
# <good a="1" b="2" />
# <good a="1" n1:a="2" />
# </x>
>>
>> However I still think that non-namespaced attributes should still be
>> keywords and that [localName], [localName nil] or [localName ""] should not
>> be legal representations of such attributes names. (For implementors,
>> Postel's Law may apply but better not to mandate it at the schema level).
>>
> I don't quite follow what you're saying here. Postel's Law is "be liberal
> in what you accept...", so why should [localName], [localName nil], and
> [localName ""] be illegal? Or do you just mean that a parser should be
> required not to emit those forms for unprefixed attributes, but that other
> code (eg: code to write out XML) should accept them?
>
> - Chris
>
> --
> You received this message because you are subscribed to the Google Groups
> "Clojure Dev" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/clojure-dev/-/FswmvaV7VKYJ.
The page seems not to be public right now.
--
You received this message because you are subscribed to the Google Groups "Clojure Dev" group.
To post to this group, send email to cloju...@googlegroups.com.
To unsubscribe from this group, send email to clojure-dev...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/clojure-dev?hl=en.
Indeed you need to be logged in :-/ I didn't finf where to change that on the page -- I think the whole DXML space is behind identification.
Can someone with more Confluence skills help?
--
You received this message because you are subscribed to the Google Groups "Clojure Dev" group.
To post to this group, send email to cloju...@googlegroups.com.
To unsubscribe from this group, send email to clojure-dev...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/clojure-dev?hl=en.