Thanks for this spec which is wanted for a long time
Here a few comments
1) There is no reference to ZIP ? Which implementation of ZIP are you
refering to ? Can we use UTF-8 in file name ?
2) Can we handle JAR/WAR/EAR ?
3) Can we handle Widget[1] ?
3) EXProc has already an unzip step [2]. How does it relate ?
4) Marklogic has already a ZIP library [3] ? How does it related ?
5) You refer to XQuery, XPath and XSLT please make explicit reference
6) You reference HTML Document ? probably you can add HTML5 algorithm
defined here [4] ?
7) Why aren't there any function to access to a zip:entry when you
already called zip:entries() ?
[1] http://www.w3.org/TR/2010/WD-widgets-20101005/
[2] http://exproc.org/proposed/steps/other.html#unzip
[3] http://developer.marklogic.com/pubs/4.0/apidocs/package.html
[4] http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#parsing
Regards,
Xmlizer
> --
> You received this message because you are subscribed to the Google Groups "EXPath" group.
> To post to this group, send email to exp...@googlegroups.com.
> To unsubscribe from this group, send email to expath+un...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/expath?hl=en.
>
>
Thanks for your comments, my replies below:
On Tue, Oct 12, 2010 at 2:56 PM, mozer <xml...@gmail.com> wrote:
> Hi Phil,
>
> Thanks for this spec which is wanted for a long time
>
> Here a few comments
>
> 1) There is no reference to ZIP ?
A1.1). Agree that one should be added, is[1] ok?
>Which implementation of ZIP are you refering to ?
A1.2) This is difficult because I believe there are many ZIP
implementations. I would recommend we reference a certain ZIP
specification as a minimum and then leave support for later versions
to be left to specific EXPath implementations, any suggestions?
Can we use UTF-8 in file name ?
A1.3) I'm assuming you're refering to the values used in the 'href'
attribute of the 'zip:file element' and the $href argument in the
entry extraction functions? If so, my view is that its probably best
that the character set supported (I don't think this is an encoding
issue) for the name is implementation defined, and even then, on
platform independent solutions (like Florent's), it may be different
across various operating systems.
> 2) Can we handle JAR/WAR/EAR ?
A2) Required support for various packaging formats/conventions like
the onces you mention should contribute towards the decision on the
minimum ZIP spec supported (see A1.2).
> 3) Can we handle Widget[1] ?
A3.1) My view is same as for [A3]
> 3) EXProc has already an unzip step [2]. How does it relate ?
A3.2) Probably, but I'm not sure how. Should this be in the spec as an
acknowledgement anyone?
> 4) Marklogic has already a ZIP library [3] ? How does it related ?
A4) I don't know, but Florent might.
> 5) You refer to XQuery, XPath and XSLT please make explicit reference
A5) Agreed. I'll add references.
> 6) You reference HTML Document ? probably you can add HTML5 algorithm
> defined here [4] ?
A6) You're referencing the HTML5 parsing algorithm section, so I'm
guessing this is related to the zip:html-entry() function (in Section
2.1) which is about parsing HTML so that it can be returned as an XML
document node. I can add the HTML algorithm you reference as an
aspiration, but so far as I can tell (from a quick glance), this
simply creates a DOM. Wouldn't we then still need to have a defined
way of mapping the DOM to an XML document node?
> 7) Why aren't there any function to access to a zip:entry when you
> already called zip:entries() ?
A7) I'm not quite sure what the question is.
(A7a) that you want the ability to insert a ZIP compressed file as a
ZIP entry within a ZIP file ?
- You can, by using a URI value locating the ZIP file for
the 'src' attribute in the zip:entry element (section 4.3)
or, (A7b) that you want to read an entry (or get details on the
structure) of a ZIP file that you've already created with the
zip:entries() call?
- In this case, it could be awkward as I don't think you
can guarantee the order of execution and you would have to start
relying on side-effects
If I've misunderstood the question, perhaps you could be more specific
on what you want the specification to allow you to do, with a small
use case?
>
>
> [1] http://www.w3.org/TR/2010/WD-widgets-20101005/
> [2] http://exproc.org/proposed/steps/other.html#unzip
> [3] http://developer.marklogic.com/pubs/4.0/apidocs/package.html
> [4] http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#parsing
>
[1] http://www.pkware.com/documents/casestudies/APPNOTE.TXT
Regards
Phil Fearon
http://qutoric.com
Phil,
> I'm pleased to announce the first draft for the ZIP Module
> specification is now available at:
Thank you so much for taking over this spec! It's been due for
a long time :-) A few random comments about the current draft:
1/ It is maybe a good time to rename zip:zip-file() as
zip:create-file() or something? I really hate renaming things,
but we are still in a early stage, and "zip-file()" is kind of
ambiguous. Does it create one zip file, read one, etc.?
2/ In zip-file() and update-zip() (aka the creation functions),
we should probably add a param $contents (like $bodies in the
HTTP Client). To be able to provide some entries' content as
individual items/nodes, without having to embed them in the
zip:file element.
3/ In the creation functions, should we add a param $href? In
order to be able to change the destination URI without having to
copy the whole zip:file element.
4/ What if in an entry does not exist i a ZIP file when calling
one of the functions zip:*-entry()? Probably return the empty
sequence. But their return type is not optional. So we should
probably change their return type (e.g. for zip:text-entry()) to
"xs:string?" instead of "xs:string", and say the empty sequence
is returned when the entry does not exist.
5/ Should we add a function zip:entry-exists()? (returning true
or false)
6/ And the function zip:xml-entry-available(), to be consistent
with fn:doc-available()? (not only the entry has to exist, but to
be "available", so well-formed, etc.)
7/ Besides the core functionalities, we should probably also
provide some helper functions (can be done in plain XPath, but
convenient to have them), for instance:
(: $entry is either a zip:dir or a zip:entry, returns the
path of $entry in its ancestor zip:file :)
zip:entry-path($entry as element()) as xs:string
(: return either a zip:dir or a zip:entry, a descendent of
$zip, corresponding to $path :)
zip:entry-descriptor($zip as element(zip:file),
$path as xs:string) as element()?
Thanks again for the draft, regards,
--
Florent Georges
http://fgeorges.org/
Hi,
> Thanks for your comments, my replies below:
Instead of responding separately to Mozer and to you, I'll
respond to both of you in the same email.
>> 1) There is no reference to ZIP ?
> A1.1). Agree that one should be added, is[1] ok?
It makes sense to me. That's also the one referred to by the
"Widget Packaging and Config" spec, as well as by ODF. Who
knows, we're maybe going to have an ISO ZIP spec one day? ;-)
>> Which implementation of ZIP are you referring to ?
What do you mean Mozer, by "referring to an implementation of
ZIP" in the context of this spec?
>> Can we use UTF-8 in file name ?
> A1.3) I'm assuming you're refering to the values used in the
> 'href' attribute of the 'zip:file element' and the $href
> argument
I might be wrong, but I think he refers instead to the name of
the entries in the ZIP file (zip:dir and zip:entry's @name).
>> 2) Can we handle JAR/WAR/EAR ?
>> 3) Can we handle Widget[1] ?
As those are ZIP files, yes, we should be able to read them.
At the ZIP layer level of course (nothing specific to JAR files
or widgets themselves). But this is both interesting use cases.
Validating that we can support all options in the JAR and Widget
specs is an interesting indicator (e.g. the signing stuff).
>> 3) EXProc has already an unzip step [2]. How does it relate ?
> A3.2) Probably, but I'm not sure how. Should this be in the
> spec as an acknowledgement anyone?
It does not relate at all. Thanks for the link Moz, I've never
seen this one before. Is it a new step? It seems to only return
either the manifest (if we can call like that the zip:file
element representing the structure of a ZIP file) or a specific
entry as XML or binary (so like zip:entries(), zip:xml-entry()
and zip:binary-entry()).
It also shows more information about each entry (like
timestamps and sizes), which would be interesting to add also in
the ZIP module.
Maybe we should coordinate with EXProc: tell them about the new
ZIP draft and see if they want to share the effort or to keep two
separate specs. I think that'd make sense for EXProc to refer to
the EXPath spec and say that some functions are actually provided
as steps instead, and just define the interface of those steps
and how they map to the corresponding function definition.
>> 4) Marklogic has already a ZIP library [3] ? How does it
>> related ?
> A4) I don't know, but Florent might.
Depends on your definition of "relate", Moz :-) Maybe more of
interest is [A] actually. Where xdmp:zip-manifest() is the
equivalent of zip:entries(), xdmp:zip-get() the equivalent of the
various zip:*-entry() functions, and xdmp:zip-create() the
equivalent of zip:zip-file().
A big difference is that they use (or resp. generate) a binary
item (a proprietary item type of MarkLogic) instead of reading
(or resp. creating) files identified by a URI. Which could make
sense in some cases/environments. Or not.
>> 6) You reference HTML Document ? probably you can add HTML5
>> algorithm defined here [4] ?
> A6) You're referencing the HTML5 parsing algorithm section, so
> I'm guessing this is related to the zip:html-entry() function
> (in Section 2.1) which is about parsing HTML so that it can be
> returned as an XML document node. I can add the HTML algorithm
> you reference as an aspiration
Yes, I think you are right (Phil). Personally I would be
reluctant to introduce a normative reference to that algorithm.
The initial references to Tag Soup and HTML Tidy come from the
XProc spec, which does the same thing for the p:http-request
step.
We can probably add a reference to the HTML algorithm as one of
the possible way to do it (forbidding the evaluation of any
script; <script>document.write('<p>');</script> MUST be returned
as one element with one text node: "document.write('<p>');").
>> 7) Why aren't there any function to access to a zip:entry when
>> you already called zip:entries() ?
> A7) I'm not quite sure what the question is.
I guess the question is: "when you already read a ZIP file
using zip:entries(), why do I have to read again the file by
providing its URI again and an entry path instead of just
providing a zip:entry", or something like that.
Two points here. First, while this is true that you have to
access the ZIP file twice if you call zip:entries() then, say,
zip:text-entry(), reusing the zip:entry element wouldn't change
that; and there is no overhead as the ZIP file is not read in is
entirety by zip:entries(), only the relevant parts to generate
the manifest.
Second, from a user point of view, it would probably make sense
to provide an overload of the zip:*-entry() functions to accept a
zip:entry element instead of both $href and $path:
(: $entry must be a descendant of a zip:file returned by
zip:entries() :)
zip:text-entry($entry as element(zip:entry)) as xs:string
>> [1] http://www.w3.org/TR/2010/WD-widgets-20101005/
>> [2] http://exproc.org/proposed/steps/other.html#unzip
>> [3] http://developer.marklogic.com/pubs/4.0/apidocs/package.html
>> [4] http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#parsing
> [1] http://www.pkware.com/documents/casestudies/APPNOTE.TXT
Thanks for your comments and responses!
Regards,
--
Florent Georges
http://fgeorges.org/
[A]http://developer.marklogic.com/pubs/4.2/apidocs/Document-Conversion.html
This would be my main comment too. It's not nice to have to save
documents to the filesystem/database in order to create a zip file of them.
John
--
John Snelson, Senior Engineer http://twitter.com/jpcs
MarkLogic Corporation http://www.marklogic.com
Hi,
>> 2/ In zip-file() and update-zip() (aka the creation
>> functions), we should probably add a param $contents (like
>> $bodies in the HTTP Client). To be able to provide some
>> entries' content as individual items/nodes, without having to
>> embed them in the zip:file element.
> This would be my main comment too. It's not nice to have to
> save documents to the filesystem/database in order to create a
> zip file of them.
That's nice to see we agree on this one, but I am not sure we
are talking about the same thing actually :-) For now, we have:
zip:zip-file(
<zip:file href="new-file.zip">
<zip:entry name="README">This is a sample.</zip:entry>
</zip:file>)
I.e. the content of README (that is, the content of what will
be the entry 'README' in the new ZIP file) is part of the
zip:file element. It is never serialized prior to the creation
of the ZIP file. An alternative way, in order to use existing
static files, is to use @src:
zip:zip-file(
<zip:file href="new-file.zip">
<zip:entry src="some/where/logo.png"/>
</zip:file>)
What I suggest here, is to be able to do something like the
following (as this is already the case in the HTTP Client, see
http://expath.org/spec/http-client, look for param "$bodies"):
let $readme := 'This is a sample.'
return
zip:zip-file(
<zip:file href="new-file.zip">
<zip:entry name="README"/>
</zip:file>,
$readme)
I see at least two reasons for that. If we generate large
content, for some implementations it could be difficult to be
efficient and prevent unnecessary copies of the content just to
add it to zip:file. But furthermore the content can be changed
as a side-effect of adding it to zip:file. For instance, both
following examples won't produce the same entry in the ZIP file:
let $xml := <hello>World!</hello>
return
zip:zip-file(
<zip:file href="one-file.zip">
<zip:entry name="hello.xml"> {
$xml
}
</zip:entry>
</zip:file>,
$readme)
let $xml := <hello>World!</hello>
return
zip:zip-file(
<zip:file href="another-file.zip">
<zip:entry name="hello.xml"/>
</zip:file>,
$xml)
When serializing the element 'hello' into the entry hello.xml,
the former will have to add a binding for the zip namespace.
Regards,
I hadn't realised that from my skim of the spec.
I agree.
Xmlizer