New EXPath module for generic input/output

15 views
Skip to first unread message

Rositsa Shadura

unread,
Nov 3, 2011, 5:45:27 PM11/3/11
to EXPath
Hi everyone,

currently we at BaseX work on a specification for an EXPath module for
generic input and output and we would like to give some more
information about it so that you can share your opinion.
We say "generic" because one such XQuery module will include several
functions with which one can convert from some data format different
from XML to XML and vice versa - from XML to some other format.
Our decision for such a module was partially driven from the intense
discussion on separating the HTML parsing from the HTTP module and the
lack of an EXPath module for parsing from different formats. Another
reason is that currently a lot of XQuery processors offer functions
for html, csv, etc. parsing and, on the other hand, functions for
serialization and transformation of XML. Having these two sets of
functions - one for input to XML and one for output from XML, it will
be really convenient to have also functions which independently from
the data format can convert data to XML and vice versa. Such a module
will not be associated with any particular data format or set of
formats and thus will not be influenced when new formats appear.
Besides it will not cause processors to rewrite their existing logic
on parsing or serializing or transforming but can be used as a wrapper
around it.

What do you think about such a module?

I will be glad to receive your feedback!

Regards,
Rositsa

Adam Retter

unread,
Nov 3, 2011, 6:01:32 PM11/3/11
to exp...@googlegroups.com
I would like to see a few proposed function signatures, but I also
recognise that the complexity is not in the function concepts
themselves but it documenting the mapping.

Also, I wonder if this should be one module, or rather separate
modules for each formats under a common base namespace
http://expath.org/ns/conversion/<thing> e.g.
http://expath.org/ns/conversion/csv and
http://expath.org/ns/conversion/json

> --
> You received this message because you are subscribed to the Google Groups "EXPath" group.
> To post to this group, send email to exp...@googlegroups.com.
> To unsubscribe from this group, send email to expath+un...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/expath?hl=en.
>
>

--
Adam Retter

skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk

Matthias Brantner

unread,
Nov 3, 2011, 7:30:14 PM11/3/11
to exp...@googlegroups.com
I really like this idea. In fact, Zorba already implements this generic mapping.

For example, there exist a couple of data-converter modules:
json
html
csv

Each of them implements at least a parse (to XDM) and a serialize (to text or binary)
function. Additional functions accept more parameter to pass options to the
conversion process. This is analog to the fn:parse-xml and fn:serialize functions.

This, for example, matches very nicely with the file and http module. Those provide
the corresponding read and write functions (one for text and one for binary). See
file

Best

Matthias

Claudius Teodorescu

unread,
Nov 4, 2011, 4:51:42 AM11/4/11
to exp...@googlegroups.com
Hi,

Such module is a great idea and a need.

I started a module for eXist, designed for digital publishing, with the intention to allow conversion to/from XML from/to various formats (docbook, csv, text, pdf, openxml, dita, etc.) and conversion between these formats, along with some functions for manipulation for digital publishing.

As this could be a huge module, I also thought to splitting into sub-modules.



Claudius

Rositsa Shadura

unread,
Nov 4, 2011, 7:43:07 AM11/4/11
to EXPath
Hi Adam,

our idea is not to have separate modules for each format, e.g. one for
csv, one for json, but to have a module
which can convert from any format to and from XML. The reason is that
there are already implementations for
parsing and serializing particular data formats. For example, we at
BaseX have functions json:parse and
json:serialize and Zorba also offers such, as Matthias mentioned.
However, the signatures for these functions
differ at BaseX and at Zorba as currently there is no established
standard for them. Standards will appear
sooner or later and apart from that new formats will appear. One
generic module for converting to/from
XML will not be influenced by this unescapable development because it
will be used as a wrapper around the
already existing and newly appearing functionality.

Regards,
Rositsa

On Nov 4, 9:51 am, Claudius Teodorescu <claudius.teodore...@gmail.com>
wrote:
> Hi,
>
> Such module is a great idea and a need.
>
> I started a module for eXist, designed for digital publishing, with the
> intention to allow conversion to/from XML from/to various formats (docbook,
> csv, text, pdf, openxml, dita, etc.) and conversion between these formats,
> along with some functions for manipulation for digital publishing.
>
> As this could be a huge module, I also thought to splitting into
> sub-modules.
>
> Claudius
>
> On Fri, Nov 4, 2011 at 1:30 AM, Matthias Brantner <mbrant...@gmail.com>wrote:
>
>
>
> > I really like this idea. In fact, Zorba already implements this generic
> > mapping.
>
> > For example, there exist a couple of data-converter modules:
> > - json<http://www.zorba-xquery.com/site2/doc/latest/zorba/xqdoc/xhtml/www.zo...>
> > - html<http://www.zorba-xquery.com/site2/doc/latest/zorba/xqdoc/xhtml/www.zo...>
> > - csv<http://www.zorba-xquery.com/site2/doc/latest/zorba/xqdoc/xhtml/www.zo...>
>
> > Each of them implements at least a parse (to XDM) and a serialize (to text
> > or binary)
> > function. Additional functions accept more parameter to pass options to the
> > conversion process. This is analog to the fn:parse-xml and fn:serialize
> > functions.
>
> > This, for example, matches very nicely with the file and http module.
> > Those provide
> > the corresponding read and write functions (one for text and one for
> > binary). See
> > - http-client<http://www.zorba-xquery.com/site2/doc/latest/zorba/xqdoc/xhtml/www.zo...>
> > - file<http://www.zorba-xquery.com/site2/doc/latest/zorba/xqdoc/xhtml/expath...>
>
> > Best
>
> > Matthias
>
> > On Nov 3, 2011, at 3:01 PM, Adam Retter wrote:
>
> > I would like to see a few proposed function signatures, but I also
> > recognise that the complexity is not in the function concepts
> > themselves but it documenting the mapping.
>
> > Also, I wonder if this should be one module, or rather separate
> > modules for each formats under a common base namespace
> >http://expath.org/ns/conversion/<thing> e.g.
> >http://expath.org/ns/conversion/csvand
> >http://expath.org/ns/conversion/json
>
> > On 3 November 2011 21:45, Rositsa Shadura <rositsa.shad...@gmail.com>
> >http://groups.google.com/group/expath?hl=en.- Hide quoted text -
>
> - Show quoted text -

Adam Retter

unread,
Nov 4, 2011, 9:39:48 AM11/4/11
to exp...@googlegroups.com
At the moment I am not understaning why you would keep all this in one
module, but dont worry ;-)

I will sit tight and see what it is that you are proposing as the
details emerge, and then comment when things are a bit less ethereal

Imsieke, Gerrit, le-tex

unread,
Nov 3, 2011, 6:13:05 PM11/3/11
to exp...@googlegroups.com

On 2011-11-03 22:45, Rositsa Shadura wrote:
> serialization and transformation of XML. Having these two sets of
> functions - one for input to XML and one for output from XML, it will
> be really convenient to have also functions which independently from
> the data format can convert data to XML and vice versa. Such a module
> will not be associated with any particular data format or set of
> formats and thus will not be influenced when new formats appear.

�which independently from the data format can convert data to XML and
vice versa�
Frankly, I don�t get it. Any converter from XML and to XML has to
convert to/from /some/ format. If your planned module doesn�t cater to
concrete existing formats such as CSV, JSON etc., what�s its
target/source representation then?


Another important area for a module that is going to cover common XML
input/output tasks: reading and writing binary data (that is carried
along in a base64 representation while in XML documents).

Geert Josten just had a related question on the XProc list (writing
base64 encoded inline images found in Word files). And I was surprised a
while ago when I learned that I couldn�t check the mere existence of a
binary file from within an XSLT processor, without reverting to Java
extension functions.

Gerrit

> Besides it will not cause processors to rewrite their existing logic
> on parsing or serializing or transforming but can be used as a wrapper
> around it.
>
> What do you think about such a module?
>
> I will be glad to receive your feedback!
>
> Regards,
> Rositsa
>

--
Gerrit Imsieke
Gesch�ftsf�hrer / Managing Director
le-tex publishing services GmbH
Weissenfelser Str. 84, 04229 Leipzig, Germany
Phone +49 341 355356 110, Fax +49 341 355356 510
gerrit....@le-tex.de, http://www.le-tex.de

Registergericht / Commercial Register: Amtsgericht Leipzig
Registernummer / Registration Number: HRB 24930

Gesch�ftsf�hrer: Gerrit Imsieke, Svea Jelonek,
Thomas Schmidt, Dr. Reinhard V�ckler

Imsieke, Gerrit, le-tex

unread,
Nov 3, 2011, 7:59:37 PM11/3/11
to exp...@googlegroups.com

On 2011-11-04 00:30, Matthias Brantner wrote:
> This, for example, matches very nicely with the file and http module.
> Those provide
> the corresponding read and write functions (one for text and one for
> binary). See

> - file
> <http://www.zorba-xquery.com/site2/doc/latest/zorba/xqdoc/xhtml/expath.org_ns_file.html>

Ok, this answers my request for reading/writing binary files.

But given that both the file and http modules perform I/O and won�t be
obsoleted by what Rositsa suggested, I think what she suggested should
rather be called 'dataformat' or 'converter' instead of input/output.

Claudius Teodorescu

unread,
Nov 5, 2011, 12:54:55 AM11/5/11
to exp...@googlegroups.com
Hi,

I wonder it would worth such module to have a single function, with the following signature:

transform($transformation-formula as xs:string, $data-to-transform as item(), $transformation-parameters) as item(),

where:
$transformation-formula can be 'xml to csv', xml to html, etc.

This could be very concise, and avoid stating many function according to formats.

Claudius

Matthias Brantner

unread,
Nov 5, 2011, 1:12:26 AM11/5/11
to exp...@googlegroups.com
I would prefer a separate module for each data converter.
Mainly to keep each module independent and small in terms of size
and dependencies. For example, instead of having one module which
depends on a shapefile, a tidy, and a json library, we should define
three modules where each only requires a single such library dependency.
Additionally, other transformations could be implemented entirely in
XQuery/XSLT and don't require any external dependency at all.

Does this make sense?

Best regards

Matthias


--
You received this message because you are subscribed to the Google Groups "EXPath" group.
To view this discussion on the web visit https://groups.google.com/d/msg/expath/-/j35eA82qzHkJ.

Claudius Teodorescu

unread,
Nov 5, 2011, 1:51:56 AM11/5/11
to exp...@googlegroups.com
Yes, this makes full sense, and I completely agree with totally separating the dependencies for each transformation one needs.

But I consider that can easily be done as Adam stated above, by using the concept of sub-modules, so that one can load the sub-modules s/he wants, and use the above mentioned function transform() as a sort of master function.

The specification and module would be very agile, allowing full liberty for implementations to add as much as transformation formula they want.

Of course, the specification have to clearly states guidelines for some sensitive formats (I am thinking of converting XML to JSON, for instance).

Normally, the formats to convert XML to and from are well defined, as we speak about HTML, CSV, JSON. I dare here to add some new formats, like docbook, csv, text, pdf, openxml, dita, etc., as I stated above, and I an thinking  why would we need a function for each of the possible conversions, not only to and from XML, but also between the other various formats (for instance, I already implemented in eXist a conversion from HTML to XSL-FO, by using the excellent tool CSS2XSLFO).

Such a separate module for doing transformations between various format is a brilliant idea, and I would like to see it as a flexible spec that will give use freedom and power.


Claudius
Reply all
Reply to author
Forward
0 new messages