http-client html parsing dependency

Max Toro

unread,

Sep 15, 2011, 12:28:12 AM9/15/11

to exp...@googlegroups.com

Hello Florent,
I propose removing the HTML parsing functionality from the HTTP Client.
Instead, the client just returns a xs:string, and you could use a
separate function (e.g. xx:parse-html(xs:string)) if you want to parse
it. To me it's a very different concern that should not be tied to the
HTTP Client, and it's a burden for implementators. Also, I suspect
most users won't need it.

Talking about spec changes, any plans for a new draft? I sent you a
text that fixed 3 major issues a while ago.
--
Max Toro

William Candillon

unread,

Sep 15, 2011, 2:32:25 AM9/15/11

to exp...@googlegroups.com

Hi,

It looks like zorba has the same concern and decoupled the HTML parser
from the HTTP client in this module:
http://zorba-xquery.com/site2/doc/latest/zorba/xqdoc/xhtml/www.zorba-xquery.com_modules_converters_html.html

Best,

William

> --
> You received this message because you are subscribed to the Google Groups "EXPath" group.
> To post to this group, send email to exp...@googlegroups.com.
> To unsubscribe from this group, send email to expath+un...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/expath?hl=en.
>
>

Matthias Brantner

unread,

Sep 19, 2011, 2:32:57 PM9/19/11

to exp...@googlegroups.com

Max,

thanks for bringing this up. I agree that the tidy functionality shouldn't
be hard-coded in the http-client. IMHO, there is too much happening
implicitly over which the user doesn't (but wants to) have control.
For example, there are various tidy implementations and most of them
allow the user to configure the tidy process. In the http-client case, it's not
clear how the user can configure such options. Dealing with the content is
orthogonal to http communication.

As William already pointed out, Zorba decoupled the http-client functionality
from parsing the contents. In Zorba, there exist a variety of parse modules for
parsing formats other than html (e.g. html, csv, json).

EXPath should specify more modules that deal with parsing other data
formats. Doing so allows those modules to be reused independent of
where the contents came from (e.g. http or file).

Does this make sense?

Best regards

Matthias

Rositsa Shadura

unread,

Sep 19, 2011, 5:35:04 PM9/19/11

to EXPath

Hi,

I also think that it is a nice idea to have a separate function which
takes care of parsing HTML. However, I am not quite sure if it is
necessary to change this part of the current specification. Here are
my reasons:

1. According to the current version of the specification, the HTML
parsing process is implementation defined. This gives a lot of freedom
- for example, if a content cannot be tidied up because a processor
misses such functionality, the error HC002(Error parsing the entity
content as XML or HTML) can be thrown. On the other hand, if such
functionality is already implemented or present because of a library,
it will be used and the content will be parsed to a document node.

2. With the current version of the specification, an HTML content can
always be represented as a string:
- in case of HTTP request with an HTML payload, if the user specifies
the serialization parameter method to be "text", then the request
payload will be serialized as text
- in case of HTTP response with an HTML payload, if the user has
specified the attribute "override-media-type" of http:request to be
"text/plain", then again the payload will be returned as a string.

What do you think?

Rositsa

Max Toro

unread,

Sep 19, 2011, 10:43:34 PM9/19/11

to exp...@googlegroups.com

Rositsa,
Yes, you can work around the limitation of an implementation that
doesn't provide HTML parsing. But it's a wrong design having HTML
parsing in the spec. To make an implementation compliant you must
support HTML parsing, which means you must depend on an HTML parsing
library, which you must distribute with your implementation, etc.,
even if the user has no plans on using it. And making it optional
makes no sense since the whole point of EXPath is portability.
Matthias makes good points also, you have no control or way to
customize the parsing, and different parsing libraries can produce
different results.

I agree, if you are working with HTML a lot then it's very convenient
to have the HTTP Client convert it to XHTML for you, freeing you from
having to call xx:parse-html() every time. But a better design is to
have a way to extend an HTTP Client with different converters (HTML,
JSON, etc.), and this should be a separate spec.
--
Max Toro

Florent Georges

unread,

Sep 20, 2011, 6:42:08 AM9/20/11

to exp...@googlegroups.com

On 15 September 2011 06:28, Max Toro wrote:

Hi,

> I propose removing the HTML parsing functionality from the HTTP
> Client. Instead, the client just returns a xs:string, and you
> could use a separate function (e.g. xx:parse-html(xs:string))
> if you want to parse it.

There are various reasons the feature was designed this way:

- it is based on the XProc similar functionality (smart
people have already discussed those details thoroughly,
besides it is easier for a user to switch from one to the
other)

- the serialization part is based on the W3C's serialization
spec for XSLT and XQuery

- the parsing part is mirroring the serialization part to be
consistent

- if XML content is parsed automatically to a document node,
the less surprise principle is to have HTML parsed also to
a document node

> To me it's a very different concern that should not be tied to
> the HTTP Client, and it's a burden for implementators.

Interesting. From my experience, it is way easier to implement
the spec as it is than returning a string and providing a HTML
parsing function that still streams the server's response and
parse it on the fly (instead, of course, of actually build a
string from the HTTP response and then parse it completely
seperatly).

> Talking about spec changes, any plans for a new draft? I sent
> you a text that fixed 3 major issues a while ago.

This is still in progress :-/ I hope to release a new revision
soon though...

Thanks for your feedback!

Regards,

--
Florent Georges
http://fgeorges.org/
http://h2oconsulting.be/

Florent Georges

unread,

Sep 20, 2011, 6:45:20 AM9/20/11

to exp...@googlegroups.com

On 15 September 2011 08:32, William Candillon wrote:

Hi,

> It looks like zorba has the same concern and decoupled the HTML
> parser from the HTTP client in this module:
> http://zorba-xquery.com/site2/doc/latest/zorba/xqdoc/xhtml/www.zorba-xquery.com_modules_converters_html.html

That's not what I understand from this doc. As far as I see,
there is a seperate html:parse() function, but this does not
change anything about their implementation of the EXPath HTTP
Client (it is not "decoupled", this is just one additional
extension provided in an additional module...)

Markus Pilman

unread,

Sep 20, 2011, 7:24:59 AM9/20/11

to exp...@googlegroups.com

Hi,

Yes, in Zorba the http client is decoupled from the html parsing (this
is also why the expath http-client is not a core module anymore).
Zorba has its own http-client and the expath http-client is
implemented in pure XQuery using Zorbas internal http-client. This way
a user does not need an installed libtidy to use zorba (and zorbas
http-client). The only reason why the expath client relies on the
http-parsing module is, that it has to (because of the spec).

Best Markus

Florent Georges

unread,

Sep 20, 2011, 8:59:45 AM9/20/11

to exp...@googlegroups.com

On 19 September 2011 20:32, Matthias Brantner wrote:

Hi,

> For example, there are various tidy implementations and most of
> them allow the user to configure the tidy process. In the
> http-client case, it's not clear how the user can configure
> such options. Dealing with the content is orthogonal to http
> communication.

Well, following this reasoning, the response of the HTTP
request should be only a binary stream (even the headers for
instance shouldn't be parsed?) But if we look at HTTP from a
user point of view, called from within an XPath / XDM environment
(like XQuery and XSLT), it makes perfect sense to provide XML
content (label as such) as an XDM document node to the caller.

The case of HTML is a bit special, as it is not always valid
lexical XML, even though it can be mapped easily to XDM nodes.
But this shouldn't be exposed to the user IMHO. Of course, if
the user wants something more specific than the default
behaviour, he/she can always request for the string value and use
another extension (like html:parse()) to have greater control.

> EXPath should specify more modules that deal with parsing other
> data formats.

You're welcome to submit a proposal ;-) An interesting proposal
would be to define "standard" (in EXPath) new serialization
methods for e.g JSON, EXI or HTML5, usable in xsl:output/@method,
in XQuery output:method option, and in http:body/@method.

Florent Georges

unread,

Sep 20, 2011, 9:08:42 AM9/20/11

to exp...@googlegroups.com

On 20 September 2011 13:24, Markus Pilman wrote:

Hi Markus,

> Zorba has its own http-client and the expath http-client is
> implemented in pure XQuery using Zorbas internal http-client.

Thanks for the details! So if I understand correctly, Zorba
provides two of its own extensions: an HTTP client and an HTML
parser. And those are used together in an additional XQuery
module (a "plain" module using those extensions, not a C++
module): the Zorba implementation of the EXPath HTTP Client.

I thought William said Zorba's EXPath HTTP Client did have the
HTML parsing capability, which is quite different ;-)

Florent Georges

unread,

Sep 20, 2011, 9:24:16 AM9/20/11

to exp...@googlegroups.com

On 20 September 2011 04:43, Max Toro wrote:

Hi,

> To make an implementation compliant you must support HTML
> parsing, which means you must depend on an HTML parsing
> library, which you must distribute with your implementation

Well, we are defining an HTTP client. Having to parse some
HTML as part of this module is not a big dependency regarding to
the complexity of HTTP itself. I mean, if the user is ready to
accept the extra code (library or whatever) to handle HTTP,
parsing HTML should not be a problem then.

> I agree, if you are working with HTML a lot then it's very
> convenient to have the HTTP Client convert it to XHTML for you,
> freeing you from having to call xx:parse-html() every time.
> But a better design is to have a way to extend an HTTP Client
> with different converters (HTML, JSON, etc.)

Sure. It would be nice to be able to configure the HTTP client
to have it parsing the content differently if the content is JSON
or CVS. But this is not the case yet. I guess this could be in
a version 2.0, by providing a map from Content-Type strings to
function items... Anyway, any proposal welcome ;-)

Daniela Florescu

unread,

Sep 20, 2011, 12:32:08 PM9/20/11

to exp...@googlegroups.com

>>
>
> There are various reasons the feature was designed this way:
>
> - it is based on the XProc similar functionality (smart
> people have already discussed those details thoroughly,
> besides it is easier for a user to switch from one to the
> other)
>
> - the serialization part is based on the W3C's serialization
> spec for XSLT and XQuery
>
> - the parsing part is mirroring the serialization part to be
> consistent
>
> - if XML content is parsed automatically to a document node,
> the less surprise principle is to have HTML parsed also to
> a document node

Florent,

I am not how many smart people designed that, but
I think I am in agreement with everybody on this list that supports
decoupling.

HTTP is a protocol for transporting bits.

HTML is a way of organizing the bits.

People should be able to use the two functionalities/modules (get the
bits, or parse the bits into something
understandable) in an orthogonal way.

HTTP is not only transporting HTML, and HTML doesn't always come from
HTTP.

Best
Dana

Max Toro

unread,

Sep 20, 2011, 2:10:34 PM9/20/11

to exp...@googlegroups.com

> There are various reasons the feature was designed this way:
>
> - it is based on the XProc similar functionality (smart
> people have already discussed those details thoroughly,
> besides it is easier for a user to switch from one to the
> other)

I was reading http://www.w3.org/TR/xproc/#c.http-request and it says:

"Given the above description, any content identified as text/html will
be encoded as (escaped) text or base64-encoded in the c:body element,
as HTML isn't always well-formed XML. A user can attempt to convert
such content into XML using the p:unescape-markup step."

> - the serialization part is based on the W3C's serialization
> spec for XSLT and XQuery

Yes, but how is this related to the topic we are discussing?

> - the parsing part is mirroring the serialization part to be
> consistent

Same as above.

> - if XML content is parsed automatically to a document node,
> the less surprise principle is to have HTML parsed also to
> a document node

Maybe the same is true if we give HTML to an XML parser, but it just
doesn't work.
--
Max Toro

Matthias Brantner

unread,

Oct 12, 2011, 11:35:11 PM10/12/11

to exp...@googlegroups.com

Florent,

Any update on this? It seems that everybody is in agreement
except you. ;-)

Best regards

Matthias

Daniela Florescu

unread,

Oct 13, 2011, 12:37:30 AM10/13/11

to exp...@googlegroups.com

Florent,

is this because "smart people thought about this before" ? :-)))

That's definitely a bad reason -- see our today's discussion.

Thanks
Dana

Adam Retter

unread,

Oct 13, 2011, 2:42:16 PM10/13/11

to EXPath

> I am not how many smart people designed that, but
> I think I am in agreement with everybody on this list that supports
> decoupling.
>
> HTTP is a protocol for transporting bits.

Indeed.

> HTML is a way of organizing the bits.

Indeed.

> People should be able to use the two functionalities/modules (get the
> bits, or parse the bits into something
> understandable) in an orthogonal way.
>
> HTTP is not only transporting HTML, and HTML doesn't always come from
> HTTP.

But do we really have to be so strict? If we follow your point to its
logical conclusion you are saying that if we do a request for an XML
document with the http-client then we should get back either an
xs:string or xs:base64Binary?

Surely it is preferable to apply some sanity for the user, and be a
little bit intelligent here so they dont have to chain lots of
functions together and say: They always have the option of getting
back the raw data, but its most likely that they want a document-
node() if they requested a URI that points at an XML document, and
the same really applies for an HTML document.

>
> Best
> Dana

Daniela Florescu

unread,

Oct 13, 2011, 2:53:13 PM10/13/11

to exp...@googlegroups.com

>>
>
> But do we really have to be so strict? If we follow your point to its
> logical conclusion you are saying that if we do a request for an XML
> document with the http-client then we should get back either an
> xs:string or xs:base64Binary?

Yes.

>
> Surely it is preferable to apply some sanity for the user, and be a
> little bit intelligent here so they dont have to chain lots of
> functions together and say:
> They always have the option of getting
> back the raw data, but its most likely that they want a document-
> node() if they requested a URI that points at an XML document, and
> the same really applies for an HTML document.

It's a matter of simple design and separation of concerns.

The raw HTTP module should have the raw HTTP call.

The HTML convertor module should have the parse and tidy call.

After that you can provide as many convenience modules in top of the
simple
building blocks as you wish....(those are all XQuery, and presumably
easy to
write and portable from one engine to the other).

Raw building blocks and convenience modules are orthogonal.

The main point is that it seems bizarre to oblige the HTTP client
module to have a built in
dependency to an external HTML tidying package. After that, why not a
CSV parser ? Or a JSON parser?
Or a shapefile reader ?

Goes on the slippery slope...

Does sound like good design to me.

Best
Dana

Max Toro

unread,

Oct 13, 2011, 2:59:18 PM10/13/11

to exp...@googlegroups.com

>> People should be able to use the two functionalities/modules (get the
>> bits, or parse the bits into something
>> understandable) in an orthogonal way.
>>
>> HTTP is not only transporting HTML, and HTML doesn't always come from
>> HTTP.
>
> But do we really have to be so strict? If we follow your point to its
> logical conclusion you are saying that if we do a request for an XML
> document with the http-client then we should get back either an
> xs:string or xs:base64Binary?
>
> Surely it is preferable to apply some sanity for the user, and be a
> little bit intelligent here so they dont have to chain lots of
> functions together and say: They always have the option of getting
> back the raw data, but its most likely that they want a document-
> node() if they requested a URI that points at an XML document, and
> the same really applies for an HTML document.

EXPath defines modules for XPath and XPath-based languages, such as
XSLT and XQuery. These languages work with XML, and the
implementations depend on an XML parser. If you implement an EXPath
module that target a specific processor you get XML parsing for free,
it's part of the processor dependencies. On the other hand, HTML has
nothing to do with XML, XPath, XSLT or XQuery, with the exception of
the html serialization method, but that has nothing to do with
parsing. So, if I want to implement an EXPath module, why should it
support HTML parsing? just because it's convenient?

--
Max Toro

Adam Retter

unread,

Oct 13, 2011, 3:06:56 PM10/13/11

to exp...@googlegroups.com

I suspect we are getting into the region of RISC vs CISC arguments here.

I can see the reasoning for small clean building blocks of course and
appreciate Dana's point of view, but I also understand that XQuery is
great for high-level 'non-programmers' who want to get things done,
and not have to chain hundreds of functions together for something
that is simple. I dont want XQuery to become the next assembly.

For example, okay we could externalise the html parsing, but then you
might say well we cant have a function that just does html parsing we
have to have a parser functions and a lexer functions etc etc - you
get the idea!

--
Adam Retter

skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk

Daniela Florescu

unread,

Oct 13, 2011, 3:11:56 PM10/13/11

to exp...@googlegroups.com

This is where good taste in API design comes into the picture :-)

Dana

> --
> Adam Retter
>
> skype: adam.retter
> tweet: adamretter
> http://www.adamretter.org.uk
>

Max Toro

unread,

Oct 13, 2011, 3:16:55 PM10/13/11

to exp...@googlegroups.com

The separation of concerns is one argument. The other is, me, as an
EXPath module implementor, have no access to HTML parsing! It's not
part of the processor dependencies, nor the processor dependencies
dependencies, etc.
--
Max Toro

Adam Retter

unread,

Oct 13, 2011, 4:05:31 PM10/13/11

to exp...@googlegroups.com

> The separation of concerns is one argument. The other is, me, as an
> EXPath module implementor, have no access to HTML parsing! It's not
> part of the processor dependencies, nor the processor dependencies
> dependencies, etc.

There may be a function under consideration within the W3C XQuery WG
for fn:parse-html() as a standard part of XQuery and XPath Functions
and Operators. Does that mean that you would not implement this, even
though it could be part of the standard?

Daniela Florescu

unread,

Oct 13, 2011, 4:11:00 PM10/13/11

to exp...@googlegroups.com

Hmm... Is it really ?

I think I missed this as part of the W3C discussions.

That's a bad idea. Every XQuery processor will now be bloated with
another huge dependency.

That's not good.

That's what external modules are for.

That's why Java is a well designed language. Core is simple and has
everything you
need to build libraries in top.

XQuery wants to be the kitchen sink......

Best
Dana

>>> --
>>> Adam Retter
>>>
>>> skype: adam.retter
>>> tweet: adamretter
>>> http://www.adamretter.org.uk
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "EXPath" group.
>>> To post to this group, send email to exp...@googlegroups.com.
>>> To unsubscribe from this group, send email to expath+un...@googlegroups.com
>>> .

>>> For more options, visit this group at http://groups.google.com/group/expath?hl=en
>>> .
>>>
>>>
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "EXPath" group.
>> To post to this group, send email to exp...@googlegroups.com.
>> To unsubscribe from this group, send email to expath+un...@googlegroups.com
>> .
>> For more options, visit this group at http://groups.google.com/group/expath?hl=en
>> .
>>
>>
>
>
>

> --
> Adam Retter
>
> skype: adam.retter
> tweet: adamretter
> http://www.adamretter.org.uk
>
> --
> You received this message because you are subscribed to the Google
> Groups "EXPath" group.
> To post to this group, send email to exp...@googlegroups.com.
> To unsubscribe from this group, send email to expath+un...@googlegroups.com
> .

Adam Retter

unread,

Oct 16, 2011, 3:12:00 PM10/16/11

to exp...@googlegroups.com

> Hmm... Is it really ?
>
> I think I missed this as part of the W3C discussions.
>
> That's a bad idea. Every XQuery processor will now be bloated with another
> huge dependency.
>
> That's not good.
>
> That's what external modules are for.

If you think that is bad, there may also be a function for invoking an
XSLT from XQuery. So now your XQuery processor maybe HAS to have an
XSLT implementation?

Daniela Florescu

unread,

Oct 16, 2011, 3:18:22 PM10/16/11

to exp...@googlegroups.com

1, I checked with the W3C specs. There is no such function in XQuery.

2. Both XSLT and XQuery are just two different syntaxes for the same
language.
The same runtime is enough for both, with two different external hats.

3. In general, i am not advocating adding anything to the core XQuery.
That's what
(EXpath) modules are made for. Users can pick and choose what is
useful, and hence
what dependencies are they willing with live with.

Making XQuery the kitchen sink is bad design.

Best
Dana

Adam Retter

unread,

Oct 16, 2011, 4:06:07 PM10/16/11

to exp...@googlegroups.com

On 16 October 2011 19:18, Daniela Florescu <dflo...@mac.com> wrote:
> 1, I checked with the W3C specs. There is no such function in XQuery.

I did not say it was in the spec. My point was that these things are
under discussion, check the mail archive of the WG discussions.

> 2. Both XSLT and XQuery are just two different syntaxes for the same
> language.
> The same runtime is enough for both, with two different external hats.

That seems a bit of a flippant remark! Yes there are a LOT of
similarities, but they are certainly not the same language. If it was
that simple every vendor with an XQuery implementation would also have
an XSLT implementation and vice-versa. They do not because it is not a
trivial undertaking.

Michael Kay

unread,

Oct 16, 2011, 5:57:38 PM10/16/11

to exp...@googlegroups.com

On 13/10/2011 21:11, Daniela Florescu wrote:
> Hmm... Is it really ?
>
> I think I missed this as part of the W3C discussions.
>
> That's a bad idea. Every XQuery processor will now be bloated with
> another huge dependency.
>
> That's not good.
>
> That's what external modules are for.
>
> That's why Java is a well designed language. Core is simple and has
> everything you
> need to build libraries in top.
>

I would disagree. Algol 60 failed because they thought you didn't need
to standardise the function library. Java succeeded in good measure
because it has a vast function library that is standardised across all
implementations. The XQuery function library in comparison is tiny.

(That's independent of the merits of adding any particular function, of
course. And it doesn't rule out having standard interfaces for functions
that are optional features, like XQJ is optional in Java, or that are
standardised by separate bodies. But in general, I think a rich and
standardised function library aids adoption.)

Michael Kay
Saxonica

Adam Retter

unread,

Oct 16, 2011, 6:04:26 PM10/16/11

to exp...@googlegroups.com

> But in general, I think a rich and
> standardised function library aids adoption.)

Agreed :-)

Daniela Florescu

unread,

Oct 16, 2011, 10:16:23 PM10/16/11

to exp...@googlegroups.com

>>
> I would disagree. Algol 60 failed because they thought you didn't
> need to standardise the function library. Java succeeded in good
> measure because it has a vast function library that is standardised
> across all implementations. The XQuery function library in
> comparison is tiny.

Michael,

you really don't pay attention to what I write, or I am not that
clear. Sorry if it is the latter.

I NEVER said that we shouldn't standardize libraries. I always said that
YES, we should standardize libraries.

But that's different then putting them in the CORE of the programming
language.

Dana

Dmitriy Shabanov

unread,

Oct 16, 2011, 11:38:21 PM10/16/11

to exp...@googlegroups.com

Can you download pure java? just "core"?

--
Dmitriy Shabanov

Matthias Brantner

unread,

Oct 17, 2011, 12:09:10 AM10/17/11

to exp...@googlegroups.com

I think that the W3C has to choose carefully which extensions to the function library
are useful in the core of the language and which are not. Certainly, many useful
functions are still missing and it's definitely useful to add them (e.g. the variety of maths
functions that are going to be in 3.0).

However, I'm not sure whether extensions such as tidy should belong to the core.
I would assume that the behavior of this extension would be mostly implementation
dependent because there is no standard for tidy that the w3c could refer to.
Adding more and more of those implementation dependent features doesn't increase the
interoperability of implementations.

IMHO, EXPath is the right place to develop such an extension and agree (among implementations)
on its features. I would vote for developing a new EXPath tidy module which is not part of the
http-client module.

Matthias

kit.wallace

unread,

Oct 17, 2011, 3:02:36 AM10/17/11

to EXPath

+1 for separating HTTP from HTML tidy

Function design is of course a goldilocks problem but this separation
feels right to me. My work involves as much parsing of the content as
CSV or JSON as it does HTML. Standard transforms to XML for boh CSV
and JSON are themselves clearly needed in addition to the HTML tidy
but need to take a string input. A wrapper to combine the two
functions is trivial although I guess there is a slight performance
hit in the separation.

Chris

Markus Pilman

unread,

Oct 17, 2011, 4:17:32 AM10/17/11

to exp...@googlegroups.com

Just my 2 cents to this discussion:

First of all I am not sure, whether expath is the right place for an
extended XPath/XQuery/XSLT library. It claims to be a community
project, but in fact it is not. It is simply a place where people can
drop a spec with still keeping control over it. There is nothing like
a community process. So in the end, the http-client specification will
change, if and only if Florent thinks it needs to be changed. This is
not an attack against Florent, since he does not have any Tools to
know what the communities opinion is (and counting email is probably
not the way he should do it). Therefore: as long as expath does not
have a defined process how something must be specified, I do not think
that expath will be successful in a long term.

Then there is a discussion, whether a standard library should be
minimal or maximal. I don't really have a strong opinion on that.
There are successful examples of both approach (C for sure is minimal
but still the most used programming language out there, C++ is a bit
less minimal but the std namespace functionality is still quite small
compared with java.*, Java is probably quite maximal and so is also
Python).

But to the original question: I really think, that concepts need to be
separated, for this reasons:
1. The design is simpler and a user does not have to think too much
about one function call (a function
"change_the_world_and_make_everything_better" is normally not what a
user can easily understand).
2. It is not whether there is a dependency to another library or not,
but about making stuff as simple as possible. I also would vote for
not letting the http-client parse XML automatically. The reason is
quite simple: There is know way the http-client can know, what kind of
data it is receiving. If you think now about the "Content-Type"
header, you should just make a few tests: the twitter API often sends
JSON and declares the content-type "text/html", probably more than
half of the stuff that has the content-type "text/xhtml" is not well
formed xml etc. The simple fact is: the real world is ugly and
programmers are to lazy to do stuff correct. So a user has no other
way to determine the content type of a URL or web service than by
looking into the documentation or by trial and error - the http-client
can not do that for him!
3. Most successful, well designed and widely used libraries separate
concepts. Let's take sorting of something: In STL there is no
std::vector::sort because it is pointless, there is a std::sort which
takes to iterators as arguments. In Java this separation is done with
interfaces and abstract classes. Often it is hard to say where to draw
a line there (should we first specify a socket module and than
designing the htpp-client on top of that? do we have to think about
streaming?).

My approach to make everything more user-friendly would be, to just
give a function item (or a sequence of function items for multi part
responses) as an optional argument to the http-client:send-request
function which handles the the data. Than a user could just give
fn:parse for xml data and do really fancy stuff (parse cvs, get
metadata from a video and generate an XML from it etc). But of course
for that solution, XQuery 3.0 is needed - for XQuery 1.0: just
separate the concerns and let the http-client return a string.

Best Markus

Dmitriy Shabanov

unread,

Oct 17, 2011, 4:50:27 AM10/17/11

to exp...@googlegroups.com

Yeap, it looks simple to create top level function to combine http+html or http+json .. etc. And that functions should be part of http+json or http+html package. It wrong to make it part of json or html package if any, IMHO.

My two cents to have pure http functions and mixed one.

On Mon, Oct 17, 2011 at 12:02 PM, kit.wallace <kit.w...@gmail.com> wrote:

+1 for separating HTTP from HTML tidy

Function design is of course a goldilocks problem but this separation
feels right to me. My work involves as much parsing of the content as
CSV or JSON as it does HTML. Standard transforms to XML for boh CSV
and JSON are themselves clearly needed in addition to the HTML tidy
but need to take a string input. A wrapper to combine the two
functions is trivial although I guess there is a slight performance
hit in the separation.

--
Dmitriy Shabanov

Christian Grün

unread,

Oct 17, 2011, 5:00:10 AM10/17/11

to exp...@googlegroups.com

> Just my 2 cents to this discussion:

A fine reply, and I widely agree. Just one issue that kept me thinking:

> let the http-client return a string.

To be consistent, we'd have to return the binary representation here;
otherwise, we'd have to limit ourselves to a specific encoding, such
as UTF-8. And as we all know, the real world is ugly.. ;)

Thanks,
Christian

Markus Pilman

unread,

Oct 17, 2011, 5:06:38 AM10/17/11

to exp...@googlegroups.com

> To be consistent, we'd have to return the binary representation here;
> otherwise, we'd have to limit ourselves to a specific encoding, such
> as UTF-8. And as we all know, the real world is ugly.. ;)

You are right, I thought about that 5 minutes after I sent the mail.
There is also not only the problem with encoding, but also with binary
data (you can not put a jpeg into an xs:string..). So of course EXPath
would also need functionality (in the form of modules) to handle these
binary responses..

Markus

Adam Retter

unread,

Oct 17, 2011, 8:41:25 AM10/17/11

to exp...@googlegroups.com

> A wrapper to combine the two
> functions is trivial although I guess there is a slight performance
> hit in the separation.

Not nessecarily!
For example, if its a base64binary data type that it returns, then in
eXist-db, you are just passing around a stream until some action is
taken on the stream, this is very efficient.
If its an xs:string, then a vendor could actually create a streaming
implementation of this in the same way that I have with base64binary,
in fact its something I have been considering for some time for
eXist-db.

--

Adam Retter

unread,

Oct 17, 2011, 8:46:03 AM10/17/11

to exp...@googlegroups.com

> First of all I am not sure, whether expath is the right place for an
> extended XPath/XQuery/XSLT library. It claims to be a community
> project, but in fact it is not. It is simply a place where people can
> drop a spec with still keeping control over it. There is nothing like
> a community process. So in the end, the http-client specification will
> change, if and only if Florent thinks it needs to be changed. This is
> not an attack against Florent, since he does not have any Tools to
> know what the communities opinion is (and counting email is probably
> not the way he should do it).

I think Florent is more of an arbitrator that an overlord ;-)

> Therefore: as long as expath does not
> have a defined process how something must be specified, I do not think
> that expath will be successful in a long term.

Okay, but perhaps you forget that EXPath is formed by the spare time
contributions of individuals, as far as I am aware no one has been
paid to work on EXPath. Also there is no entrace fee to participate in
EXPath, whereas the entrance fee for W3C is substantial and rules out
many.
Perhaps if there was some function for EXPath (FLWOR foundation and
others?) then more time and resources could be spent on
administration?

> My approach to make everything more user-friendly would be, to just
> give a function item (or a sequence of function items for multi part
> responses) as an optional argument to the http-client:send-request
> function which handles the the data. Than a user could just give
> fn:parse for xml data and do really fancy stuff (parse cvs, get
> metadata from a video and generate an XML from it etc). But of course
> for that solution, XQuery 3.0 is needed - for XQuery 1.0: just
> separate the concerns and let the http-client return a string.

I do really like this idea of passing in a parsing function :-)

Adam Retter

unread,

Oct 17, 2011, 8:47:44 AM10/17/11

to exp...@googlegroups.com

> Perhaps if there was some function for EXPath (FLWOR foundation and
> others?) then more time and resources could be spent on
> administration?

Sorry, a typo slipped through, that should read -

Perhaps if there was some funding for EXPath (FLWOR foundation and

others?) then more time and resources could be spent on
administration?

- There is obviously a function for EXPath, otherwise we would not be
discussing it!

Markus Pilman

unread,

Oct 17, 2011, 9:58:36 AM10/17/11

to exp...@googlegroups.com

Yes money is always a problem. I don't know which
companies/organisations would be interested in funding a project like
expath (Oracle, IBM and Marklogic probably not (yet), may be flwor
foundation, 28msec, BaseX - just guessing but probably these
organizations won't be able to bring too much money in). But also
without money, it would be possible to define a community process:

- define a root set of members (I call people who are allowed to vote
on specification changes and introduction of new specifications
member) - this is probably the most difficult part, may be everybody
who contributed a spec + one from each team from a XQuery/XPath/XSLT
implementation (1 from the BaseX team, one from the eXist team, one
from the Zorba team etc.)
- all discussion is still done publicly over the mailing list -
everybody can join the discussion
- at one point the members will vote (for example by using
http://www.doodle.com)
- every member can propose someone to be a new member - if two third
of the members vote for it, the proposed one will also get a member
- for each spec there will be someone coordinating the discussions and
the votings (preferably the one who initially wrote the first version
of the spec)

This is just a brainstorming idea and for sure not perfect. But with
something like that, we would have a real community process (instead
of just discussions and no one knows whether there will be a result
after it).

If others also would like to see something like that, we should
probably start a new thread. @Florent: what do you think?

Best Markus

Daniela Florescu

unread,

Oct 17, 2011, 11:10:41 AM10/17/11

to exp...@googlegroups.com

Just as a point of reference in the discussion, maybe it is
interesting to look at the way we did it in Zorba.

1. Core Zorba has a simple HTTP package (proprietary -- because we
couldn't convince the rest of the people to take tidy out)"
http://www.zorba-xquery.com/site2/doc/latest/zorba/xqdoc/xhtml/www.zorba-xquery.com_modules_http-client.html
(gets string and binary)

3. Then we implemented the current EXpath proposal by combining in
XQuery the two above:
http://www.zorba-xquery.com/site2/doc/latest/zorba/xqdoc/xhtml/expath.org_ns_http-client.html
real code is here http://www.zorba-xquery.com/site2/doc/latest/zorba/xqdoc/xhtml/modules/expath.org_ns_http-client.html

I think all three steps should be "standardized" by EXPath, not only
the 3.

Best
Dana

> --
> You received this message because you are subscribed to the Google
> Groups "EXPath" group.
> To post to this group, send email to exp...@googlegroups.com.
> To unsubscribe from this group, send email to expath+un...@googlegroups.com
> .

Daniela Florescu

unread,

Oct 17, 2011, 11:41:10 AM10/17/11

to exp...@googlegroups.com

>>
>> Perhaps if there was some funding for EXPath (FLWOR foundation and
>> others?) then more time and resources could be spent on
>> administration?

I already offered that to Florent. He is supposed to get back to me.

But yes, we can afford some $$ for some administration, and/or hire
some technical people
to take care of the specs, because:
1. specs written by developers are not of the best quality in general
2. money isn't as much the issue as the time...I guess we are all
overloaded ....

Tests would also be helpful, too...

I strongly believe that XQuery adoption will only go as fast as we
manage to "standardize" libraries.

Best
Dana

Jonathan Robie

unread,

Oct 17, 2011, 10:04:34 AM10/17/11

to exp...@googlegroups.com

On Mon, Oct 17, 2011 at 3:02 AM, kit.wallace <kit.w...@gmail.com> wrote:

+1 for separating HTTP from HTML tidy

And another +1 for separating HTTP from HTML Tidy.

Jonathan

Adam Retter

unread,

Oct 17, 2011, 1:32:48 PM10/17/11

to exp...@googlegroups.com

Yes I agree 'its possible' without money, but who will do the work?
Who will write this all up and publish it to the EXPath website, who
will ensure the process is adhered to?

If I only have a few hours hear and their to spend on EXPath,
personally I will be reading the mailing list, contributing specs and
comments on specs. So it wont be 'me' in the short term, is it 'you'?

Adam Retter

unread,

Oct 17, 2011, 1:35:00 PM10/17/11

to exp...@googlegroups.com

>>> Perhaps if there was some funding for EXPath (FLWOR foundation and
>>> others?) then more time and resources could be spent on
>>> administration?
>
> I already offered that to Florent. He is supposed to get back to me.
>
> But yes, we can afford some $$ for some administration, and/or hire some
> technical people
> to take care of the specs, because:
> 1. specs written by developers are not of the best quality in general
> 2. money isn't as much the issue as the time...I guess we are all overloaded
> ....

As we are all overloaded, if there is money to pay for it, could we
not get someone who isnt 'us' to do this. They need to have a
background in XQuery and community working. It would also be good if
that someone was not attached to any particular vendor or
implementation, so that they could remain neutral as is the spirit of
EXPath.

> Tests would also be helpful, too...

Completely agree.

> I strongly believe that XQuery adoption will only go as fast as we manage to
> "standardize" libraries.
>
> Best
> Dana
>

Daniela Florescu

unread,

Oct 17, 2011, 1:42:19 PM10/17/11

to exp...@googlegroups.com

Adam,

I'll tell you what I've been playing with recently. Maybe it's a
solution to the
(a) lots of work + (b) not enough money problem.

IT crowdsourcing. Maybe we should be the first community that takes this
seriously and make it scale.

I heard that it works in general, but there are reasons of why this
might
work even BETTER in the XMl community: because there are a lot of
skilled
people distributed all over the world and unemployed (each of of us
lost his/her job at least
several times in the last 15 years because of the XMLophobia in
industry... so you know
what I mean...)

http://www.innocentive.com/

I don't really believe in centralized decisions ( -- not in this case
at least :-).

Just thinking out loud,
Dana

Dmitriy Shabanov

unread,

Oct 18, 2011, 12:56:10 AM10/18/11

to exp...@googlegroups.com

Here another problem too: (c) no rules (transparency) from proposal to part of specification path.

(a) is a problem if only one man should work (write) the specification, that should be extended up to 2 or 3, IMHO. + better control over issue

If there are any problems with hosting or web-interface ... community will be able to some it without any money.

On Mon, Oct 17, 2011 at 10:42 PM, Daniela Florescu <dflo...@mac.com> wrote:

I'll tell you what I've been playing with recently. Maybe it's a solution to the
(a) lots of work + (b) not enough money problem.

IT crowdsourcing. Maybe we should be the first community that takes this
seriously and make it scale.

I heard that it works in general, but there are reasons of why this might
work even BETTER in the XMl community: because there are a lot of skilled
people distributed all over the world and unemployed (each of of us lost his/her job at least
several times in the last 15 years because of the XMLophobia in industry... so you know
what I mean...)

http://www.innocentive.com/

I don't really believe in centralized decisions ( -- not in this case at least :-).

centralized decisions .... does it exist? =)

--
Dmitriy Shabanov

Daniela Florescu

unread,

Oct 18, 2011, 1:00:56 AM10/18/11

to exp...@googlegroups.com

>
> centralized decisions .... does it exist? =)

Yeah.. at least W3C's trying to !? :-))

I got asked to leave the W3C meeting -- that I pay legal fees to !!
-- , remember !?

Asked to leave because "we'll not gone accept everybody's uncle" by
Sharon Adler:-))))

Because I didn't agree with XSLT maps ...

Hi. Hi.

Centralized decisions....:-)))

I don't believe in that.

Dana

Reply all

Reply to author

Forward