Re: [expath] MongoDB Module: Working Draft

3 views
Skip to first unread message

Hans-Juergen Rennau

unread,
Mar 8, 2015, 6:46:19 PM3/8/15
to christi...@gmail.com, public...@w3.org, exp...@googlegroups.com, dan...@exist-db.org
Hi Christian,

I would like to make a few suggestions concerning the new MongoDB module: one major, several minor.

index management
===============
First of all, I feel we need two functions for creating and dropping indexes, something similar to this:

mongodb:createIndex($client-id as xs:string, $database as xs:string, $collection as xs:string,
$indexDescriptor as map(*), $options as map(*)) as map(*)

mongodb:dropIndex($client-id as xs:string, $database as xs:string, $collection as xs:string,
$indexDescriptor as map(*)) as map(*)

[The returned map is meant to report the success.]

Of course, an index info function would also be useful, like this:
mongodb:getIndexes($client-id as xs:string, $database as xs:string, $collection as xs:string) as map(*)

mongodb:update / return value
========================
When an 'update' call uses the "upsert" option, it may create a new document whose _id should be returned. Therefore I suggest that the return value of mongodb:update should not be empty-sequence(), but either xs:string? or, better, a map containing the values supplied by the 'WriteResult' object when using the mongo shell, for instance:
WriteResult({ "nMatched" : 25, "nUpserted" : 0, "nModified" : 24 })

mongodb:find / options
==================
The spec says about "fields":
   "Restricts the returned fields. The file _id will always be returned."

However, supposing that the XQuery function is equivalent to db.collection.find(), the _id field can be suppressed by the "fields" entry "_id: 0", which can be combined with inclusive and exclusive field selection, alike. I suppose this issue concerns only the spec text, not the intended functionality?

~ ~ ~

Finally an editorial suggestion. Later, the spec might perhaps for each mongodb function specify the equivalence of functionality provided by the function on one hand, and a well-defined MongoDB operation on the other hand: for example the equivalence between mongodb:find and db.collection.find, including the alignment of function parameters and operation parameters. Having established such an equivalence, the normative semantics could be reduced to "providing the result/effect one would have obtained/produced by using the equivalent MongoDB operation"; of course the spec should continue to provide a brief description, but it would be non-normative.

Cheers,
Hans-Jürgen


Dannes Wessels

unread,
Mar 9, 2015, 4:19:36 AM3/9/15
to Hans-Juergen Rennau, christi...@gmail.com, public...@w3.org, exp...@googlegroups.com
Hi,

On Sun, Mar 8, 2015 at 11:46 PM, Hans-Juergen Rennau <hre...@yahoo.de> wrote:

index management
===============
First of all, I feel we need two functions for creating and dropping indexes, something similar to this:

I intentially left these kind of functions out of 'Mongrel' because I think the driver/spec should focus on operational features only. Configuring the database (creation of indices) is not part of that in this idea...... IMO for these things 'native' tooling can and should be used?

regards

Dannes


--
eXist-db Native XML Database - http://exist-db.org
Join us on linked-in: http://www.linkedin.com/groups?gid=35624

Hans-Juergen Rennau

unread,
Mar 9, 2015, 5:08:59 PM3/9/15
to Dannes Wessels, christi...@gmail.com, public...@w3.org, Dannes Wessels, exp...@googlegroups.com
Dannes,

the question is the scope of XQuery applications which we want to enable.

Scope 1 is read and write access to existing MongoDB collections.

Scope 2 is the *active* use of MongoDB technology as an extension of XQuery technology - for instance in order to overcome scaling problems. The scope includes pure XQuery applications which create, use and delete MongoDB collections at their own discretion, putting MongoDB at XQuery service.

The creation of collections is already enabled by mongodb:insert and mongodb:update. But collections without indexes are not fit for production uses. So renouncing a "mongodb:createIndex" function would render XQuery applications creating collections downright incomplete. Such applications would have to execute some "phase 1", in which documents are inserted but indexes do not yet exist; then stop or pause, only to be resumed after the completion of  some "out of scope" actions. This significantly reduces the value of the MongoDB module. The limitation can be removed by the addition of a couple of XQuery functions which simply pass their arguments through to the MongoDB database. These functions enable XQuery applications to achieve truly functional collections, by simply adding a few lines of code.

A different way of putting it: support for the creation of indexes *is* an operational feature, it is not "administrative" work to be distinguished from "operational" work. It is a key aspect of the operation "create collection", and this operation is in turn an important feature of any usage of MongoDB. I am worried about the severe operational limitation which the current version of the spec implies. 

Hans-Jürgen





--
You received this message because you are subscribed to the Google Groups "EXPath" group.
To unsubscribe from this group and stop receiving emails from it, send an email to expath+un...@googlegroups.com.
To post to this group, send email to exp...@googlegroups.com.
Visit this group at http://groups.google.com/group/expath.
For more options, visit https://groups.google.com/d/optout.


Christian Grün

unread,
Mar 9, 2015, 6:15:53 PM3/9/15
to Hans-Juergen Rennau, public...@w3.org, exp...@googlegroups.com, dan...@exist-db.org
Hi Hans-Jürgen,

Thanks for your initial feedback (and thanks to others writing me in
private.. If possible, please always write to the list!).

> index management

In my opinion, the addition of these functions would make sense and
should be pretty straightforward (but I can also relate to Dannes'
perspective on this). By the way, right now you can also us
mongodb:command for handling indexes [1].

> mongodb:update / return value

I am not quite sure on this one: If someone is not interested in the
"write result" of an update operation, this result would need to be
explicitly ignored in some way in the XQuery expression. What do you
think would be the most elegant way to write an updating MongoDB query
without returning any result as query output?

> mongodb:find / options
> However, supposing that the XQuery function is equivalent to
> db.collection.find(), the _id field can be suppressed by the "fields" entry
> "_id: 0"

Thanks for the hint! This comment should simply be dropped.

> Later, the spec might perhaps for each mongodb function specify
> the equivalence of functionality provided by the function on one
> hand, and a well-defined MongoDB operation on the other hand

This would be nice indeed. However, my personal experience so far is
that the existing MongoDB API is not as consistent and stable as we
wish it should be, at least not in its current state, so I assume that
if we aligned our module more closely to the current API, we would
arguably need to update this module more frequently as we would like
to.

Surprisingly, I noticed that even the official MongoDB Java driver
(which we and eXist use in our implementations) differs from the shell
API in various aspects, often more than our own driver does, and I'm
not even fully sure if we could map all features of the MongoDB shell
API with this driver.

More feedback is welcome,
Christian

[1] http://docs.mongodb.org/manual/reference/command/createIndexes/#dbcmd.createIndexes

Hans-Juergen Rennau

unread,
Mar 11, 2015, 7:23:29 AM3/11/15
to exp...@googlegroups.com, christi...@gmail.com, public...@w3.org, dan...@exist-db.org
Hi Christian,

very glad that index management is at any rate enabled, one way or another! I would still vote for explicit functions, but this is not a critical issue.

Something you wrote puzzles me:
"I am not quite sure on this one: If someone is not interested in the
"write result" of an update operation, this result would need to be
explicitly ignored in some way in the XQuery expression. What do you
think would be the most elegant way to write an updating MongoDB query
without returning any result as query output?
"

Could you give an example where a non-empty value of the function call might disturb the client? If he is not interested, he either does not assign the value to a variable, or, if the call is within a let or for clause, simply does not use the bound variable. I just have not yet understood the issue.

And I have a very general question concerning the order of evaluations in a "for" expression. In principle, I thought, the processor is free to choose any evaluation order as long as the dependencies implied by "return value of expr#1 = input value expr#2" is respected (i.e., expr#1 is guaranteed to be resolved before expr#2). Therefore I wonder about FLWOR clauses using expressions with a value which is statically known to be an empty sequence - can I still rely on any order? This may of course be crucial in case of side-effects, as there are with mongo (or SQL) updating expressions.

Cheers,
Hans-Jürgen



--
You received this message because you are subscribed to the Google Groups "EXPath" group.
To unsubscribe from this group and stop receiving emails from it, send an email to expath+unsub...@googlegroups.com.

Dannes Wessels

unread,
Mar 11, 2015, 5:05:54 PM3/11/15
to Christian Grün, Hans-Juergen Rennau, public...@w3.org, exp...@googlegroups.com
Hi,

> On 9 Mar 2015, at 23:15 , Christian Grün <christi...@gmail.com> wrote:
>
>> index management
>
> In my opinion, the addition of these functions would make sense and
> should be pretty straightforward (but I can also relate to Dannes'
> perspective on this). By the way, right now you can also us
> mongodb:command for handling indexes [1].


Well I am sensitive to the arguments as stated. :-) I'll have a other thought....

regards

Dannes

Christian Grün

unread,
Mar 15, 2015, 8:04:23 AM3/15/15
to Hans-Juergen Rennau, exp...@googlegroups.com, public...@w3.org, dan...@exist-db.org
Hi Hans-Jürgen,

Finally some feedback:

> Could you give an example where a non-empty value of the function call might
> disturb the client? If he is not interested, he either does not assign the
> value to a variable, or, if the call is within a let or for clause, simply
> does not use the bound variable. I just have not yet understood the issue.

One general problem with all non-deterministic and side-effecting
expressions in XQuery is that their behavior is not formalized in the
specification of XQuery (and I doubt that anyone will tackle this in
the foreseeable future..).

Let's take an example. In the following query...

let $a := local:a()
let $b := local:b()
return ()

...a query processor may decide to first evaluate local:b() and then
local:a() without violating the rules of the language. It may also
skip the evaluation of the all expressions, because the result will be
an empty sequence anyway.

In the MongoDB spec, we added the simple sentence, saying that "A
query processor must ensure that non-deterministic functions are not
relocated or rewritten in the query, and that its results are not
cached at runtime.". This rule is by no means complete, but it is
supposed to indicate that the order in which a users has written down
function calls in the original query should be preserved. Currently,
it does not tell anything about the question if an expression must be
evaluated, and that it must not be optimized away. It could be added,
but in the end I believe that a general EXPath document may be a
better place for that.

Back to the original problem statement: We observed that client may
not be interested in any result output if a query is only updating. If
a user inserts some data and runs a MongoDB query, the expression
could look as follows:

let $id := mongo:connect(...)
return (mongo:insert(...), mongo:find(...))

We could "swallow" the result of an updating function by e.g. adding a
false() predicate to the insert expression..

mongo:insert(...)[false()]

…but we then need to be sure that it is not optimized away by the
query processor. The same applies if we bind the result of the
updating function to a dummy variable:

let $id := mongo:connect(...)
let $_ := mongo:insert(...)
return mongo:find(...)

I need to add, though, that it could even apply in my first example (a
query processor could check if the result of a function call will be
an empty sequence, and skip the call), but it may not be as obvious.

In a nutshell: The general challenge here is much broader. My
practical approach would be to define functions in a way that results
will only be returned if functions are non-updating. This is the way
it has been done in the other EXPath modules so far.

Feedback is welcome as usual,
Christian

Hans-Juergen Rennau

unread,
Mar 15, 2015, 12:51:22 PM3/15/15
to exp...@googlegroups.com, christi...@gmail.com, public...@w3.org, dan...@exist-db.org
This is indeed a very subtle and, I think, very important problem. Important because the original view that XQuery is not concerned with side effects would block really important developments of XQuery. (For example, it would be absurd to say that XQuery does not need to write files - it is a crucial feature from an operational point of view.) And I feel that sooner or later this must be tackled and the freedom of optimization must be restricted in a well-defined way in response to well-defined function properties.

For the time being, a very practical approach might deal with bogus tokens which may enforce the sequence, like this:

let $fooToken := local:foo(...)
let $barToken := local:bar($fooToken, ...)
return
   mongodb:find($barToken, ...)

Therefore, in general, the most problematic case is a function returning the empty sequence, because this removes any chance to enforce anything.

Cheers,
Hans-Jürgen



Christian

Christian Grün

unread,
Mar 16, 2015, 9:10:02 AM3/16/15
to Liam R. E. Quin, Hans-Juergen Rennau, exp...@googlegroups.com, public...@w3.org, dan...@exist-db.org
> The only satisfactory answer I've seen is that XQuery was designed to
> be embedded in other languages, rather like SQL, and not as a complete
> system. When you try and introduce side-effects you have problems.

My perspective on this is: The existence of the EXPath modules and
many other side-effecting modules in XQuery processors (eXist, Saxon,
Zorba, MarkLogic, our own, possibly others) indicates that there *is*
a need to do more with the language than just using it as focused
query language. XProc seems as a useful extension to me, but it
couldn't imagine writing full applications with it. And from the
implementation perspective, my experience is that it is fine to have
side-effecting functions as long as they are consistently dealt with.

However, it would be quite some effort at this stage to define rules
that all implementors will agree with. But the major issue is that we
will hardly find anyone who would be willing to take care of this. But
I may be wrong… Any volunteers out there?

> At any rate I don't see the XQuery Working Group making much more
> progress in the area of managing side-effects.

I agree.

Christian
Reply all
Reply to author
Forward
0 new messages