Semantics of the "JOIN" comma in a FROM list

22 views
Skip to first unread message

Roland Bouman

unread,
Aug 10, 2011, 12:14:57 PM8/10/11
to UnQL
Hi!

in the syntax notes, it reads:

"The data-source field specifies one or more collections. A join is
performed if two or more collections are specified."

Since there doesn't seem to be any explicit JOIN...ON syntax, I assume
that a list of multiple data-sources in the FROM list results in a
(conceptual, not per-se materialized) cartesian product.

I wonder how I should picture that. Maybe I'm wrong to compare with
SQL in this respect, but since the syntax is so alike it seems natural
to do so.

In SQL, the elements in a FROM list are tables (be it base tables,
views, sub-selects and whatnot), and the result of a JOIN or ,
operation is itself also a table (regardless if it is materialized -
it is at least a tabular structre). In the SQL case the resulting
structure of

T1 (t1c1, t1c2), T2 (t2c1, t2c2)

will be:

T3 (t1c1, t1c2, t2c1, t2c2)

Now, in UNQL we don't have tables but collections of documents. I am
now wondering about two things:

1) will the "join" or , operation on two collections of documents
yield one collection of documents (analogous to the SQL case)?
2) If so, then what are the semantics of the operation that merges the
documents from the two collections into the documents of the resultant
collection? (I can think of several ways that this could be done, but
I'm curious as to what the designers have in mind exactly)

Of course, if I completely misunderstood the effect of multiple data
sources in the FROM list, then I'd love to understand how it is
supposed to work.

Thanks in advance and kind regards,

Roland.

Mike Malone

unread,
Aug 10, 2011, 1:26:35 PM8/10/11
to un...@googlegroups.com
Wouldn't it be:

  collection1: {doc1: {c1prop1: val1, c2prop2: val2}}, collection2: {doc1: {c2prop1: val1, c2prop2: val2}, doc2: {c2prop1: val3, c2prop2: val4}}

to:

  {collection1: {doc1: {c1prop1: val1, c2prop2: val2}}, collection2: {doc1: {c2prop1: val1, c2prop2: val2}}}
  {collection1: {doc1: {c1prop1: val1, c2prop2: val2}}, collection2: {doc2: {c2prop1: val3, c2prop2: val4}}}

In other words, it's just a pairwise production of each document from collection1 paired with each document from collection2. In fact, a select from a single collection is just a special case of this same operation.

Mike

Roland Bouman

unread,
Aug 10, 2011, 5:41:53 PM8/10/11
to UnQL
Well, it could be, but it could equally well be an array.
Going by the current spec, there's just no telling - the syntax alone
doesn't explain.

Mike Malone

unread,
Aug 10, 2011, 6:46:23 PM8/10/11
to un...@googlegroups.com
On Wed, Aug 10, 2011 at 2:41 PM, Roland Bouman <roland...@gmail.com> wrote:
Well, it could be, but it could equally well be an array.
Going by the current spec, there's just no telling - the syntax alone
doesn't explain.

I agree that it needs some clarification, but if it was an array then it would make the sub-query syntax a little bit weird wouldn't it? The original query would use the collection name to dereference items in the result document while outer queries would need to use the array index. Turning them into documents makes these two queries identical:

  SELECT [collection1.c1prop1, collection2.c2prop1] FROM collection1, collection2;
  SELECT [collection1.c1prop1, collection2.c2prop1] FROM (SELECT FROM collection1, collection2);

Without an "as" clause in the second form though it's not obvious if / how to reference items in the subquery. My best guess is that the top-level objects from the inner query become available in the outer query..? But then what happens if you do:

 SELECT FROM (SELECT FROM collection1, collection2), collection1;

or:
 
 SELECT FROM (SELECT {collection1.c1prop1: collection2.c1prop2} FROM collection1, collection2);

Definitely needs some more thought. I think the subquery might need a name.

Mike

Hans Marggraff

unread,
Aug 29, 2011, 10:11:49 AM8/29/11
to UnQL
Why on earth is there no explicit join clause?

On 11 Aug., 00:46, Mike Malone <m...@simplegeo.com> wrote:

Mike Malone

unread,
Aug 29, 2011, 12:52:42 PM8/29/11
to un...@googlegroups.com
On Mon, Aug 29, 2011 at 7:11 AM, Hans Marggraff <hans.ma...@googlemail.com> wrote:
Why on earth is there no explicit join clause?

Is there really a great reason for one? A reasonable optimizer can work things out on its own. So why bother with the additional syntax? The only argument I've heard for one is that a full cartesian join is almost never what you want, and an explicit join clause makes it less likely that you'll get one accidentally. But it also clutters things up. And it's sort of akin to optimizer hinting, which is definitely a double-edged sword.

Roland Bouman

unread,
Aug 29, 2011, 7:37:37 PM8/29/11
to UnQL
Hi All, Hans, Mike,

On Aug 29, 6:52 pm, Mike Malone <m...@simplegeo.com> wrote:
> Is there really a great reason for one? A reasonable optimizer can work
> things out on its own. So why bother with the additional syntax? The only
> argument I've heard for one is that a full cartesian join is almost never
> what you want, and an explicit join clause makes it less likely that you'll
> get one accidentally. But it also clutters things up. And it's sort of akin
> to optimizer hinting, which is definitely a double-edged sword.

uhm, no.

The argument I hear often, and which also appeals to me, is that
explicit JOIN syntax is a more explicit clarification of the intent of
the programmer.
By writing only those conditions in the ON condition that express the
relationship between the two join operands, the query becomes a lot
clearer.

Any conditions that just restrict the sets to rows of arbitrary
interest should go into the WHERE.

There is more to it than just convention though.
If you allow for OUTER joins, then the semantics of conditions in the
ON clause are different from the conditions in the WHERE clause, in
the sense that those in the ON clause take higher precedence than
those in the WHERE clause.

This leads us to a wholly different topic: should UnQL support OUTER
joins? Even in SQL dialects that do not support an explicit JOIN
clause (like ancient Oracle SQL syntax) there is some element to
indicate an outer join.
AFAICT, No such element is present in UnQL, so my assumption at this
point is that UnQL currently does not support it.

kind regards,

Roland.

Dustin

unread,
Aug 30, 2011, 2:50:08 AM8/30/11
to un...@googlegroups.com

On Monday, August 29, 2011 4:37:37 PM UTC-7, Roland Bouman wrote:
 
The argument I hear often, and which also appeals to me, is that
explicit JOIN syntax is a more explicit clarification of the intent of
the programmer.
By writing only those conditions in the ON condition that express the
relationship between the two join operands, the query becomes a lot
clearer.

  That's exactly where the JOIN operator breaks down for me.  It gets non-obvious when you're doing a five table join centered around 
 
Any conditions that just restrict the sets to rows of arbitrary
interest should go into the WHERE.

  IMO, that sounds like a good reason to never do any filtering in the FROM clause.

    FROM == find sources of data            (make result longer)
    WHERE == trim data down                 (make result shorter)
    SELECT == transform and emit data   (make results narrower)

Hans Marggraff

unread,
Aug 31, 2011, 5:12:42 AM8/31/11
to UnQL
I find explicit joins much easier to read. Without them I am forced to
search for the join condition among the filter conditions.
An explicit join is also more inline with the pipeline interpretation
of query statements:

from
collection_x
-- return an iterator over all elements in x
join collection_y on collection_x.id =
collection_y.id -- return an iterator of x and y
combined (widens result)
where collection_y.date > yesterday and many_more
-- shortens result
select
whatever
-- narrows

Dustin

unread,
Sep 2, 2011, 1:01:11 AM9/2/11
to un...@googlegroups.com

On Wednesday, August 31, 2011 2:12:42 AM UTC-7, Hans Marggraff wrote:
I find explicit joins much easier to read. Without them I am forced to
search for the join condition among the filter conditions.
An explicit join is also more inline with the pipeline interpretation
of query statements:

  I disagree that the inline join is aligned with the pipeline interpretation unless you want multiple where clauses as well, because that's pretty well what ON is, isn't it?  You're effectively saying combine both of these things.  Then you're effectively filtering *both* of them by a particular condition.  I think it's easy to look at this either way.  It would, however, be difficult to *disallow* filtering joins in the WHERE clause, so making more than one way to do it seems a bit undesirable.

Mike Malone

unread,
Sep 2, 2011, 12:44:46 PM9/2/11
to un...@googlegroups.com
On Mon, Aug 29, 2011 at 4:37 PM, Roland Bouman <roland...@gmail.com> wrote:
Hi All, Hans, Mike,

On Aug 29, 6:52 pm, Mike Malone <m...@simplegeo.com> wrote:
> Is there really a great reason for one? A reasonable optimizer can work
> things out on its own. So why bother with the additional syntax? The only
> argument I've heard for one is that a full cartesian join is almost never
> what you want, and an explicit join clause makes it less likely that you'll
> get one accidentally. But it also clutters things up. And it's sort of akin
> to optimizer hinting, which is definitely a double-edged sword.

uhm, no.

The argument I hear often, and which also appeals to me, is that
explicit JOIN syntax is a more explicit clarification of the intent of
the programmer.
By writing only those conditions in the ON condition that express the
relationship between the two join operands, the query becomes a lot
clearer.

There are other ways to be explicit:

  SELECT {c.name: c.age} FROM Parent AS p, p.children AS c WHERE p.name = "Bob";

Mike

Roland Bouman

unread,
Sep 2, 2011, 7:29:32 PM9/2/11
to UnQL
Dustin,

On Sep 2, 7:01 am, Dustin <dsalli...@gmail.com> wrote:
>   I disagree that the inline join is aligned with the pipeline
> interpretation unless you want multiple where clauses as well, because
> that's pretty well what ON is, isn't it?  You're effectively saying combine
> both of these things.  Then you're effectively filtering *both* of them by a
> particular condition.  I think it's easy to look at this either way.  It
> would, however, be difficult to *disallow* filtering joins in the WHERE
> clause, so making more than one way to do it seems a bit undesirable.

If all joins are "inner" joins, then you're indeed looking at a
combination of all collections with each other, and in that case I
agree that it only boils down to preference where you write the
conditions.

But if it should be possible to have outer joins like in SQL, then
suddenly the order of the collections does matter, and the semantics
of conditions appearing in the ON clause are slightly different then
those appearing in the WHERE clause.

I mean this, suppose we have
collection A: [{a: 1}, {a: 2}] and collection B [{b: 1}]

then:

SELECT {a: A.a, b: B.b}
FROM A, B
WHERE A.a = B.b

would give only one result {a: 1, b: 1}, because only {a: 1} matches
{b: 1} and a syntax using JOIN..ON like this

SELECT {a: A.a, b: B.b}
FROM A JOIN B ON A.a = B.b

would yield exactly the same result. But if it would be an outer join
the reslt would be different:

SELECT {a: A.a, b: B.b}
FROM A LEFT JOIN B ON A.a = B.b

would give two result documents, [{a:1, b:1}, {a: 2, b: null}]

And if we would add a WHERE like this:

SELECT {a: A.a, b: B.b}
FROM A LEFT JOIN B ON A.a = B.b
WHERE B.b === null

then the result would be {a:2, b:null},

This outcome is very different from the case where both conditions
would appear in either the ON or the WHERE:

SELECT {a: A.a, b: B.b}
FROM A LEFT JOIN B ON A.a = B.b && B.b === null

would yield no data at all, because there is no document in B with b
=== null.
And for the same reason,

SELECT {a: A.a, b: B.b}
FROM A LEFT JOIN B
WHERE A.a = B.b && B.b === null

would also return no data.

So if we put aside all arguments about explicitness, the explicit
JOIN...ON syntax is functionally different from the comma join, at
least in case UnQL is to support outer joins.

kind regards,

Roland



Roland Bouman

unread,
Sep 2, 2011, 7:31:34 PM9/2/11
to UnQL
Mike,

I'm sorry - I'm afraid I don't understand your example. How is your
example an illustration of an explicit join?

On Sep 2, 6:44 pm, Mike Malone <m...@simplegeo.com> wrote:
Reply all
Reply to author
Forward
0 new messages