Algebricks Logical Operators question

44 views
Skip to first unread message

Mihail Vieru

unread,
Apr 24, 2014, 3:42:23 PM4/24/14
to hyrack...@googlegroups.com
Hi,

what is the best way to understand the semantics of the Algebricks Logical Operators (package edu.uci.ics.hyracks.algebricks.core.algebra.operators.logical)?

For instance I don't completely understand how the following operators exactly work on the data flow, while used being used in Hyracks logical plans: EmptyTupleSourceOperator, UnnestOperator, ExchangeOperator, DistributeResultOperator.

How do these work in the logical plans for the equi-join query of the TinySocial example?

(plan attached in 3 forms: unoptimized, logically optimized, physically&logically optimized)

Cheers,
Mihail
unoptimized logical plan _annotated.png
logical optimized logical plan _annotated.png
physical optimized logical plan _annotated.png

Vinayak Borkar

unread,
Apr 24, 2014, 4:44:47 PM4/24/14
to hyrack...@googlegroups.com
Hi Mihail,

Preston at UC Riverside has been creating a document that describes the
semantics of Algebricks operators. (@Preston, can you send a draft to
the mailing list?).

Your pictures ignore the arguments provided to each of the operators -
These arguments are very important in understanding the query
expression. Imagine trying to understand the meaning of a SQL query
(select a,b from R where c > 5) using the relational algebra
representation of Project(Select(R)). Without the arguments, it is
impossible to fully understand the dataflow in an Algebricks plan.

Preston's document should help you clarify the meaning of the operators
you mention, but here is a short description:


1. EmptyTupleSource: Creates a single empty tuple (A tuple with no fields).

2. Unnest: For each input tuple, unnest evaluates the provided
expression that returns a collection, and for each item in the
collection, produces a tuple that includes all fields in the input tuple
and additionally the field specified as an argument to the unnest.

3. Exchange: Exchange is a physical operator that is used to repartition
data between the input operator partitions and the output operator
partitions. The arguments to the exchange operator describe the
semantics of the repartitioning step.

4. DistributeResult: This operator is used to denote the production of
results from a query. Every Algebricks plan has to be terminated at a
sink operator. In case of queries, DistributeResult serves as the sink.
If one were to construct an Algebricks plan to represent a statement
similar to the SQL statement:

insert into T select * from R where c > 5

then such a statement would not be terminated by a DistributeResult.
Instead we would use the Insert Operator as the sink.

Vinayak
> --
> You received this message because you are subscribed to the Google
> Groups "hyracks-users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to hyracks-user...@googlegroups.com
> <mailto:hyracks-user...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

Mihail Vieru

unread,
Apr 25, 2014, 2:02:26 PM4/25/14
to hyrack...@googlegroups.com
Hi Vinayak,

thank you for the quick answer and for the explanations!
You are of course right about the operator arguments, I have understated their importance.

Regarding Unnest: For unnest_1 the provided expression is:

function-call: asterix:dataset, Args:[AString: {FacebookUsers}]


And it's input is a single empty tuple received from the EmptyToupleSource.

The evaluation of the provided expression would return the collection of data instances of the dataset FacebookUsers.
For each of these data instances, a tuple is produced with all fields of the input tuple, here none, and additionally all fields of the dataset FacebookUsers. For n data instances of FacebookUsers that will make n produced tuples. Correct?

So, after taking also unnest_2 into consideration, we'll have n*m tuples produced (m data instances of FacebookMessages) and every tuple having all the fields from both FacebookUsers and FacebookMessages. Did I understood it right?

It would be awesome if Preston's document would be made available as soon as possible, as I am trying to understand all 30 Algebricks logical operators and this would be of great help.

Best regards,
Mihail

Eldon Carman

unread,
Apr 25, 2014, 2:26:01 PM4/25/14
to hyrack...@googlegroups.com



For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "hyracks-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hyracks-users+unsubscribe@googlegroups.com.

Vinayak Borkar

unread,
Apr 25, 2014, 2:29:18 PM4/25/14
to hyrack...@googlegroups.com
On 4/25/14, 11:02 AM, Mihail Vieru wrote:
> Hi Vinayak,
>
> thank you for the quick answer and for the explanations!
> You are of course right about the operator arguments, I have understated
> their importance.
>
> *Regarding Unnest*: For unnest_1 the provided expression is:
>
> function-call: asterix:dataset, Args:[AString: {FacebookUsers}]
>
> And it's input is a single empty tuple received from the EmptyToupleSource.
>
> The evaluation of the provided expression would return the collection of
> data instances of the dataset FacebookUsers.
> For each of these data instances, a tuple is produced with all fields of
> the input tuple, here none, and additionally all fields of the dataset
> FacebookUsers. For n data instances of FacebookUsers that will make n
> produced tuples. Correct?

Correct!



>
> So, after taking also unnest_2 into consideration, we'll have n*m tuples
> produced (m data instances of FacebookMessages) and every tuple having
> all the fields from both FacebookUsers and FacebookMessages. Did I
> understood it right?

Correct again!

>
> It would be awesome if Preston's document would be made available as
> soon as possible, as I am trying to understand all 30 Algebricks logical
> operators and this would be of great help.

Watch this space for the document. I just got confirmation from Preston
that he will publish the document on this list soon.


Vinayak
> > an email to hyracks-user...@googlegroups.com <javascript:>
> > <mailto:hyracks-user...@googlegroups.com <javascript:>>.
> > For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "hyracks-users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to hyracks-user...@googlegroups.com
> <mailto:hyracks-user...@googlegroups.com>.

Mihail Vieru

unread,
Apr 25, 2014, 2:32:13 PM4/25/14
to hyrack...@googlegroups.com
Thank you, Preston! I have something to read now :)

And thanks for the clarifications, Vinayak!

Mihail
>      > <mailto:hyracks-users+unsub...@googlegroups.com <javascript:>>.
>      > For more options, visit https://groups.google.com/d/optout
>     <https://groups.google.com/d/optout>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "hyracks-users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to hyracks-user...@googlegroups.com

Mihail Vieru

unread,
May 4, 2014, 5:39:49 PM5/4/14
to hyrack...@googlegroups.com
Hi,

I have some new questions regarding operators not described in the Algebricks Operators Draft Documentation.

1. What are the differences between the operators: Unnest vs. UnnestMap, Write vs. WriteResult and DistributeResult vs. Sink?
The latter not being described in the documentation/user group yet.

2. Can you please provide a query, which generates a logical plan, which uses the PartitioningSplit operator? I'm interested to know when this operator is used.

Best regards,
Mihail

Mihail Vieru

unread,
May 24, 2014, 10:12:18 AM5/24/14
to hyrack...@googlegroups.com
Hi,
I'm still very interested in finding the answers to the above questions. I would appreciate if somebody would find the time to answer those.
Best regards,
Mihail

Vinayak Borkar

unread,
May 24, 2014, 12:47:51 PM5/24/14
to hyrack...@googlegroups.com
Hi Mihail,


1.
The Unnest operator accepts a sequence generating expression and for
each item in the sequence, produces a tuple with 1 variable bound to
that item. The UnnestMap is similar in that it accepts a sequence
generating function. However, the sequence generating function produces
tuples and for each tuple, the UnnestMap produces a tuple with multiple
variables (each variable is bound to the corresponding field in the
tuple produced by the sequence expression). You can see this operator
being used in AsterixDB to access data stored in indexes. The sequence
expression in this case is the "search" function that probes the index
and produces [Key, Value] tuples. The UnnestMap then binds two
variables, one for the key and one for the value.

The Write/WriteResult operators are not named well. The Write operator
was used in the past to produce results of queries in the past. Systems
sing Algebricks should start moving away from Write to using
DistributeResults for result handling. DistributeResults provides a more
general framework for capturing results of a query and then making it
available to a client (over the network). The Write operator was a
limited way of handing results because it needed to be scheduled at a
location where the client could then read the file that was produced.

The WriteResults operator is used in "INSERT INTO foo SELECT FROM"
scenarios to produce a new dataset in the system. It has nothing to do
with query results (although the name suggests that it does).

The Sink operator is a no-op operator that accepts and silently discards
data. Every well-formed algebricks expression must be terminated by an
operator that consumes data but does not produce data. The Sink operator
is used to terminate plans which do not use
Write/WriteResults/DistributeResults operators. You can see this in
AsterixDB for insert pipelines where the plan inserts data into one or
more indexes. Each index inserting operator consumes tuples and forwards
the tuples to the next operator in the plan. So we can have as many
index inserting operators as indexes that need to be updated. Such a
plan is finally terminated with a Sink operator.

2. I do not think we have a scenario where the PartitioningSplit
operator is used yet.


Vinayak
> > > <mailto:hyracks-user...@googlegroups.com
> <javascript:>>.
> > > For more options, visit
> https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>
> > <https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>>.
> >
> > --
> > You received this message because you are subscribed to
> the Google
> > Groups "hyracks-users" group.
> > To unsubscribe from this group and stop receiving emails
> from it, send
> > an email to hyracks-user...@googlegroups.com
> > <mailto:hyracks-user...@googlegroups.com>.
> > For more options, visit
> https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "hyracks-users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to hyracks-user...@googlegroups.com
> <mailto:hyracks-user...@googlegroups.com>.

Mihail Vieru

unread,
Jun 3, 2014, 4:36:00 PM6/3/14
to hyrack...@googlegroups.com
Hi Vinayak,

You haven't mentioned the input tuple collection for Unnest & UnnestMap and I'm not fully certain I understood everything right. Everything else is now clear to me. Thanks!

The Unnest operator's semantics in relational algebra would be equivalent to that of:
R x Project(S)

where R is the input tuple collection, S the tuple collection generated by the evaluation of the provided expression. And where only one field is projected from S.

UnnestMap's would be:
R x Project(Select(S))

R and S same as above and where several fields are projected from the selection from the tuple collection S, generated by the evaluation of the provided function. Correct?

And regarding usage: Unnest is used for full dataset scans and UnnestMap is used for index searches only. Correct?

Regards,
Mihail
>              >      > <mailto:hyracks-users+unsub...@googlegroups.com
>             <javascript:>>.
>              >      > For more options, visit
>             https://groups.google.com/d/optout
>             <https://groups.google.com/d/optout>
>              >     <https://groups.google.com/d/optout
>             <https://groups.google.com/d/optout>>.
>              >
>              > --
>              > You received this message because you are subscribed to
>             the Google
>              > Groups "hyracks-users" group.
>              > To unsubscribe from this group and stop receiving emails
>             from it, send
>              > an email to hyracks-user...@googlegroups.com
>              > <mailto:hyracks-users+unsub...@googlegroups.com>.
>              > For more options, visit
>             https://groups.google.com/d/optout
>             <https://groups.google.com/d/optout>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "hyracks-users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to hyracks-user...@googlegroups.com

Vinayak Borkar

unread,
Jun 3, 2014, 6:38:58 PM6/3/14
to hyrack...@googlegroups.com
>
> The *Unnest* operator's semantics in relational algebra would be
> equivalent to that of:
> R x Project(S)

Close. However, S can be a function of R. Same applies to UnnestMap.

>
> where R is the input tuple collection, S the tuple collection generated
> by the evaluation of the provided expression. And where only one field
> is projected from S.
>
> *UnnestMap*'s would be:
> <mailto:hyracks-user...@googlegroups.com <javascript:>
> > <javascript:>>.
> > > > For more options, visit
> > https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>
> > <https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>>
> > > <https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>
> > <https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>>>.
> > >
> > > --
> > > You received this message because you are
> subscribed to
> > the Google
> > > Groups "hyracks-users" group.
> > > To unsubscribe from this group and stop receiving
> emails
> > from it, send
> > > an email to hyracks-user...@googlegroups.com
> > > <mailto:hyracks-user...@googlegroups.com
> <javascript:>>.
> > > For more options, visit
> > https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>
> > <https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>>.
> >
> > --
> > You received this message because you are subscribed to the Google
> > Groups "hyracks-users" group.
> > To unsubscribe from this group and stop receiving emails from it,
> send
> > an email to hyracks-user...@googlegroups.com <javascript:>
> > <mailto:hyracks-user...@googlegroups.com <javascript:>>.
> > For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "hyracks-users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to hyracks-user...@googlegroups.com
> <mailto:hyracks-user...@googlegroups.com>.
Reply all
Reply to author
Forward
0 new messages