Hi Sandeep,
Activities in Hyracks are used to define blocking dependencies between
tasks within operators. For example, In a Hash Join operator, the probe
activity cannot start until the build activity is complete. If there are
no blocking dependencies between tasks, one would not use multiple
activities within that operator. Remember that a single activity can
accept multiple inputs.
Having said that, can you explain a little bit why you would like to put
all the tasks of doing a division into one operator? It might be easier
to create the operation through composing other operators. In Hyracks
there is no performance penalty to having a chain of operators, since
the runtime merges all operators that are connected by 1:1 edges and
running at the same site into a super operator. One other thing you will
have to work out is how will this operator run in a parallel setting?
How does data need to be partitioned feeding into the operator and is
there a need to repartition data at some point within the divide
operation (I have a feeling you will need that)? If repartitioning is
required, division cannot be done as a single operator.
Thanks,
Vinayak
On 10/10/13 6:30 AM, Sandeep Kale wrote:
>
>
>
> Hi,
> Thank you for quick reply.
>
> I am implementing Relational Algebra like Division Operation ( say
> DivisionOperator).
> DivisionOperator has input arity 2 and output arity 1.
>
> This can be done as sequence of Operators but we want to wrap all the
> activities into one operator.
>
> Division operation: (More about Division Operation
>
http://en.wikipedia.org/wiki/Relational_algebra)
> Definition in terms of the basic algebra operation
> Let r(R) and s(S) be relations, and let S subsetof R
>
> r divideby s = Project R-S (r ) � Project R-S ( ( Project R-S (r ) x s ) �
> Project R-S,S(r ))
> (I could not write actual relational algebra symbol here. Sorry for that)
>
>
> following are the activities of my operator
>
> A_SELECT_R_MINUS_S_FROM_r_ACTIVITY_ID = 0; // project R-S (r)
> B_SELECT_R_MINUS_S_S_FROM_r_ACTIVITY_ID = 1; // project R-S,S (r)
>
> C_JOIN_A_WITH_S_PHASE1_ACTIVITY_ID = 2; // join of (project R-S (r))
> and s
> C_JOIN_A_WITH_S_PHASE2_ACTIVITY_ID = 3;
>
> D_FIND_C_SET_DIFF_B_PHASE1_ACTIVITY_ID = 4; // set difference
> D_FIND_C_SET_DIFF_B_PHASE2_ACTIVITY_ID = 5;
>
> E_SELECT_R_MINUS_S_FROM_D_ACTIVITY_ID = 6; // Project R-S ( ( Project
> R-S (r ) x s ) � Project R-S,S(r ))
> F_SELECT_R_MINUS_S_FROM_r_ACTIVITY_ID = 7; // activity 7 and 0 are
> same, we can use output of activity 0 instead of having activity 7
>
> G_FIND_F_SET_DIFF_E_PHASE1_ACTIVITY_ID = 8; // set difference of
> Project R-S (r ) � Project R-S ( ( Project R-S (r ) x s ) � Project R-S,S(r
> ))
> G_FIND_F_SET_DIFF_E_PHASE2_ACTIVITY_ID = 9;
>
> join activities will be same as nested loop join and set difference
> activities will be like Hybrid Hash Join where tuple will be added output
> if it is not present in build input when probed
>
> Activity 5 and 6 , 0 and 2 and 7 and 8 can be merged but to keep it simple
> we have separated them for more visibility. Later we will merge those in
> one activity.
>
> Following figure gives dependency between activities.
>
>
> <
https://lh4.googleusercontent.com/-pSP9Uy6voBU/UlapT7tCI0I/AAAAAAAAADA/u3JGwrSDSMU/s1600/ActivityDiagram.jpeg>