How to give output of two different activities to input to third activity.

15 views
Skip to first unread message

Sandeep Kale

unread,
Oct 10, 2013, 5:30:46 AM10/10/13
to hyrack...@googlegroups.com
Hi Team,

I am implementing an Operator where i need to give output of two different activities as input to third activity.
I understood from other operator implementation that we can do this by maintaining task state Object.

Input Arity of my operator is 2.

As in contributeActivities(IActivityGraphBuilder builder) method of OperatorDescriptor,
we add source Edge as

builder.addSourceEdge(int operatoInputIndex, IActivity task, int taskInputIndex)

In my third activity i dont need to use input given to operator instead i want to use output of other activities(by using task state object) as input to third activity.

Is there any alternate way to do this instead of using addSourceEdge method.

Sandeep

Vinayak Borkar

unread,
Oct 10, 2013, 5:38:03 AM10/10/13
to hyrack...@googlegroups.com
Can you describe more details about your operator? What are the three
activities? Is there a certain sequencing you want to enforce between
the execution of the three activities?

Vinayak

Sandeep Kale

unread,
Oct 10, 2013, 9:30:03 AM10/10/13
to hyrack...@googlegroups.com


Hi,
Thank you for quick reply.

I am implementing Relational Algebra like Division Operation ( say DivisionOperator).
DivisionOperator has input arity 2 and output arity 1.

This can be done as sequence of Operators but we want to wrap all the activities into one operator.

Division operation:  (More about Division Operation http://en.wikipedia.org/wiki/Relational_algebra)
Definition in terms of the basic algebra operation
Let r(R) and s(S) be relations, and let S subsetof R

r divideby s = Project R-S (r ) – Project R-S ( ( Project R-S (r ) x s ) – Project R-S,S(r ))
(I could not write actual relational algebra symbol here. Sorry for that)


following are the activities of my operator

A_SELECT_R_MINUS_S_FROM_r_ACTIVITY_ID = 0;      // project R-S (r)
B_SELECT_R_MINUS_S_S_FROM_r_ACTIVITY_ID = 1;   // project R-S,S (r)
   
C_JOIN_A_WITH_S_PHASE1_ACTIVITY_ID = 2;      // join of (project R-S (r)) and s
C_JOIN_A_WITH_S_PHASE2_ACTIVITY_ID = 3; 
   
D_FIND_C_SET_DIFF_B_PHASE1_ACTIVITY_ID = 4;       // set difference
D_FIND_C_SET_DIFF_B_PHASE2_ACTIVITY_ID = 5;     
   
E_SELECT_R_MINUS_S_FROM_D_ACTIVITY_ID = 6;     // Project R-S ( ( Project R-S (r ) x s ) – Project R-S,S(r ))
F_SELECT_R_MINUS_S_FROM_r_ACTIVITY_ID = 7;        // activity 7 and 0 are same, we can use output of activity 0 instead of having activity 7
   
G_FIND_F_SET_DIFF_E_PHASE1_ACTIVITY_ID = 8;    // set difference of  Project R-S (r ) – Project R-S ( ( Project R-S (r ) x s ) – Project R-S,S(r ))
G_FIND_F_SET_DIFF_E_PHASE2_ACTIVITY_ID = 9;

join activities will be same as nested loop join and set difference activities will be like Hybrid Hash Join where tuple will be added output if it is not present in build input when probed

Activity 5 and 6 , 0 and 2 and 7 and 8 can be merged but to keep it simple we have separated them for more visibility. Later we will merge those in one activity.

Following figure gives dependency between activities.



I am implementing this operator as part of Course Project in Database System.

Thanks

Sandeep

Vinayak Borkar

unread,
Oct 10, 2013, 12:55:42 PM10/10/13
to hyrack...@googlegroups.com
Hi Sandeep,


Activities in Hyracks are used to define blocking dependencies between
tasks within operators. For example, In a Hash Join operator, the probe
activity cannot start until the build activity is complete. If there are
no blocking dependencies between tasks, one would not use multiple
activities within that operator. Remember that a single activity can
accept multiple inputs.

Having said that, can you explain a little bit why you would like to put
all the tasks of doing a division into one operator? It might be easier
to create the operation through composing other operators. In Hyracks
there is no performance penalty to having a chain of operators, since
the runtime merges all operators that are connected by 1:1 edges and
running at the same site into a super operator. One other thing you will
have to work out is how will this operator run in a parallel setting?
How does data need to be partitioned feeding into the operator and is
there a need to repartition data at some point within the divide
operation (I have a feeling you will need that)? If repartitioning is
required, division cannot be done as a single operator.

Thanks,
Vinayak



On 10/10/13 6:30 AM, Sandeep Kale wrote:
>
>
>
> Hi,
> Thank you for quick reply.
>
> I am implementing Relational Algebra like Division Operation ( say
> DivisionOperator).
> DivisionOperator has input arity 2 and output arity 1.
>
> This can be done as sequence of Operators but we want to wrap all the
> activities into one operator.
>
> Division operation: (More about Division Operation
> http://en.wikipedia.org/wiki/Relational_algebra)
> Definition in terms of the basic algebra operation
> Let r(R) and s(S) be relations, and let S subsetof R
>
> r divideby s = Project R-S (r ) � Project R-S ( ( Project R-S (r ) x s ) �
> Project R-S,S(r ))
> (I could not write actual relational algebra symbol here. Sorry for that)
>
>
> following are the activities of my operator
>
> A_SELECT_R_MINUS_S_FROM_r_ACTIVITY_ID = 0; // project R-S (r)
> B_SELECT_R_MINUS_S_S_FROM_r_ACTIVITY_ID = 1; // project R-S,S (r)
>
> C_JOIN_A_WITH_S_PHASE1_ACTIVITY_ID = 2; // join of (project R-S (r))
> and s
> C_JOIN_A_WITH_S_PHASE2_ACTIVITY_ID = 3;
>
> D_FIND_C_SET_DIFF_B_PHASE1_ACTIVITY_ID = 4; // set difference
> D_FIND_C_SET_DIFF_B_PHASE2_ACTIVITY_ID = 5;
>
> E_SELECT_R_MINUS_S_FROM_D_ACTIVITY_ID = 6; // Project R-S ( ( Project
> R-S (r ) x s ) � Project R-S,S(r ))
> F_SELECT_R_MINUS_S_FROM_r_ACTIVITY_ID = 7; // activity 7 and 0 are
> same, we can use output of activity 0 instead of having activity 7
>
> G_FIND_F_SET_DIFF_E_PHASE1_ACTIVITY_ID = 8; // set difference of
> Project R-S (r ) � Project R-S ( ( Project R-S (r ) x s ) � Project R-S,S(r
> ))
> G_FIND_F_SET_DIFF_E_PHASE2_ACTIVITY_ID = 9;
>
> join activities will be same as nested loop join and set difference
> activities will be like Hybrid Hash Join where tuple will be added output
> if it is not present in build input when probed
>
> Activity 5 and 6 , 0 and 2 and 7 and 8 can be merged but to keep it simple
> we have separated them for more visibility. Later we will merge those in
> one activity.
>
> Following figure gives dependency between activities.
>
>
> <https://lh4.googleusercontent.com/-pSP9Uy6voBU/UlapT7tCI0I/AAAAAAAAADA/u3JGwrSDSMU/s1600/ActivityDiagram.jpeg>

Sandeep Kale

unread,
Oct 10, 2013, 2:45:22 PM10/10/13
to hyrack...@googlegroups.com
Hi,
We want to implement division operation as a single operator as, we want to bundle all the sub-operations performed
for division into one operator.

In Hyracks, How can we write new operator by composing other operators and connecting them 1:1 ?

Yes, we will need to repartitioning as there is two set difference Operation in division.


Thanks
--
Sandeep

Vinayak Borkar

unread,
Oct 16, 2013, 3:49:10 AM10/16/13
to hyrack...@googlegroups.com
Hi Sandeep,

Are you still stuck on this issue? As I mentioned earlier, you will need
to split the operator into several operators since you need to perform
data redistribution along the way.

Vinayak
>>> r divideby s = Project R-S (r ) � Project R-S ( ( Project R-S (r ) x s
>> ) �
>>> Project R-S,S(r ))
>>> (I could not write actual relational algebra symbol here. Sorry for
>> that)
>>>
>>>
>>> following are the activities of my operator
>>>
>>> A_SELECT_R_MINUS_S_FROM_r_ACTIVITY_ID = 0; // project R-S (r)
>>> B_SELECT_R_MINUS_S_S_FROM_r_ACTIVITY_ID = 1; // project R-S,S (r)
>>>
>>> C_JOIN_A_WITH_S_PHASE1_ACTIVITY_ID = 2; // join of (project R-S
>> (r))
>>> and s
>>> C_JOIN_A_WITH_S_PHASE2_ACTIVITY_ID = 3;
>>>
>>> D_FIND_C_SET_DIFF_B_PHASE1_ACTIVITY_ID = 4; // set difference
>>> D_FIND_C_SET_DIFF_B_PHASE2_ACTIVITY_ID = 5;
>>>
>>> E_SELECT_R_MINUS_S_FROM_D_ACTIVITY_ID = 6; // Project R-S ( (
>> Project
>>> R-S (r ) x s ) � Project R-S,S(r ))
>>> F_SELECT_R_MINUS_S_FROM_r_ACTIVITY_ID = 7; // activity 7 and 0
>> are
>>> same, we can use output of activity 0 instead of having activity 7
>>>
>>> G_FIND_F_SET_DIFF_E_PHASE1_ACTIVITY_ID = 8; // set difference of
>>> Project R-S (r ) � Project R-S ( ( Project R-S (r ) x s ) � Project

Sandeep Kale

unread,
Oct 16, 2013, 2:27:38 PM10/16/13
to hyrack...@googlegroups.com

Hi Vinayak,

Currently we are looking for alternative avoid redistribution of data along the way.
As After doing first set Difference  using partitioning, tuples needed for Second set Difference is on the same partition, so  there will not be requirement of repartitioning. Tuples would be available locally. We have not done the Proof of Concept yet, hoping we will be done with it by 24th Oct.
I will get concept formalized, discuss about it with you soon.

Thank you

Sandeep
>>> r divideby s = Project R-S (r ) � Project R-S ( ( Project R-S (r ) x s
>> ) �
>>> Project R-S,S(r ))
>>> (I could not write actual relational algebra symbol here. Sorry for
>> that)
>>>
>>>
>>> following are the activities of my operator
>>>
>>> A_SELECT_R_MINUS_S_FROM_r_ACTIVITY_ID = 0;      // project R-S (r)
>>> B_SELECT_R_MINUS_S_S_FROM_r_ACTIVITY_ID = 1;   // project R-S,S (r)
>>>
>>> C_JOIN_A_WITH_S_PHASE1_ACTIVITY_ID = 2;      // join of (project R-S
>> (r))
>>> and s
>>> C_JOIN_A_WITH_S_PHASE2_ACTIVITY_ID = 3;
>>>
>>> D_FIND_C_SET_DIFF_B_PHASE1_ACTIVITY_ID = 4;       // set difference
>>> D_FIND_C_SET_DIFF_B_PHASE2_ACTIVITY_ID = 5;
>>>
>>> E_SELECT_R_MINUS_S_FROM_D_ACTIVITY_ID = 6;     // Project R-S ( (
>> Project
>>> R-S (r ) x s ) � Project R-S,S(r ))
>>> F_SELECT_R_MINUS_S_FROM_r_ACTIVITY_ID = 7;        // activity 7 and 0
>> are
>>> same, we can use output of activity 0 instead of having activity 7
>>>
>>> G_FIND_F_SET_DIFF_E_PHASE1_ACTIVITY_ID = 8;    // set difference of
>>> Project R-S (r ) � Project R-S ( ( Project R-S (r ) x s ) � Project
Reply all
Reply to author
Forward
0 new messages