Jagmohan Chauhan
unread,Mar 21, 2012, 12:05:31 AM3/21/12Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
  to s4-project
Hi
We are working on Yahoo S4 for a project. We are using  a simple
application where we are reading words from a file , making sentences
out of it and printing the sentences on the console. We have made two
PE's for it. The first PE extracts the words thrown by the client
adapter, looks for the . , which means end of a sentence, forms a
sentence and sends it to next PE. The second PE takes the sentence and
prints it on console.   The file size from which our client
application is reading and feeding input to the adapter is 1 GB.  The
first PE's is keyless while for second one we performed experiments
with same key as well as different keys.
We are finding an unusual issue when we are trying with different
configuration of nodes.  We are trying to run the application on a
cluster which has 4 systems.
We are using 1 system for client adapter and other three as Processing
nodes.  The issue we are observing is that with increasing number of
nodes the execution time is increasing for same data set(file).
Here are some statistics :
1 node configuration: Time is 2 min 10 sec
2 node configuration : Time is 2 min 30 sec
3 node configuration :Time is 2min 40 sec
We could not  reason about this issue as we thought that with
increasing nodes we shall get better execution time . Can anyone
please shed some light on this issue. Is the overhead of disseminating
events is so high that it does not improve the execution time.
Thanks and Regards
Jagmohan Chauhan
MSc student,CS
Univ. of Saskatchewan