Performance issue in Yahoo S4

37 views
Skip to first unread message

Jagmohan Chauhan

unread,
Mar 21, 2012, 12:05:31 AM3/21/12
to s4-project
Hi

We are working on Yahoo S4 for a project. We are using a simple
application where we are reading words from a file , making sentences
out of it and printing the sentences on the console. We have made two
PE's for it. The first PE extracts the words thrown by the client
adapter, looks for the . , which means end of a sentence, forms a
sentence and sends it to next PE. The second PE takes the sentence and
prints it on console. The file size from which our client
application is reading and feeding input to the adapter is 1 GB. The
first PE's is keyless while for second one we performed experiments
with same key as well as different keys.

We are finding an unusual issue when we are trying with different
configuration of nodes. We are trying to run the application on a
cluster which has 4 systems.
We are using 1 system for client adapter and other three as Processing
nodes. The issue we are observing is that with increasing number of
nodes the execution time is increasing for same data set(file).

Here are some statistics :

1 node configuration: Time is 2 min 10 sec
2 node configuration : Time is 2 min 30 sec
3 node configuration :Time is 2min 40 sec


We could not reason about this issue as we thought that with
increasing nodes we shall get better execution time . Can anyone
please shed some light on this issue. Is the overhead of disseminating
events is so high that it does not improve the execution time.

Thanks and Regards
Jagmohan Chauhan
MSc student,CS
Univ. of Saskatchewan

Reply all
Reply to author
Forward
0 new messages