Running Stratosphere Job on Cluster

40 views
Skip to first unread message

Janani

unread,
Jun 2, 2014, 9:40:56 AM6/2/14
to stratosp...@googlegroups.com
Hi,

I was running a simple map reduce job (stratosphere-0.5-rc2) in ibm cluster (9 slaves) on 7GB twitter dataset for extracting only unique vertex ids<src> from edges file<src,tar>. The job is executing for more than 3 hours and even after it is not completing.  What would be the approximate estimated time for running on 10 node cluster for around 7GB data? Kindly help me to find the the problem in my case here? Here is my simple user defined map and reduce functions:
Map:
public Tuple2<Long, Long> map(String value) throws Exception {
            String[] array = value.split(this.delim);
            Tuple2<Long,Long> emit = new Tuple2<Long,Long>();
            emit.f0 = Long.parseLong(array[0]);
            emit.f1 = Long.parseLong(array[1]);
            return emit;
        }

Reduce:
public void reduce(Iterator<Tuple2<Long, Long>> values,
                Collector<Long> out) throws Exception {
            Long srcKey = values.next().f0;
            out.collect(srcKey);
        }

Java main method:
DataSet<String> text = env.readTextFile(inputfilepath);
        DataSet<Long> result =  text.map(new TextMapper(fieldDelimiter)).groupBy(0).reduceGroup(new Reducer());
        result.writeAsText(outputfilepath, WriteMode.OVERWRITE);
        env.execute();

Thanks,
Janani

Fabian Hueske

unread,
Jun 2, 2014, 9:52:50 AM6/2/14
to stratosp...@googlegroups.com
Hi,

the program looks good to me on the first sight. There are a few ways to tweak it (only emit the source ID from the mapper, use a ReduceFunction or combinable GroupReduceFunction, use the mutable *Value data types) but it should run in my opinion.
7GB input data on 10 powerful nodes should not be a problem.

Did you set up and configure Stratosphere yourself? Is it maybe running with the default configuration?


--
You received this message because you are subscribed to the Google Groups "stratosphere-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stratosphere-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/stratosphere-dev.
For more options, visit https://groups.google.com/d/optout.

Robert Metzger

unread,
Jun 2, 2014, 9:55:16 AM6/2/14
to stratosp...@googlegroups.com
I would expect the job to finish in a few minutes.
Have you looked into the log files?
As Fabian said, you have to at least set the TaskManager heap size for good performance.
If would suggest to use 80% of the node's memory for the TM's heap space.

Stephan Ewen

unread,
Jun 2, 2014, 10:01:08 AM6/2/14
to stratosp...@googlegroups.com
Please check the following:

  1) How much memory to the TaskManagers get (JVM heap space) The default is very small.
  2) Where is the configures temp directory? Is it a slow default temp directory on an OS partition or to a fast temp space on data disks?

Janani

unread,
Jun 2, 2014, 10:22:10 AM6/2/14
to stratosp...@googlegroups.com
Hi,

yes I configured it. But I changed the default values of the following parameters as,

jobmanager.heap.mb: 1024

taskmanager.heap.mb: 25000

But I didn't change the following two parameters.

taskmanager.network.numberOfBuffers: 2048

taskmanager.network.bufferSizeInBytes: 32768

Thanks,
Janani

Janani

unread,
Jun 2, 2014, 10:31:15 AM6/2/14
to stratosp...@googlegroups.com, se...@apache.org
Hi,

I changed the TaskManagers heap space to 25000 mb. For the second question, do mean the tmp configuration in hadoop core-site.xml. If so here is the configuration settings:

 <name>hadoop.tmp.dir</name>
 <value>/hadoop/hdfs/data/1/user/hadoop/hadoop1.2/tmp-dir/hadoop-${user.name}</value>

Thanks,
Janani

Fabian Hueske

unread,
Jun 2, 2014, 10:33:40 AM6/2/14
to stratosp...@googlegroups.com
The parameters are not too bad.
You could increase the numberOfBuffers to 8192 or so.

This is the tmp dir parameter in stratosphere-conf.yaml.
taskmanager.tmp.dirs: <dir1>:<dir2>:<dir3>:...

<dir1>, <dir2>, and <dir3> are some directories where Stratosphere writes temp files to. So ideally, each directory is on a separate physical disk or all together on a fast RAID.

What's the DOP you are running the job with?

Ufuk Celebi

unread,
Jun 2, 2014, 10:33:57 AM6/2/14
to stratosp...@googlegroups.com, se...@apache.org
The network buffers should be OK.

Do you see any progress in the web interface (port 8081)?

On 02 Jun 2014, at 16:31, Janani <janani...@gmail.com> wrote:
> I changed the TaskManagers heap space to 25000 mb. For the second question, do mean the tmp configuration in hadoop core-site.xml. If so here is the configuration settings:
>
> <name>hadoop.tmp.dir</name>
> <value>/hadoop/hdfs/data/1/user/hadoop/hadoop1.2/tmp-dir/hadoop-${user.name}</value>

No, I think Stephan means the "taskmanager.tmp.dirs" config key in [1] (see line 69).

[1] https://github.com/stratosphere/stratosphere/blob/master/stratosphere-dist/src/main/stratosphere-bin/conf/stratosphere-conf.yaml

Janani

unread,
Jun 2, 2014, 11:21:58 AM6/2/14
to stratosp...@googlegroups.com, se...@apache.org
There is no progress in web interface. Mapper and reducer are still running. Here is the screenshot of it,



Stephan Ewen

unread,
Jun 2, 2014, 6:02:36 PM6/2/14
to stratosp...@googlegroups.com
Can you re-send the picture? I can only see text ;-)


Ufuk Celebi

unread,
Jun 2, 2014, 6:03:52 PM6/2/14
to stratosp...@googlegroups.com
On 03 Jun 2014, at 00:02, Stephan Ewen <se...@apache.org> wrote:

Can you re-send the picture? I can only see text ;-)

Stephan Ewen

unread,
Jun 2, 2014, 6:05:43 PM6/2/14
to stratosp...@googlegroups.com
Do you know what the cluster CPU utilization is? Is it idle or is it doing anything?

Janani

unread,
Jun 5, 2014, 9:49:43 AM6/5/14
to stratosp...@googlegroups.com, se...@apache.org
Hi,

Here I attached the CPU utilization of master node in the cluster while the stratosphere job is running.

After looking into hadoop installations in the cluster we found that it had broken datanodes, So I installed new version of hadoop (2.2.0) and ran the stratosphere job by pointing to the new version. But still its the same problem. The job is never ending :-( (I updated the code with combine function and changed as per Fabian's suggestion)

Regards,
Janani
htop1.PNG
Reply all
Reply to author
Forward
0 new messages