Fwd: Starfish with TPCH

13 views
Skip to first unread message

Amit Sangroya

unread,
Mar 12, 2015, 5:43:01 AM3/12/15
to hadoop-...@googlegroups.com
Hello,

Regarding Starfish Profiler.


I am looking into the job profile created by starfish.

We have average timing for map and reduce in the created job profile.

I have one map and one reduce in my job. Reduce is configured to start after completion of map.

The timing of map + reduce (including subphases) should be equal to the total job response time.

However, when I match this time with the job response time from Hadoop History logs, it is always less than the time available from Hadoop History logs.

Can you suggest how can I co relate both timings.

As an example, I am attaching two files: 

job profile from starfish and

hadoop history log of the same.

There was only one wave of map and reduce and reduce started after finish of map.

Thanks in advance,


--

Best Regards,
Amit

---------- Forwarded message ----------
From: Amit Sangroya <sangro...@gmail.com>
Date: Wed, Mar 11, 2015 at 8:31 PM
Subject: Re: Starfish with TPCH
To: har...@cs.duke.edu
Cc: Shivnath Babu <shiv...@cs.duke.edu>, Herodotos Herodotou <herodotos...@cut.ac.cy>


Hello Herodotos,

Regarding Starfish Profiler.

I always notice that timing values of (map + reduce) in job profile is always less than the job response time from Hadoop history logs. The job response time should match to the total of timing values in the job profile. Am I missing something? 

Thanks in advance, 

--

Best Regards,
Amit

On Wed, Nov 19, 2014 at 5:13 PM, Amit Sangroya <sangro...@gmail.com> wrote:
Hello Everyone,

I am using Starfish to profile hadoop jobs. I am able to do it well. This is a wonderful tool.

However, Now, I need to profile my TPCH queries. I want to know the duration of various sub phases such as read, spill, merge, sort etc. 

I also want to know the data processed in each phase.

I believe that Starfish is able to profile TPCH queries too.

This page has something.

https://github.com/JerryLead/BenchmarkScripts/tree/master/tpch/pig

But, I am not able to find the scripts:exec_tpch.sh, profile_tpch.sh, and optimize_tpch.sh scripts

Where can I find them.

Can you guide me in detail for using Starfish to profile TPCH-Hive queries.

Thanks a lot!!

--

Best Regards,
Amit


profile_job_201502241910_0001.xml
job_201502241910_0001_1424785313024_hadoop_word+count
job_201502241910_0001_conf.xml
Reply all
Reply to author
Forward
0 new messages