'Incompatible shuffle request version' in Scoobi app running on CDH5

449 views
Skip to first unread message

Graham Lea

unread,
Dec 6, 2013, 4:46:24 PM12/6/13
to scoobi...@googlegroups.com
Hi guys,

I've hit another snag with CDH 5 and Scoobi 0.8.0-hadoop2.

Running my job, I get the following error coming out of a task:

Incompatible shuffle request version

I found the Hadoop source code in CDH5 that generates this:

hadoop-2.2.0-cdh5.0.0-beta-1/src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java

    @Override
    public void messageReceived(ChannelHandlerContext ctx, MessageEvent evt)
        throws Exception {
      HttpRequest request = (HttpRequest) evt.getMessage();
      if (request.getMethod() != GET) {
          sendError(ctx, METHOD_NOT_ALLOWED);
          return;
      }
      // Check whether the shuffle version is compatible
      if (!ShuffleHeader.DEFAULT_HTTP_HEADER_NAME.equals(
          request.getHeader(ShuffleHeader.HTTP_HEADER_NAME))
          || !ShuffleHeader.DEFAULT_HTTP_HEADER_VERSION.equals(
              request.getHeader(ShuffleHeader.HTTP_HEADER_VERSION))) {
        sendError(ctx, "Incompatible shuffle request version", BAD_REQUEST);
      }
      ...

The constants are defined here:

hadoop-2.2.0-cdh5.0.0-beta-1/src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/ShuffleHeader.java

@InterfaceAudience.Private
@InterfaceStability.Stable
public class ShuffleHeader implements Writable {

  /** Header info of the shuffle http request/response */
  public static final String HTTP_HEADER_NAME = "name";
  public static final String DEFAULT_HTTP_HEADER_NAME = "mapreduce";
  public static final String HTTP_HEADER_VERSION = "version";
  public static final String DEFAULT_HTTP_HEADER_VERSION = "1.0.0";

I checked my fat jar to ensure I didn't have a different copy of anything matching Shuffle* in there, so it should all be vanilla CDH5 classes running this.

Has anyone sent this before?
Is it likely to be a Scoobi issue, a CDH5 issue, a Hadoop MR2 issue or a problem in my ScoobiApp?
My first suspicion is that it's to with running Scoobi, written against the MR1 APIs, using MR2, but I don't know where I would look to prove/disprove that.

Any help or pointers in the right direction would be very welcome.

Thanks,

Graham.

Eric Torreborre

unread,
Jan 22, 2014, 9:26:47 PM1/22/14
to scoobi...@googlegroups.com
Hi Graham,

Did you solve this issue?

I'm currently only working with the org.apache.hadoop.hadoop-*-2.2.0 jars and I haven't seen this issue in my tests.

Eric.

Eric Torreborre

unread,
Mar 20, 2014, 6:42:58 PM3/20/14
to scoobi...@googlegroups.com
Hi Graham,

I think I hit the same issue yesterday. The situation was the following:

 - client machine with CDH5
 - cluster with EMR/hadoop 2

I was using scoobi 0.9.0-SNAPSHOT and got the same error. When I replaced with 0.9.0-cdh5-SNAPSHOT, things worked fine.

Eric.

On Saturday, December 7, 2013 8:46:24 AM UTC+11, Graham Lea wrote:

Patrick Grandjean

unread,
Jun 6, 2014, 10:55:30 AM6/6/14
to scoobi...@googlegroups.com
Hi Eric,

I have been trying to use 0.8.4-cdh5 version with hadoop 2.3.0. It fixed the "Incompatible shuffle request version" problem, but a new one arised. Jobs are failed because of the following exception:

Application application_1401293847912_0026 failed 2 times due to AM Container for appattempt_1401293847912_0026_000002 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException:
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
at org.apache.hadoop.util.Shell.run(Shell.java:418)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Container exited with a non-zero exit code 1
.Failing this attempt.. Failing the application.

I could not find 0.9.0-cdh5-SNAPSHOT, but I compiled locally both scoobi and scoobi-compatibility-cdh5 with hadoop 2.3.0 libraries. The error is still the same. Would you have any idea why it is failing?

Thanks,
Patrick.

Anwar Rizal

unread,
Jun 10, 2014, 7:50:28 AM6/10/14
to scoobi...@googlegroups.com
Hi all,

The problem that Patrick and I are actually simple. We would like to work with CDH 5.  I'm wondering if there is a documentation on how to work with CDH5, since it's quite hard to understand what to do with CDH5. 
The scoobi-compatibility-cdh5 is there, but I'm not sure how to use those.

Any clue ?

Best regards,
Anwar Rizal.


--
You received this message because you are subscribed to the Google Groups "scoobi-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scoobi-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Patrick Grandjean

unread,
Jun 10, 2014, 9:35:14 AM6/10/14
to scoobi...@googlegroups.com
Hi Anwar,

I compiled scoobi and scoobi-compatibility-cdh5 with libraries available on CDH repository. I used version 2.3.0-cdh5.0.0 and avro 1.7.5-cdh5.0.0. Now seems to be working correctly.

KR,
Patrick.

Eric Torreborre

unread,
Jun 10, 2014, 7:58:14 PM6/10/14
to scoobi...@googlegroups.com
Hi Anwar,

The new "scoobi-compatibility-cdh5" jar contains all the cdh5 specific dependencies as well as the necessary code changes to account for the API changes in CDH5.

So if you add scoobi + scoobi-compatibility-cdh5 as dependencies in your project and also make sure you have a clean state (no libraries uploaded to your cluster) then things should be good.

Eric.

Eric Torreborre

unread,
Jul 16, 2014, 8:25:11 PM7/16/14
to scoobi...@googlegroups.com
More feedback about this.

I had the same issue yesterday and it turns out that it's been fixed by just fixing some time-outs I had in my code. 

So either:

 - this error is the result of Hadoop being lost when overwhelmed
 - this code triggering the error was not executed as part of the "normal" operations

Eric.

Eric Torreborre

unread,
Jul 16, 2014, 8:25:48 PM7/16/14
to scoobi...@googlegroups.com
And by the way it was with CDH5 on the client and hadoop 2.2.0 on the cluster.

Amit Jaiswal

unread,
Jul 18, 2014, 3:56:01 PM7/18/14
to scoobi...@googlegroups.com
Hi,

I am facing the same issue of 'Incompatible shuffle request version' with scoobi-0.8.5 + hadoop-2.2.0.2.0.6.0-61 (part of HDP 2.2). The error is somewhat misleading too because on jobtracker, the tasks failed with a different exception and its not clear whether the two errors are related or not. Is there any hadoop compatiblity jar for HDP that fixes this issue?

Container [pid=7281,containerID=container_1403048840935_232029_01_001073] is running beyond virtual memory limits. Current usage: 345.6 MB of 2.8 GB physical memory used; 8.9 GB of 5.8 GB virtual memory used. Killing container. Dump of the process-tree for container_1403048840935_232029_01_001073 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 7382 7281 7281 7281 (java) 742 14961 4782673920 44094 /usr/java/latest/bin/java ...

Thanks,
Amit

Eric Torreborre

unread,
Jul 18, 2014, 7:35:02 PM7/18/14
to scoobi...@googlegroups.com
Your log message seems to indicate that this error only occurs when the container is running into trouble. I would be worth checking if the error disappears if we run the same program, in the same conditions with hadoop 2.3.0 on the cluster.

Amit Jaiswal

unread,
Jul 18, 2014, 7:46:37 PM7/18/14
to scoobi...@googlegroups.com
The job is run on a shared cluster and will be upgraded to hadoop 2.4 sometime in future. In the meantime is there anything else that can be tried out ?

-regards
Amit

Russell Aronson

unread,
Jul 18, 2014, 8:20:49 PM7/18/14
to scoobi...@googlegroups.com
Hi Amit,

I think the shuffle request version error is misleading. I have only ever seen it before when there is another error which is the actual the cause. In this case the other error you mentioned is the container running out of memory ("8.9 GB of 5.8 GB virtual memory used. Killing container"). You need to have a buffer between max container size and java heap, as the jvm will go over the max heap and can cause the container to get killed. Here is an example of the settings we use:

-Dmapreduce.map.memory.mb=3000 -Dmapreduce.reduce.memory.mb=3000 -Dmapreduce.map.java.opts=-Xmx1536M -Dmapreduce.reduce.java.opts=-Xmx1536M


Hope that helps

cheers

Russell


On Saturday, 19 July 2014 9:46 AM, Amit Jaiswal <amit.j...@gmail.com> wrote:


The job is run on a shared cluster and will be upgraded to hadoop 2.4 sometime in future. In the meantime is there anything else that can be tried out ?

-regards
Amit

On Friday, July 18, 2014 4:35:02 PM UTC-7, Eric Torreborre wrote:
Your log message seems to indicate that this error only occurs when the container is running into trouble. I would be worth checking if the error disappears if we run the same program, in the same conditions with hadoop 2.3.0 on the cluster.

On Saturday, July 19, 2014 5:56:01 AM UTC+10, Amit Jaiswal wrote:
Hi,

I am facing the same issue of 'Incompatible shuffle request version' with scoobi-0.8.5 + hadoop-2.2.0.2.0.6.0-61 (part of HDP 2.2). The error is somewhat misleading too because on jobtracker, the tasks failed with a different exception and its not clear whether the two errors are related or not. Is there any hadoop compatiblity jar for HDP that fixes this issue?

Container [pid=7281,containerID= container_1403048840935_ 232029_01_001073] is running beyond virtual memory limits. Current usage: 345.6 MB of 2.8 GB physical memory used; 8.9 GB of 5.8 GB virtual memory used. Killing container. Dump of the process-tree for container_1403048840935_ 232029_01_001073 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 7382 7281 7281 7281 (java) 742 14961 4782673920 44094 /usr/java/latest/bin/java ...
Application application_1401293847912_0026 failed 2 times due to AM Container for appattempt_1401293847912_0026_ 000002 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ ExitCodeException:
org.apache.hadoop.util.Shell$ ExitCodeException:
at org.apache.hadoop.util.Shell. runCommand(Shell.java:505)
at org.apache.hadoop.util.Shell. run(Shell.java:418)
at org.apache.hadoop.util.Shell$ ShellCommandExecutor.execute( Shell.java:650)
at org.apache.hadoop.yarn.server. nodemanager. DefaultContainerExecutor. launchContainer( DefaultContainerExecutor.java: 195)
at org.apache.hadoop.yarn.server. nodemanager.containermanager. launcher.ContainerLaunch.call( ContainerLaunch.java:300)
at org.apache.hadoop.yarn.server. nodemanager.containermanager. launcher.ContainerLaunch.call( ContainerLaunch.java:81)
at java.util.concurrent. FutureTask.run(FutureTask. java:262)
at java.util.concurrent. ThreadPoolExecutor.runWorker( ThreadPoolExecutor.java:1145)
at java.util.concurrent. ThreadPoolExecutor$Worker.run( ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread. java:744)
Container exited with a non-zero exit code 1
.Failing this attempt.. Failing the application.

I could not find 0.9.0-cdh5-SNAPSHOT, but I compiled locally both scoobi and scoobi-compatibility-cdh5 with hadoop 2.3.0 libraries. The error is still the same. Would you have any idea why it is failing?

Thanks,
Patrick.



On Thursday, March 20, 2014 11:42:58 PM UTC+1, Eric Torreborre wrote:
Hi Graham,

I think I hit the same issue yesterday. The situation was the following:

 - client machine with CDH5
 - cluster with EMR/hadoop 2

I was using scoobi 0.9.0-SNAPSHOT and got the same error. When I replaced with 0.9.0-cdh5-SNAPSHOT, things worked fine.

Eric.

On Saturday, December 7, 2013 8:46:24 AM UTC+11, Graham Lea wrote:
Hi guys,

I've hit another snag with CDH 5 and Scoobi 0.8.0-hadoop2.

Running my job, I get the following error coming out of a task:

Incompatible shuffle request version

I found the Hadoop source code in CDH5 that generates this:

hadoop-2.2.0-cdh5.0.0-beta-1/s rc/hadoop-mapreduce-project/ha doop-mapreduce-client/hadoop- mapreduce-client-shuffle/src/ main/java/org/apache/hadoop/ mapred/ShuffleHandler.java

    @Override
    public void messageReceived(ChannelHandler Context ctx, MessageEvent evt)

Amit Jaiswal

unread,
Jul 30, 2014, 5:01:49 PM7/30/14
to scoobi...@googlegroups.com, russell...@yahoo.co.uk
Hi Russel,

I tried the configuration but it didn't help. I also tried increasing the number of reducers (from 300 determined by scoobi to 1500). But the reducers then get stuck at 33% instead of container being killed. Couple of other folks in my team are also facing the same issue. Are there any other suggestions?

Thanks,
Amit

Kevin C

unread,
Sep 11, 2014, 6:27:36 PM9/11/14
to scoobi...@googlegroups.com, russell...@yahoo.co.uk
Have you guys resolved this problem? I'm running EMR with Hadoop 2.2.0 and tried AMI 3.0.3 and 3.0.4. I'm getting this message:

[INFO] MapReduceJob - MapReduce job 'job_1410473527303_0001' submitted. Please see http://10.63.162.32:9046/proxy/application_1410473527303_0001/ for more info.
[INFO] Step - Task attempt 'attempt_1410473527303_0001_m_000000_0' failed! Trying again. Please see http://ip-10-63-162-32.ec2.internal:13562/tasklog?attemptid=attempt_1410473527303_0001_m_000000_0&all=true for task attempt logs
[INFO] Step - Task attempt 'attempt_1410473527303_0001_m_000000_1' failed! Trying again. Please see http://ip-10-63-162-32.ec2.internal:13562/tasklog?attemptid=attempt_1410473527303_0001_m_000000_1&all=true for task attempt logs
[INFO] Step - Task attempt 'attempt_1410473527303_0001_m_000000_2' failed! Trying again. Please see http://ip-10-63-162-32.ec2.internal:13562/tasklog?attemptid=attempt_1410473527303_0001_m_000000_2&all=true for task attempt logs
[ERROR] Step - Task 'task_1410473527303_0001_m_000000' failed! Please see http://ip-10-63-162-32.ec2.internal:13562/tasklog?attemptid=attempt_1410473527303_0001_m_000000_3&all=true for task attempt logs

Going to that URL, regardless of path and parameters, it says:
"Incompatible shuffle request version"

Kevin C

unread,
Sep 11, 2014, 7:45:31 PM9/11/14
to scoobi...@googlegroups.com, russell...@yahoo.co.uk
I fixed a bug in my Scoobi job where it worked locally but failed remotely. Going into the Hadoop Job History, I see that my script was causing a Nullpointer exception, and it has nothing to do with the weird "Incompatible shuffle request version."
After my fix, EMR works perfectly.
Reply all
Reply to author
Forward
0 new messages