mapper speed - disk speed limits calculated against preset 100MB, not block size

Srinivas Kommalapati

unread,

May 7, 2017, 3:00:09 PM5/7/17

to dr-elephant-users

Dr.Elephant team,

I see following line in HDFSContext.java ('final' and '100' are to be noted in this line)

public static final long DISK_READ_SPEED = 100 * 1024 * 1024;

But when I see MapperSpeedHeuristic.java, I see that disk speed limit is fraction of block size

private double[] diskSpeedLimits = {1d/2, 1d/4, 1d/8, 1d/32}; // Fraction of HDFS block size

I feel that this disk speed limits are calculated against preset 100MB/s...not based on Block size. Please provide rationale for this.

Srinivas Kommalapati

unread,

May 7, 2017, 4:55:26 PM5/7/17

to dr-elephant-users

And also why is that "FILE_BYTES_READ" is not considered while calculating the Mapper Speed?

Akshay Rai

unread,

May 8, 2017, 6:24:09 AM5/8/17

to dr-elephant-users

Hi Srinivas,

Mapper Speed tries to express the effectiveness of the mapper code. This heuristic tries to flag long running jobs which load very less HDFS data (CPU-bound jobs). These jobs are most likely performing unnecessary CPU work.

The threshold values for read speed severity(diskSpeedLimits) should be fractions of the mapper read speed and not HDFS block size. The variables and the comments in the code are a bit misleading.

Note that Mapper Read Speed = HDFS_BYTES_READ / Runtime of Mapper

100 MB/sec is something we set for our cluster. If you think 100 MB/sec is not optimal or doesn't suit your needs (flagging every job as critical), we should tune this value or make this value configurable.