Job Queue size continues to grow

23 views
Skip to first unread message

brian spallholtz

unread,
Oct 4, 2017, 6:00:11 PM10/4/17
to dr-elephant-users
Spark version 2.1.0
Hadoop version 2.6.0
Dr E version 2.0.6 

I have tried increasing number of threads, max heap size, 
I currently am running Dr E on a Dell C6630 with 12 cores 125G RAM and bonding 1GBE network 
Eventually the queue size will grow to over 600 jobs 

I currently see no pressure on CPU/RAM/DISK/Network etc on either the SparkHS or DrE server 

Any help would be appreciated 



10-04-2017 19:24:01 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 12
10-04-2017 19:24:31 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 11
10-04-2017 19:25:01 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 11
10-04-2017 19:25:31 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 6
10-04-2017 19:26:01 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 7
10-04-2017 19:26:31 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 5
10-04-2017 19:27:01 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 4
10-04-2017 19:27:31 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 6
10-04-2017 19:28:01 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 10
10-04-2017 19:28:31 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 9
10-04-2017 19:29:01 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 10
10-04-2017 19:29:31 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 11
10-04-2017 19:30:01 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 9
10-04-2017 19:30:32 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 11
10-04-2017 19:31:02 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 12
10-04-2017 19:31:32 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 15
10-04-2017 19:32:02 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 15
10-04-2017 19:32:32 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 13
10-04-2017 19:33:02 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 10
10-04-2017 19:33:32 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 12
10-04-2017 19:34:02 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 13
10-04-2017 19:34:32 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 16
10-04-2017 19:35:02 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 17
10-04-2017 19:35:32 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 21
10-04-2017 19:36:02 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 18
10-04-2017 19:36:32 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 21
10-04-2017 19:37:02 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 25
10-04-2017 19:37:32 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 23
10-04-2017 19:38:02 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 26
10-04-2017 19:38:32 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 28
10-04-2017 19:39:02 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 25
10-04-2017 19:39:32 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 27
10-04-2017 19:40:03 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 28
10-04-2017 19:40:33 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 31
10-04-2017 19:41:03 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 25
10-04-2017 19:41:33 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 28
10-04-2017 19:42:03 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 27
10-04-2017 19:42:33 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 26
10-04-2017 19:43:03 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 22
10-04-2017 19:43:33 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 24
10-04-2017 19:44:03 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 25
10-04-2017 19:44:33 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 28
10-04-2017 19:45:03 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 22
10-04-2017 19:45:33 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 21
10-04-2017 19:46:03 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 25
10-04-2017 19:46:33 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 27
10-04-2017 19:47:03 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 27
10-04-2017 19:47:33 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 29
10-04-2017 19:48:03 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 33
10-04-2017 19:48:34 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 35
10-04-2017 19:49:04 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 31
10-04-2017 19:49:34 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 29
10-04-2017 19:50:04 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 31
10-04-2017 19:50:34 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 33
10-04-2017 19:51:04 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 34
10-04-2017 19:51:34 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 30
10-04-2017 19:52:04 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 23
10-04-2017 19:52:34 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 23
10-04-2017 19:53:04 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 26
10-04-2017 19:53:34 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 28
10-04-2017 19:54:04 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 29
10-04-2017 19:54:34 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 32
10-04-2017 19:55:04 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 33
10-04-2017 19:55:34 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 34
10-04-2017 19:56:04 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 31
10-04-2017 19:56:34 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 32
10-04-2017 19:57:04 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 35
10-04-2017 19:57:34 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 37
10-04-2017 19:58:04 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 30
10-04-2017 19:58:35 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 30
10-04-2017 19:59:05 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 33
10-04-2017 19:59:35 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 34
10-04-2017 20:00:05 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 37
10-04-2017 20:00:35 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 37
10-04-2017 20:01:05 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 40
10-04-2017 20:01:35 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 39
10-04-2017 20:02:05 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 35
10-04-2017 20:02:35 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 36
10-04-2017 20:03:05 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 42
10-04-2017 20:03:35 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 43
10-04-2017 20:04:05 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 43
10-04-2017 20:04:35 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 46
10-04-2017 20:05:05 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 45
10-04-2017 20:05:35 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 34
10-04-2017 20:06:05 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 29
10-04-2017 20:06:35 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 35
10-04-2017 20:07:05 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 35
10-04-2017 20:07:35 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 36
10-04-2017 20:08:06 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 38
10-04-2017 20:08:36 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 39
10-04-2017 20:09:06 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 42
10-04-2017 20:09:36 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 37
10-04-2017 20:10:06 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 27
10-04-2017 20:10:36 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 23
10-04-2017 20:11:06 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 28
10-04-2017 20:11:36 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 34
10-04-2017 20:12:06 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 36
10-04-2017 20:12:36 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 39
10-04-2017 20:13:06 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 43
10-04-2017 20:13:36 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 50
10-04-2017 20:14:06 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 52
10-04-2017 20:14:37 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 56
10-04-2017 20:15:07 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 57
10-04-2017 20:15:37 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 55
10-04-2017 20:16:07 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 58
10-04-2017 20:16:37 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 59
10-04-2017 20:17:07 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 65
10-04-2017 20:17:37 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 69
10-04-2017 20:18:07 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 72
10-04-2017 20:18:37 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 81
10-04-2017 20:19:07 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 83
10-04-2017 20:19:37 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 83
10-04-2017 20:20:07 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 85
10-04-2017 20:20:37 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 83
10-04-2017 20:21:07 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 83
10-04-2017 20:21:37 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 86
10-04-2017 20:22:07 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 91
10-04-2017 20:22:37 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 91
10-04-2017 20:23:07 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 94
10-04-2017 20:23:37 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 95
10-04-2017 20:24:08 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 98
10-04-2017 20:24:38 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 96
10-04-2017 20:25:08 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 98
10-04-2017 20:25:38 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 99
10-04-2017 20:26:08 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 101
10-04-2017 20:26:38 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 102
10-04-2017 20:27:08 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 102
10-04-2017 20:27:38 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 101
10-04-2017 20:28:08 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 105
10-04-2017 20:28:38 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 99
10-04-2017 20:29:08 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 100
10-04-2017 20:29:38 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 100
10-04-2017 20:30:08 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 103
10-04-2017 20:30:38 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 106
10-04-2017 20:31:08 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 109
10-04-2017 20:31:38 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 110
10-04-2017 20:32:08 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 112
10-04-2017 20:32:38 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 112
10-04-2017 20:33:08 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 113
10-04-2017 20:33:39 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 115
10-04-2017 20:34:09 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 116
10-04-2017 20:34:39 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 117
10-04-2017 20:35:09 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 121
10-04-2017 20:35:39 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 126
10-04-2017 20:36:09 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 129
10-04-2017 20:36:39 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 135
10-04-2017 20:37:09 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 138
10-04-2017 20:37:39 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 139
10-04-2017 20:38:09 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 141
10-04-2017 20:38:39 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 142
10-04-2017 20:39:09 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 143
10-04-2017 20:39:39 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 145
10-04-2017 20:40:09 INFO  [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 146

Akshay Rai

unread,
Oct 5, 2017, 7:01:32 AM10/5/17
to dr-elephant-users
Hi Brian,

Use the MapReduceFSFetcherHadoop2 which is much faster at processing jobs. You can configure it in the FetcherConf.xml.

Regards,
Akshay

brian spallholtz

unread,
Oct 5, 2017, 10:28:24 AM10/5/17
to dr-elephant-users
I am already using that {code}<fetchers>
  <fetcher>
    <applicationtype>mapreduce</applicationtype>
    <classname>com.linkedin.drelephant.mapreduce.fetchers.MapReduceFetcherHadoop2</classname>
    <params>
      <sampling_enabled>false</sampling_enabled>
    </params>
  </fetcher>{code}

Akshay Rai

unread,
Oct 7, 2017, 8:29:59 AM10/7/17
to dr-elephant-users
Hi Brian, 

The one you have configured is not the one I am talking about.


Regards,
Akshay
Reply all
Reply to author
Forward
0 new messages