alluxio.security.authorization.permission.umask=000
alluxio.worker.block.heartbeat.timeout.ms=300000
alluxio.worker.filesystem.heartbeat.interval.ms=60000
alluxio.worker.block.heartbeat.interval.ms=60000
Alluxio Summary
Master Address: aa02.prod.com/10.3.3.5:19998
Started: 11-30-2016 05:13:38:281
Uptime: 0 day(s), 2 hour(s), 19 minute(s), and 46 second(s)
Version: 1.3.0
Running Workers: 20
Cluster Usage Summary
Workers Capacity: 2000.00GB
Workers Free / Used: 2000.00GB / 0.00B
UnderFS Capacity: 78.28TB
UnderFS Free / Used: 69.53TB / 8.75TB
Live Workers
Node Name Last Heartbeat State Workers Capacity Space Used Space Usage
10.3.3.10 38 In Service 100.00GB 0.00B 100%Free
10.3.3.11 30 In Service 100.00GB 0.00B 100%Free
10.3.3.12 38 In Service 100.00GB 0.00B 100%Free
10.3.3.13 38 In Service 100.00GB 0.00B 100%Free
10.3.3.19 39 In Service 100.00GB 0.00B 100%Free
10.3.3.20 39 In Service 100.00GB 0.00B 100%Free
10.3.3.21 39 In Service 100.00GB 0.00B 100%Free
10.3.3.23 38 In Service 100.00GB 0.00B 100%Free
10.3.3.73 38 In Service 100.00GB 0.00B 100%Free
10.3.3.74 39 In Service 100.00GB 0.00B 100%Free
10.3.3.75 38 In Service 100.00GB 0.00B 100%Free
10.3.3.76 39 In Service 100.00GB 0.00B 100%Free
10.3.3.77 39 In Service 100.00GB 0.00B 100%Free
10.3.3.83 38 In Service 100.00GB 0.00B 100%Free
10.3.3.84 38 In Service 100.00GB 0.00B 100%Free
10.3.3.85 39 In Service 100.00GB 0.00B 100%Free
10.3.3.86 39 In Service 100.00GB 0.00B 100%Free
10.3.3.87 39 In Service 100.00GB 0.00B 100%Free
10.3.3.9 36 In Service 100.00GB 0.00B 100%Free
ab09.prod.com 38 In Service 100.00GB 0.00B 100%Free
Lost Workers
Node Name Last Heartbeat Workers Capacity
[zk: localhost:2181(CONNECTED) 32] ls /leader [zk: localhost:2181(CONNECTED) 33] get /leader/aa02.prod.com:1999810.3.3.5cZxid = 0x30000082ectime = Wed Nov 30 05:13:20 GMT 2016mZxid = 0x30000082emtime = Wed Nov 30 05:13:20 GMT 2016pZxid = 0x30000082ecversion = 0dataVersion = 0aclVersion = 0ephemeralOwner = 0x0dataLength = 9numChildren = 0[zk: localhost:2181(CONNECTED) 34] get /leader/ab02.prod.com:1999810.3.3.15cZxid = 0x300000826ctime = Wed Nov 30 04:10:15 GMT 2016mZxid = 0x300000826mtime = Wed Nov 30 04:10:15 GMT 2016pZxid = 0x300000826cversion = 0dataVersion = 0aclVersion = 0ephemeralOwner = 0x0dataLength = 10numChildren = 0[zk: localhost:2181(CONNECTED) 35] ls /election [_c_10613ed9-58f1-4633-8ee6-9450b3a991e6-lock-0000000002][zk: localhost:2181(CONNECTED) 36] get /election/_c_10613ed9-58f1-4633-8ee6-9450b3a991e6-lock-0000000002cZxid = 0x30000082bctime = Wed Nov 30 05:12:00 GMT 2016mZxid = 0x30000082bmtime = Wed Nov 30 05:12:00 GMT 2016pZxid = 0x30000082bcversion = 0dataVersion = 0aclVersion = 0ephemeralOwner = 0x15876ca48ee00b3dataLength = 31numChildren = 0[zk: localhost:2181(CONNECTED) 37]
2016-11-30 07:38:57,060 INFO imps.CuratorFrameworkImpl (CuratorFrameworkImpl.java:start) - Starting2016-11-30 07:38:57,076 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT2016-11-30 07:38:57,078 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:host.name=10.3.3.152016-11-30 07:38:57,078 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:java.version=1.7.0_1212016-11-30 07:38:57,078 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:java.vendor=Oracle Corporation2016-11-30 07:38:57,078 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:java.home=/usr/lib/jvm/java-1.7-openjdk/jre2016-11-30 07:38:57,079 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:java.class.path=/usr/local/alluxio-1.3.0/conf/::/usr/local/alluxio-1.3.0/assembly/target/alluxio-assemblies-1.3.0-jar-with-dependencies.jar2016-11-30 07:38:57,079 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib2016-11-30 07:38:57,079 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:java.io.tmpdir=/tmp2016-11-30 07:38:57,079 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:java.compiler=<NA>2016-11-30 07:38:57,079 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:os.name=Linux2016-11-30 07:38:57,079 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:os.arch=amd642016-11-30 07:38:57,079 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:os.version=4.6.3-coreos2016-11-30 07:38:57,079 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:user.name=root2016-11-30 07:38:57,079 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:user.home=/root2016-11-30 07:38:57,079 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:user.dir=/2016-11-30 07:38:57,082 INFO zookeeper.ZooKeeper (ZooKeeper.java:<init>) - Initiating client connection, connectString=zookeeper:2181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@10bc5c92016-11-30 07:38:57,099 INFO zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect) - Opening socket connection to server 192.168.192.20/192.168.192.20:2181. Will not attempt to authenticate using SASL (unknown error)2016-11-30 07:38:57,104 INFO zookeeper.ClientCnxn (ClientCnxn.java:primeConnection) - Socket connection established to 192.168.192.20/192.168.192.20:2181, initiating session2016-11-30 07:38:57,110 INFO zookeeper.ClientCnxn (ClientCnxn.java:onConnected) - Session establishment complete on server 192.168.192.20/192.168.192.20:2181, sessionid = 0x25876ca490b00b7, negotiated timeout = 600002016-11-30 07:38:57,112 INFO zookeeper.ZooKeeper (ZooKeeper.java:close) - Session: 0x25876ca490b00b7 closed2016-11-30 07:38:57,112 INFO zookeeper.ClientCnxn (ClientCnxn.java:run) - EventThread shut down2016-11-30 07:38:57,112 INFO imps.CuratorFrameworkImpl (CuratorFrameworkImpl.java:start) - Starting2016-11-30 07:38:57,112 INFO zookeeper.ZooKeeper (ZooKeeper.java:<init>) - Initiating client connection, connectString=zookeeper:2181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@65c98bf32016-11-30 07:38:57,114 INFO zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect) - Opening socket connection to server 192.168.192.20/192.168.192.20:2181. Will not attempt to authenticate using SASL (unknown error)2016-11-30 07:38:57,114 INFO zookeeper.ClientCnxn (ClientCnxn.java:primeConnection) - Socket connection established to 192.168.192.20/192.168.192.20:2181, initiating session2016-11-30 07:38:57,116 INFO zookeeper.ClientCnxn (ClientCnxn.java:onConnected) - Session establishment complete on server 192.168.192.20/192.168.192.20:2181, sessionid = 0x25876ca490b00b8, negotiated timeout = 600002016-11-30 07:38:57,119 INFO state.ConnectionStateManager (ConnectionStateManager.java:addStateChange) - State change: CONNECTED2016-11-30 07:38:57,120 WARN state.ConnectionStateManager (ConnectionStateManager.java:processEvents) - There are no ConnectionStateListeners registered.2016-11-30 07:38:57,123 INFO logger.type (AbstractMaster.java:stop) - BlockMaster: Stopping standby master.2016-11-30 07:38:57,123 INFO logger.type (AbstractMaster.java:stop) - FileSystemMaster: Stopping standby master.2016-11-30 07:38:57,127 INFO logger.type (AbstractMaster.java:start) - BlockMaster: Starting standby master.2016-11-30 07:38:57,129 INFO logger.type (JournalTailerThread.java:run) - BlockMaster: Journal tailer started.2016-11-30 07:38:57,129 INFO logger.type (JournalTailerThread.java:run) - BlockMaster: Waiting to load the checkpoint file.2016-11-30 07:38:57,136 INFO logger.type (AbstractMaster.java:start) - FileSystemMaster: Starting standby master.2016-11-30 07:38:57,137 INFO logger.type (JournalTailerThread.java:run) - FileSystemMaster: Journal tailer started.2016-11-30 07:38:57,137 INFO logger.type (JournalTailerThread.java:run) - FileSystemMaster: Waiting to load the checkpoint file.2016-11-30 07:38:57,140 INFO logger.type (LeaderSelectorClient.java:stateChanged) - The current leader is aa02.prod.com:199982016-11-30 07:38:57,169 INFO logger.type (JournalTailerThread.java:run) - FileSystemMaster: Start loading the checkpoint file.2016-11-30 07:38:57,169 INFO logger.type (JournalTailer.java:processJournalCheckpoint) - FileSystemMaster: Loading checkpoint file: /mnt/glusterfs/alluxio/journal/FileSystemMaster/checkpoint.data2016-11-30 07:38:57,169 INFO logger.type (JournalReader.java:getCheckpointInputStream) - Opening journal checkpoint file: /mnt/glusterfs/alluxio/journal/FileSystemMaster/checkpoint.data2016-11-30 07:38:57,170 INFO logger.type (JournalTailerThread.java:run) - BlockMaster: Start loading the checkpoint file.2016-11-30 07:38:57,170 INFO logger.type (JournalTailer.java:processJournalCheckpoint) - BlockMaster: Loading checkpoint file: /mnt/glusterfs/alluxio/journal/BlockMaster/checkpoint.data2016-11-30 07:38:57,170 INFO logger.type (JournalReader.java:getCheckpointInputStream) - Opening journal checkpoint file: /mnt/glusterfs/alluxio/journal/BlockMaster/checkpoint.data2016-11-30 07:38:57,202 INFO logger.type (JournalTailerThread.java:run) - BlockMaster: Checkpoint file has been loaded.2016-11-30 07:38:57,218 INFO logger.type (JournalTailerThread.java:run) - FileSystemMaster: Checkpoint file has been loaded.2016-11-30 07:40:10,005 INFO logger.type (LeaderSelectorClient.java:takeLeadership) - deleting zk path: /leader/ab02.prod.com:199982016-11-30 07:40:10,008 INFO logger.type (LeaderSelectorClient.java:takeLeadership) - creating zk path: /leader/ab02.prod.com:199982016-11-30 07:40:10,009 INFO logger.type (LeaderSelectorClient.java:takeLeadership) - ab02.prod.com:19998 is now the leader.2016-11-30 07:40:10,011 INFO logger.type (AbstractMaster.java:stop) - BlockMaster: Stopping standby master.2016-11-30 07:40:10,011 INFO logger.type (JournalTailerThread.java:shutdown) - BlockMaster: Journal tailer shutdown has been initiated.2016-11-30 07:40:15,767 INFO logger.type (JournalTailerThread.java:run) - BlockMaster: Journal tailer has been shutdown. No new logs for the quiet period.2016-11-30 07:40:15,767 INFO logger.type (AbstractMaster.java:stop) - FileSystemMaster: Stopping standby master.2016-11-30 07:40:15,767 INFO logger.type (JournalTailerThread.java:shutdown) - FileSystemMaster: Journal tailer shutdown has been initiated.2016-11-30 07:40:21,845 INFO logger.type (JournalTailerThread.java:run) - FileSystemMaster: Journal tailer has been shutdown. No new logs for the quiet period.2016-11-30 07:40:21,846 INFO logger.type (AbstractMaster.java:start) - BlockMaster: Starting leader master.2016-11-30 07:40:21,847 INFO logger.type (JournalWriter.java:completeAllLogs) - Marking all logs as complete.2016-11-30 07:40:21,871 INFO logger.type (AbstractMaster.java:start) - BlockMaster: finish processing remaining journal entries (standby -> master).2016-11-30 07:40:21,880 INFO logger.type (JournalWriter.java:getCheckpointOutputStream) - Creating tmp checkpoint file: /mnt/glusterfs/alluxio/journal/BlockMaster/checkpoint.data.tmp2016-11-30 07:40:21,880 INFO logger.type (JournalWriter.java:getCheckpointOutputStream) - Latest journal sequence number: 4 Next journal sequence number: 52016-11-30 07:40:21,962 INFO logger.type (JournalWriter.java:close) - Successfully created tmp checkpoint file: /mnt/glusterfs/alluxio/journal/BlockMaster/checkpoint.data.tmp2016-11-30 07:40:22,217 INFO logger.type (JournalWriter.java:close) - Renamed checkpoint file /mnt/glusterfs/alluxio/journal/BlockMaster/checkpoint.data.tmp to /mnt/glusterfs/alluxio/journal/BlockMaster/checkpoint.data2016-11-30 07:40:22,218 INFO logger.type (JournalWriter.java:deleteCompletedLogs) - Deleting all completed log files...2016-11-30 07:40:22,223 INFO logger.type (JournalWriter.java:deleteCompletedLogs) - Finished deleting all completed log files.2016-11-30 07:40:22,260 INFO logger.type (MountTable.java:add) - Mounting /mnt/glusterfs at /2016-11-30 07:40:22,261 INFO logger.type (AbstractMaster.java:start) - FileSystemMaster: Starting leader master.2016-11-30 07:40:22,261 INFO logger.type (JournalWriter.java:completeAllLogs) - Marking all logs as complete.2016-11-30 07:40:23,125 INFO logger.type (JournalWriter.java:completeCurrentLog) - Completed current log: /mnt/glusterfs/alluxio/journal/FileSystemMaster/log.out to completed log: /mnt/glusterfs/alluxio/journal/FileSystemMaster/completed/log.000000000000000000012016-11-30 07:40:23,126 INFO logger.type (AbstractMaster.java:start) - FileSystemMaster: finish processing remaining journal entries (standby -> master).2016-11-30 07:40:23,134 INFO logger.type (JournalReader.java:getNextInputStream) - Opening journal log file: /mnt/glusterfs/alluxio/journal/FileSystemMaster/completed/log.000000000000000000012016-11-30 07:40:23,134 INFO logger.type (JournalTailer.java:processNextJournalLogFiles) - FileSystemMaster: Processing a completed log file.2016-11-30 07:40:23,139 INFO logger.type (JournalTailer.java:processNextJournalLogFiles) - FileSystemMaster: Finished processing the log file.2016-11-30 07:40:23,159 INFO logger.type (JournalWriter.java:getCheckpointOutputStream) - Creating tmp checkpoint file: /mnt/glusterfs/alluxio/journal/FileSystemMaster/checkpoint.data.tmp2016-11-30 07:40:23,159 INFO logger.type (JournalWriter.java:getCheckpointOutputStream) - Latest journal sequence number: 21 Next journal sequence number: 222016-11-30 07:40:23,189 INFO logger.type (JournalWriter.java:close) - Successfully created tmp checkpoint file: /mnt/glusterfs/alluxio/journal/FileSystemMaster/checkpoint.data.tmp2016-11-30 07:40:23,282 INFO logger.type (JournalWriter.java:close) - Renamed checkpoint file /mnt/glusterfs/alluxio/journal/FileSystemMaster/checkpoint.data.tmp to /mnt/glusterfs/alluxio/journal/FileSystemMaster/checkpoint.data2016-11-30 07:40:23,282 INFO logger.type (JournalWriter.java:deleteCompletedLogs) - Deleting all completed log files...2016-11-30 07:40:23,282 INFO logger.type (JournalWriter.java:deleteCompletedLogs) - Deleting completed log: /mnt/glusterfs/alluxio/journal/FileSystemMaster/completed/log.000000000000000000012016-11-30 07:40:23,347 INFO logger.type (JournalWriter.java:deleteCompletedLogs) - Finished deleting all completed log files.2016-11-30 07:40:23,357 INFO logger.type (MetricsSystem.java:startSinksFromConfig) - Starting sinks with config: {}.2016-11-30 07:40:23,366 INFO util.log (Log.java:initialized) - Logging initialized @87036ms2016-11-30 07:40:23,487 INFO server.Server (Server.java:doStart) - jetty-9.2.z-SNAPSHOT2016-11-30 07:40:23,507 INFO handler.ContextHandler (ContextHandler.java:doStart) - Started o.e.j.s.ServletContextHandler@73a1eeaa{/metrics/json,null,AVAILABLE}2016-11-30 07:40:29,353 INFO handler.ContextHandler (ContextHandler.java:doStart) - Started o.e.j.w.WebAppContext@1246406b{/,file:/usr/local/alluxio-1.3.0/core/server/src/main/webapp/,AVAILABLE}{/usr/local/alluxio-1.3.0/core/server/src/main/webapp}2016-11-30 07:40:29,359 INFO server.ServerConnector (AbstractConnector.java:doStart) - Started ServerConnector@680bf396{HTTP/1.1}{0.0.0.0:19999}2016-11-30 07:40:29,359 INFO server.Server (Server.java:doStart) - Started @93029ms2016-11-30 07:40:29,360 INFO logger.type (UIWebServer.java:startWebServer) - Alluxio Master Web service started @ /0.0.0.0:199992016-11-30 07:40:29,360 INFO logger.type (AlluxioMaster.java:startServing) - Alluxio master version 1.3.0 started @ ab02.prod.com/10.3.3.15:19998 (gained leadership)2016-11-30 07:40:29,451 WARN logger.type (BlockMaster.java:workerHeartbeat) - Could not find worker id: 8 for heartbeat.2016-11-30 07:40:29,451 WARN logger.type (BlockMaster.java:workerHeartbeat) - Could not find worker id: 13 for heartbeat.2016-11-30 07:40:29,451 WARN logger.type (BlockMaster.java:workerHeartbeat) - Could not find worker id: 16 for heartbeat.2016-11-30 07:40:29,451 WARN logger.type (BlockMaster.java:workerHeartbeat) - Could not find worker id: 2 for heartbeat.2016-11-30 07:40:29,451 WARN logger.type (BlockMaster.java:workerHeartbeat) - Could not find worker id: 9 for heartbeat.2016-11-30 07:40:29,451 WARN logger.type (BlockMaster.java:workerHeartbeat) - Could not find worker id: 1 for heartbeat.2016-11-30 07:40:29,451 WARN logger.type (BlockMaster.java:workerHeartbeat) - Could not find worker id: 5 for heartbeat.2016-11-30 07:40:29,451 WARN logger.type (BlockMaster.java:workerHeartbeat) - Could not find worker id: 18 for heartbeat.2016-11-30 07:40:29,451 WARN logger.type (BlockMaster.java:workerHeartbeat) - Could not find worker id: 15 for heartbeat.2016-11-30 07:40:29,451 WARN logger.type (BlockMaster.java:workerHeartbeat) - Could not find worker id: 20 for heartbeat.2016-11-30 07:40:29,451 WARN logger.type (BlockMaster.java:workerHeartbeat) - Could not find worker id: 3 for heartbeat.2016-11-30 07:40:29,451 WARN logger.type (BlockMaster.java:workerHeartbeat) - Could not find worker id: 6 for heartbeat.2016-11-30 07:40:29,451 WARN logger.type (BlockMaster.java:workerHeartbeat) - Could not find worker id: 11 for heartbeat.2016-11-30 07:40:29,451 WARN logger.type (BlockMaster.java:workerHeartbeat) - Could not find worker id: 7 for heartbeat.2016-11-30 07:40:29,451 WARN logger.type (BlockMaster.java:workerHeartbeat) - Could not find worker id: 10 for heartbeat.2016-11-30 07:40:29,451 WARN logger.type (BlockMaster.java:workerHeartbeat) - Could not find worker id: 4 for heartbeat.2016-11-30 07:40:29,451 WARN logger.type (BlockMaster.java:workerHeartbeat) - Could not find worker id: 12 for heartbeat.2016-11-30 07:40:29,468 INFO logger.type (BlockMaster.java:getWorkerId) - getWorkerId(): WorkerNetAddress: WorkerNetAddress{host=10.3.3.87, rpcPort=29998, dataPort=29999, webPort=30000} id: 32016-11-30 07:40:29,468 INFO logger.type (BlockMaster.java:getWorkerId) - getWorkerId(): WorkerNetAddress: WorkerNetAddress{host=10.3.3.74, rpcPort=29998, dataPort=29999, webPort=30000} id: 62016-11-30 07:40:29,469 INFO logger.type (BlockMaster.java:getWorkerId) - getWorkerId(): WorkerNetAddress: WorkerNetAddress{host=10.3.3.20, rpcPort=29998, dataPort=29999, webPort=30000} id: 142016-11-30 07:40:29,469 INFO logger.type (BlockMaster.java:getWorkerId) - getWorkerId(): WorkerNetAddress: WorkerNetAddress{host=10.3.3.23, rpcPort=29998, dataPort=29999, webPort=30000} id: 162016-11-30 07:40:29,468 INFO logger.type (BlockMaster.java:getWorkerId) - getWorkerId(): WorkerNetAddress: WorkerNetAddress{host=10.3.3.10, rpcPort=29998, dataPort=29999, webPort=30000} id: 12016-11-30 07:40:29,468 INFO logger.type (BlockMaster.java:getWorkerId) - getWorkerId(): WorkerNetAddress: WorkerNetAddress{host=10.3.3.76, rpcPort=29998, dataPort=29999, webPort=30000} id: 42016-11-30 07:40:29,468 INFO logger.type (BlockMaster.java:getWorkerId) - getWorkerId(): WorkerNetAddress: WorkerNetAddress{host=10.3.3.12, rpcPort=29998, dataPort=29999, webPort=30000} id: 102016-11-30 07:40:29,468 INFO logger.type (BlockMaster.java:getWorkerId) - getWorkerId(): WorkerNetAddress: WorkerNetAddress{host=10.3.3.21, rpcPort=29998, dataPort=29999, webPort=30000} id: 82016-11-30 07:40:29,468 INFO logger.type (BlockMaster.java:getWorkerId) - getWorkerId(): WorkerNetAddress: WorkerNetAddress{host=10.3.3.86, rpcPort=29998, dataPort=29999, webPort=30000} id: 112016-11-30 07:40:29,468 INFO logger.type (BlockMaster.java:getWorkerId) - getWorkerId(): WorkerNetAddress: WorkerNetAddress{host=10.3.3.73, rpcPort=29998, dataPort=29999, webPort=30000} id: 22016-11-30 07:40:29,468 INFO logger.type (BlockMaster.java:getWorkerId) - getWorkerId(): WorkerNetAddress: WorkerNetAddress{host=10.3.3.75, rpcPort=29998, dataPort=29999, webPort=30000} id: 172016-11-30 07:40:29,468 INFO logger.type (BlockMaster.java:getWorkerId) - getWorkerId(): WorkerNetAddress: WorkerNetAddress{host=ab09.prod.com, rpcPort=29998, dataPort=29999, webPort=30000} id: 132016-11-30 07:40:29,468 INFO logger.type (BlockMaster.java:getWorkerId) - getWorkerId(): WorkerNetAddress: WorkerNetAddress{host=10.3.3.85, rpcPort=29998, dataPort=29999, webPort=30000} id: 52016-11-30 07:40:29,469 INFO logger.type (BlockMaster.java:getWorkerId) - getWorkerId(): WorkerNetAddress: WorkerNetAddress{host=10.3.3.83, rpcPort=29998, dataPort=29999, webPort=30000} id: 122016-11-30 07:40:29,469 INFO logger.type (BlockMaster.java:getWorkerId) - getWorkerId(): WorkerNetAddress: WorkerNetAddress{host=10.3.3.84, rpcPort=29998, dataPort=29999, webPort=30000} id: 152016-11-30 07:40:29,469 INFO logger.type (BlockMaster.java:getWorkerId) - getWorkerId(): WorkerNetAddress: WorkerNetAddress{host=10.3.3.19, rpcPort=29998, dataPort=29999, webPort=30000} id: 72016-11-30 07:40:29,469 INFO logger.type (BlockMaster.java:getWorkerId) - getWorkerId(): WorkerNetAddress: WorkerNetAddress{host=10.3.3.77, rpcPort=29998, dataPort=29999, webPort=30000} id: 92016-11-30 07:40:29,477 INFO logger.type (BlockMaster.java:workerRegister) - registerWorker(): MasterWorkerInfo{id=3, workerAddress=WorkerNetAddress{host=10.3.3.87, rpcPort=29998, dataPort=29999, webPort=30000}, capacityBytes=107374182400, usedBytes=0, lastUpdatedTimeMs=1480491629476, blocks=[]}2016-11-30 07:40:29,477 INFO logger.type (BlockMaster.java:workerRegister) - registerWorker(): MasterWorkerInfo{id=5, workerAddress=WorkerNetAddress{host=10.3.3.85, rpcPort=29998, dataPort=29999, webPort=30000}, capacityBytes=107374182400, usedBytes=0, lastUpdatedTimeMs=1480491629476, blocks=[]}2016-11-30 07:40:29,477 INFO logger.type (BlockMaster.java:workerRegister) - registerWorker(): MasterWorkerInfo{id=11, workerAddress=WorkerNetAddress{host=10.3.3.86, rpcPort=29998, dataPort=29999, webPort=30000}, capacityBytes=107374182400, usedBytes=0, lastUpdatedTimeMs=1480491629476, blocks=[]}2016-11-30 07:40:29,477 INFO logger.type (BlockMaster.java:workerRegister) - registerWorker(): MasterWorkerInfo{id=6, workerAddress=WorkerNetAddress{host=10.3.3.74, rpcPort=29998, dataPort=29999, webPort=30000}, capacityBytes=107374182400, usedBytes=0, lastUpdatedTimeMs=1480491629476, blocks=[]}2016-11-30 07:40:29,477 INFO logger.type (BlockMaster.java:workerRegister) - registerWorker(): MasterWorkerInfo{id=2, workerAddress=WorkerNetAddress{host=10.3.3.73, rpcPort=29998, dataPort=29999, webPort=30000}, capacityBytes=107374182400, usedBytes=0, lastUpdatedTimeMs=1480491629476, blocks=[]}2016-11-30 07:40:29,477 INFO logger.type (BlockMaster.java:workerRegister) - registerWorker(): MasterWorkerInfo{id=14, workerAddress=WorkerNetAddress{host=10.3.3.20, rpcPort=29998, dataPort=29999, webPort=30000}, capacityBytes=107374182400, usedBytes=0, lastUpdatedTimeMs=1480491629476, blocks=[]}2016-11-30 07:40:29,477 INFO logger.type (BlockMaster.java:workerRegister) - registerWorker(): MasterWorkerInfo{id=7, workerAddress=WorkerNetAddress{host=10.3.3.19, rpcPort=29998, dataPort=29999, webPort=30000}, capacityBytes=107374182400, usedBytes=0, lastUpdatedTimeMs=1480491629476, blocks=[]}2016-11-30 07:40:29,477 INFO logger.type (BlockMaster.java:workerRegister) - registerWorker(): MasterWorkerInfo{id=1, workerAddress=WorkerNetAddress{host=10.3.3.10, rpcPort=29998, dataPort=29999, webPort=30000}, capacityBytes=107374182400, usedBytes=0, lastUpdatedTimeMs=1480491629476, blocks=[]}2016-11-30 07:40:29,477 INFO logger.type (BlockMaster.java:workerRegister) - registerWorker(): MasterWorkerInfo{id=13, workerAddress=WorkerNetAddress{host=ab09.prod.com, rpcPort=29998, dataPort=29999, webPort=30000}, capacityBytes=107374182400, usedBytes=0, lastUpdatedTimeMs=1480491629476, blocks=[]}2016-11-30 07:40:29,477 INFO logger.type (BlockMaster.java:workerRegister) - registerWorker(): MasterWorkerInfo{id=10, workerAddress=WorkerNetAddress{host=10.3.3.12, rpcPort=29998, dataPort=29999, webPort=30000}, capacityBytes=107374182400, usedBytes=0, lastUpdatedTimeMs=1480491629476, blocks=[]}2016-11-30 07:40:29,477 INFO logger.type (BlockMaster.java:workerRegister) - registerWorker(): MasterWorkerInfo{id=16, workerAddress=WorkerNetAddress{host=10.3.3.23, rpcPort=29998, dataPort=29999, webPort=30000}, capacityBytes=107374182400, usedBytes=0, lastUpdatedTimeMs=1480491629476, blocks=[]}2016-11-30 07:40:29,477 INFO logger.type (BlockMaster.java:workerRegister) - registerWorker(): MasterWorkerInfo{id=4, workerAddress=WorkerNetAddress{host=10.3.3.76, rpcPort=29998, dataPort=29999, webPort=30000}, capacityBytes=107374182400, usedBytes=0, lastUpdatedTimeMs=1480491629476, blocks=[]}2016-11-30 07:40:29,477 INFO logger.type (BlockMaster.java:workerRegister) - registerWorker(): MasterWorkerInfo{id=8, workerAddress=WorkerNetAddress{host=10.3.3.21, rpcPort=29998, dataPort=29999, webPort=30000}, capacityBytes=107374182400, usedBytes=0, lastUpdatedTimeMs=1480491629476, blocks=[]}2016-11-30 07:40:29,477 INFO logger.type (BlockMaster.java:workerRegister) - registerWorker(): MasterWorkerInfo{id=17, workerAddress=WorkerNetAddress{host=10.3.3.75, rpcPort=29998, dataPort=29999, webPort=30000}, capacityBytes=107374182400, usedBytes=0, lastUpdatedTimeMs=1480491629476, blocks=[]}2016-11-30 07:40:29,477 INFO logger.type (BlockMaster.java:workerRegister) - registerWorker(): MasterWorkerInfo{id=9, workerAddress=WorkerNetAddress{host=10.3.3.77, rpcPort=29998, dataPort=29999, webPort=30000}, capacityBytes=107374182400, usedBytes=0, lastUpdatedTimeMs=1480491629476, blocks=[]}2016-11-30 07:40:29,477 INFO logger.type (BlockMaster.java:workerRegister) - registerWorker(): MasterWorkerInfo{id=15, workerAddress=WorkerNetAddress{host=10.3.3.84, rpcPort=29998, dataPort=29999, webPort=30000}, capacityBytes=107374182400, usedBytes=0, lastUpdatedTimeMs=1480491629476, blocks=[]}2016-11-30 07:40:29,477 INFO logger.type (BlockMaster.java:workerRegister) - registerWorker(): MasterWorkerInfo{id=12, workerAddress=WorkerNetAddress{host=10.3.3.83, rpcPort=29998, dataPort=29999, webPort=30000}, capacityBytes=107374182400, usedBytes=0, lastUpdatedTimeMs=1480491629476, blocks=[]}2016-11-30 07:40:39,654 WARN logger.type (BlockMaster.java:workerHeartbeat) - Could not find worker id: 19 for heartbeat.2016-11-30 07:40:39,657 INFO logger.type (BlockMaster.java:getWorkerId) - getWorkerId(): WorkerNetAddress: WorkerNetAddress{host=10.3.3.9, rpcPort=29998, dataPort=29999, webPort=30000} id: 182016-11-30 07:40:39,660 INFO logger.type (BlockMaster.java:workerRegister) - registerWorker(): MasterWorkerInfo{id=18, workerAddress=WorkerNetAddress{host=10.3.3.9, rpcPort=29998, dataPort=29999, webPort=30000}, capacityBytes=107374182400, usedBytes=0, lastUpdatedTimeMs=1480491639659, blocks=[]}2016-11-30 07:42:04,598 INFO logger.type (LeaderInquireClient.java:<init>) - create new zookeeper client. address: zookeeper:21812016-11-30 07:42:04,598 INFO imps.CuratorFrameworkImpl (CuratorFrameworkImpl.java:start) - Starting2016-11-30 07:42:04,599 INFO zookeeper.ZooKeeper (ZooKeeper.java:<init>) - Initiating client connection, connectString=zookeeper:2181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@79111dfc2016-11-30 07:42:04,601 INFO zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect) - Opening socket connection to server 192.168.192.20/192.168.192.20:2181. Will not attempt to authenticate using SASL (unknown error)2016-11-30 07:42:04,601 INFO zookeeper.ClientCnxn (ClientCnxn.java:primeConnection) - Socket connection established to 192.168.192.20/192.168.192.20:2181, initiating session2016-11-30 07:42:04,603 INFO zookeeper.ClientCnxn (ClientCnxn.java:onConnected) - Session establishment complete on server 192.168.192.20/192.168.192.20:2181, sessionid = 0x15876ca48ee00bd, negotiated timeout = 600002016-11-30 07:42:04,604 INFO state.ConnectionStateManager (ConnectionStateManager.java:addStateChange) - State change: CONNECTED2016-11-30 07:42:04,604 WARN state.ConnectionStateManager (ConnectionStateManager.java:processEvents) - There are no ConnectionStateListeners registered.2016-11-30 07:42:04,605 INFO logger.type (LeaderInquireClient.java:getMasterAddress) - Master addresses: [ab02.prod.com:19998, aa02.prod.com:19998]2016-11-30 07:42:04,606 INFO logger.type (LeaderInquireClient.java:getMasterAddress) - The leader master: ab02.prod.com:199982016-11-30 07:42:05,797 INFO logger.type (LeaderInquireClient.java:getMasterAddress) - Master addresses: [ab02.prod.com:19998, aa02.prod.com:19998]2016-11-30 07:42:05,798 INFO logger.type (LeaderInquireClient.java:getMasterAddress) - The leader master: ab02.prod.com:199982016-11-30 07:42:07,778 INFO logger.type (LeaderInquireClient.java:getMasterAddress) - Master addresses: [ab02.prod.com:19998, aa02.prod.com:19998]2016-11-30 07:42:07,779 INFO logger.type (LeaderInquireClient.java:getMasterAddress) - The leader master: ab02.prod.com:199982016-11-30 07:42:10,636 INFO logger.type (LeaderInquireClient.java:getMasterAddress) - Master addresses: [ab02.prod.com:19998, aa02.prod.com:19998]2016-11-30 07:42:10,638 INFO logger.type (LeaderInquireClient.java:getMasterAddress) - The leader master: ab02.prod.com:19998
2016-11-30 07:27:05,015 INFO util.log (Log.java:initialized) - Logging initialized @772ms2016-11-30 07:27:05,317 INFO logger.type (MetricsSystem.java:startSinksFromConfig) - Starting sinks with config: {}.2016-11-30 07:27:05,321 INFO server.Server (Server.java:doStart) - jetty-9.2.z-SNAPSHOT2016-11-30 07:27:05,352 INFO handler.ContextHandler (ContextHandler.java:doStart) - Started o.e.j.s.ServletContextHandler@78d88d3e{/metrics/json,null,AVAILABLE}2016-11-30 07:27:09,642 INFO handler.ContextHandler (ContextHandler.java:doStart) - Started o.e.j.w.WebAppContext@54d9a624{/,file:/usr/local/alluxio-1.3.0/core/server/src/main/webapp/,AVAILABLE}{/usr/local/alluxio-1.3.0/core/server/src/main/webapp}2016-11-30 07:27:09,649 INFO server.ServerConnector (AbstractConnector.java:doStart) - Started ServerConnector@7272be94{HTTP/1.1}{0.0.0.0:30000}2016-11-30 07:27:09,649 INFO server.Server (Server.java:doStart) - Started @5408ms2016-11-30 07:27:09,649 INFO logger.type (UIWebServer.java:startWebServer) - Alluxio worker web service started @ /0.0.0.0:300002016-11-30 07:27:09,652 INFO logger.type (LeaderInquireClient.java:<init>) - create new zookeeper client. address: zookeeper:21812016-11-30 07:27:09,708 INFO imps.CuratorFrameworkImpl (CuratorFrameworkImpl.java:start) - Starting2016-11-30 07:27:09,716 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT2016-11-30 07:27:09,717 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:host.name=10.3.3.862016-11-30 07:27:09,717 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:java.version=1.7.0_1212016-11-30 07:27:09,717 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:java.vendor=Oracle Corporation2016-11-30 07:27:09,717 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:java.home=/usr/lib/jvm/java-1.7-openjdk/jre2016-11-30 07:27:09,717 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:java.class.path=/usr/local/alluxio-1.3.0/conf/::/usr/local/alluxio-1.3.0/assembly/target/alluxio-assemblies-1.3.0-jar-with-dependencies.jar2016-11-30 07:27:09,718 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib2016-11-30 07:27:09,718 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:java.io.tmpdir=/tmp2016-11-30 07:27:09,718 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:java.compiler=<NA>2016-11-30 07:27:09,719 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:os.name=Linux2016-11-30 07:27:09,719 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:os.arch=amd642016-11-30 07:27:09,719 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:os.version=4.6.3-coreos2016-11-30 07:27:09,719 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:user.name=root2016-11-30 07:27:09,719 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:user.home=/root2016-11-30 07:27:09,719 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:user.dir=/2016-11-30 07:27:09,720 INFO zookeeper.ZooKeeper (ZooKeeper.java:<init>) - Initiating client connection, connectString=zookeeper:2181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@318603db2016-11-30 07:27:09,736 INFO zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect) - Opening socket connection to server 192.168.192.20/192.168.192.20:2181. Will not attempt to authenticate using SASL (unknown error)2016-11-30 07:27:09,740 INFO zookeeper.ClientCnxn (ClientCnxn.java:primeConnection) - Socket connection established to 192.168.192.20/192.168.192.20:2181, initiating session2016-11-30 07:27:09,747 INFO zookeeper.ClientCnxn (ClientCnxn.java:onConnected) - Session establishment complete on server 192.168.192.20/192.168.192.20:2181, sessionid = 0x35876ca7b4400cb, negotiated timeout = 600002016-11-30 07:27:09,750 INFO state.ConnectionStateManager (ConnectionStateManager.java:addStateChange) - State change: CONNECTED2016-11-30 07:27:09,750 WARN state.ConnectionStateManager (ConnectionStateManager.java:processEvents) - There are no ConnectionStateListeners registered.2016-11-30 07:27:10,768 INFO logger.type (LeaderInquireClient.java:getMasterAddress) - Master addresses: [ab02.prod.com:19998, aa02.prod.com:19998]2016-11-30 07:27:10,769 INFO logger.type (LeaderInquireClient.java:getMasterAddress) - The leader master: aa02.prod.com:199982016-11-30 07:27:10,771 INFO logger.type (AbstractClient.java:connect) - Alluxio client (version 1.3.0) is trying to connect with BlockMasterWorker master @ aa02.prod.com/10.3.3.5:199982016-11-30 07:27:10,794 INFO logger.type (AbstractClient.java:connect) - Client registered with BlockMasterWorker master @ aa02.prod.com/10.3.3.5:199982016-11-30 07:27:10,837 INFO logger.type (DefaultAlluxioWorker.java:start) - Started Alluxio worker with id 112016-11-30 07:27:10,837 INFO logger.type (DefaultAlluxioWorker.java:start) - Alluxio worker version 1.3.0 started @ /0.0.0.0:299982016-11-30 07:27:10,839 INFO logger.type (LeaderInquireClient.java:getMasterAddress) - Master addresses: [ab02.prod.com:19998, aa02.prod.com:19998]2016-11-30 07:27:10,841 INFO logger.type (LeaderInquireClient.java:getMasterAddress) - The leader master: aa02.prod.com:199982016-11-30 07:27:10,841 INFO logger.type (AbstractClient.java:connect) - Alluxio client (version 1.3.0) is trying to connect with FileSystemMasterWorker master @ aa02.prod.com/10.3.3.5:199982016-11-30 07:27:10,844 INFO logger.type (LeaderInquireClient.java:getMasterAddress) - Master addresses: [ab02.prod.com:19998, aa02.prod.com:19998]2016-11-30 07:27:10,845 INFO logger.type (AbstractClient.java:connect) - Client registered with FileSystemMasterWorker master @ aa02.prod.com/10.3.3.5:199982016-11-30 07:27:10,845 INFO logger.type (LeaderInquireClient.java:getMasterAddress) - The leader master: aa02.prod.com:199982016-11-30 07:27:10,845 INFO logger.type (AbstractClient.java:connect) - Alluxio client (version 1.3.0) is trying to connect with FileSystemMasterWorker master @ aa02.prod.com/10.3.3.5:199982016-11-30 07:27:10,848 INFO logger.type (AbstractClient.java:connect) - Client registered with FileSystemMasterWorker master @ aa02.prod.com/10.3.3.5:199982016-11-30 07:40:10,838 ERROR logger.type (AbstractClient.java:retryRPC) - org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:376) at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:453) at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:435) at org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219) at org.apache.thrift.protocol.TProtocolDecorator.readMessageBegin(TProtocolDecorator.java:135) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77) at alluxio.thrift.BlockMasterWorkerService$Client.recv_heartbeat(BlockMasterWorkerService.java:201) at alluxio.thrift.BlockMasterWorkerService$Client.heartbeat(BlockMasterWorkerService.java:185) at alluxio.worker.block.BlockMasterClient$3.call(BlockMasterClient.java:135) at alluxio.worker.block.BlockMasterClient$3.call(BlockMasterClient.java:132) at alluxio.AbstractClient.retryRPC(AbstractClient.java:317) at alluxio.worker.block.BlockMasterClient.heartbeat(BlockMasterClient.java:132) at alluxio.worker.block.BlockMasterSync.heartbeat(BlockMasterSync.java:148) at alluxio.heartbeat.HeartbeatThread.run(HeartbeatThread.java:82) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)2016-11-30 07:40:10,838 ERROR logger.type (AbstractClient.java:retryRPC) - org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:376) at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:453) at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:435) at org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219) at org.apache.thrift.protocol.TProtocolDecorator.readMessageBegin(TProtocolDecorator.java:135) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77) at alluxio.thrift.FileSystemMasterWorkerService$Client.recv_heartbeat(FileSystemMasterWorkerService.java:162) at alluxio.thrift.FileSystemMasterWorkerService$Client.heartbeat(FileSystemMasterWorkerService.java:148) at alluxio.worker.file.FileSystemMasterClient$3.call(FileSystemMasterClient.java:120) at alluxio.worker.file.FileSystemMasterClient$3.call(FileSystemMasterClient.java:117) at alluxio.AbstractClient.retryRPC(AbstractClient.java:348) at alluxio.worker.file.FileSystemMasterClient.heartbeat(FileSystemMasterClient.java:117) at alluxio.worker.file.FileWorkerMasterSyncExecutor.heartbeat(FileWorkerMasterSyncExecutor.java:88) at alluxio.heartbeat.HeartbeatThread.run(HeartbeatThread.java:82) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)2016-11-30 07:40:10,838 ERROR logger.type (AbstractClient.java:retryRPC) - org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:376) at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:453) at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:435) at org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219) at org.apache.thrift.protocol.TProtocolDecorator.readMessageBegin(TProtocolDecorator.java:135) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77) at alluxio.thrift.FileSystemMasterWorkerService$Client.recv_getPinIdList(FileSystemMasterWorkerService.java:135) at alluxio.thrift.FileSystemMasterWorkerService$Client.getPinIdList(FileSystemMasterWorkerService.java:123) at alluxio.worker.file.FileSystemMasterClient$2.call(FileSystemMasterClient.java:101) at alluxio.worker.file.FileSystemMasterClient$2.call(FileSystemMasterClient.java:98) at alluxio.AbstractClient.retryRPC(AbstractClient.java:317) at alluxio.worker.file.FileSystemMasterClient.getPinList(FileSystemMasterClient.java:98) at alluxio.worker.block.PinListSync.heartbeat(PinListSync.java:55) at alluxio.heartbeat.HeartbeatThread.run(HeartbeatThread.java:82) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)2016-11-30 07:40:10,842 INFO logger.type (LeaderInquireClient.java:getMasterAddress) - Master addresses: [ab02.prod.com:19998, aa02.prod.com:19998]2016-11-30 07:40:10,844 INFO logger.type (LeaderInquireClient.java:getMasterAddress) - The leader master: ab02.prod.com:199982016-11-30 07:40:10,844 INFO logger.type (AbstractClient.java:connect) - Alluxio client (version 1.3.0) is trying to connect with BlockMasterWorker master @ ab02.prod.com/10.3.3.15:199982016-11-30 07:40:10,845 INFO logger.type (LeaderInquireClient.java:getMasterAddress) - Master addresses: [ab02.prod.com:19998, aa02.prod.com:19998]2016-11-30 07:40:10,848 INFO logger.type (LeaderInquireClient.java:getMasterAddress) - The leader master: ab02.prod.com:199982016-11-30 07:40:10,849 INFO logger.type (AbstractClient.java:connect) - Alluxio client (version 1.3.0) is trying to connect with FileSystemMasterWorker master @ ab02.prod.com/10.3.3.15:199982016-11-30 07:40:10,850 INFO logger.type (LeaderInquireClient.java:getMasterAddress) - Master addresses: [ab02.prod.com:19998, aa02.prod.com:19998]2016-11-30 07:40:10,851 INFO logger.type (LeaderInquireClient.java:getMasterAddress) - The leader master: ab02.prod.com:199982016-11-30 07:40:10,851 INFO logger.type (AbstractClient.java:connect) - Alluxio client (version 1.3.0) is trying to connect with FileSystemMasterWorker master @ ab02.prod.com/10.3.3.15:199982016-11-30 07:40:29,437 INFO logger.type (AbstractClient.java:connect) - Client registered with FileSystemMasterWorker master @ ab02.prod.com/10.3.3.15:199982016-11-30 07:40:29,437 INFO logger.type (AbstractClient.java:connect) - Client registered with FileSystemMasterWorker master @ ab02.prod.com/10.3.3.15:199982016-11-30 07:40:29,437 INFO logger.type (AbstractClient.java:connect) - Client registered with BlockMasterWorker master @ ab02.prod.com/10.3.3.15:19998
2016-11-30 07:33:13,689 INFO util.log (Log.java:initialized) - Logging initialized @939ms2016-11-30 07:33:14,001 INFO logger.type (MetricsSystem.java:startSinksFromConfig) - Starting sinks with config: {}.2016-11-30 07:33:14,007 INFO server.Server (Server.java:doStart) - jetty-9.2.z-SNAPSHOT2016-11-30 07:33:14,034 INFO handler.ContextHandler (ContextHandler.java:doStart) - Started o.e.j.s.ServletContextHandler@51211f6d{/metrics/json,null,AVAILABLE}2016-11-30 07:33:18,382 INFO handler.ContextHandler (ContextHandler.java:doStart) - Started o.e.j.w.WebAppContext@5dcac79c{/,file:/usr/local/alluxio-1.3.0/core/server/src/main/webapp/,AVAILABLE}{/usr/local/alluxio-1.3.0/core/server/src/main/webapp}2016-11-30 07:33:18,390 INFO server.ServerConnector (AbstractConnector.java:doStart) - Started ServerConnector@1e62e899{HTTP/1.1}{0.0.0.0:30000}2016-11-30 07:33:18,390 INFO server.Server (Server.java:doStart) - Started @5643ms2016-11-30 07:33:18,391 INFO logger.type (UIWebServer.java:startWebServer) - Alluxio worker web service started @ /0.0.0.0:300002016-11-30 07:33:18,393 INFO logger.type (LeaderInquireClient.java:<init>) - create new zookeeper client. address: zookeeper:21812016-11-30 07:33:18,454 INFO imps.CuratorFrameworkImpl (CuratorFrameworkImpl.java:start) - Starting2016-11-30 07:33:18,464 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT2016-11-30 07:33:18,464 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:host.name=10.3.3.112016-11-30 07:33:18,464 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:java.version=1.7.0_1212016-11-30 07:33:18,464 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:java.vendor=Oracle Corporation2016-11-30 07:33:18,464 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:java.home=/usr/lib/jvm/java-1.7-openjdk/jre2016-11-30 07:33:18,464 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:java.class.path=/usr/local/alluxio-1.3.0/conf/::/usr/local/alluxio-1.3.0/assembly/target/alluxio-assemblies-1.3.0-jar-with-dependencies.jar2016-11-30 07:33:18,466 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib2016-11-30 07:33:18,466 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:java.io.tmpdir=/tmp2016-11-30 07:33:18,466 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:java.compiler=<NA>2016-11-30 07:33:18,466 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:os.name=Linux2016-11-30 07:33:18,466 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:os.arch=amd642016-11-30 07:33:18,466 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:os.version=4.6.3-coreos2016-11-30 07:33:18,466 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:user.name=root2016-11-30 07:33:18,466 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:user.home=/root2016-11-30 07:33:18,467 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:user.dir=/2016-11-30 07:33:18,468 INFO zookeeper.ZooKeeper (ZooKeeper.java:<init>) - Initiating client connection, connectString=zookeeper:2181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@377c0e2a2016-11-30 07:33:18,487 INFO zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect) - Opening socket connection to server 192.168.192.20/192.168.192.20:2181. Will not attempt to authenticate using SASL (unknown error)2016-11-30 07:33:18,491 INFO zookeeper.ClientCnxn (ClientCnxn.java:primeConnection) - Socket connection established to 192.168.192.20/192.168.192.20:2181, initiating session2016-11-30 07:33:18,500 INFO zookeeper.ClientCnxn (ClientCnxn.java:onConnected) - Session establishment complete on server 192.168.192.20/192.168.192.20:2181, sessionid = 0x15876ca48ee00bc, negotiated timeout = 600002016-11-30 07:33:18,503 INFO state.ConnectionStateManager (ConnectionStateManager.java:addStateChange) - State change: CONNECTED2016-11-30 07:33:18,503 WARN state.ConnectionStateManager (ConnectionStateManager.java:processEvents) - There are no ConnectionStateListeners registered.2016-11-30 07:33:19,517 INFO logger.type (LeaderInquireClient.java:getMasterAddress) - Master addresses: [ab02.prod.com:19998, aa02.prod.com:19998]2016-11-30 07:33:19,519 INFO logger.type (LeaderInquireClient.java:getMasterAddress) - The leader master: aa02.prod.com:199982016-11-30 07:33:19,520 INFO logger.type (AbstractClient.java:connect) - Alluxio client (version 1.3.0) is trying to connect with BlockMasterWorker master @ aa02.prod.com/10.3.3.5:199982016-11-30 07:33:19,540 INFO logger.type (AbstractClient.java:connect) - Client registered with BlockMasterWorker master @ aa02.prod.com/10.3.3.5:199982016-11-30 07:33:19,600 INFO logger.type (LeaderInquireClient.java:getMasterAddress) - Master addresses: [ab02.prod.com:19998, aa02.prod.com:19998]2016-11-30 07:33:19,600 INFO logger.type (DefaultAlluxioWorker.java:start) - Started Alluxio worker with id 172016-11-30 07:33:19,600 INFO logger.type (DefaultAlluxioWorker.java:start) - Alluxio worker version 1.3.0 started @ /0.0.0.0:299982016-11-30 07:33:19,602 INFO logger.type (LeaderInquireClient.java:getMasterAddress) - The leader master: aa02.prod.com:199982016-11-30 07:33:19,602 INFO logger.type (AbstractClient.java:connect) - Alluxio client (version 1.3.0) is trying to connect with FileSystemMasterWorker master @ aa02.prod.com/10.3.3.5:199982016-11-30 07:33:19,604 INFO logger.type (AbstractClient.java:connect) - Client registered with FileSystemMasterWorker master @ aa02.prod.com/10.3.3.5:199982016-11-30 07:33:19,605 INFO logger.type (LeaderInquireClient.java:getMasterAddress) - Master addresses: [ab02.prod.com:19998, aa02.prod.com:19998]2016-11-30 07:33:19,607 INFO logger.type (LeaderInquireClient.java:getMasterAddress) - The leader master: aa02.prod.com:199982016-11-30 07:33:19,607 INFO logger.type (AbstractClient.java:connect) - Alluxio client (version 1.3.0) is trying to connect with FileSystemMasterWorker master @ aa02.prod.com/10.3.3.5:199982016-11-30 07:33:19,609 INFO logger.type (AbstractClient.java:connect) - Client registered with FileSystemMasterWorker master @ aa02.prod.com/10.3.3.5:199982016-11-30 07:40:19,599 ERROR logger.type (AbstractClient.java:retryRPC) - org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:376) at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:453) at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:435) at org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219) at org.apache.thrift.protocol.TProtocolDecorator.readMessageBegin(TProtocolDecorator.java:135) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77) at alluxio.thrift.BlockMasterWorkerService$Client.recv_heartbeat(BlockMasterWorkerService.java:201) at alluxio.thrift.BlockMasterWorkerService$Client.heartbeat(BlockMasterWorkerService.java:185) at alluxio.worker.block.BlockMasterClient$3.call(BlockMasterClient.java:135) at alluxio.worker.block.BlockMasterClient$3.call(BlockMasterClient.java:132) at alluxio.AbstractClient.retryRPC(AbstractClient.java:317) at alluxio.worker.block.BlockMasterClient.heartbeat(BlockMasterClient.java:132) at alluxio.worker.block.BlockMasterSync.heartbeat(BlockMasterSync.java:148) at alluxio.heartbeat.HeartbeatThread.run(HeartbeatThread.java:82) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)2016-11-30 07:40:19,601 ERROR logger.type (AbstractClient.java:retryRPC) - org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:376) at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:453) at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:435) at org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219) at org.apache.thrift.protocol.TProtocolDecorator.readMessageBegin(TProtocolDecorator.java:135) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77) at alluxio.thrift.FileSystemMasterWorkerService$Client.recv_getPinIdList(FileSystemMasterWorkerService.java:135) at alluxio.thrift.FileSystemMasterWorkerService$Client.getPinIdList(FileSystemMasterWorkerService.java:123) at alluxio.worker.file.FileSystemMasterClient$2.call(FileSystemMasterClient.java:101) at alluxio.worker.file.FileSystemMasterClient$2.call(FileSystemMasterClient.java:98) at alluxio.AbstractClient.retryRPC(AbstractClient.java:317) at alluxio.worker.file.FileSystemMasterClient.getPinList(FileSystemMasterClient.java:98) at alluxio.worker.block.PinListSync.heartbeat(PinListSync.java:55) at alluxio.heartbeat.HeartbeatThread.run(HeartbeatThread.java:82) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)2016-11-30 07:40:19,600 ERROR logger.type (AbstractClient.java:retryRPC) - org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:376) at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:453) at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:435) at org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219) at org.apache.thrift.protocol.TProtocolDecorator.readMessageBegin(TProtocolDecorator.java:135) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77) at alluxio.thrift.FileSystemMasterWorkerService$Client.recv_heartbeat(FileSystemMasterWorkerService.java:162) at alluxio.thrift.FileSystemMasterWorkerService$Client.heartbeat(FileSystemMasterWorkerService.java:148) at alluxio.worker.file.FileSystemMasterClient$3.call(FileSystemMasterClient.java:120) at alluxio.worker.file.FileSystemMasterClient$3.call(FileSystemMasterClient.java:117) at alluxio.AbstractClient.retryRPC(AbstractClient.java:348) at alluxio.worker.file.FileSystemMasterClient.heartbeat(FileSystemMasterClient.java:117) at alluxio.worker.file.FileWorkerMasterSyncExecutor.heartbeat(FileWorkerMasterSyncExecutor.java:88) at alluxio.heartbeat.HeartbeatThread.run(HeartbeatThread.java:82) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)2016-11-30 07:40:19,603 INFO logger.type (LeaderInquireClient.java:getMasterAddress) - Master addresses: [ab02.prod.com:19998, aa02.prod.com:19998]2016-11-30 07:40:19,605 INFO logger.type (LeaderInquireClient.java:getMasterAddress) - The leader master: ab02.prod.com:199982016-11-30 07:40:19,606 INFO logger.type (AbstractClient.java:connect) - Alluxio client (version 1.3.0) is trying to connect with BlockMasterWorker master @ ab02.prod.com/10.3.3.15:199982016-11-30 07:40:19,606 INFO logger.type (LeaderInquireClient.java:getMasterAddress) - Master addresses: [ab02.prod.com:19998, aa02.prod.com:19998]2016-11-30 07:40:19,608 INFO logger.type (LeaderInquireClient.java:getMasterAddress) - The leader master: ab02.prod.com:199982016-11-30 07:40:19,608 INFO logger.type (AbstractClient.java:connect) - Alluxio client (version 1.3.0) is trying to connect with FileSystemMasterWorker master @ ab02.prod.com/10.3.3.15:199982016-11-30 07:40:19,609 INFO logger.type (LeaderInquireClient.java:getMasterAddress) - Master addresses: [ab02.prod.com:19998, aa02.prod.com:19998]2016-11-30 07:40:19,611 INFO logger.type (LeaderInquireClient.java:getMasterAddress) - The leader master: ab02.prod.com:199982016-11-30 07:40:19,611 INFO logger.type (AbstractClient.java:connect) - Alluxio client (version 1.3.0) is trying to connect with FileSystemMasterWorker master @ ab02.prod.com/10.3.3.15:199982016-11-30 07:40:32,307 INFO logger.type (AbstractClient.java:connect) - Client registered with FileSystemMasterWorker master @ ab02.prod.com/10.3.3.15:199982016-11-30 07:40:32,307 INFO logger.type (AbstractClient.java:connect) - Client registered with BlockMasterWorker master @ ab02.prod.com/10.3.3.15:199982016-11-30 07:40:33,443 INFO logger.type (AbstractClient.java:connect) - Client registered with FileSystemMasterWorker master @ ab02.prod.com/10.3.3.15:19998
Alluxio Summary
Master Address: ab02.prod.com/10.3.3.15:19998Started: 11-30-2016 07:40:29:381Uptime: 0 day(s), 0 hour(s), 3 minute(s), and 16 second(s)Version: 1.3.0Running Workers: 18Cluster Usage Summary
Workers Capacity: 1800.00GBWorkers Free / Used: 1800.00GB / 0.00BUnderFS Capacity: 78.28TBUnderFS Free / Used: 69.52TB / 8.76TB
Live Workers
Node Name Last Heartbeat State Workers Capacity Space Used Space Usage10.3.3.10 58 In Service 100.00GB 0.00B 100%Free10.3.3.12 57 In Service 100.00GB 0.00B 100%Free10.3.3.19 58 In Service 100.00GB 0.00B 100%Free10.3.3.20 57 In Service 100.00GB 0.00B 100%Free10.3.3.21 58 In Service 100.00GB 0.00B 100%Free10.3.3.23 57 In Service 100.00GB 0.00B 100%Free10.3.3.73 57 In Service 100.00GB 0.00B 100%Free10.3.3.74 58 In Service 100.00GB 0.00B 100%Free10.3.3.75 49 In Service 100.00GB 0.00B 100%Free10.3.3.76 58 In Service 100.00GB 0.00B 100%Free10.3.3.77 58 In Service 100.00GB 0.00B 100%Free10.3.3.83 58 In Service 100.00GB 0.00B 100%Free10.3.3.84 57 In Service 100.00GB 0.00B 100%Free10.3.3.85 58 In Service 100.00GB 0.00B 100%Free10.3.3.86 58 In Service 100.00GB 0.00B 100%Free10.3.3.87 58 In Service 100.00GB 0.00B 100%Free10.3.3.9 55 In Service 100.00GB 0.00B 100%Free
Lost Workers
Node Name Last Heartbeat Workers Capacity
[zk: localhost:2181(CONNECTED) 37] ls /leader [zk: localhost:2181(CONNECTED) 38] get /leader/aa02.prod.com:19998 10.3.3.5cZxid = 0x30000082ectime = Wed Nov 30 05:13:20 GMT 2016mZxid = 0x30000082emtime = Wed Nov 30 05:13:20 GMT 2016pZxid = 0x30000082ecversion = 0dataVersion = 0aclVersion = 0ephemeralOwner = 0x0dataLength = 9numChildren = 0[zk: localhost:2181(CONNECTED) 39] get /leader/ab02.prod.com:19998 10.3.3.15cZxid = 0x300000860ctime = Wed Nov 30 07:40:10 GMT 2016mZxid = 0x300000860mtime = Wed Nov 30 07:40:10 GMT 2016pZxid = 0x300000860cversion = 0dataVersion = 0aclVersion = 0ephemeralOwner = 0x0dataLength = 10numChildren = 0[zk: localhost:2181(CONNECTED) 40] ls /election [_c_7f58be2b-bf1c-4c0f-bf7c-221d8f95485b-lock-0000000003][zk: localhost:2181(CONNECTED) 41] get /election/_c_7f58be2b-bf1c-4c0f-bf7c-221d8f95485b-lock-0000000003cZxid = 0x30000085dctime = Wed Nov 30 07:38:57 GMT 2016mZxid = 0x30000085dmtime = Wed Nov 30 07:38:57 GMT 2016pZxid = 0x30000085dcversion = 0dataVersion = 0aclVersion = 0ephemeralOwner = 0x25876ca490b00b8dataLength = 31numChildren = 0[zk: localhost:2181(CONNECTED) 42]
2016-12-01 11:19:49,000 INFO logger.type (AbstractClient.java:connect) - Client registered with FileSystemMasterWorker master @ ab02.prod.com/10.3.3.15:199982016-12-01 11:19:49,004 INFO logger.type (AbstractClient.java:connect) - Client registered with FileSystemMasterWorker master @ ab02.prod.com/10.3.3.15:199982016-12-01 11:19:49,018 WARN logger.type (SleepingTimer.java:tick) - Worker Pin List Sync last execution took 26462 ms. Longer than the interval 10002016-12-01 11:19:49,027 WARN logger.type (SleepingTimer.java:tick) - Worker FileSystemMaster Sync last execution took 27382 ms. Longer than the interval 1000
There is a bug in the worker id management. Here is what happens.
BlockMaster maintains two queues: 1. the list of workers that have registered with the master: mWorkers (https://github.com/Alluxio/alluxio/blob/7c5bd6c101cd699c0da204c036878ad56c3615cf/core/server/src/main/java/alluxio/master/block/BlockMaster.java#L147 ) 2. the list of lost workers that have lost communication with the master, mLostWorkers.
1. Each worker when it first starts up, it will register with the master, and be put into the mWorkers list.
2. Each worker will heartbeat with the master, if master does not get the heartbeat from its worker for a configurable timeout, it will move the worker from mWorkers list to the mLostWorkers list
3. When the lost worker resumes sending heartbeat to the master, the master will move it from mLostWorkers to mWorkers
The above logic is quite straightforward. Each worker is also assigned a worker id. The id is the index of the worker in the mWorker list. The id is sent back from master to worker. so the worker knows what workerid it is assigned to. The worker will use the id to send heartbeat to master, and the master will use the worker id to keep track of which worker is alive, which is lost.
The mWorkers list and mLostWorkers list is *NOT* persisted. Hence, when a new master becomes leader, the new master does *NOT* know which worker has what id. It starts with an empty mWorkers and mLostWorkers list.
Now is the sequence of events that lead to the problem.
1. Start two masters, master A, and master B. master A is leader, master B is standby
2. Start 20 workers
3. the 20 workers get the leader master's address from zookeeper
4. the 20 workers register with the lead master A
5. master A has the 20 workers in into mWorkers list, each worker has got a worker id, assigned by master A
6. the 20 workers send heartbeat to the master periodically
7. kill master A
8. master B become leader
9. the 20 workers get the new leader master's address from zookeeper
10. the 20 workers send heartbeat to the master B
11. master B' mWorkers list is empty when it first take over the leadership, so in handling the worker's heartbeat, it will print out "Could not find worker id: {} for heartbeat."(https://github.com/Alluxio/alluxio/blob/7c5bd6c101cd699c0da204c036878ad56c3615cf/core/server/src/main/java/alluxio/master/block/BlockMaster.java#L674). After that, the master will ask the worker to re-register, and the worker will be assigned a new workerid from the new master.
12. However, after some workers (say 17) have re-registered with the new master, the new master B's mWorkers list is no longer empty. and when last three workers that heart beat with the new master, chances are the three worker's old worker id (id assigned by the prior master) is between 1 and 17, hence when the new master receives the heartbeat from the last three workers, it would think they have already registered. Hence, the new master would *NOT* ask the last three workers to re-register, and the last three workers would be permanently lost. The new master does *NOT* keep track of them, the new master does *NOT* know they exist, even though they continuously heartbeat with the master. The last three workers still think they are registered with the master, with the old worker id.
The bug is in this line below. It checks whether the worker id is present in mWorkers, if it does, it will think the worker has registered. it does not check whether the worker that sends the heart beat is the same worker that has registered with the master with that id.
MasterWorkerInfo worker = mWorkers.getFirstByField(ID_INDEX, workerId);(https://github.com/Alluxio/alluxio/blob/7c5bd6c101cd699c0da204c036878ad56c3615cf/core/server/src/main/java/alluxio/master/block/BlockMaster.java#L672)