Yes, that’s also what I gathered from the source code comments. That’s why I asked explicitly "if we should increase hdfs.fetcher.buffer.size when using hdfs”.
Btw. I tested already with hdfs and it’s not working in our case: the fetching finishes “successfully” immediately after it started and the swap is failing because the fetch directory is not even created. The somewhat redacted logs looks like this:
[...]
2017-01-30 11:54:12 PM pool-1-thread-1 VoldemortUtils INFO: Existing protocol = hdfs and port = -1
2017-01-30 11:54:12 PM pool-1-thread-1 VoldemortUtils INFO: New protocol = hdfs and port = 8020
2017-01-30 11:54:12 PM pool-1-thread-1 AbstractStoreClientFactory INFO: Client zone-id [-1] Attempting to get raw store [voldsys$_metadata_version_persistence]
2017-01-30 11:54:12 PM pool-3-thread-1 AdminClient INFO: Node dc1-voldemort25:6666 [id 9] : AsyncOperationStatus(task id = 16, description = Fetch store 'readonlyusers7' v16, complete = false, status = 0 MB copied at 0 MB/sec - 0 % complete)
2017-01-30 11:54:12 PM pool-3-thread-1 AdminClient INFO: Node dc1-voldemort25:6666 [id 9] : AsyncOperationStatus(task id = 16, description = Fetch store 'readonlyusers7' v16, complete = true, status = Finished AsyncOperationStatus(task id = 16, description = Fetch store 'readonlyusers7' v16, complete = false, status = 0 MB copied at 0 MB/sec - 0 % complete))
2017-01-30 11:54:12 PM pool-3-thread-1 net:6666 INFO:
tcp://dc1-voldemort06:6666 : Fetch succeeded on Node dc1-voldemort25:6666 [id 9]
2017-01-30 11:54:12 PM pool-1-thread-1 net:6666 INFO:
tcp://dc1-voldemort06:6666 : Attempting swap for Node dc1-voldemort25:6666 [id 9], dir = Finished AsyncOperationStatus(task id = 16, description = Fetch store 'readonlyusers7' v16, complete = false, status = 0 MB copied at 0 MB/sec - 0 % complete)
2017-01-30 11:54:12 PM pool-1-thread-1 net:6666 ERROR:
tcp://dc1-voldemort06:6666 : Error on Node dc1-voldemort25:6666 [id 9] during swap :
voldemort.VoldemortException: Store directory 'Finished AsyncOperationStatus(task id = 16, description = Fetch store 'readonlyusers7' v16, complete = false, status = 0 MB copied at 0 MB/sec - 0 % complete)' is not a readable directory.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at voldemort.utils.ReflectUtils.callConstructor(ReflectUtils.java:116)
at voldemort.utils.ReflectUtils.callConstructor(ReflectUtils.java:103)
at voldemort.store.ErrorCodeMapper.getError(ErrorCodeMapper.java:84)
at voldemort.client.protocol.admin.AdminClient$HelperOperations.throwException(AdminClient.java:462)
at voldemort.client.protocol.admin.AdminClient$ReadOnlySpecificOperations.swapStore(AdminClient.java:4467)
at voldemort.store.readonly.swapper.AdminStoreSwapper.invokeSwap(AdminStoreSwapper.java:283)
at voldemort.store.readonly.swapper.AdminStoreSwapper.fetchAndSwapStoreData(AdminStoreSwapper.java:124)
at voldemort.store.readonly.mr.azkaban.VoldemortSwapJob.run(VoldemortSwapJob.java:159)
at voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob.runPushStore(VoldemortBuildAndPushJob.java:837)
at voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob$StorePushTask.call(VoldemortBuildAndPushJob.java:556)
at voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob$StorePushTask.call(VoldemortBuildAndPushJob.java:539)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2017-01-30 11:54:12 PM pool-1-thread-1 shell-job ERROR: Exception during push for cluster URL:
tcp://dc1-voldemort06:6666. Rethrowing exception.
2017-01-30 11:54:12 PM main shell-job ERROR: Got exceptions during Build and Push:
java.util.concurrent.ExecutionException: voldemort.VoldemortException: Exception during swaps on nodes Node dc1-voldemort06:6666 [id 0] in zone 0 partitionList:[0, 12, 24, 36, 48, 60, 72, 84, 96, 108, 120, 132, 144, 156, 168, 180, 192, 204, 216, 228, 240],Node dc1-voldemort07:6666 [id 1] in zone 0 partitionList:[1, 13, 25, 37, 49, 61, 73, 85, 97, 109, 121, 133, 145, 157, 169, 181, 193, 205, 217, 229, 241],Node dc1-voldemort08:6666 [id 2] in zone 0 partitionList:[2, 14, 26, 38, 50, 62, 74, 86, 98, 110, 122, 134, 146, 158, 170, 182, 194, 206, 218, 230, 242],Node dc1-voldemort09:6666 [id 3] in zone 0 partitionList:[3, 15, 27, 39, 51, 63, 75, 87, 99, 111, 123, 135, 147, 159, 171, 183, 195, 207, 219, 231, 243],Node dc1-voldemort10:6666 [id 4] in zone 0 partitionList:[4, 16, 28, 40, 52, 64, 76, 88, 100, 112, 124, 136, 148, 160, 172, 184, 196, 208, 220, 232, 244],Node dc1-voldemort21:6666 [id 5] in zone 0 partitionList:[5, 17, 29, 41, 53, 65, 77, 89, 101, 113, 125, 137, 149, 161, 173, 185, 197, 209, 221, 233, 245],Node dc1-voldemort22:6666 [id 6] in zone 0 partitionList:[6, 18, 30, 42, 54, 66, 78, 90, 102, 114, 126, 138, 150, 162, 174, 186, 198, 210, 222, 234, 246],Node dc1-voldemort23:6666 [id 7] in zone 0 partitionList:[7, 19, 31, 43, 55, 67, 79, 91, 103, 115, 127, 139, 151, 163, 175, 187, 199, 211, 223, 235, 247],Node dc1-voldemort24:6666 [id 8] in zone 0 partitionList:[8, 20, 32, 44, 56, 68, 80, 92, 104, 116, 128, 140, 152, 164, 176, 188, 200, 212, 224, 236, 248],Node dc1-voldemort25:6666 [id 9] in zone 0 partitionList:[9, 21, 33, 45, 57, 69, 81, 93, 105, 117, 129, 141, 153, 165, 177, 189, 201, 213, 225, 237, 249],Node dc1-voldemort26:6666 [id 10] in zone 0 partitionList:[10, 22, 34, 46, 58, 70, 82, 94, 106, 118, 130, 142, 154, 166, 178, 190, 202, 214, 226, 238, 250],Node dc1-voldemort27:6666 [id 11] in zone 0 partitionList:[11, 23, 35, 47, 59, 71, 83, 95, 107, 119, 131, 143, 155, 167, 179, 191, 203, 215, 227, 239, 251] failed
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:188)
at voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob.run(VoldemortBuildAndPushJob.java:653)
at voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJobRunner.main(VoldemortBuildAndPushJobRunner.java:34)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: voldemort.VoldemortException: Exception during swaps on nodes Node dc1-voldemort06:6666 [id 0] in zone 0 partitionList:[0, 12, 24, 36, 48, 60, 72, 84, 96, 108, 120, 132, 144, 156, 168, 180, 192, 204, 216, 228, 240],Node dc1-voldemort07:6666 [id 1] in zone 0 partitionList:[1, 13, 25, 37, 49, 61, 73, 85, 97, 109, 121, 133, 145, 157, 169, 181, 193, 205, 217, 229, 241],Node dc1-voldemort08:6666 [id 2] in zone 0 partitionList:[2, 14, 26, 38, 50, 62, 74, 86, 98, 110, 122, 134, 146, 158, 170, 182, 194, 206, 218, 230, 242],Node dc1-voldemort09:6666 [id 3] in zone 0 partitionList:[3, 15, 27, 39, 51, 63, 75, 87, 99, 111, 123, 135, 147, 159, 171, 183, 195, 207, 219, 231, 243],Node dc1-voldemort10:6666 [id 4] in zone 0 partitionList:[4, 16, 28, 40, 52, 64, 76, 88, 100, 112, 124, 136, 148, 160, 172, 184, 196, 208, 220, 232, 244],Node dc1-voldemort21:6666 [id 5] in zone 0 partitionList:[5, 17, 29, 41, 53, 65, 77, 89, 101, 113, 125, 137, 149, 161, 173, 185, 197, 209, 221, 233, 245],Node dc1-voldemort22:6666 [id 6] in zone 0 partitionList:[6, 18, 30, 42, 54, 66, 78, 90, 102, 114, 126, 138, 150, 162, 174, 186, 198, 210, 222, 234, 246],Node dc1-voldemort23:6666 [id 7] in zone 0 partitionList:[7, 19, 31, 43, 55, 67, 79, 91, 103, 115, 127, 139, 151, 163, 175, 187, 199, 211, 223, 235, 247],Node dc1-voldemort24:6666 [id 8] in zone 0 partitionList:[8, 20, 32, 44, 56, 68, 80, 92, 104, 116, 128, 140, 152, 164, 176, 188, 200, 212, 224, 236, 248],Node dc1-voldemort25:6666 [id 9] in zone 0 partitionList:[9, 21, 33, 45, 57, 69, 81, 93, 105, 117, 129, 141, 153, 165, 177, 189, 201, 213, 225, 237, 249],Node dc1-voldemort26:6666 [id 10] in zone 0 partitionList:[10, 22, 34, 46, 58, 70, 82, 94, 106, 118, 130, 142, 154, 166, 178, 190, 202, 214, 226, 238, 250],Node dc1-voldemort27:6666 [id 11] in zone 0 partitionList:[11, 23, 35, 47, 59, 71, 83, 95, 107, 119, 131, 143, 155, 167, 179, 191, 203, 215, 227, 239, 251] failed
at voldemort.store.readonly.swapper.AdminStoreSwapper.invokeSwap(AdminStoreSwapper.java:318)
at voldemort.store.readonly.swapper.AdminStoreSwapper.fetchAndSwapStoreData(AdminStoreSwapper.java:124)
at voldemort.store.readonly.mr.azkaban.VoldemortSwapJob.run(VoldemortSwapJob.java:159)
at voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob.runPushStore(VoldemortBuildAndPushJob.java:837)
at voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob$StorePushTask.call(VoldemortBuildAndPushJob.java:556)
at voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob$StorePushTask.call(VoldemortBuildAndPushJob.java:539)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2017-01-30 11:54:12 PM main VoldemortBuildAndPushJobRunner ERROR: Exception while running BnP job!
voldemort.VoldemortException: An exception occurred during Build and Push !!
at voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob.run(VoldemortBuildAndPushJob.java:685)
at voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJobRunner.main(VoldemortBuildAndPushJobRunner.java:34)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: voldemort.VoldemortException: Got exceptions during Build and Push
at voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob.run(VoldemortBuildAndPushJob.java:681)
... 7 more
(I removed all node specific logs besides dc1-voldemort06 and dc1-voldemort25, because they are basically the same for all nodes.) This might be just due to the different versions of the CDH libs, but it still keeps me wondering if anyone is using hdfs for fetches at all? Maybe that code path is not really tested and not working anymore.
Thanks,
David