Hi,When I tried to run Gerrit Analytics ETL job in docker, got an elasticsearch connection issue:[shiping.chen@localhost /]$ sudo docker run -ti --rm -e ES_HOST=127.0.0.1:9200 -e GERRIT_URL="http://xdb-dev.alibaba.net:8080" -e ANALYTICS_ARGS="--since 2000-06-01 --aggregate email_hour -e gerrit/analytics" gerritforge/spark-gerrit-analytics-etl:latest* Elastic Search Host: localhost:9200* Gerrit URL: http://xdb-dev.alibaba.net:8080* Analytics arguments: --since 2000-06-01 --aggregate email_hour -e gerrit/analytics* Spark jar class: com.gerritforge.analytics.job.Main* Spark jar path: /usr/local/spark/jars* Waiting for Elasticsearch at http://localhost:9200 (1/30)* Waiting for Elasticsearch at http://localhost:9200 (2/30)* Waiting for Elasticsearch at http://localhost:9200 (3/30)* Waiting for Elasticsearch at http://localhost:9200 (4/30)* Waiting for Elasticsearch at http://localhost:9200 (5/30)* Waiting for Elasticsearch at http://localhost:9200 (6/30)* Waiting for Elasticsearch at http://localhost:9200 (7/30)* Waiting for Elasticsearch at http://localhost:9200 (8/30)* Waiting for Elasticsearch at http://localhost:9200 (9/30)* Waiting for Elasticsearch at http://localhost:9200 (10/30)* Waiting for Elasticsearch at http://localhost:9200 (11/30)* Waiting for Elasticsearch at http://localhost:9200 (12/30)* Waiting for Elasticsearch at http://localhost:9200 (13/30)* Waiting for Elasticsearch at http://localhost:9200 (14/30)* Waiting for Elasticsearch at http://localhost:9200 (15/30)* Waiting for Elasticsearch at http://localhost:9200 (16/30)* Waiting for Elasticsearch at http://localhost:9200 (17/30)* Waiting for Elasticsearch at http://localhost:9200 (18/30)* Waiting for Elasticsearch at http://localhost:9200 (19/30)* Waiting for Elasticsearch at http://localhost:9200 (20/30)* Waiting for Elasticsearch at http://localhost:9200 (21/30)* Waiting for Elasticsearch at http://localhost:9200 (22/30)* Waiting for Elasticsearch at http://localhost:9200 (23/30)* Waiting for Elasticsearch at http://localhost:9200 (24/30)* Waiting for Elasticsearch at http://localhost:9200 (25/30)* Waiting for Elasticsearch at http://localhost:9200 (26/30)* Waiting for Elasticsearch at http://localhost:9200 (27/30)* Waiting for Elasticsearch at http://localhost:9200 (28/30)* Waiting for Elasticsearch at http://localhost:9200 (29/30)* Waiting for Elasticsearch at http://localhost:9200 (30/30)Operation timed outElasticSearch itself is running:[shiping.chen@localhost /]$ curl -XGET 127.0.0.1:9200 {"name" : "Q6EjhhY","cluster_name" : "elasticsearch","cluster_uuid" : "syys2GZBSuKb8_HTMpnMkw","version" : {"number" : "6.4.2","build_flavor" : "default","build_type" : "rpm","build_hash" : "04711c2","build_date" : "2018-09-26T13:34:09.098244Z","build_snapshot" : false,"lucene_version" : "7.4.0","minimum_wire_compatibility_version" : "5.6.0","minimum_index_compatibility_version" : "5.0.0"},"tagline" : "You Know, for Search"}I built the dockerized etl using "sbt docker" command in the git repo.Could anybody kindly shed some light on what's wrong with my setting?Best regards,Shiping
Hi Fabio,Thanks for the hint. Since there is no a good way to get the localhost from docker in linux, I switched to windows. With docker.host.internal I did not see the "Waiting for Elasticsearch at http://localhost:9200 (1/30)" message anymore.The execution went very far, but eventually it failed at:PS C:\Users\shiping.chen\elasticsearch-6.4.1> docker run -ti --rm -e ES_HOST=host.docker.internal -e GERRIT_URL="http://xdb-dev.alibaba.net:8080" -e ANALYTICS_ARGS="--since 2018-08-03 --aggregate email_hour -e gerrit/analytics" gerritforge/spark-gerrit-analytics-etl:latest.........2018-10-30 03:40:28 INFO ContextCleaner:54 - Cleaned accumulator 2242018-10-30 03:40:29 INFO SparkContext:54 - Starting job: runJob at EsSparkSQL.scala:1012018-10-30 03:40:29 INFO DAGScheduler:54 - Got job 6 (runJob at EsSparkSQL.scala:101) with 2 output partitions2018-10-30 03:40:29 INFO DAGScheduler:54 - Final stage: ResultStage 26 (runJob at EsSparkSQL.scala:101)2018-10-30 03:40:29 INFO DAGScheduler:54 - Parents of final stage: List()2018-10-30 03:40:29 INFO DAGScheduler:54 - Missing parents: List()2018-10-30 03:40:29 INFO DAGScheduler:54 - Submitting ResultStage 26 (MapPartitionsRDD[48] at rdd at EsSparkSQL.scala:101), which has no missing parents2018-10-30 03:40:29 INFO MemoryStore:54 - Block broadcast_9 stored as values in memory (estimated size 34.8 KB, free 366.2 MB)2018-10-30 03:40:29 INFO MemoryStore:54 - Block broadcast_9_piece0 stored as bytes in memory (estimated size 15.1 KB, free 366.2 MB)2018-10-30 03:40:29 INFO BlockManagerInfo:54 - Added broadcast_9_piece0 in memory on 9775ea57fd23:38395 (size: 15.1 KB, free: 366.3 MB)2018-10-30 03:40:29 INFO SparkContext:54 - Created broadcast 9 from broadcast at DAGScheduler.scala:10392018-10-30 03:40:29 INFO DAGScheduler:54 - Submitting 2 missing tasks from ResultStage 26 (MapPartitionsRDD[48] at rdd at EsSparkSQL.scala:101) (first 15 tasks are for partitions Vector(0, 1))2018-10-30 03:40:29 INFO TaskSchedulerImpl:54 - Adding task set 26.0 with 2 tasks2018-10-30 03:40:29 INFO TaskSetManager:54 - Starting task 0.0 in stage 26.0 (TID 604, localhost, executor driver, partition 0, PROCESS_LOCAL, 7884 bytes)2018-10-30 03:40:29 INFO TaskSetManager:54 - Starting task 1.0 in stage 26.0 (TID 605, localhost, executor driver, partition 1, PROCESS_LOCAL, 7884 bytes)2018-10-30 03:40:29 INFO Executor:54 - Running task 0.0 in stage 26.0 (TID 604)2018-10-30 03:40:29 INFO Executor:54 - Running task 1.0 in stage 26.0 (TID 605)2018-10-30 03:40:29 INFO BlockManager:54 - Found block rdd_39_0 locally2018-10-30 03:40:29 INFO BlockManager:54 - Found block rdd_39_1 locally2018-10-30 03:40:29 INFO CodeGenerator:54 - Code generated in 39.2802 ms2018-10-30 03:40:29 INFO HttpMethodDirector:439 - I/O exception (java.net.ConnectException) caught when processing request: Connection refused (Connection refused)2018-10-30 03:40:29 INFO HttpMethodDirector:445 - Retrying request2018-10-30 03:40:29 INFO HttpMethodDirector:439 - I/O exception (java.net.ConnectException) caught when processing request: Connection refused (Connection refused)2018-10-30 03:40:29 INFO HttpMethodDirector:445 - Retrying request2018-10-30 03:40:29 INFO HttpMethodDirector:439 - I/O exception (java.net.ConnectException) caught when processing request: Connection refused (Connection refused)2018-10-30 03:40:29 INFO HttpMethodDirector:445 - Retrying request2018-10-30 03:40:29 ERROR NetworkClient:144 - Node [127.0.0.1:9200] failed (Connection refused (Connection refused)); selected next node [192.168.65.2:9200]2018-10-30 03:40:30 INFO EsDataFrameWriter:594 - Writing to [gerrit/analytics]2018-10-30 03:40:30 INFO EsDataFrameWriter:594 - Writing to [gerrit/analytics]2018-10-30 03:40:30 INFO HttpMethodDirector:439 - I/O exception (java.net.ConnectException) caught when processing request: Connection refused (Connection refused)2018-10-30 03:40:30 INFO HttpMethodDirector:445 - Retrying request2018-10-30 03:40:30 INFO HttpMethodDirector:439 - I/O exception (java.net.ConnectException) caught when processing request: Connection refused (Connection refused)2018-10-30 03:40:30 INFO HttpMethodDirector:445 - Retrying request2018-10-30 03:40:30 INFO HttpMethodDirector:439 - I/O exception (java.net.ConnectException) caught when processing request: Connection refused (Connection refused)2018-10-30 03:40:30 INFO HttpMethodDirector:445 - Retrying request2018-10-30 03:40:30 INFO HttpMethodDirector:439 - I/O exception (java.net.ConnectException) caught when processing request: Connection refused (Connection refused)2018-10-30 03:40:30 INFO HttpMethodDirector:445 - Retrying request2018-10-30 03:40:30 INFO HttpMethodDirector:439 - I/O exception (java.net.ConnectException) caught when processing request: Connection refused (Connection refused)2018-10-30 03:40:30 INFO HttpMethodDirector:445 - Retrying request2018-10-30 03:40:30 ERROR NetworkClient:144 - Node [127.0.0.1:9200] failed (Connection refused (Connection refused)); no other nodes left - aborting...2018-10-30 03:40:30 INFO HttpMethodDirector:439 - I/O exception (java.net.ConnectException) caught when processing request: Connection refused (Connection refused)2018-10-30 03:40:30 INFO HttpMethodDirector:445 - Retrying request2018-10-30 03:40:30 ERROR NetworkClient:144 - Node [127.0.0.1:9200] failed (Connection refused (Connection refused)); no other nodes left - aborting...2018-10-30 03:40:30 ERROR Executor:91 - Exception in task 1.0 in stage 26.0 (TID 605)org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[127.0.0.1:9200]]at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:149)at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:380)at org.elasticsearch.hadoop.rest.RestClient.executeNotFoundAllowed(RestClient.java:388)at org.elasticsearch.hadoop.rest.RestClient.exists(RestClient.java:484)at org.elasticsearch.hadoop.rest.RestClient.indexExists(RestClient.java:479)at org.elasticsearch.hadoop.rest.RestClient.touch(RestClient.java:490)at org.elasticsearch.hadoop.rest.RestRepository.touch(RestRepository.java:352)at org.elasticsearch.hadoop.rest.RestService.initSingleIndex(RestService.java:612)at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:600)at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:101)at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:101)at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)at org.apache.spark.scheduler.Task.run(Task.scala:109)at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:748)2018-10-30 03:40:30 ERROR Executor:91 - Exception in task 0.0 in stage 26.0 (TID 604)org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[127.0.0.1:9200]]at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:149)at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:380)at org.elasticsearch.hadoop.rest.RestClient.executeNotFoundAllowed(RestClient.java:388)at org.elasticsearch.hadoop.rest.RestClient.exists(RestClient.java:484)at org.elasticsearch.hadoop.rest.RestClient.indexExists(RestClient.java:479)at org.elasticsearch.hadoop.rest.RestClient.touch(RestClient.java:490)at org.elasticsearch.hadoop.rest.RestRepository.touch(RestRepository.java:352)at org.elasticsearch.hadoop.rest.RestService.initSingleIndex(RestService.java:612)at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:600)at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:101)at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:101)at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)at org.apache.spark.scheduler.Task.run(Task.scala:109)at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:748)2018-10-30 03:40:30 WARN TaskSetManager:66 - Lost task 1.0 in stage 26.0 (TID 605, localhost, executor driver): org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[127.0.0.1:9200]]at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:149)at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:380)at org.elasticsearch.hadoop.rest.RestClient.executeNotFoundAllowed(RestClient.java:388)at org.elasticsearch.hadoop.rest.RestClient.exists(RestClient.java:484)at org.elasticsearch.hadoop.rest.RestClient.indexExists(RestClient.java:479)at org.elasticsearch.hadoop.rest.RestClient.touch(RestClient.java:490)at org.elasticsearch.hadoop.rest.RestRepository.touch(RestRepository.java:352)at org.elasticsearch.hadoop.rest.RestService.initSingleIndex(RestService.java:612)at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:600)at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:101)at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:101)at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)at org.apache.spark.scheduler.Task.run(Task.scala:109)at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:748)2018-10-30 03:40:30 ERROR TaskSetManager:70 - Task 1 in stage 26.0 failed 1 times; aborting job2018-10-30 03:40:30 INFO TaskSetManager:54 - Lost task 0.0 in stage 26.0 (TID 604) on localhost, executor driver: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException (Connection error (check network and/or proxy settings)- all nodes failed; tried [[127.0.0.1:9200]] ) [duplicate 1]2018-10-30 03:40:30 INFO TaskSchedulerImpl:54 - Removed TaskSet 26.0, whose tasks have all completed, from pool2018-10-30 03:40:30 INFO TaskSchedulerImpl:54 - Cancelling stage 262018-10-30 03:40:30 INFO DAGScheduler:54 - ResultStage 26 (runJob at EsSparkSQL.scala:101) failed in 0.860 s due to Job aborted due to stage failure: Task 1 in stage 26.0 failed 1 times, most recent failure: Lost task 1.0 in stage 26.0 (TID 605, localhost, executor driver): org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[127.0.0.1:9200]]at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:149)at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:380)at org.elasticsearch.hadoop.rest.RestClient.executeNotFoundAllowed(RestClient.java:388)at org.elasticsearch.hadoop.rest.RestClient.exists(RestClient.java:484)at org.elasticsearch.hadoop.rest.RestClient.indexExists(RestClient.java:479)at org.elasticsearch.hadoop.rest.RestClient.touch(RestClient.java:490)at org.elasticsearch.hadoop.rest.RestRepository.touch(RestRepository.java:352)at org.elasticsearch.hadoop.rest.RestService.initSingleIndex(RestService.java:612)at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:600)at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:101)at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:101)at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)at org.apache.spark.scheduler.Task.run(Task.scala:109)at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:748)Driver stacktrace:2018-10-30 03:40:30 INFO DAGScheduler:54 - Job 6 failed: runJob at EsSparkSQL.scala:101, took 0.874579 sException in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 26.0 failed 1 times, most recent failure: Lost task 1.0 in stage 26.0 (TID 605, localhost, executor driver): org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[127.0.0.1:9200]]at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:149)at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:380)at org.elasticsearch.hadoop.rest.RestClient.executeNotFoundAllowed(RestClient.java:388)at org.elasticsearch.hadoop.rest.RestClient.exists(RestClient.java:484)at org.elasticsearch.hadoop.rest.RestClient.indexExists(RestClient.java:479)at org.elasticsearch.hadoop.rest.RestClient.touch(RestClient.java:490)at org.elasticsearch.hadoop.rest.RestRepository.touch(RestRepository.java:352)at org.elasticsearch.hadoop.rest.RestService.initSingleIndex(RestService.java:612)at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:600)at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:101)at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:101)at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)at org.apache.spark.scheduler.Task.run(Task.scala:109)at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:748)Driver stacktrace:at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1651)at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1639)at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1638)at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1638)at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)at scala.Option.foreach(Option.scala:257)at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1872)at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1821)at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1810)at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034)at org.apache.spark.SparkContext.runJob(SparkContext.scala:2055)at org.apache.spark.SparkContext.runJob(SparkContext.scala:2087)at org.elasticsearch.spark.sql.EsSparkSQL$.saveToEs(EsSparkSQL.scala:101)at org.elasticsearch.spark.sql.EsSparkSQL$.saveToEs(EsSparkSQL.scala:80)at org.elasticsearch.spark.sql.package$SparkDataFrameFunctions.saveToEs(package.scala:48)at com.gerritforge.analytics.job.Job$$anonfun$saveES$1.apply(Main.scala:210)at com.gerritforge.analytics.job.Job$$anonfun$saveES$1.apply(Main.scala:207)at scala.Option.foreach(Option.scala:257)at com.gerritforge.analytics.job.Job$class.saveES(Main.scala:207)at com.gerritforge.analytics.job.Main$.saveES(Main.scala:35)at com.gerritforge.analytics.job.Main$.delayedEndpoint$com$gerritforge$analytics$job$Main$1(Main.scala:115)at com.gerritforge.analytics.job.Main$delayedInit$body.apply(Main.scala:35)at scala.Function0$class.apply$mcV$sp(Function0.scala:34)at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)at scala.App$$anonfun$main$1.apply(App.scala:76)at scala.App$$anonfun$main$1.apply(App.scala:76)at scala.collection.immutable.List.foreach(List.scala:381)at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)at scala.App$class.main(App.scala:76)at com.gerritforge.analytics.job.Main$.main(Main.scala:35)at com.gerritforge.analytics.job.Main.main(Main.scala)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:498)at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)Caused by: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[127.0.0.1:9200]]at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:149)at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:380)at org.elasticsearch.hadoop.rest.RestClient.executeNotFoundAllowed(RestClient.java:388)at org.elasticsearch.hadoop.rest.RestClient.exists(RestClient.java:484)at org.elasticsearch.hadoop.rest.RestClient.indexExists(RestClient.java:479)at org.elasticsearch.hadoop.rest.RestClient.touch(RestClient.java:490)at org.elasticsearch.hadoop.rest.RestRepository.touch(RestRepository.java:352)at org.elasticsearch.hadoop.rest.RestService.initSingleIndex(RestService.java:612)at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:600)at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:101)at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:101)at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)at org.apache.spark.scheduler.Task.run(Task.scala:109)at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:748)2018-10-30 03:40:30 INFO SparkContext:54 - Invoking stop() from shutdown hook2018-10-30 03:40:30 INFO AbstractConnector:318 - Stopped Spark@745aef8d{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}2018-10-30 03:40:30 INFO SparkUI:54 - Stopped Spark web UI at http://9775ea57fd23:40402018-10-30 03:40:30 INFO MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped!2018-10-30 03:40:30 INFO MemoryStore:54 - MemoryStore cleared2018-10-30 03:40:30 INFO BlockManager:54 - BlockManager stopped2018-10-30 03:40:30 INFO BlockManagerMaster:54 - BlockManagerMaster stopped2018-10-30 03:40:30 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped!2018-10-30 03:40:31 INFO SparkContext:54 - Successfully stopped SparkContext2018-10-30 03:40:31 INFO ShutdownHookManager:54 - Shutdown hook called2018-10-30 03:40:31 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-a348143a-875d-45ed-a502-9484e16859cb2018-10-30 03:40:31 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-9c2e9a6a-5bcf-42ed-bca9-7cb55e932fd1It seems still network issue.Any idea?Best regards,Shiping
Hi Fabio,I eventually took a workaround in which I installed elasticsearch and kibana on a machine, and run the docker job on another machine.The feature is very nice! Thank you very much for the nice work!Shiping
docker run -ti --rm -e ES_HOST=xxx.xxx.xxx.xx -e GERRIT_URL="gerrit url" -e ANALYTICS_ARGS="--since 2018-08-03 --extract-branches true --aggregate email_hour -e gerrit/analytics" gerritforge/spark-gerrit-analytics-etl:latest
spark-submit \
--conf spark.es.nodes="30.57.186.97" \
--conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=file:log4j.properties" \
--conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:log4j.properties" \
--class com.gerritforge.analytics.gitcommits.job.Main /app/analytics-etl-gitcommits-assembly.jar \
--url="http://x-dev.alibaba.net:8080" --since 2018-08-03 --aggregate email_hour -e gerrit
--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en
---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
...
Hi Fabio,
Thank you for the suggestions. I figured out that it simply takes time, like half hour, but the job eventually will be finished. Not sure why it becomes so much slower than two months ago.
BTW, what's the correct procedure to update data in elasticsearch? I mean to get the latest analytic data from gerrit by executing the docker etl but keep the visulize and dashboard untouched, do I need to "DELETE gerrit" before rerun the etl?
--
Hi Fabio,
We have 10 projects, about 1000 commits in total. The whole execution takes more than a hour.My current analytic plugin is the original one came from Gerrit 2.14.10 installation package. To get the latest one, should I use stable-2.14 branch to build? (https://gerrit.googlesource.com/plugins/analytics/) Does that branch include your latest performance improvement changes?
I also found data from some projects are missing, and the number of commits are not correct sometimes.