Re: Issue on running Gerrit Analytics ETL job in docker

285 views

Skip to first unread message

Fabio Ponciroli

unread,

Oct 29, 2018, 7:33:48 PM10/29/18

to sher...@gmail.com, repo-d...@googlegroups.com, Luca Milanesio, syntonyze, galar...@gmail.com

Hi Shiping,

the ETL job is running inside a Docker container, hence the address you are passing in the ES_HOST (127.0.0.1) refers to the localhost inside the docker container itself. In your case Elastisearch is running on you host machine, hence you need to set ES_HOST to your host IP address.

If you are using a Mac you can do it using docker.for.mac.localhost, otherwise just specify the IP of you host (not sure if there is an equivalent for Windows/Unix).

I also suggest you clean the ETL image you currently have, to make sure you get the latest one.

Try to run the following:

docker rmi gerritforge/spark-gerrit-analytics-etl:latest # Remove docker image

docker run -ti --rm -e ES_HOST=docker.for.mac.localhost -e GERRIT_URL="http://xdb-dev.alibaba.net:8080" -e ANALYTICS_ARGS="--since 2000-06-01 --aggregate email_hour -e gerrit/analytics" gerritforge/spark-gerrit-analytics-etl:latest # Use ES_HOST=<your_host_ip> if you are not running on MacOS

Let us know if it works.

Thanks,

Fabio

Il giorno lun 29 ott 2018 alle ore 19:59 shipingc <sher...@gmail.com> ha scritto:

Hi,

When I tried to run Gerrit Analytics ETL job in docker, got an elasticsearch connection issue:

[shiping.chen@localhost /]$ sudo docker run -ti --rm -e ES_HOST=127.0.0.1:9200 -e GERRIT_URL="http://xdb-dev.alibaba.net:8080" -e ANALYTICS_ARGS="--since 2000-06-01 --aggregate email_hour -e gerrit/analytics" gerritforge/spark-gerrit-analytics-etl:latest
* Elastic Search Host: localhost:9200
* Gerrit URL: http://xdb-dev.alibaba.net:8080
* Analytics arguments: --since 2000-06-01 --aggregate email_hour -e gerrit/analytics
* Spark jar class: com.gerritforge.analytics.job.Main
* Spark jar path: /usr/local/spark/jars
* Waiting for Elasticsearch at http://localhost:9200 (1/30)
* Waiting for Elasticsearch at http://localhost:9200 (2/30)
* Waiting for Elasticsearch at http://localhost:9200 (3/30)
* Waiting for Elasticsearch at http://localhost:9200 (4/30)
* Waiting for Elasticsearch at http://localhost:9200 (5/30)
* Waiting for Elasticsearch at http://localhost:9200 (6/30)
* Waiting for Elasticsearch at http://localhost:9200 (7/30)
* Waiting for Elasticsearch at http://localhost:9200 (8/30)
* Waiting for Elasticsearch at http://localhost:9200 (9/30)
* Waiting for Elasticsearch at http://localhost:9200 (10/30)
* Waiting for Elasticsearch at http://localhost:9200 (11/30)
* Waiting for Elasticsearch at http://localhost:9200 (12/30)
* Waiting for Elasticsearch at http://localhost:9200 (13/30)
* Waiting for Elasticsearch at http://localhost:9200 (14/30)
* Waiting for Elasticsearch at http://localhost:9200 (15/30)
* Waiting for Elasticsearch at http://localhost:9200 (16/30)
* Waiting for Elasticsearch at http://localhost:9200 (17/30)
* Waiting for Elasticsearch at http://localhost:9200 (18/30)
* Waiting for Elasticsearch at http://localhost:9200 (19/30)
* Waiting for Elasticsearch at http://localhost:9200 (20/30)
* Waiting for Elasticsearch at http://localhost:9200 (21/30)
* Waiting for Elasticsearch at http://localhost:9200 (22/30)
* Waiting for Elasticsearch at http://localhost:9200 (23/30)
* Waiting for Elasticsearch at http://localhost:9200 (24/30)
* Waiting for Elasticsearch at http://localhost:9200 (25/30)
* Waiting for Elasticsearch at http://localhost:9200 (26/30)
* Waiting for Elasticsearch at http://localhost:9200 (27/30)
* Waiting for Elasticsearch at http://localhost:9200 (28/30)
* Waiting for Elasticsearch at http://localhost:9200 (29/30)
* Waiting for Elasticsearch at http://localhost:9200 (30/30)
Operation timed out

ElasticSearch itself is running:

[shiping.chen@localhost /]$ curl -XGET 127.0.0.1:9200 {
"name" : "Q6EjhhY",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "syys2GZBSuKb8_HTMpnMkw",
"version" : {
"number" : "6.4.2",
"build_flavor" : "default",
"build_type" : "rpm",
"build_hash" : "04711c2",
"build_date" : "2018-09-26T13:34:09.098244Z",
"build_snapshot" : false,
"lucene_version" : "7.4.0",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}

I built the dockerized etl using "sbt docker" command in the git repo.

Could anybody kindly shed some light on what's wrong with my setting?

Best regards,
Shiping

Fabio Ponciroli

unread,

Oct 30, 2018, 3:56:53 AM10/30/18

to shipingc, repo-d...@googlegroups.com, Luca Milanesio, Antonio Barone, Stefano Galarraga

Hi Shiping,

Can you share your Elasticsearch configuration ? It will be useful to understand and debug the issue.

Thanks ,

Fabio

On Tue, 30 Oct 2018, 04:47 shipingc, <sher...@gmail.com> wrote:

Hi Fabio,

Thanks for the hint. Since there is no a good way to get the localhost from docker in linux, I switched to windows. With docker.host.internal I did not see the "Waiting for Elasticsearch at http://localhost:9200 (1/30)" message anymore.
The execution went very far, but eventually it failed at:

PS C:\Users\shiping.chen\elasticsearch-6.4.1> docker run -ti --rm -e ES_HOST=host.docker.internal -e GERRIT_URL="http://xdb-dev.alibaba.net:8080" -e ANALYTICS_ARGS="--since 2018-08-03 --aggregate email_hour -e gerrit/analytics" gerritforge/spark-gerrit-analytics-etl:latest

.........
2018-10-30 03:40:28 INFO ContextCleaner:54 - Cleaned accumulator 224
2018-10-30 03:40:29 INFO SparkContext:54 - Starting job: runJob at EsSparkSQL.scala:101
2018-10-30 03:40:29 INFO DAGScheduler:54 - Got job 6 (runJob at EsSparkSQL.scala:101) with 2 output partitions
2018-10-30 03:40:29 INFO DAGScheduler:54 - Final stage: ResultStage 26 (runJob at EsSparkSQL.scala:101)
2018-10-30 03:40:29 INFO DAGScheduler:54 - Parents of final stage: List()
2018-10-30 03:40:29 INFO DAGScheduler:54 - Missing parents: List()
2018-10-30 03:40:29 INFO DAGScheduler:54 - Submitting ResultStage 26 (MapPartitionsRDD[48] at rdd at EsSparkSQL.scala:101), which has no missing parents
2018-10-30 03:40:29 INFO MemoryStore:54 - Block broadcast_9 stored as values in memory (estimated size 34.8 KB, free 366.2 MB)
2018-10-30 03:40:29 INFO MemoryStore:54 - Block broadcast_9_piece0 stored as bytes in memory (estimated size 15.1 KB, free 366.2 MB)
2018-10-30 03:40:29 INFO BlockManagerInfo:54 - Added broadcast_9_piece0 in memory on 9775ea57fd23:38395 (size: 15.1 KB, free: 366.3 MB)
2018-10-30 03:40:29 INFO SparkContext:54 - Created broadcast 9 from broadcast at DAGScheduler.scala:1039
2018-10-30 03:40:29 INFO DAGScheduler:54 - Submitting 2 missing tasks from ResultStage 26 (MapPartitionsRDD[48] at rdd at EsSparkSQL.scala:101) (first 15 tasks are for partitions Vector(0, 1))
2018-10-30 03:40:29 INFO TaskSchedulerImpl:54 - Adding task set 26.0 with 2 tasks
2018-10-30 03:40:29 INFO TaskSetManager:54 - Starting task 0.0 in stage 26.0 (TID 604, localhost, executor driver, partition 0, PROCESS_LOCAL, 7884 bytes)
2018-10-30 03:40:29 INFO TaskSetManager:54 - Starting task 1.0 in stage 26.0 (TID 605, localhost, executor driver, partition 1, PROCESS_LOCAL, 7884 bytes)
2018-10-30 03:40:29 INFO Executor:54 - Running task 0.0 in stage 26.0 (TID 604)
2018-10-30 03:40:29 INFO Executor:54 - Running task 1.0 in stage 26.0 (TID 605)
2018-10-30 03:40:29 INFO BlockManager:54 - Found block rdd_39_0 locally
2018-10-30 03:40:29 INFO BlockManager:54 - Found block rdd_39_1 locally
2018-10-30 03:40:29 INFO CodeGenerator:54 - Code generated in 39.2802 ms
2018-10-30 03:40:29 INFO HttpMethodDirector:439 - I/O exception (java.net.ConnectException) caught when processing request: Connection refused (Connection refused)
2018-10-30 03:40:29 INFO HttpMethodDirector:445 - Retrying request
2018-10-30 03:40:29 INFO HttpMethodDirector:439 - I/O exception (java.net.ConnectException) caught when processing request: Connection refused (Connection refused)
2018-10-30 03:40:29 INFO HttpMethodDirector:445 - Retrying request
2018-10-30 03:40:29 INFO HttpMethodDirector:439 - I/O exception (java.net.ConnectException) caught when processing request: Connection refused (Connection refused)
2018-10-30 03:40:29 INFO HttpMethodDirector:445 - Retrying request
2018-10-30 03:40:29 ERROR NetworkClient:144 - Node [127.0.0.1:9200] failed (Connection refused (Connection refused)); selected next node [192.168.65.2:9200]
2018-10-30 03:40:30 INFO EsDataFrameWriter:594 - Writing to [gerrit/analytics]
2018-10-30 03:40:30 INFO EsDataFrameWriter:594 - Writing to [gerrit/analytics]
2018-10-30 03:40:30 INFO HttpMethodDirector:439 - I/O exception (java.net.ConnectException) caught when processing request: Connection refused (Connection refused)
2018-10-30 03:40:30 INFO HttpMethodDirector:445 - Retrying request
2018-10-30 03:40:30 INFO HttpMethodDirector:439 - I/O exception (java.net.ConnectException) caught when processing request: Connection refused (Connection refused)
2018-10-30 03:40:30 INFO HttpMethodDirector:445 - Retrying request
2018-10-30 03:40:30 INFO HttpMethodDirector:439 - I/O exception (java.net.ConnectException) caught when processing request: Connection refused (Connection refused)
2018-10-30 03:40:30 INFO HttpMethodDirector:445 - Retrying request
2018-10-30 03:40:30 INFO HttpMethodDirector:439 - I/O exception (java.net.ConnectException) caught when processing request: Connection refused (Connection refused)
2018-10-30 03:40:30 INFO HttpMethodDirector:445 - Retrying request
2018-10-30 03:40:30 INFO HttpMethodDirector:439 - I/O exception (java.net.ConnectException) caught when processing request: Connection refused (Connection refused)
2018-10-30 03:40:30 INFO HttpMethodDirector:445 - Retrying request
2018-10-30 03:40:30 ERROR NetworkClient:144 - Node [127.0.0.1:9200] failed (Connection refused (Connection refused)); no other nodes left - aborting...
2018-10-30 03:40:30 INFO HttpMethodDirector:439 - I/O exception (java.net.ConnectException) caught when processing request: Connection refused (Connection refused)
2018-10-30 03:40:30 INFO HttpMethodDirector:445 - Retrying request
2018-10-30 03:40:30 ERROR NetworkClient:144 - Node [127.0.0.1:9200] failed (Connection refused (Connection refused)); no other nodes left - aborting...
2018-10-30 03:40:30 ERROR Executor:91 - Exception in task 1.0 in stage 26.0 (TID 605)
org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[127.0.0.1:9200]]
at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:149)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:380)
at org.elasticsearch.hadoop.rest.RestClient.executeNotFoundAllowed(RestClient.java:388)
at org.elasticsearch.hadoop.rest.RestClient.exists(RestClient.java:484)
at org.elasticsearch.hadoop.rest.RestClient.indexExists(RestClient.java:479)
at org.elasticsearch.hadoop.rest.RestClient.touch(RestClient.java:490)
at org.elasticsearch.hadoop.rest.RestRepository.touch(RestRepository.java:352)
at org.elasticsearch.hadoop.rest.RestService.initSingleIndex(RestService.java:612)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:600)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:101)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:101)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2018-10-30 03:40:30 ERROR Executor:91 - Exception in task 0.0 in stage 26.0 (TID 604)
org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[127.0.0.1:9200]]
at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:149)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:380)
at org.elasticsearch.hadoop.rest.RestClient.executeNotFoundAllowed(RestClient.java:388)
at org.elasticsearch.hadoop.rest.RestClient.exists(RestClient.java:484)
at org.elasticsearch.hadoop.rest.RestClient.indexExists(RestClient.java:479)
at org.elasticsearch.hadoop.rest.RestClient.touch(RestClient.java:490)
at org.elasticsearch.hadoop.rest.RestRepository.touch(RestRepository.java:352)
at org.elasticsearch.hadoop.rest.RestService.initSingleIndex(RestService.java:612)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:600)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:101)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:101)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2018-10-30 03:40:30 WARN TaskSetManager:66 - Lost task 1.0 in stage 26.0 (TID 605, localhost, executor driver): org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[127.0.0.1:9200]]
at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:149)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:380)
at org.elasticsearch.hadoop.rest.RestClient.executeNotFoundAllowed(RestClient.java:388)
at org.elasticsearch.hadoop.rest.RestClient.exists(RestClient.java:484)
at org.elasticsearch.hadoop.rest.RestClient.indexExists(RestClient.java:479)
at org.elasticsearch.hadoop.rest.RestClient.touch(RestClient.java:490)
at org.elasticsearch.hadoop.rest.RestRepository.touch(RestRepository.java:352)
at org.elasticsearch.hadoop.rest.RestService.initSingleIndex(RestService.java:612)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:600)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:101)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:101)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

2018-10-30 03:40:30 ERROR TaskSetManager:70 - Task 1 in stage 26.0 failed 1 times; aborting job
2018-10-30 03:40:30 INFO TaskSetManager:54 - Lost task 0.0 in stage 26.0 (TID 604) on localhost, executor driver: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException (Connection error (check network and/or proxy settings)- all nodes failed; tried [[127.0.0.1:9200]] ) [duplicate 1]
2018-10-30 03:40:30 INFO TaskSchedulerImpl:54 - Removed TaskSet 26.0, whose tasks have all completed, from pool
2018-10-30 03:40:30 INFO TaskSchedulerImpl:54 - Cancelling stage 26
2018-10-30 03:40:30 INFO DAGScheduler:54 - ResultStage 26 (runJob at EsSparkSQL.scala:101) failed in 0.860 s due to Job aborted due to stage failure: Task 1 in stage 26.0 failed 1 times, most recent failure: Lost task 1.0 in stage 26.0 (TID 605, localhost, executor driver): org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[127.0.0.1:9200]]
at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:149)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:380)
at org.elasticsearch.hadoop.rest.RestClient.executeNotFoundAllowed(RestClient.java:388)
at org.elasticsearch.hadoop.rest.RestClient.exists(RestClient.java:484)
at org.elasticsearch.hadoop.rest.RestClient.indexExists(RestClient.java:479)
at org.elasticsearch.hadoop.rest.RestClient.touch(RestClient.java:490)
at org.elasticsearch.hadoop.rest.RestRepository.touch(RestRepository.java:352)
at org.elasticsearch.hadoop.rest.RestService.initSingleIndex(RestService.java:612)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:600)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:101)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:101)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
2018-10-30 03:40:30 INFO DAGScheduler:54 - Job 6 failed: runJob at EsSparkSQL.scala:101, took 0.874579 s
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 26.0 failed 1 times, most recent failure: Lost task 1.0 in stage 26.0 (TID 605, localhost, executor driver): org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[127.0.0.1:9200]]
at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:149)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:380)
at org.elasticsearch.hadoop.rest.RestClient.executeNotFoundAllowed(RestClient.java:388)
at org.elasticsearch.hadoop.rest.RestClient.exists(RestClient.java:484)
at org.elasticsearch.hadoop.rest.RestClient.indexExists(RestClient.java:479)
at org.elasticsearch.hadoop.rest.RestClient.touch(RestClient.java:490)
at org.elasticsearch.hadoop.rest.RestRepository.touch(RestRepository.java:352)
at org.elasticsearch.hadoop.rest.RestService.initSingleIndex(RestService.java:612)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:600)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:101)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:101)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1651)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1639)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1638)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1638)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1872)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1821)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1810)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2055)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2087)
at org.elasticsearch.spark.sql.EsSparkSQL$.saveToEs(EsSparkSQL.scala:101)
at org.elasticsearch.spark.sql.EsSparkSQL$.saveToEs(EsSparkSQL.scala:80)
at org.elasticsearch.spark.sql.package$SparkDataFrameFunctions.saveToEs(package.scala:48)
at com.gerritforge.analytics.job.Job$$anonfun$saveES$1.apply(Main.scala:210)
at com.gerritforge.analytics.job.Job$$anonfun$saveES$1.apply(Main.scala:207)
at scala.Option.foreach(Option.scala:257)
at com.gerritforge.analytics.job.Job$class.saveES(Main.scala:207)
at com.gerritforge.analytics.job.Main$.saveES(Main.scala:35)
at com.gerritforge.analytics.job.Main$.delayedEndpoint$com$gerritforge$analytics$job$Main$1(Main.scala:115)
at com.gerritforge.analytics.job.Main$delayedInit$body.apply(Main.scala:35)
at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at com.gerritforge.analytics.job.Main$.main(Main.scala:35)
at com.gerritforge.analytics.job.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[127.0.0.1:9200]]
at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:149)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:380)
at org.elasticsearch.hadoop.rest.RestClient.executeNotFoundAllowed(RestClient.java:388)
at org.elasticsearch.hadoop.rest.RestClient.exists(RestClient.java:484)
at org.elasticsearch.hadoop.rest.RestClient.indexExists(RestClient.java:479)
at org.elasticsearch.hadoop.rest.RestClient.touch(RestClient.java:490)
at org.elasticsearch.hadoop.rest.RestRepository.touch(RestRepository.java:352)
at org.elasticsearch.hadoop.rest.RestService.initSingleIndex(RestService.java:612)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:600)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:101)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:101)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2018-10-30 03:40:30 INFO SparkContext:54 - Invoking stop() from shutdown hook
2018-10-30 03:40:30 INFO AbstractConnector:318 - Stopped Spark@745aef8d{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2018-10-30 03:40:30 INFO SparkUI:54 - Stopped Spark web UI at http://9775ea57fd23:4040
2018-10-30 03:40:30 INFO MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped!
2018-10-30 03:40:30 INFO MemoryStore:54 - MemoryStore cleared
2018-10-30 03:40:30 INFO BlockManager:54 - BlockManager stopped
2018-10-30 03:40:30 INFO BlockManagerMaster:54 - BlockManagerMaster stopped
2018-10-30 03:40:30 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped!
2018-10-30 03:40:31 INFO SparkContext:54 - Successfully stopped SparkContext
2018-10-30 03:40:31 INFO ShutdownHookManager:54 - Shutdown hook called
2018-10-30 03:40:31 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-a348143a-875d-45ed-a502-9484e16859cb
2018-10-30 03:40:31 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-9c2e9a6a-5bcf-42ed-bca9-7cb55e932fd1

It seems still network issue.

Any idea?

Best regards,
Shiping

Fabio Ponciroli

unread,

Nov 2, 2018, 2:36:00 PM11/2/18

to shipingc, repo-d...@googlegroups.com

Hi Shiping,

good to hear it is working! Let us know if you need any more help with it.

By the way, to simplify the setup of the whole infrastructure we have been working on this plugin: https://gerrit.googlesource.com/plugins/analytics-wizard/

Have a look if it is something that can be of any help.

Thanks,

Fabio

Il giorno mer 31 ott 2018 alle ore 03:14 shipingc <sher...@gmail.com> ha scritto:

Hi Fabio,

I eventually took a workaround in which I installed elasticsearch and kibana on a machine, and run the docker job on another machine.

The feature is very nice! Thank you very much for the nice work!

Shiping

shipingc

unread,

Nov 5, 2018, 3:36:02 AM11/5/18

to Fabio Ponciroli, repo-d...@googlegroups.com, Luca Milanesio, Antonio Barone, Stefano Galarraga

shipingc

unread,

Nov 5, 2018, 3:36:03 AM11/5/18

to Repo and Gerrit Discussion

HI Fabio,

I just use the default configuration. BTW, I use version 6.4.1 for both elasticsearch and kibana. 6.4.2 has some other startup issues.

Thanks,

Shiping

elasticsearch.yml

# ======================== Elasticsearch Configuration =========================

# NOTE: Elasticsearch comes with reasonable defaults for most settings.

# Before you set out to tweak and tune the configuration, make sure you

# understand what are you trying to accomplish and the consequences.

# The primary way of configuring a node is via this file. This template lists

# the most important settings you may want to configure for a production cluster.

# Please consult the documentation for further information on configuration options:

# https://www.elastic.co/guide/en/elasticsearch/reference/index.html

# ---------------------------------- Cluster -----------------------------------

# Use a descriptive name for your cluster:

#cluster.name: my-application

# ------------------------------------ Node ------------------------------------

# Use a descriptive name for the node:

#node.name: node-1

# Add custom attributes to the node:

#node.attr.rack: r1

# ----------------------------------- Paths ------------------------------------

# Path to directory where to store the data (separate multiple locations by comma):

#path.data: /path/to/data

# Path to log files:

#path.logs: /path/to/logs

# ----------------------------------- Memory -----------------------------------

# Lock the memory on startup:

#bootstrap.memory_lock: true

# Make sure that the heap size is set to about half the memory available

# on the system and that the owner of the process is allowed to use this

# limit.

# Elasticsearch performs poorly when the system is swapping the memory.

# ---------------------------------- Network -----------------------------------

# Set the bind address to a specific IP (IPv4 or IPv6):

#network.host: 192.168.0.1

# Set a custom port for HTTP:

#http.port: 9200

# For more information, consult the network module documentation.

# --------------------------------- Discovery ----------------------------------

# Pass an initial list of hosts to perform discovery when new node is started:

# The default list of hosts is ["127.0.0.1", "[::1]"]

#discovery.zen.ping.unicast.hosts: ["host1", "host2"]

# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):

#discovery.zen.minimum_master_nodes:

# For more information, consult the zen discovery module documentation.

# ---------------------------------- Gateway -----------------------------------

# Block initial recovery after a full cluster restart until N nodes are started:

#gateway.recover_after_nodes: 3

# For more information, consult the gateway module documentation.

# ---------------------------------- Various -----------------------------------

# Require explicit names when deleting indices:

#action.destructive_requires_name: true

shipingc

unread,

Dec 20, 2018, 7:24:37 PM12/20/18

to Repo and Gerrit Discussion

Hi Fabio,

It has been a while since I successfully got the nice tool run. Recently I tried to rerun the etl in docker to re-populate the latest data, however the command hangs there:

PS C:\Users\shiping.chen> docker run -ti --rm -e ES_HOST=30.57.186.97 -e GERRIT_URL="http://x-dev.alibaba.net:8080" -e ANALYTICS_ARGS="--since 2018-08-03 --aggregate email_hour -e gerrit" gerritforge/gerrit-analytics-etl-gitcommits:latest

Unable to find image 'gerritforge/gerrit-analytics-etl-gitcommits:latest' locally

latest: Pulling from gerritforge/gerrit-analytics-etl-gitcommits

4fe2ade4980c: Already exists

6fc58a8d4ae4: Already exists

ef87ded15917: Pull complete

28f8e02fea6a: Pull complete

6f3c2b9d6b74: Pull complete

8b3a5087354d: Pull complete

16fc39044a9d: Pull complete

f309e443c9d2: Pull complete

1b92c11b208f: Pull complete

Digest: sha256:bcf38217d1cd189af79fec022b3a8a5874f4825b453d7b28ec04154121073ac7

Status: Downloaded newer image for gerritforge/gerrit-analytics-etl-gitcommits:latest

* Elastic Search Host: 30.57.186.97:9200

* Gerrit URL: http://x-dev.alibaba.net:8080

* Analytics arguments: --since 2018-08-03 --aggregate email_hour -e gerrit

* Spark jar class: com.gerritforge.analytics.gitcommits.job.Main

* Spark jar path: /app/analytics-etl-gitcommits-assembly.jar

Elasticsearch is up, now running spark job...

2018-12-21 00:12:05 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

2018-12-21 00:12:06 INFO Main$:103 - Starting analytics app with config GerritEndpointConfig(Some(http://x-dev.alibaba.net:8080),None,file:///tmp/analytics-54550104399800,Some(gerrit),Some(2018-08-03),None,Some(email_hour),None,None,None,None,None,None,None)

2018-12-21 00:12:06 INFO SparkContext:54 - Running Spark version 2.3.2

2018-12-21 00:12:06 INFO SparkContext:54 - Submitted application: Gerrit GitCommits Analytics ETL

2018-12-21 00:12:06 INFO SecurityManager:54 - Changing view acls to: root

2018-12-21 00:12:06 INFO SecurityManager:54 - Changing modify acls to: root

2018-12-21 00:12:06 INFO SecurityManager:54 - Changing view acls groups to:

2018-12-21 00:12:06 INFO SecurityManager:54 - Changing modify acls groups to:

2018-12-21 00:12:06 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()

2018-12-21 00:12:06 INFO Utils:54 - Successfully started service 'sparkDriver' on port 36815.

2018-12-21 00:12:06 INFO SparkEnv:54 - Registering MapOutputTracker

2018-12-21 00:12:06 INFO SparkEnv:54 - Registering BlockManagerMaster

2018-12-21 00:12:06 INFO BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information

2018-12-21 00:12:06 INFO BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up

2018-12-21 00:12:06 INFO DiskBlockManager:54 - Created local directory at /tmp/blockmgr-59e490ee-76f1-4b76-b8df-357857eb846c

2018-12-21 00:12:06 INFO MemoryStore:54 - MemoryStore started with capacity 366.3 MB

2018-12-21 00:12:06 INFO SparkEnv:54 - Registering OutputCommitCoordinator

2018-12-21 00:12:06 INFO log:192 - Logging initialized @3119ms

2018-12-21 00:12:07 INFO Server:351 - jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown

2018-12-21 00:12:07 INFO Server:419 - Started @3276ms

2018-12-21 00:12:07 INFO AbstractConnector:278 - Started ServerConnector@5e663be5{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}

2018-12-21 00:12:07 INFO Utils:54 - Successfully started service 'SparkUI' on port 4040.

2018-12-21 00:12:07 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6c44052e{/jobs,null,AVAILABLE,@Spark}

2018-12-21 00:12:07 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4fdf8f12{/jobs/json,null,AVAILABLE,@Spark}

2018-12-21 00:12:07 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4a8b5227{/jobs/job,null,AVAILABLE,@Spark}

2018-12-21 00:12:07 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6979efad{/jobs/job/json,null,AVAILABLE,@Spark}

2018-12-21 00:12:07 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5a6d5a8f{/stages,null,AVAILABLE,@Spark}

2018-12-21 00:12:07 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4a67318f{/stages/json,null,AVAILABLE,@Spark}

2018-12-21 00:12:07 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@315ba14a{/stages/stage,null,AVAILABLE,@Spark}

2018-12-21 00:12:07 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@54e81b21{/stages/stage/json,null,AVAILABLE,@Spark}

2018-12-21 00:12:07 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@38d5b107{/stages/pool,null,AVAILABLE,@Spark}

2018-12-21 00:12:07 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6650813a{/stages/pool/json,null,AVAILABLE,@Spark}

2018-12-21 00:12:07 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@44ea608c{/storage,null,AVAILABLE,@Spark}

2018-12-21 00:12:07 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@50cf5a23{/storage/json,null,AVAILABLE,@Spark}

2018-12-21 00:12:07 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@450794b4{/storage/rdd,null,AVAILABLE,@Spark}

2018-12-21 00:12:07 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@273c947f{/storage/rdd/json,null,AVAILABLE,@Spark}

2018-12-21 00:12:07 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@30457e14{/environment,null,AVAILABLE,@Spark}

2018-12-21 00:12:07 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@632aa1a3{/executors,null,AVAILABLE,@Spark}

2018-12-21 00:12:07 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@20765ed5{/executors/json,null,AVAILABLE,@Spark}

2018-12-21 00:12:07 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3b582111{/executors/threadDump,null,AVAILABLE,@Spark}

2018-12-21 00:12:07 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2899a8db{/executors/threadDump/json,null,AVAILABLE,@Spark}

2018-12-21 00:12:07 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1e8823d2{/static,null,AVAILABLE,@Spark}

2018-12-21 00:12:07 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@251ebf23{/,null,AVAILABLE,@Spark}

2018-12-21 00:12:07 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@29b732a2{/api,null,AVAILABLE,@Spark}

2018-12-21 00:12:07 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1162410a{/jobs/job/kill,null,AVAILABLE,@Spark}

2018-12-21 00:12:07 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@b09fac1{/stages/stage/kill,null,AVAILABLE,@Spark}

2018-12-21 00:12:07 INFO SparkUI:54 - Bound SparkUI to 0.0.0.0, and started at http://96c763388076:4040

2018-12-21 00:12:07 INFO SparkContext:54 - Added JAR file:/app/analytics-etl-gitcommits-assembly.jar at spark://96c763388076:36815/jars/analytics-etl-gitcommits-assembly.jar with timestamp 1545351127637

2018-12-21 00:12:07 INFO Executor:54 - Starting executor ID driver on host localhost

2018-12-21 00:12:07 INFO Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 37905.

2018-12-21 00:12:07 INFO NettyBlockTransferService:54 - Server created on 96c763388076:37905

2018-12-21 00:12:07 INFO BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy

2018-12-21 00:12:07 INFO BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, 96c763388076, 37905, None)

2018-12-21 00:12:07 INFO BlockManagerMasterEndpoint:54 - Registering block manager 96c763388076:37905 with 366.3 MB RAM, BlockManagerId(driver, 96c763388076, 37905, None)

2018-12-21 00:12:07 INFO BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, 96c763388076, 37905, None)

2018-12-21 00:12:07 INFO BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, 96c763388076, 37905, None)

2018-12-21 00:12:08 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@444548a0{/metrics/json,null,AVAILABLE,@Spark}

2018-12-21 00:12:08 INFO GerritConnectivity:61 - Connecting to API http://x-dev.alibaba.net:8080/projects/

2018-12-21 00:12:08 INFO Main$:142 - Loaded a list of 10 projects [GerritProject(All-Projects,All-Projects),GerritProject(test-project,test-project),GerritProject(X-DB,X-DB),GerritProject(persistent_cache,persistent_cache),GerritProject(histore,histore),GerritProject(AliSQL-8.0,AliSQL-8.0),GerritProject(All-Users,All-Users),GerritProject(X-Factory,X-Factory),GerritProject(X-DB5,X-DB5),GerritProject(newengine,newengine)]

2018-12-21 00:12:13 INFO SharedState:54 - Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/spark-warehouse').

2018-12-21 00:12:13 INFO SharedState:54 - Warehouse path is 'file:/spark-warehouse'.

2018-12-21 00:12:13 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@30517a57{/SQL,null,AVAILABLE,@Spark}

2018-12-21 00:12:13 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3dde5f38{/SQL/json,null,AVAILABLE,@Spark}

2018-12-21 00:12:13 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@721fc2e3{/SQL/execution,null,AVAILABLE,@Spark}

2018-12-21 00:12:13 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@63187d63{/SQL/execution/json,null,AVAILABLE,@Spark}

2018-12-21 00:12:13 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@44864536{/static/sql,null,AVAILABLE,@Spark}

2018-12-21 00:12:14 INFO StateStoreCoordinatorRef:54 - Registered StateStoreCoordinator endpoint

2018-12-21 00:12:16 INFO HashAggregateExec:54 - spark.sql.codegen.aggregate.map.twolevel.enabled is set to true, but current version of codegened fast hashmap does not support this aggregate.

2018-12-21 00:12:17 INFO CodeGenerator:54 - Code generated in 387.9738 ms

2018-12-21 00:12:17 INFO HashAggregateExec:54 - spark.sql.codegen.aggregate.map.twolevel.enabled is set to true, but current version of codegened fast hashmap does not support this aggregate.

2018-12-21 00:12:17 INFO CodeGenerator:54 - Code generated in 61.008 ms

2018-12-21 00:12:17 INFO CodeGenerator:54 - Code generated in 37.2962 ms

2018-12-21 00:12:18 INFO ContextCleaner:54 - Cleaned accumulator 0

2018-12-21 00:12:18 INFO CodeGenerator:54 - Code generated in 115.547 ms

2018-12-21 00:12:18 INFO CodeGenerator:54 - Code generated in 60.2091 ms

2018-12-21 00:12:18 INFO SparkContext:54 - Starting job: head at Main.scala:191

2018-12-21 00:12:18 INFO DAGScheduler:54 - Registering RDD 15 (rdd at Main.scala:188)

2018-12-21 00:12:18 INFO DAGScheduler:54 - Registering RDD 21 (keyBy at GerritEventsTransformations.scala:58)

2018-12-21 00:12:18 INFO DAGScheduler:54 - Registering RDD 20 (keyBy at GerritEventsTransformations.scala:57)

2018-12-21 00:12:18 INFO DAGScheduler:54 - Registering RDD 28 (groupBy at GerritEventsTransformations.scala:69)

2018-12-21 00:12:18 INFO DAGScheduler:54 - Got job 0 (head at Main.scala:191) with 1 output partitions

2018-12-21 00:12:18 INFO DAGScheduler:54 - Final stage: ResultStage 4 (head at Main.scala:191)

2018-12-21 00:12:18 INFO DAGScheduler:54 - Parents of final stage: List(ShuffleMapStage 3)

2018-12-21 00:12:18 INFO DAGScheduler:54 - Missing parents: List(ShuffleMapStage 3)

2018-12-21 00:12:18 INFO DAGScheduler:54 - Submitting ShuffleMapStage 0 (MapPartitionsRDD[15] at rdd at Main.scala:188), which has no missing parents

2018-12-21 00:12:18 INFO MemoryStore:54 - Block broadcast_0 stored as values in memory (estimated size 27.7 KB, free 366.3 MB)

2018-12-21 00:12:18 INFO MemoryStore:54 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 11.5 KB, free 366.3 MB)

2018-12-21 00:12:18 INFO BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on 96c763388076:37905 (size: 11.5 KB, free: 366.3 MB)

2018-12-21 00:12:18 INFO SparkContext:54 - Created broadcast 0 from broadcast at DAGScheduler.scala:1039

2018-12-21 00:12:18 INFO DAGScheduler:54 - Submitting 2 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[15] at rdd at Main.scala:188) (first 15 tasks are for partitions Vector(0, 1))

2018-12-21 00:12:18 INFO TaskSchedulerImpl:54 - Adding task set 0.0 with 2 tasks

2018-12-21 00:12:18 INFO TaskSetManager:54 - Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 8135 bytes)

2018-12-21 00:12:18 INFO TaskSetManager:54 - Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 8117 bytes)

2018-12-21 00:12:18 INFO Executor:54 - Running task 0.0 in stage 0.0 (TID 0)

2018-12-21 00:12:18 INFO Executor:54 - Running task 1.0 in stage 0.0 (TID 1)

2018-12-21 00:12:18 INFO Executor:54 - Fetching spark://96c763388076:36815/jars/analytics-etl-gitcommits-assembly.jar with timestamp 1545351127637

2018-12-21 00:12:19 INFO TransportClientFactory:267 - Successfully created connection to 96c763388076/172.17.0.2:36815 after 56 ms (0 ms spent in bootstraps)

2018-12-21 00:12:19 INFO Utils:54 - Fetching spark://96c763388076:36815/jars/analytics-etl-gitcommits-assembly.jar to /tmp/spark-fbfa42a9-9a8b-4b95-b1c9-b51e0a14268f/userFiles-c6853d5a-a1e2-4ced-b56e-d22e0ee40429/fetchFileTemp7646913260148081577.tmp

2018-12-21 00:12:19 INFO Executor:54 - Adding file:/tmp/spark-fbfa42a9-9a8b-4b95-b1c9-b51e0a14268f/userFiles-c6853d5a-a1e2-4ced-b56e-d22e0ee40429/analytics-etl-gitcommits-assembly.jar to class loader

2018-12-21 00:12:19 INFO CodeGenerator:54 - Code generated in 15.3798 ms

2018-12-21 00:12:19 INFO CodeGenerator:54 - Code generated in 16.9873 ms

2018-12-21 00:12:19 INFO CodeGenerator:54 - Code generated in 9.7323 ms

2018-12-21 00:12:19 INFO CodeGenerator:54 - Code generated in 32.58 ms

2018-12-21 00:12:19 INFO GerritConnectivity:61 - Connecting to API http://x-dev.alibaba.net:8080/projects/All-Projects/analytics~contributors?since=2018-08-03&aggregate=email_hour

2018-12-21 00:12:19 INFO GerritConnectivity:61 - Connecting to API http://x-dev.alibaba.net:8080/projects/AliSQL-8.0/analytics~contributors?since=2018-08-03&aggregate=email_hour

2018-12-21 00:12:20 INFO GerritConnectivity:61 - Connecting to API http://x-dev.alibaba.net:8080/projects/test-project/analytics~contributors?since=2018-08-03&aggregate=email_hour

2018-12-21 00:12:20 INFO GerritConnectivity:61 - Connecting to API http://x-dev.alibaba.net:8080/projects/X-DB/analytics~contributors?since=2018-08-03&aggregate=email_hour