EsHadoopIllegalArgumentException - Cannot detect ES version

1,449 views
Skip to first unread message

shish...@gmail.com

unread,
Apr 27, 2020, 1:03:37 AM4/27/20
to actionml-user
Hi 

I am running the harness from the 
harness-docker-compos
repository.


I create an engine with the following configuration:

{
"engineId": "test_ur",
"engineFactory": "com.actionml.engines.ur.UREngine",
"sparkConf": {
"master": "local",
"spark.driver.memory": "3g",
"spark.executor.memory": "1g",
"spark.serializer": "org.apache.spark.serializer.KryoSerializer",
"spark.kryo.registrator": "org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",
"spark.kryo.referenceTracking": "false",
"spark.kryoserializer.buffer": "300m",
"spark.es.index.auto.create": "true",
"spark.es.nodes": "localhost",
"es.nodes":"localhost",
"spark.es.nodes.wan.only": "true",
"es.nodes.wan.only":"true"
},
"algorithm": {
"indicators": [
{
"name": "purchase"
},
{
"name": "view"
},
{
"name": "category-pref"
}
],
"num": 4
}
}


I add nearly 250 events to the database and then run a training job. The get the following error in the harness logs:

04:45:19.004 ERROR NetworkClient     - Node [localhost:9200] failed (java.net.ConnectException: Connection refused (Connection refused)); no other nodes left - aborting...
04:45:19.011 ERROR URAlgorithm       - Spark computation failed for engine test_ur with params {{"engineId":"test_ur","engineFactory":"com.actionml.engines.ur.UREngine","sparkConf":{"master":"local","spark.driver.memory":"3g","spark.executor.memory":"1g","spark.serializer":"org.apache.spark.serializer.KryoSerializer","spark.kryo.registrator":"org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator","spark.kryo.referenceTracking":"false","spark.kryoserializer.buffer":"300m","spark.es.index.auto.create":"true","spark.es.nodes":"localhost","es.nodes":"localhost","spark.es.nodes.wan.only":"true","es.nodes.wan.only":"true"},"algorithm":{"indicators":[{"name":"purchase"},{"name":"view"},{"name":"category-pref"}],"num":4}}}
org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only


The elastic search is working and available at port 9200. I have even tried connecting it with a python client just to make sure that it works.
Even the spark config for ES is as mentioned in some of the other answers.

"spark.es.nodes": "localhost",
"es.nodes":"localhost",
"spark.es.nodes.wan.only": "true",
"es.nodes.wan.only":"true"




The spark jobs is able to creates indices, although the sizes are in bytes.


produces following results.

yellow open test_ur_1587957559524  x0FasDlKQoiPzcsH7Y1lbw 1 1 0 0  283b  283b
yellow open test_ur_1587962717198  4GwAdFlrQLOr9IMqQJg-pA 1 1 0 0  283b  283b
yellow open test_ur_1587954743979  HI4adyyyRQm25gVBIf1wRw 1 1 0 0  283b  283b
yellow open test_ur_1587945057452  gWGq94QlRBm948zar3eTEg 1 1 0 0  283b  283b

After this it produces the error mentioned above. How is it able to create these indices but not able to write data?

Thanks.




Pat Ferrel

unread,
Apr 27, 2020, 11:58:54 AM4/27/20
to shish...@gmail.com, actionml-user
Look in harness logs to see why Spark jobs fail. It is 99% caused by not enough memory but the ERRORs in logs will be a variety of things. Spark uses in-memory calculations to speed training and when it runs out it throws exceptions. 

As I have said many times, docker-compose is limited in scaling and so is not recommended for production deployments. Ideally each service should be put in independent deployments so they can be scaled independently. We see most scaling needs in Spark but performance can be boosted by scaling Mongo and ES.  
--
You received this message because you are subscribed to the Google Groups "actionml-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to actionml-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/actionml-user/5118afa5-28cb-4a46-a6c6-dac3993b47f1%40googlegroups.com.

shish...@gmail.com

unread,
Apr 27, 2020, 12:29:41 PM4/27/20
to actionml-user
The spark job does not fail. 

{
      "jobId": "f28cd967-90cd-4e4d-b800-9ac1d6e3fec6",
      "status": {
        "name": "successful"
      },
      "comment": "Spark job",
      "createdAt": "2020-04-27T04:44:56.985Z",
      "completedAt": "2020-04-27T04:45:19.030Z"
    },


In fact, I see the following popRank in the logs too:

04:45:15.830 INFO  URAlgorithm       - RankRDDs[1]
popRank
(4,5.0)
(14,8.0)
(19,8.0)
(7,7.0)
(42,8.0)
(36,5.0)
(6,4.0)
(37,4.0)
(45,4.0)
(2,13.0)
(16,14.0)
(41,7.0)


It is writing to ES that fails for some reason. ES is working fine, I can access it from the browser and also using python, from which can also write to it. Only harness is not able to write to it, excepting for creating those byte size indices. 

I have also added all the configuration needed to write to ES, still can't figure out what is wrong. This is what is present in the harness logs.

04:45:17.191 INFO  DAGScheduler      - Job 19 finished: collect at URModel.scala:81, took 1.103642 s
04:45:17.198 INFO  ElasticSearchClient$$anon$1 - Create new index: test_ur_1587962717198, List(popRank, category, purchase, id), Map(id -> (keyword,true), category-pref -> (keyword,true), category -> (keyword,true), purchase -> (keyword,true), popRank -> (float,false), view -> (keyword,true))
04:45:18.465 WARN  RestClient        - request [PUT http://elasticsearch:9200/test_ur_1587962717198?include_type_name=true] returned 1 warnings: [299 Elasticsearch-7.6.0-7f634e9f44834fbc12724506cc1da681b0c3b1e3 "[types removal] Using include_type_name in create index requests is deprecated. The parameter will be removed in the next major version."]
04:45:18.468 INFO  ElasticSearchClient$$anon$1 - Number of ES connections for saveToEs: 1
04:45:18.995 INFO  HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused (Connection refused)
04:45:18.996 INFO  HttpMethodDirector - Retrying request
04:45:18.997 INFO  HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused (Connection refused)
04:45:18.999 INFO  HttpMethodDirector - Retrying request
04:45:19.000 INFO  HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused (Connection refused)
04:45:19.002 INFO  HttpMethodDirector - Retrying request
04:45:19.004 ERROR NetworkClient     - Node [localhost:9200] failed (java.net.ConnectException: Connection refused (Connection refused)); no other nodes left - aborting...
04:45:19.011 ERROR URAlgorithm       - Spark computation failed for engine test_ur with params {{"engineId":"test_ur","engineFactory":"com.actionml.engines.ur.UREngine","sparkConf":{"master":"local","spark.driver.memory":"3g","spark.executor.memory":"1g","spark.serializer":"org.apache.spark.serializer.KryoSerializer","spark.kryo.registrator":"org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator","spark.kryo.referenceTracking":"false","spark.kryoserializer.buffer":"300m","spark.es.index.auto.create":"true","spark.es.nodes":"localhost","es.nodes":"localhost","spark.es.nodes.wan.only":"true","es.nodes.wan.only":"true"},"algorithm":{"indicators":[{"name":"purchase"},{"name":"view"},{"name":"category-pref"}],"num":4}}}
org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'
at org.elasticsearch.hadoop.rest.InitializationUtils.discoverClusterInfo(InitializationUtils.java:340)
at org.elasticsearch.spark.rdd.EsSpark$.doSaveToEs(EsSpark.scala:104)
at org.elasticsearch.spark.rdd.EsSpark$.saveToEs(EsSpark.scala:79)
at org.elasticsearch.spark.rdd.EsSpark$.saveToEs(EsSpark.scala:76)
at org.elasticsearch.spark.package$SparkRDDFunctions.saveToEs(package.scala:56)
at com.actionml.core.search.elasticsearch.ElasticSearchClient.hotSwap(ElasticSearchSupport.scala:379)
at com.actionml.engines.ur.URModel.save(URModel.scala:83)
at com.actionml.engines.ur.URAlgorithm$$anonfun$train$1.apply(URAlgorithm.scala:296)
at com.actionml.engines.ur.URAlgorithm$$anonfun$train$1.apply(URAlgorithm.scala:255)
at scala.util.Success$$anonfun$map$1.apply(Try.scala:237)
at scala.util.Try$.apply(Try.scala:192)
at scala.util.Success.map(Try.scala:237)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
at scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[localhost:9200]]
at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:160)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:432)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:428)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:388)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:392)
at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:168)
at org.elasticsearch.hadoop.rest.RestClient.mainInfo(RestClient.java:745)
at org.elasticsearch.hadoop.rest.InitializationUtils.discoverClusterInfo(InitializationUtils.java:330)
... 20 common frames omitted





On Monday, 27 April 2020 11:58:54 UTC-4, pat wrote:
Look in harness logs to see why Spark jobs fail. It is 99% caused by not enough memory but the ERRORs in logs will be a variety of things. Spark uses in-memory calculations to speed training and when it runs out it throws exceptions. 

As I have said many times, docker-compose is limited in scaling and so is not recommended for production deployments. Ideally each service should be put in independent deployments so they can be scaled independently. We see most scaling needs in Spark but performance can be boosted by scaling Mongo and ES.  

To unsubscribe from this group and stop receiving emails from it, send an email to action...@googlegroups.com.

Pat Ferrel

unread,
Apr 27, 2020, 2:08:00 PM4/27/20
to shish...@gmail.com, actionml-user
The log is saying that the Spark job failed to launch because the ES version could not be detected. Are you using the latest image for harness:develop?
To unsubscribe from this group and stop receiving emails from it, send an email to actionml-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/actionml-user/7fe62433-b4f6-45d9-bc0b-1e36eeae6f62%40googlegroups.com.

Pat Ferrel

unread,
Apr 27, 2020, 2:18:45 PM4/27/20
to shish...@gmail.com, actionml-user
Try `docker-compose pull` to get the latest images

shish...@gmail.com

unread,
Apr 27, 2020, 2:41:31 PM4/27/20
to actionml-user
Hi 

I tried the `docker-compose pull` it still gives the same issue with the latest image.

But, then I tried running each service separately and was able to get the training and serving to work. I guess there is some issue with the docker-compose image. 
Also, I would suggest that if the job actually fails then it should not be marked `successful` one should get that the job `failed` when they get the job status.

I can create an issue on github describing the problem.

Thanks.

--
You received this message because you are subscribed to the Google Groups "actionml-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to action...@googlegroups.com.

Pat Ferrel

unread,
Apr 27, 2020, 2:49:05 PM4/27/20
to shish...@gmail.com, actionml-user
Ok, please do create an issue.

It is very difficult to check for all conditions that result in a failed job. The job is run asynchronously and until the Job is created (which relies on Spark accepting the job) it cannot be tagged as “failed”. 

Make sure the job you see as “successful” has the same job-id as the one that failed. We would expect that there is no job-id until the job is accepted by Spark.

If it does have the same job-id, then we have a problem in the status that we should be able to fix so add that information to the issue report.

Thanks
To unsubscribe from this group and stop receiving emails from it, send an email to actionml-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/actionml-user/a5ad2447-34a9-4070-84ab-5f6975a3892a%40googlegroups.com.

Pat Ferrel

unread,
Apr 27, 2020, 7:43:58 PM4/27/20
to shish...@gmail.com, actionml-user
The error: 

ERROR NetworkClient     - Node [localhost:9200] failed (java.net.ConnectException: Connection refused (Connection refused)); no other nodes left - aborting...

indicates that the CONTAINER cannot connect to ES on its “localhost:9200” location. The container’s “localhost” is not the same as the host’s localhost. Containers work somewhat like VMs. For a container to container connection use the container name, in this case “elasticsearch:9200” not “localhost:9200”. These container names are maintained by the “docker-compose network” as if they almost like DNS names. 

TL;DR set “spark.es.nodes”:”elasticsearch” when using docker-compose. You may want to brush up on how docker-compose maintains a pseudo-network. 

es.nodes is not used anymore, only keys that start with “spark.” are needed.

Also be aware that the docker-compose.yml references a tag id for all images. The published image will change even though the tag remains the same. docker-compose pull tells the system to refresh all images that are out of date. Another docker/docker-compose detail you will want to understand is how containers are instantiated. We include the watchtower container to refresh automatically in some cases.

Also be aware that by default our docker-compose project launches images from our develop branch, which does not contain released code. This is meant for people who need to run pre-released code.

If you run docker-compose in production, you should fork the project and pin versions to known image tags like release numbers. This is left for you to do as you wish.
To unsubscribe from this group and stop receiving emails from it, send an email to actionml-use...@googlegroups.com.

shish...@gmail.com

unread,
Apr 27, 2020, 9:10:09 PM4/27/20
to actionml-user
Thanks a lot. 

I changed it to "elasticsearch" and it works now. I had tried a few different combinations of the ip address but none of them had worked. Since, it was able to create those byte size indices I assumed, ES is accessible from the containers, but for some reason it was not able to write to it.

Thanks for all the help.


On Monday, 27 April 2020 19:43:58 UTC-4, pat wrote:
The error: 

ERROR NetworkClient     - Node [localhost:9200] failed (java.net.ConnectException: Connection refused (Connection refused)); no other nodes left - aborting...

indicates that the CONTAINER cannot connect to ES on its “localhost:9200” location. The container’s “localhost” is not the same as the host’s localhost. Containers work somewhat like VMs. For a container to container connection use the container name, in this case “elasticsearch:9200” not “localhost:9200”. These container names are maintained by the “docker-compose network” as if they almost like DNS names. 

TL;DR set “spark.es.nodes”:”elasticsearch” when using docker-compose. You may want to brush up on how docker-compose maintains a pseudo-network. 

es.nodes is not used anymore, only keys that start with “spark.” are needed.

Also be aware that the docker-compose.yml references a tag id for all images. The published image will change even though the tag remains the same. docker-compose pull tells the system to refresh all images that are out of date. Another docker/docker-compose detail you will want to understand is how containers are instantiated. We include the watchtower container to refresh automatically in some cases.

Also be aware that by default our docker-compose project launches images from our develop branch, which does not contain released code. This is meant for people who need to run pre-released code.

If you run docker-compose in production, you should fork the project and pin versions to known image tags like release numbers. This is left for you to do as you wish.

To unsubscribe from this group and stop receiving emails from it, send an email to action...@googlegroups.com.

Pat Ferrel

unread,
Apr 28, 2020, 12:20:15 PM4/28/20
to shish...@gmail.com, actionml-user
Docker, and it’s more complex cousin Kubernetes, both deliver many benefits over old style native host installations but they come with a new set of concepts and tools. 

Docker-compose makes installation super easy, compared with native installs. But this ease can be deceptive — there is a cost for this.

For instance:
 - you won’t ssh to an ip address, you’ll use docker of k8s to login to a container or run a command on the container
 - you won’t tail -f logs, you’ll use docker-compose and k8s to view them
 - you will need to explicitly attach a service to localhost with configuration and this forwards traffic back and forth between the host and containers
 - docker-compose runs on a single machine so all containers share the host resources
 - containers must communicate via network interfaces that involve routing even on a single machine — in fact k8s comes with a built in DNS resolver
 - All config for a container is passed in via env variables set when a container is launched — these are specified in different ways for docker-compose and k8s
 - in docker-compose the same storage can be attached to more than one container and will be seen in different places in the filesystems of the host and each container. This mapping of filesystems can be hard to visualize until you become familiar with it and the mapping is controlled by configuration so it changes with config.

There are many other differences — too many to note here.

We encourage anyone using docker-compose to not underestimate these differences in operation. Do some reading if you are unfamiliar with container technology.


From: shish...@gmail.com <shish...@gmail.com>
Date: April 27, 2020 at 6:10:08 PM
To: actionml-user <action...@googlegroups.com>
Subject:  Re: EsHadoopIllegalArgumentException - Cannot detect ES version
Thanks a lot. 

I changed it to "elasticsearch" and it works now. I had tried a few different combinations of the ip address but none of them had worked. Since, it was able to create those byte size indices I assumed, ES is accessible from the containers, but for some reason it was not able to write to it.

Thanks for all the help.

On Monday, 27 April 2020 19:43:58 UTC-4, pat wrote:
To unsubscribe from this group and stop receiving emails from it, send an email to action...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "actionml-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to actionml-use...@googlegroups.com.

shish...@gmail.com

unread,
Apr 28, 2020, 2:42:30 PM4/28/20
to actionml-user
Thanks a lot for this quick primer on containers. I will definitely read up more on the technology.


On Tuesday, 28 April 2020 12:20:15 UTC-4, pat wrote:
Docker, and it’s more complex cousin Kubernetes, both deliver many benefits over old style native host installations but they come with a new set of concepts and tools. 

Docker-compose makes installation super easy, compared with native installs. But this ease can be deceptive — there is a cost for this.

For instance:
 - you won’t ssh to an ip address, you’ll use docker of k8s to login to a container or run a command on the container
 - you won’t tail -f logs, you’ll use docker-compose and k8s to view them
 - you will need to explicitly attach a service to localhost with configuration and this forwards traffic back and forth between the host and containers
 - docker-compose runs on a single machine so all containers share the host resources
 - containers must communicate via network interfaces that involve routing even on a single machine — in fact k8s comes with a built in DNS resolver
 - All config for a container is passed in via env variables set when a container is launched — these are specified in different ways for docker-compose and k8s
 - in docker-compose the same storage can be attached to more than one container and will be seen in different places in the filesystems of the host and each container. This mapping of filesystems can be hard to visualize until you become familiar with it and the mapping is controlled by configuration so it changes with config.

There are many other differences — too many to note here.

We encourage anyone using docker-compose to not underestimate these differences in operation. Do some reading if you are unfamiliar with container technology.

Reply all
Reply to author
Forward
0 new messages