Rubix with Spark on EMR Exception

69 views

Skip to first unread message

dee...@gmail.com

unread,

Dec 29, 2018, 9:25:10 AM12/29/18

to RubiX

Helllo.

I'm running Rubix with s3 on EMR with default installer and experiencing followng exception in spark driver logs:

18/12/28 13:29:50 INFO RetryingBookkeeperClient: Error while connecting : 
org.apache.thrift.shaded.TApplicationException: getCacheStatus failed: unknown result
...
18/12/28 13:29:50 INFO CachingInputStream: Could not get cache status from server org.apache.thrift.shaded.TException
        at com.qubole.rubix.spi.RetryingBookkeeperClient.retryConnection(RetryingBookkeeperClient.java:95)
        at com.qubole.rubix.spi.RetryingBookkeeperClient.getCacheStatus(RetryingBookkeeperClient.java:47)
        at com.qubole.rubix.core.CachingInputStream.setupReadRequestChains(CachingInputStream.java:304)
        at com.qubole.rubix.core.CachingInputStream.readInternal(CachingInputStream.java:230)
        at com.qubole.rubix.core.CachingInputStream.read(CachingInputStream.java:184)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
        at java.io.DataInputStream.read(DataInputStream.java:149)
        at org.apache.avro.mapred.FsInput.read(FsInput.java:46)
        at org.apache.avro.file.DataFileRbieader.openReader(DataFileReader.java:55)
        at com.databricks.spark.avro.DefaultSource$$anonfun$5.apply(DefaultSource.scala:86)
        at com.databricks.spark.avro.DefaultSource$$anonfun$5.apply(DefaultSource.scala:83)
        at scala.Option.getOrElse(Option.scala:121)
        at com.databricks.spark.avro.DefaultSource.inferSchema(DefaultSource.scala:83)
        at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:184)
        at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:184)
        at scala.Option.orElse(Option.scala:289)
        at org.apache.spark.sql.execution.datasources.DataSource.org$apache$spark$sql$execution$datasources$DataSource$$getOrInferFileFormatSchema(DataSource.scala:183)
        at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:387)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:135)
        at co.bigstream.benchmark.TPCSQ1$.main(TPCSQ1.scala:63)
        at co.bigstream.benchmark.TPCSQ1.main(TPCSQ1.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

And I can see following in /var/log/rubix/bks.log:

18/12/28 13:46:25,393 ERROR pool-6-thread-5 bookkeeper.BookKeeper: Could not initialize cluster nodes=[ip-172-31-21-10.us-west-2.compute.internal, ip-172-31-23-67.us-west-2.compute.internal] nodeHostName=ip-172-
31-31-168.us-west-2.compute.internal nodeHostAddress=172.31.31.168 currentNodeIndex=-1
18/12/28 13:46:25,393 ERROR pool-6-thread-5 bookkeeper.BookKeeper: Node name is null for Cluster TypeHADOOP2_CLUSTER_MANAGER
18/12/28 13:46:25,394 ERROR pool-6-thread-5 bookkeeper.BookKeeper: Could not initialize cluster nodes=[ip-172-31-21-10.us-west-2.compute.internal, ip-172-31-23-67.us-west-2.compute.internal] nodeHostName=ip-172-
31-31-168.us-west-2.compute.internal nodeHostAddress=172.31.31.168 currentNodeIndex=-1
18/12/28 13:46:25,394 ERROR pool-6-thread-5 bookkeeper.BookKeeper: Node name is null for Cluster TypeHADOOP2_CLUSTER_MANAGER
18/12/28 13:46:25,394 ERROR pool-6-thread-5 bookkeeper.BookKeeper: Could not initialize cluster nodes=[ip-172-31-21-10.us-west-2.compute.internal, ip-172-31-23-67.us-west-2.compute.internal] nodeHostName=ip-172-
31-31-168.us-west-2.compute.internal nodeHostAddress=172.31.31.168 currentNodeIndex=-1
18/12/28 13:46:25,394 ERROR pool-6-thread-5 bookkeeper.BookKeeper: Node name is null for Cluster TypeHADOOP2_CLUSTER_MANAGER

Please, help me.

ad...@qubole.com

unread,

Jan 4, 2019, 1:46:41 AM1/4/19

to RubiX

Hi,

We are aware of this issue. The error is related to nodeName and currentNodeIndex not being set. The main reason is the list of nodes that cluster manager provides doesn't include the master node. The driver running in the master node is trying to get the cache status of some file from its local bookkeeper and that call is throwing the exception. The exception is not going to cause any job failure and the executors should be able to read the data from rubix cache properly.

You can file an issue in github and we will take it as priority. Please let us know if your main job is failing because of this exception.

Regards,

Abhishek

Reply all

Reply to author

Forward

0 new messages