Hi Guys,
Pretty new to Tachyon and spark.. i am basically trying to write partitioned data into parquet files using spark on tachyon.. spark version 1.6.0 / Tachyon 0.8.2 on hadoop 2.71... following happens as soon as all writes are done... spark job tries to delete the _temporary folder and then it starts hitting the read timeouts.. this cause my job to fail even though i verified that all records got written without issues.. any help in getting this resolved is really appreciated.. tried increasing the spark executor memory however still i am hitting the same error consistently. i am using CACHE_THROUGH writes directly to HDFS..
>> code snipp
val hadoopConf = sc.hadoopConfiguration
hadoopConf.set("parquet.enable.summary-metadata", "false")
hadoopConf.set("mapreduce.fileoutputcommitter.marksuccessfuljobs", "false")
hadoopConf.set("fs.tachyon.impl", "tachyon.hadoop.TFS")
hadoopConf.set("parquet.metadata.read.parallelism", "15")
hadoopConf.set("spark.sql.parquet.output.committer.class","org.apache.spark.sql.parquet.DirectParquetOutputCommitter")
.......
rtrk_viewership.write.partitionBy("market_code", "program_start_date").mode(SaveMode.Append).parquet("tachyon://ip-10-1-83-211.ec2.internal:19998/QH/staging_rtrk_viewership")
<<
execption details are as follows..
-46ef-9ab2-acbd3b27ee29.gz.parquet): HDFS Path: hdfs://ip-10-1-83-211.ec2.internal:9000/QH/staging_rtrk_viewership/market_code=533/program_start_date=2015-12-05/part-r-00674-eb0e73e9
-e64f-46ef-9ab2-acbd3b27ee29.gz.parquet TPath: tachyon://ip-10-1-83-211.ec2.internal:19998/QH/staging_rtrk_viewership/market_code=533/program_start_date=2015-12-05/part-r-00674-eb0e7
3e9-e64f-46ef-9ab2-acbd3b27ee29.gz.parquet
16/02/22 09:32:07 INFO : File does not exist: tachyon://ip-10-1-83-211.ec2.internal:19998/QH/staging_rtrk_viewership/market_code=533/program_start_date=2015-12-05/part-r-00674-eb0e73e9-e64f-46ef-9ab2-acbd3b27ee29.gz.parquet
16/02/22 09:32:07 INFO : rename(tachyon://ip-10-1-83-211.ec2.internal:19998/QH/staging_rtrk_viewership/_temporary/0/task_201602220835_0000_m_000674/market_code=533/program_start_date=2015-12-05/part-r-00674-eb0e73e9-e64f-46ef-9ab2-acbd3b27ee29.gz.parquet, tachyon://ip-10-1-83-211.ec2.internal:19998/QH/staging_rtrk_viewership/market_code=533/program_start_date=2015-12-05/part-r-00674-eb0e73e9-e64f-46ef-9ab2-acbd3b27ee29.gz.parquet)
16/02/22 09:32:07 INFO : delete(tachyon://ip-10-1-83-211.ec2.internal:19998/QH/staging_rtrk_viewership/_temporary, true)
16/02/22 09:32:37 ERROR : java.net.SocketTimeoutException: Read timed out
tachyon.org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
at tachyon.org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
at tachyon.org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at tachyon.org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
at tachyon.org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
at tachyon.org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at tachyon.org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at tachyon.org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at tachyon.org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at tachyon.org.apache.thrift.protocol.TProtocolDecorator.readMessageBegin(TProtocolDecorator.java:135)
at tachyon.org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at tachyon.thrift.FileSystemMasterService$Client.recv_deleteFile(FileSystemMasterService.java:265)
at tachyon.thrift.FileSystemMasterService$Client.deleteFile(FileSystemMasterService.java:251)
at tachyon.client.FileSystemMasterClient.deleteFile(FileSystemMasterClient.java:289)
at tachyon.client.TachyonFS.delete(TachyonFS.java:377)
at tachyon.client.AbstractTachyonFS.delete(AbstractTachyonFS.java:109)
at tachyon.client.TachyonFS.delete(TachyonFS.java:66)
at tachyon.hadoop.AbstractTFS.delete(AbstractTFS.java:199)
at tachyon.hadoop.TFS.delete(TFS.java:27)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.cleanupJob(FileOutputCommitter.java:381)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:314)
at org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:46)
at org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitJob(WriterContainer.scala:230)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelation.scala:151)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:108)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:256)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:139)
at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:329)
at RentrakQHLoadStaging$.main(RentrakQHLoadStaging.scala:213)
at RentrakQHLoadStaging.main(RentrakQHLoadStaging.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at tachyon.org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
... 50 more
16/02/22 09:32:37 INFO : Tachyon client (version 0.8.2) is trying to connect with FileSystemMaster master @ ip-10-1-83-211.ec2.internal/
10.1.83.211:1999816/02/22 09:32:37 INFO : Client registered with FileSystemMaster master @ ip-10-1-83-211.ec2.internal/
10.1.83.211:1999816/02/22 09:33:07 ERROR : java.net.SocketTimeoutException: Read timed out
tachyon.org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
Thanks
Azad