(1)
2018-08-07 20:49:56 INFO http-nio-8420-exec-5:SparkFileSchemaParserService:206 - Created temporary file file:/tmp/kylo-spark-parser874516531883217874.dat success? true
2018-08-07 20:49:56 INFO http-nio-8420-exec-5:SparkFileSchemaParserService:137 - Script sqlContext.read.json("file:///tmp/kylo-spark-parser874516531883217874.dat").limit(10).toDF()
2018-08-07 20:49:56 ERROR http-nio-8420-exec-5:SparkFileSchemaParserService:101 - Error parsing file JSON: java.util.concurrent.ExecutionException: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 79.0 failed 1 times, most recent failure: Lost task 0.0 in stage 79.0 (TID 59, localhost, executor driver): java.io.FileNotFoundException: /tmp/blockmgr-2207eb29-af1c-4516-8069-c8c02a7c30f2/32/temp_shuffle_1eb6b692-8499-4f2f-802f-57747f445764 (No such file or directory)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at org.apache.spark.storage.DiskBlockObjectWriter.initialize(DiskBlockObjectWriter.scala:103)
at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:116)
at org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:237)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace:
(2)
2018-08-07 20:50:37 INFO http-nio-8420-exec-8:SparkFileSchemaParserService:206 - Created temporary file file:/tmp/kylo-spark-parser4221910046203569500.dat success? true
2018-08-07 20:50:37 INFO http-nio-8420-exec-8:SparkFileSchemaParserService:137 - Script sqlContext.read.json("file:///tmp/kylo-spark-parser4221910046203569500.dat").limit(10).toDF()
2018-08-07 20:50:37 ERROR http-nio-8420-exec-8:SparkFileSchemaParserService:101 - Error parsing file JSON: java.util.concurrent.ExecutionException: org.apache.spark.sql.AnalysisException: Since Spark 2.3, the queries from raw JSON/CSV files are disallowed when the
referenced columns only include the internal corrupt record column
(named _corrupt_record by default). For example:
spark.read.schema(schema).json(file).filter($"_corrupt_record".isNotNull).count()
and spark.read.schema(schema).json(file).select("_corrupt_record").show().
Instead, you can cache or save the parsed results and then send the same query.
For example, val df = spark.read.schema(schema).json(file).cache() and then
df.filter($"_corrupt_record".isNotNull).count().;