Is it possible to set the underlying jobID? I am writing a custom Tap and Scheme and the underlying InputFormat requires that a jobID be set. I am interfacing with BigQuery. Here is the exact same issue the cask framework was facing:
. It looks like they have direct access to the underlying Job instance and therefore were able to set the jobID from there.
I have tried setting the job id via the JobConf in my Tap and Scheme and I have also attempted to set it via the JobConf in a custom FlowStepStrategy. Neither of these approaches succeeded. I still receive the following error:
Caused by: java.lang.NullPointerException: getSplits requires a jobID
at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:229)
at com.google.cloud.hadoop.io.bigquery.AbstractBigQueryInputFormat.getSplits(AbstractBigQueryInputFormat.java:102)
at com.twitter.elephantbird.mapred.input.DeprecatedInputFormatWrapper.getSplits(DeprecatedInputFormatWrapper.java:137)
at cascading.tap.hadoop.io.MultiInputFormat.getSplits(MultiInputFormat.java:200)
at cascading.tap.hadoop.io.MultiInputFormat.getSplits(MultiInputFormat.java:134)
at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:332)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:324)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:200)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:578)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:573)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:573)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:564)
at cascading.flow.hadoop.planner.HadoopFlowStepJob.internalNonBlockingStart(HadoopFlowStepJob.java:108)
at cascading.flow.planner.FlowStepJob.blockOnJob(FlowStepJob.java:207)
at cascading.flow.planner.FlowStepJob.start(FlowStepJob.java:150)
at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:124)
at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:43)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Any help is appreciated.