Setting the JobID (required for custom Tap)

jeffo

unread,

Aug 18, 2017, 6:19:56 AM8/18/17

to cascading-user

Is it possible to set the underlying jobID? I am writing a custom Tap and Scheme and the underlying InputFormat requires that a jobID be set. I am interfacing with BigQuery. Here is the exact same issue the cask framework was facing: https://issues.cask.co/browse/CDAP-10402. It looks like they have direct access to the underlying Job instance and therefore were able to set the jobID from there.

I have tried setting the job id via the JobConf in my Tap and Scheme and I have also attempted to set it via the JobConf in a custom FlowStepStrategy. Neither of these approaches succeeded. I still receive the following error:

Caused by: java.lang.NullPointerException: getSplits requires a jobID

at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:229)

at com.google.cloud.hadoop.io.bigquery.AbstractBigQueryInputFormat.getSplits(AbstractBigQueryInputFormat.java:102)

at com.twitter.elephantbird.mapred.input.DeprecatedInputFormatWrapper.getSplits(DeprecatedInputFormatWrapper.java:137)

at cascading.tap.hadoop.io.MultiInputFormat.getSplits(MultiInputFormat.java:200)

at cascading.tap.hadoop.io.MultiInputFormat.getSplits(MultiInputFormat.java:134)

at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:332)

at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:324)

at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:200)

at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307)

at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)

at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304)

at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:578)

at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:573)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)

at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:573)

at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:564)

at cascading.flow.hadoop.planner.HadoopFlowStepJob.internalNonBlockingStart(HadoopFlowStepJob.java:108)

at cascading.flow.planner.FlowStepJob.blockOnJob(FlowStepJob.java:207)

at cascading.flow.planner.FlowStepJob.start(FlowStepJob.java:150)

at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:124)

at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:43)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

Any help is appreciated.

Thanks,

Jeff

Chris K Wensel

unread,

Aug 21, 2017, 12:32:35 PM8/21/17

to cascadi...@googlegroups.com

Technically the job id is set.

So i guess the question is what property is that code trying to find the job id under in the jobconf?

Also keep in mind Cascading Hadoop2 uses the original/stable mapreduce apis, so there might be some impedance there.

ckw

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at https://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/bd02771e-acb2-411a-9dd0-7f893bf39844%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

—

Chris K Wensel

415-203-5022

ch...@wensel.net

https://www.linkedin.com/in/cwensel

jeffo

unread,

Aug 22, 2017, 4:50:33 AM8/22/17

to cascading-user

Thanks Chris. I believe the issue I was having stemmed from using a `mapreduce` to `mapred` InputFormat bridge (from Twitter's elephantbird), which was creating/using a JobContext with a null JobID. I am working around this by using my own delegate InputFormat which preserves the already set Job ID.

Reply all

Reply to author

Forward