Depending on 2.0.0-wip builds of cascading-hive and cascading:cascading-hadoop2-mr1:jar:3.0.0-wip-+

44 views
Skip to first unread message

michael...@imagini.net

unread,
May 18, 2015, 8:26:23 AM5/18/15
to cascadi...@googlegroups.com
Hi,

I'm trying to have my (maven) project depend on a recent build of cascading-hive but I always get:

[ERROR] Failed to execute goal on project ...: Could not resolve dependencies for project ...: Could not find artifact cascading:cascading-hadoop2-mr1:jar:3.0.0-wip-+ in ...

I can't see this "3.0.0-wip-+" version in e.g. the conjars repository. The gradle build of cascading-hive works locally so presumably gradle is able to find it somehow? But I don't really know gradle and can't see any unusual repositories configured there.

Where do I get that version of cascading-hadoop2-mr1? Is there another maven repository I need to add to my pom, or something else?

Many thanks,
Michael

Andre Kelpe

unread,
May 18, 2015, 8:37:36 AM5/18/15
to cascadi...@googlegroups.com
This looks like a gradle problem on our CI server. I will investigate and create a new build of wip-2.0. Can you for the time being use the latest stable release of cascading-hive, namely 1.1?  http://conjars.org/cascading/cascading-hive/versions/1.1.0

- André



--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/fe0b5401-cd6a-4299-a3b4-e4d80082e202%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

michael...@imagini.net

unread,
May 18, 2015, 8:40:36 AM5/18/15
to cascadi...@googlegroups.com
I'm more concerned about having it fixed in git - I'm happy to build locally (I actually wanted to possibly make a change to cascading-hive, as I was having trouble reading a partitioned table - don't know whether that's already been fixed on wip-2.0 but it was a problem when I built my code against 1.1.0).

Thanks for looking at it,
Michael

Andre Kelpe

unread,
May 18, 2015, 8:47:27 AM5/18/15
to cascadi...@googlegroups.com
There is nothing to be fixed in git, gradle should resolve the version to the latest version before creating the pom file. (the 3.0.0-wip-+ works like a version range in maven). That seems to not happen and I need to investigate why that stopped working.

If you aren't using Cascading 3.0, you should stay on 1.1.The wip-2.0 branch is for Cascading 3.0 compatibility.

Can you explain what kind of issue with partitioned tables you are seeing?

- André


For more options, visit https://groups.google.com/d/optout.

michael...@imagini.net

unread,
May 18, 2015, 9:06:06 AM5/18/15
to cascadi...@googlegroups.com
I haven't isolated this yet, could easily be my code (which was using the other cascading-hive before but I'm trying to port to this), but:
I have a table partitioned by three keys.
I'm trying to use a source like this for scalding:

case class EventsSource() extends Source {
  val FIELDS = [...]
  val tableDescriptor = new HiveTableDescriptor("my_table",
    FIELDS.map(_.name).toArray,
    FIELDS.map(_.hiveType).toArray,
    Array("topic", "d", "hour"))

  def createTap(readOrWrite: AccessMode)(implicit mode: Mode): Tap[_, _, _] =
    CastHfsTap(new HiveTap(tableDescriptor, tableDescriptor.toScheme))
}

And when I'm running it I'm getting:

Exception in thread "main" cascading.flow.FlowException: unhandled exception
        at cascading.flow.BaseFlow.complete(BaseFlow.java:918)
        at com.twitter.scalding.Job.run(Job.scala:265)
        at com.twitter.scalding.Tool.start$1(Tool.scala:104)
        at com.twitter.scalding.Tool.run(Tool.scala:120)
        at com.twitter.scalding.Tool.run(Tool.scala:68)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at net.imagini.rd_core.utils.scalding.JobRunner$.runJob(JobRunner.scala:13)
        at net.imagini.rextract.extract.AttemptDataExtractor.extractEventsData(AttemptDataExtractor.scala:20)
        at net.imagini.rextract.RExtractMain.run(RExtractMain.scala:80)
        at net.imagini.rextract.RExtractMain$.main(RExtractMain.scala:31)
        at net.imagini.rextract.RExtractMain.main(RExtractMain.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.io.IOException: Not a file: hdfs://[...]/my-table/topic=foo
        at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:277)
        at cascading.tap.hadoop.io.MultiInputFormat.getSplits(MultiInputFormat.java:200)
        at cascading.tap.hadoop.io.MultiInputFormat.getSplits(MultiInputFormat.java:134)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:520)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:512)
        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1295)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1292)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1292)
        at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:564)
        at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:559)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:559)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:550)
        at cascading.flow.hadoop.planner.HadoopFlowStepJob.internalNonBlockingStart(HadoopFlowStepJob.java:107)
        at cascading.flow.planner.FlowStepJob.blockOnJob(FlowStepJob.java:196)
        at cascading.flow.planner.FlowStepJob.start(FlowStepJob.java:149)
        at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:124)
        at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:43)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

topic is the first key, but hdfs://.../my-table/topic=foo isn't a file, it's a directory of directories like .../my-table/topic=foo/d=2015-05-18 (and then that's further split by the third key, .../my-table/topic=foo/d=2015-05-18/hour=14 - so the actual rc files are e.g. .../my-table/topic=foo/d=2015-05-18/hour=9/000002_0 )

I can try to minimize the full code if that helps.

Andre Kelpe

unread,
May 18, 2015, 9:13:36 AM5/18/15
to cascadi...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages