Depending on 2.0.0-wip builds of cascading-hive and cascading:cascading-hadoop2-mr1:jar:3.0.0-wip-+

michael...@imagini.net

unread,

May 18, 2015, 8:26:23 AM5/18/15

to cascadi...@googlegroups.com

Hi,

I'm trying to have my (maven) project depend on a recent build of cascading-hive but I always get:

[ERROR] Failed to execute goal on project ...: Could not resolve dependencies for project ...: Could not find artifact cascading:cascading-hadoop2-mr1:jar:3.0.0-wip-+ in ...

I can't see this "3.0.0-wip-+" version in e.g. the conjars repository. The gradle build of cascading-hive works locally so presumably gradle is able to find it somehow? But I don't really know gradle and can't see any unusual repositories configured there.

Where do I get that version of cascading-hadoop2-mr1? Is there another maven repository I need to add to my pom, or something else?

Many thanks,

Michael

Andre Kelpe

unread,

May 18, 2015, 8:37:36 AM5/18/15

to cascadi...@googlegroups.com

This looks like a gradle problem on our CI server. I will investigate and create a new build of wip-2.0. Can you for the time being use the latest stable release of cascading-hive, namely 1.1? http://conjars.org/cascading/cascading-hive/versions/1.1.0

- André

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/fe0b5401-cd6a-4299-a3b4-e4d80082e202%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

André Kelpe
an...@concurrentinc.com
http://concurrentinc.com

michael...@imagini.net

unread,

May 18, 2015, 8:40:36 AM5/18/15

to cascadi...@googlegroups.com

I'm more concerned about having it fixed in git - I'm happy to build locally (I actually wanted to possibly make a change to cascading-hive, as I was having trouble reading a partitioned table - don't know whether that's already been fixed on wip-2.0 but it was a problem when I built my code against 1.1.0).

Thanks for looking at it,

Michael

Andre Kelpe

unread,

May 18, 2015, 8:47:27 AM5/18/15

to cascadi...@googlegroups.com

There is nothing to be fixed in git, gradle should resolve the version to the latest version before creating the pom file. (the 3.0.0-wip-+ works like a version range in maven). That seems to not happen and I need to investigate why that stopped working.

If you aren't using Cascading 3.0, you should stay on 1.1.The wip-2.0 branch is for Cascading 3.0 compatibility.

Can you explain what kind of issue with partitioned tables you are seeing?

- André

To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/cf221715-d2b7-4207-af94-5058fbc68587%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

michael...@imagini.net

unread,

May 18, 2015, 9:06:06 AM5/18/15

to cascadi...@googlegroups.com

I haven't isolated this yet, could easily be my code (which was using the other cascading-hive before but I'm trying to port to this), but:

I have a table partitioned by three keys.

I'm trying to use a source like this for scalding:

case class EventsSource() extends Source {

val FIELDS = [...]

val tableDescriptor = new HiveTableDescriptor("my_table",

FIELDS.map(_.name).toArray,

FIELDS.map(_.hiveType).toArray,

Array("topic", "d", "hour"))

def createTap(readOrWrite: AccessMode)(implicit mode: Mode): Tap[_, _, _] =

CastHfsTap(new HiveTap(tableDescriptor, tableDescriptor.toScheme))

}

And when I'm running it I'm getting:

Exception in thread "main" cascading.flow.FlowException: unhandled exception

at cascading.flow.BaseFlow.complete(BaseFlow.java:918)

at com.twitter.scalding.Job.run(Job.scala:265)

at com.twitter.scalding.Tool.start$1(Tool.scala:104)

at com.twitter.scalding.Tool.run(Tool.scala:120)

at com.twitter.scalding.Tool.run(Tool.scala:68)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

at net.imagini.rd_core.utils.scalding.JobRunner$.runJob(JobRunner.scala:13)

at net.imagini.rextract.extract.AttemptDataExtractor.extractEventsData(AttemptDataExtractor.scala:20)

at net.imagini.rextract.RExtractMain.run(RExtractMain.scala:80)

at net.imagini.rextract.RExtractMain$.main(RExtractMain.scala:31)

at net.imagini.rextract.RExtractMain.main(RExtractMain.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

Caused by: java.io.IOException: Not a file: hdfs://[...]/my-table/topic=foo

at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:277)

at cascading.tap.hadoop.io.MultiInputFormat.getSplits(MultiInputFormat.java:200)

at cascading.tap.hadoop.io.MultiInputFormat.getSplits(MultiInputFormat.java:134)

at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:520)

at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:512)

at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394)

at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1295)

at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1292)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:415)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)

at org.apache.hadoop.mapreduce.Job.submit(Job.java:1292)

at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:564)

at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:559)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:415)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)

at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:559)

at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:550)

at cascading.flow.hadoop.planner.HadoopFlowStepJob.internalNonBlockingStart(HadoopFlowStepJob.java:107)

at cascading.flow.planner.FlowStepJob.blockOnJob(FlowStepJob.java:196)

at cascading.flow.planner.FlowStepJob.start(FlowStepJob.java:149)

at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:124)

at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:43)

at java.util.concurrent.FutureTask.run(FutureTask.java:262)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

topic is the first key, but hdfs://.../my-table/topic=foo isn't a file, it's a directory of directories like .../my-table/topic=foo/d=2015-05-18 (and then that's further split by the third key, .../my-table/topic=foo/d=2015-05-18/hour=14 - so the actual rc files are e.g. .../my-table/topic=foo/d=2015-05-18/hour=9/000002_0 )

I can try to minimize the full code if that helps.

Andre Kelpe

unread,

May 18, 2015, 9:13:36 AM5/18/15

to cascadi...@googlegroups.com

You have to use HivePartitionTap to read/write partitioned Hive tables. HiveTap itself does not do that:

https://github.com/Cascading/cascading-hive/blob/1.1/src/main/java/cascading/tap/hive/HivePartitionTap.java

- André

To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/b209be71-27c1-4cf7-ac0e-fe1e71b859c4%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward