Cascading 3.0/Tez: How can a custom Scheme control the ClassLoader that Tez tasks use?

Luis Casillas

unread,

Jun 15, 2015, 6:26:49 PM6/15/15

to cascadi...@googlegroups.com

I'm getting this error in the first task my application runs when I try it in Cascading 3.0/Tez (Tez 0.6.1 under Hadoop 2.6, downloaded from the Cascading S3 bucket). The application works with the Hadoop2 planner in 3.0. Critiically, it uses a patched version of the AvroScheme class from the cascading.avro package, which is packaged into the application's fat jar.

Vertex failed, vertexName=F94065FA50A2402B85BADC54084D324C, vertexId=vertex_1434403130327_0002_1_02, diagnostics=[Vertex vertex_1434403130327_0002_1_02 [F94065FA50A2402B85BADC54084D324C] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: E6E0F344EDC34730A44DA6FD2D212B43 initializer failed, vertex=vertex_1434403130327_0002_1_02 [F94065FA50A2402B85BADC54084D324C], org.apache.tez.dag.api.TezUncheckedException: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.avro.mapred.AvroInputFormat not found
at org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateOldSplits(MRInputHelpers.java:426)
at org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateInputSplitsToMem(MRInputHelpers.java:295)
at org.apache.tez.mapreduce.common.MRInputAMSplitGenerator.initialize(MRInputAMSplitGenerator.java:122)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.avro.mapred.AvroInputFormat not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2106)
at org.apache.hadoop.mapred.JobConf.getInputFormat(JobConf.java:689)
at org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateOldSplits(MRInputHelpers.java:424)
... 13 more
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.avro.mapred.AvroInputFormat not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2074)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2098)
... 15 more
Caused by: java.lang.ClassNotFoundException: Class org.apache.avro.mapred.AvroInputFormat not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1980)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2072)
... 16 more
]

I have tried three things so far:

Modify the AvroScheme's sourceConfInit and sinkConfInit methods to invoke Configuration.setClassLoader() method on the Tez configuration object that's handed to it. That doesn't work; I don't think the serialization/deserialization of the configuration respects that.
Put the AvroInputFormat class into the local Tez libs directory and the tar.gz archive on HDFS. This gets past this error (and has allowed me to troubleshoot others) but I can't stomach this as a true solution. It will make it quite cumbersome to write and use custom schemes under the Tez planner.
Ignore what the Cascading documentation tells you to do, and set tez.use.cluster.hadoop-libs to false. This of course breaks Cascading itself (and the docs tell you to set it to true, duh).

Any insights or recommendations?

Thanks!

Luis.

This message and any files or text attached to it are intended only for the recipients named above, and contain information that is confidential or privileged. If you are not an intended recipient, you must not read, copy, use or disclose this communication. Please also notify the sender by replying to this message, and then delete all copies of it from your system.

Este mensaje y cualquier archivo o texto adjunto es dirigido solamente a los destinatarios especificados en el encabezado y contiene información confidencial y/o privilegiada. Si usted no es el destinatario no deberá leer, copiar, usar o divulgar el contenido. Por favor notifique al remitente, respondiendo a esté mensaje y elimine todas las copias del mismo de su sistema.

Chris K Wensel

unread,

Jun 15, 2015, 7:29:06 PM6/15/15

to cascadi...@googlegroups.com

did you try putting the avro jar on hdfs, and update

tez.lib.uris=${fs.default.name}/apps/tez-0.6.1-minimal-hadoop26.tar.gz

to include the path?

tez.lib.uris=${fs.default.name}/apps/tez-0.6.1-minimal-hadoop26.tar.gz,${fs.default.name}/apps/avro/theavro.jar

Considering YARN and that the "stuff your dep libs in the lib folder of your job zip/jar” hack from Hadoop is no longer first class. you might need to package an uber jar that unpacks the avro stuff.

if either works, i’ll update the docs.

ckw

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/a2e84675-e632-4128-a1ef-c30d765fbb0b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

—

Chris K Wensel

ch...@wensel.net

Luis Casillas

unread,

Jun 15, 2015, 8:51:19 PM6/15/15

to cascadi...@googlegroups.com

I just tried your first suggestion, and uploaded the jar into HDFS and added it to tez.lib.uris. But that makes the application fail even sooner, with two attempts that report the same thing:

Log Type: stderr
Log Upload Time: 16-Jun-2015 00:28:50
Log Length: 77
Error: Could not find or load main class org.apache.tez.dag.app.DAGAppMaster

However, the suggestion is similar in principle to item #2 in my list of things I tried, which did make that error go away. To repeat that item in more detail: I created a tez-0.6.1+avro.tar.gz file by adding the avro-mapreduce.jar to the original tez-0.6.1-minimal-hadoop26.tar.gz, and changed my tez.lib.uris to point to augmented tarball. So it is at first glance surprising that it doesn't work.

Anyway, am I understanding you right, and the "fat jar" method that works for the Hadoop2/MapReduce planner is just not supported under the Tez planner?

Thanks!

Luis

Chris K Wensel

unread,

Jun 15, 2015, 10:46:39 PM6/15/15

to cascadi...@googlegroups.com

are you sure there wasn’t a typo in tez.lib.uris?

in theory, you should be able to specify a list of them. the implication of the error is that the tez archive wasn’t fetched either.

this could be a bug in tez.

what’s not working is your job jar isn’t being unpacked on the app master and having the lib folder jar files be included in the classpath. there might be some extra yarn voodoo to make this work, even though we do already do this so the jars are found by the tasks. again, pointing to another bug in tez, or a missed tez property we aren’t setting.

ckw

To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/80d75392-3743-4462-b36a-1b866950d568%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

—

Chris K Wensel

ch...@wensel.net

Cyrille Chépélov

unread,

Jun 16, 2015, 3:14:41 AM6/16/15

to cascadi...@googlegroups.com

Le 16/06/2015 02:51, Luis Casillas a écrit :

> Anyway, am I understanding you right, and the "fat jar" method that
> works for the Hadoop2/MapReduce planner is just not supported under
> the Tez planner?
>

Successfully using the "fat jar" method here (after a little bit of
collision management with "sbt assembly").

-- Cyrille

Luis Casillas

unread,

Jun 16, 2015, 2:10:34 PM6/16/15

to cascadi...@googlegroups.com

Copy-paste from the code that constructs the properties passed to the FlowConnector:

        Map<Object, Object> properties = new HashMap<>();
        AppProps.setApplicationJarClass(properties, Main.class);
        AppProps.setApplicationName(properties, "Data Platform ETL");

        properties = FlowRuntimeProps.flowRuntimeProps()
                // level of parallelization during the gather stage.  FIXME: don't hardcode
                .setGatherPartitions(4)
                .buildProperties( properties );

        properties.put("io.serializations", "cascading.kryo.KryoSerialization");
        properties.put(MRJobConfig.MAPREDUCE_JOB_USER_CLASSPATH_FIRST, "true");
        
        properties.put("tez.lib.uris", "${fs.default.name}/apps/tez-0.6.1-minimal-hadoop26.tar.gz,${fs.default.name}/apps/loan-applications-etl-0.2-SNAPSHOT/lib/avro-mapred-1.7.7-hadoop2.jar");
        properties.put("tez.use.cluster.hadoop-libs", "true");
        properties.put("yarn.timeline-service.hostname", "master.local"); // FIXME: don't hardcode
        properties.put("io.compression.codecs", "org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec");
        properties.put("mapred.output.committer.class", "org.apache.hadoop.mapred.FileOutputCommitter");

        /* NOT WORKING
        HfsProps.setUseCombinedInput(properties, true);
        HfsProps.setUseCombinedInputSafeMode(properties, true);
        HfsProps.setCombinedInputMaxSize(properties, 134_217_728L);
        */
        properties.put("mapred.min.split.size", "33554432");

If I delete the second item in the comma-separated list for tez.lib.uris then I'm back to the original failure mode.

Luis Casillas

unread,

Jun 16, 2015, 2:16:42 PM6/16/15

to cascadi...@googlegroups.com

What schemes and taps have you used? It sounds then like the ClassLoader that is aware of the fat jar is not the same as the one that the Tez tasks use to load the InputFormat.

Andre Kelpe

unread,

Jun 16, 2015, 2:24:31 PM6/16/15

to cascadi...@googlegroups.com

I think you are confusing fat-jar and hadoop jar. A fat-jar is normally one where all deps are unpacked and repackaged into one big jar. You can do that with the maven-shade plugin: http://maven.apache.org/plugins/maven-shade-plugin/ or the gradle shadow-jar plugin: https://github.com/johnrengelman/shadow/

IIRC that is what Cyrille produces in his sbt build.

- André

--

You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.

To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/88e72f5f-e0b8-4afa-99eb-c514a7b08a95%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

André Kelpe
an...@concurrentinc.com
http://concurrentinc.com

Luis Casillas

unread,

Jun 16, 2015, 3:13:42 PM6/16/15

to cascadi...@googlegroups.com

D'oh! Yes I am. Thanks for catching this one!

I just tried the true fat jar, built with the Gradle plugin you suggested. I still get the same exception and stack trace as my original email (java.lang.ClassNotFoundException: Class org.apache.avro.mapred.AvroInputFormat not found). The way I read this is that the ClassLoader that Tez is using to lookup the InputFormat does not consult my application's jar at all.

Chris K Wensel

unread,

Jun 16, 2015, 3:30:16 PM6/16/15

to cascadi...@googlegroups.com

looks like a bug in Tez, a list of uris is in the example (besides, the property name is plural)

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.4.0/bk_installing_manually_book/content/rpm-chap-tez-2.html

if you can open an issue on the tez issue tracker that would be great.

that said, we are looking for the additional voodoo to trick the Tez app master to look in the lib folder. this is either a limitation of YARN or Tez, unsure yet.

since bin/hadoop|yarn unpacks the job jar, and includes the lib folder client side, one would think this would work similarly cluster side — it does not, we go through quite a few hoops to emulate this for the task cluster side classpath.

as a note, there is no CombinedInputFormat in Tez, this a MapReduce feature, and is ignored by tez. that said, the Tez app master should be aggregating small files, see the tez docs on how to enable this if its not working.

ckw

To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/28000974-c3d2-476d-8fca-3c71d7353aa3%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

—

Chris K Wensel

ch...@wensel.net

Chris K Wensel

unread,

Jun 16, 2015, 4:00:26 PM6/16/15

to cascadi...@googlegroups.com

fwiw, looks like we aren’t packing the ‘local resources’ into the tez client initialization, but only for the task (via the dag).

this is impossible to unit test, so we are trying to reproduce locally to see if we need to declare the libraries for both the client call and dag, or just the client. or if its irrelevant.

will issue a wip fix to test by tomorrow sometime.

ckw

To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/F4939E55-7B3C-4273-A2C9-3EB5007E4920%40wensel.net.

For more options, visit https://groups.google.com/d/optout.

—

Chris K Wensel

ch...@wensel.net

Luis Casillas

unread,

Jun 16, 2015, 5:00:50 PM6/16/15

to cascadi...@googlegroups.com

Thanks. I'll give the wip a shot when it's out.

Andre Kelpe

unread,

Jun 18, 2015, 2:02:14 PM6/18/15

to cascadi...@googlegroups.com

Hi,

we still haven't published a wip since we are still working with the Tez community to make it work correctly for non fat jars. We will let you know, once we have something that's working in all cases.

- André

To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/cb2ef2ef-68fe-49b2-a3ff-7caeec678f87%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Luis Casillas

unread,

Jun 18, 2015, 2:05:40 PM6/18/15

to cascadi...@googlegroups.com

No worries, haste makes waste after all...

You received this message because you are subscribed to a topic in the Google Groups "cascading-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cascading-user/HmP7TetztG0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cascading-use...@googlegroups.com.

To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.

To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/CAA2tiYKHmxQWZP0UtNp8vv-5haJsZqburHfe_XhBgBCWF5eeyg%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

-----------

Chris K Wensel

unread,

Jun 19, 2015, 6:17:58 PM6/19/15

to cascadi...@googlegroups.com

I think we have a fix, it’s in the wip build pipeline, should show up in 5 hours from now.

in short it was a small change, just not intuitive. we had to do the same thing we do for Tez with YARN, but just in a different way using the same stuff…

ckw

To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/D1A85642.AF5E%25luis.casillas%40progressfin.com.

For more options, visit https://groups.google.com/d/optout.

—

Chris K Wensel

ch...@wensel.net

Luis Casillas

unread,

Jun 19, 2015, 6:19:50 PM6/19/15

to cascadi...@googlegroups.com

Thanks. I’ll see if I can give it a shot on Monday.

To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/E745AF92-7B95-4EF7-88A0-FB35AD91DBCA%40wensel.net.

For more options, visit https://groups.google.com/d/optout.

Andre Kelpe

unread,

Jun 19, 2015, 6:49:27 PM6/19/15

to cascadi...@googlegroups.com

btw it should work with fat-jars and hadoop style jars. If one of them does not work for you, let us know.

- André

On Fri, Jun 19, 2015 at 3:19 PM, Luis Casillas <luis.c...@progressfin.com> wrote:

Thanks. I’ll see if I can give it a shot on Monday.

To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/D1A9E369.B2DE%25luis.casillas%40progressfin.com.

For more options, visit https://groups.google.com/d/optout.

Chris K Wensel

unread,

Jun 19, 2015, 7:01:20 PM6/19/15

to cascadi...@googlegroups.com

not to belabor the point, but neither YARN or Tez support "the stuff a lib folder in your jar” we have become accustomed to over the past 10 years, unless you are the bin/yarn or bin/hadoop bash script, which do client side.

we had to rebuild that support directly in Cascading3/Tez - sucks we missed this piece for the YARN AM config.

To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/CAA2tiY%2Bi7sCq_ofoHZpwyX4tVY0zV3ndUEcerUQ0G%3Da25PenJw%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

—

Chris K Wensel

ch...@wensel.net

Luis Casillas

unread,

Jun 19, 2015, 10:13:47 PM6/19/15

to cascadi...@googlegroups.com

I don’t have any particular attachment to that one packaging model. Our application uses it simply because that’s what the Cascading 2.x documentation told us to do. The official examples have Gradle setups that package the apps this way, and we just copied those. It took hardly any time or effort.

It’s also worth stressing the incredible convenience of it, especially when coupled with EMR. So far all we’ve needed to do to run our apps is to package everything in the one jar and tell EMR to run it. We don’t even need to log into the cluster to do that; submitting the jar through the UI or the AWS SDK works perfectly. We haven’t needed to script the installation of anything into the EMR environment at all, since the lib-directory-in-jar packaging so far has been sufficient to bundle everything we need.

But as far as I’m concerned I’d be just as happy with any alternative that's just as easy, well documented and convenient, so deprecate away if that’s the way the wind blows.

I do recognize however that, unless Amazon starts putting recent versions of Tez into their default EMR images, we’ll be out of this utopia in the near future. (The only Tez-on-EMR automation I’ve seen so far is the experimental one linked from this thread in the AWS Forums, which seems like an excellent starting point, but will take some work to get it going.)

To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/84296DFA-4FE9-465E-9FC3-2BD147F192D0%40wensel.net.

For more options, visit https://groups.google.com/d/optout.

Andre Kelpe

unread,

Jun 20, 2015, 2:31:30 PM6/20/15

to cascadi...@googlegroups.com

On Fri, Jun 19, 2015 at 7:13 PM, Luis Casillas <luis.c...@progressfin.com> wrote:

I don’t have any particular attachment to that one packaging model. Our application uses it simply because that’s what the Cascading 2.x documentation told us to do. The official examples have Gradle setups that package the apps this way, and we just copied those. It took hardly any time or effort.

I believe hadoop should make the jar-with-lib its official standard and call it har (hadoop archive) analog to war files for web-containers. That will work in so many of the cases, you wonder why that isn't the case. Sure, they can then add more fancy APIs to do more interesting things, but the basic format should be simple to use. Anyone interested in filing a JIRA upstream?

It’s also worth stressing the incredible convenience of it, especially when coupled with EMR. So far all we’ve needed to do to run our apps is to package everything in the one jar and tell EMR to run it. We don’t even need to log into the cluster to do that; submitting the jar through the UI or the AWS SDK works perfectly. We haven’t needed to script the installation of anything into the EMR environment at all, since the lib-directory-in-jar packaging so far has been sufficient to bundle everything we need.

Glad to hear it is all working smoothly for you.

But as far as I’m concerned I’d be just as happy with any alternative that's just as easy, well documented and convenient, so deprecate away if that’s the way the wind blows.

I'd say our mantra is "we do not break your stuff", so we would like to keep things as easy as they are.

I do recognize however that, unless Amazon starts putting recent versions of Tez into their default EMR images, we’ll be out of this utopia in the near future. (The only Tez-on-EMR automation I’ve seen so far is the experimental one linked from this thread in the AWS Forums, which seems like an excellent starting point, but will take some work to get it going.)

Did you see this? https://github.com/Cascading/cascading/tree/3.0/cascading-hadoop2-tez#running-on-amazon-emr

Maybe it is a good idea to wrap all that in a bootstrap action. Feel free to give it a shot and share it with us.

- André

To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/D1AA09DA.B341%25luis.casillas%40progressfin.com.

For more options, visit https://groups.google.com/d/optout.

Chris K Wensel

unread,

Jun 22, 2015, 12:36:45 PM6/22/15

to cascadi...@googlegroups.com

the ‘stuff all your extra bits into the lib folder’ is a Hadoop thing, not a Cascading thing.

fat/uber jars weren’t that common 10 year ago, but are problematic. continuing to support that model is important I think.

do let us know if wip-126 works for you so we can publish 3.0.1.

ckw

To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/D1AA09DA.B341%25luis.casillas%40progressfin.com.

For more options, visit https://groups.google.com/d/optout.

—

Chris K Wensel

ch...@wensel.net

Luis Casillas

unread,

Jun 22, 2015, 2:00:18 PM6/22/15

to cascadi...@googlegroups.com

That’s my intent, but first I need to get the app to run at all…

I do know that a bootstrap action isn’t enough, you need both a bootstrap action and a step-initiated script (for the actions that require HDFS to be up).

From: Andre Kelpe <ake...@concurrentinc.com>
Reply-To: "cascadi...@googlegroups.com" <cascadi...@googlegroups.com>
Date: Saturday, June 20, 2015 at 11:31 AM
To: "cascadi...@googlegroups.com" <cascadi...@googlegroups.com>
Subject: Re: Cascading 3.0/Tez: How can a custom Scheme control the ClassLoader that Tez tasks use?

Did you see this? https://github.com/Cascading/cascading/tree/3.0/cascading-hadoop2-tez#running-on-amazon-emr

Maybe it is a good idea to wrap all that in a bootstrap action. Feel free to give it a shot and share it with us.

Luis Casillas

unread,

Jun 22, 2015, 3:38:43 PM6/22/15

to cascadi...@googlegroups.com

I just tested it and wip-126 does get me past that error. Thanks! Still not out of the woods, but the exceptions are now in my code.

Something that’s not an issue in Cascading 3.0 itself but seems well worth noting is the patches that I've needed to make (so far) to the cascading.avro project to get it to run under Tez. I suspect that a lot of third-party Scheme implementations are going to need patches similar to the following to work under Tez:

Warning: I haven’t validated that these are all of the patches needed. I know without the second one I get errors like this:

Vertex failed, vertexName=CF4F45EB699F41559ABDDE93CD813E6B, vertexId=vertex_1434997756408_0006_1_03, diagnostics=[Task failed, taskId=task_1434997756408_0006_1_03_000002, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.ClassCastException: org.apache.tez.dag.api.TezConfiguration cannot be cast to org.apache.hadoop.mapred.JobConf

at cascading.avro.AvroScheme.sinkConfInit(AvroScheme.java:53)

at cascading.tap.Tap.sinkConfInit(Tap.java:206)

at cascading.tap.hadoop.Hfs.sinkConfInit(Hfs.java:414)

at cascading.tap.hadoop.Hfs.sinkConfInit(Hfs.java:108)

at cascading.tap.hadoop.io.TapOutputCollector.initialize(TapOutputCollector.java:96)

at cascading.tap.hadoop.io.TapOutputCollector.<init>(TapOutputCollector.java:91)

at cascading.tap.hadoop.io.TapOutputCollector.<init>(TapOutputCollector.java:79)

at cascading.tap.hadoop.io.TapOutputCollector.<init>(TapOutputCollector.java:74)

at cascading.tap.hadoop.io.HadoopTupleEntrySchemeCollector.makeCollector(HadoopTupleEntrySchemeCollector.java:57)

at cascading.tap.hadoop.io.HadoopTupleEntrySchemeCollector.<init>(HadoopTupleEntrySchemeCollector.java:49)

at cascading.tap.hadoop.Hfs.openForWrite(Hfs.java:447)

at cascading.tap.hadoop.Hfs.openForWrite(Hfs.java:108)

at cascading.tap.MultiSinkTap$MultiSinkCollector.<init>(MultiSinkTap.java:82)

at cascading.tap.MultiSinkTap.openForWrite(MultiSinkTap.java:162)

at cascading.flow.stream.element.SinkStage.prepare(SinkStage.java:68)

at cascading.flow.tez.stream.element.TezSinkStage.prepare(TezSinkStage.java:63)

at cascading.flow.stream.graph.StreamGraph.prepare(StreamGraph.java:181)

at cascading.flow.tez.FlowProcessor.run(FlowProcessor.java:137)

at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:326)

at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)

at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:415)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)

at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)

at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)

at java.util.concurrent.FutureTask.run(FutureTask.java:262)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/3D549FD0-FBF4-4664-A04B-734293BE1264%40wensel.net.

For more options, visit https://groups.google.com/d/optout.

Chris K Wensel

unread,

Jun 24, 2015, 6:34:32 PM6/24/15

to cascadi...@googlegroups.com

Hey all

per wip-126, is it safe to assume our fix for the lib folder on the tez app master is working as expected?

if so, I want to publish a 3.0.1 tomorrow.

To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/D1ADAB68.B4BC%25luis.casillas%40progressfin.com.

For more options, visit https://groups.google.com/d/optout.

—

Chris K Wensel

ch...@wensel.net

Luis Casillas

unread,

Jun 24, 2015, 6:41:01 PM6/24/15

to cascadi...@googlegroups.com

Yes, in my end it definitely fixes what it was intended to fix.

To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/DD633FED-3E09-4B8D-AF78-CA4C25A196DD%40wensel.net.

For more options, visit https://groups.google.com/d/optout.

Chris K Wensel

unread,

Jun 25, 2015, 1:00:19 PM6/25/15

to cascadi...@googlegroups.com

ok, sent 3.0.1 to our CI servers, will be out today in about 5 or 6 hours.

after it publishes, i’ll issue a wip-3.1.

good news, with type information specified against a basic mix of workloads on Tez, we see a ~15% improvement (not a rigorous scientific result), the default workload ran for 2.5 hours, dropping roughly 20 minutes.

the improvement applies to MapReduce as well in parts, but I haven’t measured the diffs yet. but there are more things we can do I think for some incremental improvements.

ckw

To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/D1B07FD2.B6A9%25luis.casillas%40progressfin.com.

For more options, visit https://groups.google.com/d/optout.

—

Chris K Wensel

ch...@wensel.net

Ken Krugler

unread,

Aug 21, 2015, 6:48:47 PM8/21/15

to cascadi...@googlegroups.com

Hi Luis,

Other than setting the "mapred.mapper.new-api" to false, was there anything else that you've had to change to cascading.avro?

Thinking of cutting a 3.0-compatible version.

Thanks,

-- Ken

From: Luis Casillas

Sent: June 22, 2015 12:38:35pm PDT

To: cascadi...@googlegroups.com

To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/D1ADAB68.B4BC%25luis.casillas%40progressfin.com.

For more options, visit https://groups.google.com/d/optout.

--------------------------

Ken Krugler

+1 530-210-6378

http://www.scaleunlimited.com

custom big data solutions & training

Hadoop, Cascading, Cassandra & Solr

--------------------------

Ken Krugler

+1 530-210-6378

http://www.scaleunlimited.com

custom big data solutions & training

Hadoop, Cascading, Cassandra & Solr

Luis Casillas

unread,

Aug 21, 2015, 6:57:38 PM8/21/15

to Ken Krugler, cascadi...@googlegroups.com

Yes. The Scheme classes need to be decoupled from the JobConf class and coupled to org.apache.hadoop.conf.Configuration, which is a common supertype to both JobConf and TezConfiguration. This commit illustrates it:

https://github.com/ldcasillas-progreso/cascading.avro/commit/5fbf8aeec33419d06b8651d61a2a5d21a84c8ae2

Ken Krugler

unread,

Aug 21, 2015, 7:02:54 PM8/21/15

to cascadi...@googlegroups.com

Got it, thanks!

-- Ken

From: Luis Casillas

Sent: August 21, 2015 3:57:29pm PDT

To: Ken Krugler; cascadi...@googlegroups.com

To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/D0E15293-37E8-4209-8131-ABB432D9EBA0%40progressfin.com.

For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward