Error while creating SequenceFile as Sink

106 views
Skip to first unread message

Vinoth Kumar Kannan

unread,
Jul 15, 2013, 11:25:21 AM7/15/13
to cascadi...@googlegroups.com
Hello All, 

I am new to Cascading and the concept of SequenceFile.

I have two flows

The first flow reads data from DB and should write into a SequenceFile as Sink

The second flow should read from the same SequenceFile as source Tap and do further processing.

But I am getting some object mapping exception while doing this.

 

Fields fields = new Fields( "_id", "name", "createTime", "updateTime", "age", "country", "status");

 

Tap seqfle = new Lfs(new SequenceFile(fields), "test");

 

             

FlowDef testFlowDef = new FlowDef()                               

                           .addSource(somepipe, someSourceTap)

                            .addTailSink(tmpPipe, seqfle);

 

return testFlowDef;


What am i missing here? Help greatly appreciated.

Following is my Failure trace:

 

cascading.flow.planner.PlannerException: could not build flow from assembly: [java.util.Properties cannot be cast to org.apache.hadoop.mapred.JobConf]

       at cascading.flow.planner.FlowPlanner.handleExceptionDuringPlanning(FlowPlanner.java:515)

       at cascading.flow.local.planner.LocalPlanner.buildFlow(LocalPlanner.java:84)

       at cascading.flow.FlowConnector.connect(FlowConnector.java:454)

       at de.carbook.hadoop.user.DetailsFlowGeneratorTest.test(DetailsFlowGeneratorTest.java:112)

       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

       at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)

       at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)

       at java.lang.reflect.Method.invoke(Unknown Source)

       at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)

       at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)

       at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)

       at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)

       at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)

       at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)

       at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)

       at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)

       at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)

       at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)

       at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)

       at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)

       at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)

       at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)

       at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)

       at org.junit.runners.ParentRunner.run(ParentRunner.java:309)

       at cascading.test.PlatformRunner.runChild(PlatformRunner.java:173)

       at cascading.test.PlatformRunner.runChild(PlatformRunner.java:44)

       at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)

       at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)

       at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)

       at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)

       at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)

       at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)

       at org.junit.runners.ParentRunner.run(ParentRunner.java:309)

       at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)

       at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)

       at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)

       at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)

Ken Krugler

unread,
Jul 15, 2013, 7:49:05 PM7/15/13
to cascadi...@googlegroups.com
Hi Vinoth,

I believe you're mixing in Cascading local mode types with Hadoop types.

If you're using a SequenceFile, then it has to be a Hadoop-based Flow.

So check your imports, and make sure none of them contain ".local." as part of the package path.

-- Ken

PS - You should be using Hfs instead of Lfs.

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--------------------------
Ken Krugler
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr





Vinoth Kumar Kannan

unread,
Jul 19, 2013, 10:39:21 AM7/19/13
to cascadi...@googlegroups.com
Hi Ken,

Thanks for your reply.

You are right. I mixed up the local and hadoop mode. Once I changed to hadoop it runs fine in my cluster.

However is there any equivalent way to test SequenceFile using cascading flow in Local Mode ?

Regards
Vinoth

Chris K Wensel

unread,
Jul 19, 2013, 12:32:10 PM7/19/13
to cascadi...@googlegroups.com
nope, since the SequenceFile scheme is couple to Hadoop and the major value of Cascading Local mode is that there are zero Hadoop dependencies.

That said, there isn't any practical reason to have Sequence file support for testing within Local mode since your business logic can be tested independently of any integration/file formats. That said, if you need to test out things that could fail when writing to a SequenceFile, you will need to be running in Hadoop mode regardless (tuple serializers, comparators, etc etc etc).

ckw

Ted Dunning

unread,
Jul 19, 2013, 3:14:55 PM7/19/13
to cascadi...@googlegroups.com
Chris, 

In Mahout, we have found a lot of value in having access to Hadoop data without having to deal with Hadoop execution.

Is there a way to use Sequence files as input for local mode via an optional dependency?

Chris K Wensel

unread,
Jul 19, 2013, 3:23:19 PM7/19/13
to cascadi...@googlegroups.com
fair enough, but we have not created a local mode scheme that can deal with Writable types stuffed into a Hadoop Sequence file.

fwiw, the default SequenceFile in Cascading is uses Hadoop API's but is 'proprietary' to Cascading (it stores Cascading Tuples). WritableSequenceFile can read any hadoop sequence file and will stuff/yank Tuples for use by Cascading. it requires some assembly and the expectation objects are primitive or Hadoop Writable.

Maybe the Cascading Avro guys have Schemes for both platforms, assuming Mahout lets you choose output formats simply.

Ted Dunning

unread,
Jul 19, 2013, 3:57:50 PM7/19/13
to cascadi...@googlegroups.com

Hmm... I think I wrote poorly here.

I wasn't advocating that Mahout needs better integration into Cascading.  I was merely relating our positive experience with the splitting of the concepts of Hadoop file formats away from Hadoop execution.  The primary way that we did this was by defining a sequence file input stream.  I often use this simply to spit out the first few records in a large file with no more than a few lines of sequential code reading from conventional I/O.  It helps that I am normally running on a MapR cluster which allows me to read files via NFS, but the principle is more general than my own situation.
  
Lots of small tasks run faster sequentially than as map-reduce.  Having these read existing formats is a boon because you can pick your execution mode on a case by case basis.

If you take as an assumption that the person has code that can read SequenceFiles into Tuples, it makes a lot of sense to be able to run that program in local mode.


Chris K Wensel

unread,
Jul 19, 2013, 4:43:10 PM7/19/13
to cascadi...@googlegroups.com
its worth noting Cascading Local mode is not the same as Hadoop standalone mode.

If you wish to access SequenceFiles on your local disk from Cascading, just run in Hadoop mode (with the HadoopFlowConnector) but use Hfs with a file:/ url to your local file or don't have a cluster configuration enabled (HADOOP_CONF_DIR, default fs file://, etc).

ckw

Christopher Severs

unread,
Jul 19, 2013, 5:35:29 PM7/19/13
to cascadi...@googlegroups.com


On Friday, July 19, 2013 12:23:19 PM UTC-7, Chris K Wensel wrote:


Maybe the Cascading Avro guys have Schemes for both platforms, assuming Mahout lets you choose output formats simply.



The newest version of cascading.avro, due out in conjars any day now, has local mode support.
Reply all
Reply to author
Forward
0 new messages