How to configure a Pipe.

142 views
Skip to first unread message

JPatrick Davenport

unread,
Apr 11, 2013, 9:47:29 AM4/11/13
to cascadi...@googlegroups.com
Hello,
I've got a pipe, a Unique, that I'm trying to set the number of reducers on. My code is workingSet.getConfigDef().setProperty("mapred.reduce.tasks", "4");. My understanding is this should tell Cascading to use 4 reducers. When I run this, I only get one. What am I missing?

Also, is it possible to configure a pipe's output block size? Right now our block size is 128. I'd like to move it to 256 m in a few points.

Thanks,
JPD

Chris K Wensel

unread,
Apr 11, 2013, 1:45:31 PM4/11/13
to cascadi...@googlegroups.com
First, you should be calling #getStepConfigDef. #getConfigDef is only for the current Pipe and underlying Operation.

Second, it doesn't work on SubAssemblies, Unique is a SubAssembly. I thought we were throwing an exception if you tried to use it, but it doesn't look like it. So i've opened and issue to fix that.

In the future we will support SubAssemblies, the config def stuff to this point has been experimental, and SubAssemblies are hard, and may require a slight change to make it work right.

ckw

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 


JPatrick Davenport

unread,
Apr 11, 2013, 1:52:25 PM4/11/13
to cascadi...@googlegroups.com
Thanks, Chris, for getting back to me.

I stumbled on the distinction between getStepConfigDef and getConfigDef. Here's some new code I've tried, and I don't think it's working as I'd expect, but I could be just wrong.

Pipe source = new Pipe("source");
    source.getStepConfigDef().setProperty(Mode.DEFAULT, "dfs.block.size", "1073741824");

My understanding is that there should a change in the job config for this step that sets dfs.block.size to 1 GB. But when I look at the job on the cluster, dfs.block.size is still set to 123 MB. Am I just wrong here?

Thanks,
JPD

JPatrick Davenport

unread,
Apr 11, 2013, 2:13:06 PM4/11/13
to cascadi...@googlegroups.com
Just to add a piece of diagnostic information, I'm using 2.0.8.

Chris K Wensel

unread,
Apr 11, 2013, 2:21:23 PM4/11/13
to cascadi...@googlegroups.com
current supported release of Cascading is 2.1.5.

I don't remember what issues regarding this were resolved between those versions.

that said, for this to work reliably, config def needs to work on a SubAssembly. its on the todo list.

ckw

JPatrick Davenport

unread,
May 1, 2013, 8:03:02 AM5/1/13
to cascadi...@googlegroups.com
For future generations, if you can determine which step the activity is taking place in you can use the FlowStepStrategy to set the reducers manually. It's not as cool or easy as the pipe.stepflow but it does work. I find it helpful to use named pipes and the iterate over the group for flowstep to figure out where I am.
Reply all
Reply to author
Forward
0 new messages