Cascading uses the mapred API, so you should set mapred.reduce.tasks to the value you want in your properties.
André
--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/dce26f64-8ec1-4acc-bf69-0b659675127b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
From: Dilip K
Sent: July 31, 2015 7:16:30am PDT
To: cascading-user
Subject: Re: run cascading job with multiple reducers
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/d10b57e9-205f-4478-9029-749f04d22160%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Subject: Re: run cascading job with multiple reducers
Hi Ken,I am doing a Cartesian Join on 2000 tuples(with 150 fields) to find the similarity between each user to all other users, which I will end up with 2000 * 2000 tuples.Please let me know what you are referring as one key value here.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/75a81b28-72fe-47c8-9629-f8f5384576fd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/94492AFF-6BF1-4392-9CB9-6D7EEA64E09F%40transpac.com.
From: Dilip K
Sent: July 31, 2015 12:42:43pm PDT
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/e39e2d0d-4cfc-4095-859d-9db675718033%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
2015-08-01 00:57:45,280 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : cascading.tuple.TupleException: unable to set into: [UNKNOWN], using selector: [UNKNOWN] at cascading.tuple.Tuple.set(Tuple.java:797) at cascading.flow.hadoop.HadoopGroupByClosure$1$1.makeResult(HadoopGroupByClosure.java:93) at cascading.flow.hadoop.HadoopGroupByClosure$1.next(HadoopGroupByClosure.java:120) at cascading.flow.hadoop.HadoopGroupByClosure$1.next(HadoopGroupByClosure.java:76) at cascading.pipe.joiner.InnerJoin$JoinIterator.initLastValues(InnerJoin.java:152) at cascading.pipe.joiner.InnerJoin$JoinIterator.next(InnerJoin.java:184) at cascading.pipe.joiner.InnerJoin$JoinIterator.next(InnerJoin.java:68) at cascading.tuple.TupleEntryChainIterator.next(TupleEntryChainIterator.java:97) at cascading.tuple.TupleEntryChainIterator.next(TupleEntryChainIterator.java:32) at cascading.flow.stream.duct.OpenDuct.receive(OpenDuct.java:45) at cascading.flow.stream.duct.OpenDuct.receive(OpenDuct.java:28) at cascading.flow.hadoop.stream.HadoopGroupGate.accept(HadoopGroupGate.java:141) at cascading.flow.hadoop.FlowReducer.reduce(FlowReducer.java:146) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: cascading.tuple.TupleException: given tuple not same size as position array: 0, tuple: ['00011324', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '1', '1', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '1', '0', '0', '0', '0', '0', '1', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '1', '0', '0', '0', '0', '0'] at cascading.tuple.Tuple.set(Tuple.java:759)
From: Dilip K
Sent: July 31, 2015 11:14:28pm PDT
To: cascading-user
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/cd5b1dcb-90fc-4ad5-a55a-76682ad8009c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
...
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/3301a19b-238c-4ac2-8a4c-1e06740288a8%40googlegroups.com.
...
An alternative approach would be to save the upstream result (what you get from the readCsvAssembly pipe), and the do a GroupBy on Fields.ALL with just that pipe.That will give you one group per unique record (spread out over all of the reducers).Then in your custom Buffer function (modified version of EdgeDistanceBufferJoin), read in the upstream result (in memory if possible, or reopen for each group) and calc the distance.
From: Dilip K
Sent: August 3, 2015 1:19:53am PDT
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/79ae92b5-f33b-40ee-9788-52b0ae86c543%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
...
An alternative approach would be to save the upstream result (what you get from the readCsvAssembly pipe), and the do a GroupBy on Fields.ALL with just that pipe.That will give you one group per unique record (spread out over all of the reducers).Then in your custom Buffer function (modified version of EdgeDistanceBufferJoin), read in the upstream result (in memory if possible, or reopen for each group) and calc the distance.
From: Dilip K
Sent: August 5, 2015 11:43:39am PDT
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/28557bf4-8d8f-4648-8c37-d146881dfee2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
...
From: Dilip K
Sent: August 5, 2015 12:34:34pm PDT
To: cascading-user
Subject: Re: run cascading job with multiple reducers
Sorry for my ignorance. I was thinking that CoGroup is the only way that we can join two tuple stream and perform different kind of joins.
Let me be clear.For instance if I have 10 users, I want to find distance between one user to every other user. End result should be 10*10 = 100 distances.My understanding is that with the approach you suggested earlierLets say the 10 users are divided into 3 groups(if group on certain field) or 10 groups(if group on all fields)Buffer on GroupBy(not with CoGroup) will give me access to only tuples in that group, so I can't find distances to users in other groups.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/e7d3dcd3-0296-4add-873b-639078720b56%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
...
From: Dilip K
Sent: August 5, 2015 5:02:11pm CDT
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/2915c0fd-bb3e-4cad-a1da-a38a53cb5a52%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
2015-08-11 04:11:02,287 INFO [main] cascading.flow.hadoop.FlowMapper: sourcing from: Hfs["TextDelimited[['employee-record']]"]["/user/guest/feature-extraction-output.csv"]
2015-08-11 04:11:02,288 INFO [main] cascading.flow.hadoop.FlowMapper: sinking to: CoGroup(DistributedCrossProduct-lhs*DistributedCrossProduct-rhs)[by: DistributedCrossProduct-lhs:[{1}:-1] DistributedCrossProduct-rhs:[{1}:-1]]
2015-08-11 04:11:02,289 INFO [main] cascading.flow.hadoop.FlowMapper: flow node id: BE87D61BA76D4CE589EE04824FFD2031, mem on start (mb), free: 118, total: 1191, max: 1191
2015-08-11 04:11:02,409 ERROR [main] cascading.flow.stream.element.TrapHandler: caught Throwable, no trap available, rethrowing
cascading.tuple.TupleException: failed to set a value beyond the end of the tuple elements array, size: 567 , index: -1
at cascading.tuple.Tuple.internalSet(Tuple.java:638)
at cascading.tuple.Tuple.set(Tuple.java:535)
at cascading.tuple.Tuple.nulledCopy(Tuple.java:731)
at cascading.tuple.Tuples.nulledCopy(Tuples.java:261)
at cascading.flow.stream.element.GroupingSpliceGate$5.makeResult(GroupingSpliceGate.java:156)
at cascading.flow.hadoop.stream.HadoopGroupGate.receive(HadoopGroupGate.java:97)
at cascading.flow.hadoop.stream.HadoopGroupGate.receive(HadoopGroupGate.java:45)
at cascading.flow.stream.element.FunctionEachStage$1.collect(FunctionEachStage.java:81)
at cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:145)
at cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:133)
at com.test.DistributedCrossProduct$RemoveFieldnames.operate(DistributedCrossProduct.java:136)
Thanks
Dilip
...
From: Dilip K
Sent: August 11, 2015 10:20:46am PDT
To: cascading-user
Subject: Re: run cascading job with multiple reducers
That was so quick. Thanks for keeping me posted on this.I tried using the snippet DistributedCrossProduct.javaIt is working as expected using LocalFlowConnector, but not with either MR2 or TEZ connectors.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/6319f0a7-11a8-4400-8977-2401cb790cad%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
...
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/f5f85f85-07e0-4fea-94f5-0e4a76af64e9%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/f5f85f85-07e0-4fea-94f5-0e4a76af64e9%40googlegroups.com.
...
readCsvAssembly = new GroupBy(readCsvAssembly, new Fields(1)); // Fields.ALL
From: Dilip K
Sent: August 11, 2015 5:11:42pm PDT
--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/812ea4fe-69b4-4b09-9b88-e156ff7ad167%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
You received this message because you are subscribed to a topic in the Google Groups "cascading-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cascading-user/OaOByGY_bRE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/82E02CA3-28B1-4E60-9FC7-CAC5C8A29321%40transpac.com.
Subject: Re: run cascading job with multiple reducers
This would group the tuples based on second field, there by spin number of reducers equal to number of groups.
I am using it to test the basic flow with multiple reducers and make sure configuration is not an issue. Without cross join it self is running
with single redeuce task. I remember the same code with TEZ was running with multiple reducers.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/CACetXmXiNxtkUj3XMTvC9FHjZDBp6TGDG67aWkUKM5teu9GbAQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-user+unsubscribe@googlegroups.com.
To post to this group, send email to cascading-user@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/812ea4fe-69b4-4b09-9b88-e156ff7ad167%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
...
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/812ea4fe-69b4-4b09-9b88-e156ff7ad167%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
...--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/99820c0f-c7e3-4b66-acdf-2e44db59fe82%40googlegroups.com.
I did basic testing on the new enhancement made for Cartesian Join. Everything is working great.
One minor issue that I see is, InsertRandom(numReducers) being used in DistributedCrossProduct is generating the arbitrary values. Due to which some of the reducers are not getting any tuples and some are getting few tuples, such reducers are completing far ahead of other reducers. Equally distributing tuples across all reducers makes much sense.
Thanks
Dilip
...
From: Dilip K
Sent: August 12, 2015 2:35:27pm PDT
--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/a21a84ec-f0e7-4974-9a42-37d5c187d8d7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
...