--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at https://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/CAAuNgTYTXp5U4MAccZ2qR9JHpWuGA5ySMkqzDCOO_wk6tTtYkQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
fwiw, on Tez, this is not an issue as a broadcast edge is always used.for MapReduce, my recommendation would be to update the rules in MapReduceHadoopRuleRegistry to use an alternate IntermediateTapElementFactory implementation that returns a DistCacheTap.see line 41when that rule fires, and others like it, they create a standard temp tap from the registered factory. just create a new factory, and update the rules to use it.
On Aug 2, 2016, at 9:55 AM, 'Ruban Monu' via cascading-user <cascading-user@googlegroups.com> wrote:
--Is there a way (in Cascading 3) to walk the flow plan and apply DistCacheTap to any source tap that is the rhs of a HashJoin?The solution in our current fork finds "accumulated" sources in HadoopFlowStep and enables distributed cache for those sources:
This is re https://github.com/twitter/scalding/issues/1103 In Scalding, I don't think we can know if a source is going to be used for HashJoin at the time we call createTap on it.
https://github.com/twitter/cascading/commit/8271526443c9ef832415df5d9673fde3e4391620
From what I can tell, there is no way to look at the pipes and wrap their source taps in DistCacheTap after the pipes have been created?I see support for decorating all temporary taps or checkpoint taps in FlowConnectorProps, but nothing that can be applied to sources that are accumulated. (It's probably not possible to do that without first checkpointing the source and then wrapping the resulting tap?)I'm hoping there's a simple solution for this that I've completely missed.Thanks!-Ruban
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-user+unsubscribe@googlegroups.com.
To post to this group, send email to cascading-user@googlegroups.com.
Visit this group at https://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/CAAuNgTYTXp5U4MAccZ2qR9JHpWuGA5ySMkqzDCOO_wk6tTtYkQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-user+unsubscribe@googlegroups.com.
To post to this group, send email to cascading-user@googlegroups.com.
Visit this group at https://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/14A9B130-CF18-4E7C-8884-46D00BC9672D%40wensel.net.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/CAAuNgTaO0fxnuLzLFYzv82wj6qEjTGWiz%2BY07tUSpfCymZWrEA%40mail.gmail.com.
I would not add NEW rules — that won’t work— but update the EXISTING rules to use a new factory.the rules that add a temp Tap before the HashJoin should be obvious in the naming.that is, just change the rule to grab a different factory.(and I would NOT write it in Scala, since you can’t send it as a pull request and contribute the work back assuming globally using DistCacheTap works at scale which I hope you prove does work)ckw
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at https://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/CAAuNgTYTXp5U4MAccZ2qR9JHpWuGA5ySMkqzDCOO_wk6tTtYkQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-user+unsubscribe@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at https://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/14A9B130-CF18-4E7C-8884-46D00BC9672D%40wensel.net.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-user+unsubscribe@googlegroups.com.
To post to this group, send email to cascading-user@googlegroups.com.
Visit this group at https://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/CAAuNgTaO0fxnuLzLFYzv82wj6qEjTGWiz%2BY07tUSpfCymZWrEA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-user+unsubscribe@googlegroups.com.
To post to this group, send email to cascading-user@googlegroups.com.
Visit this group at https://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/A4CBC00F-0019-4B60-9464-D47303F6B7A1%40wensel.net.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at https://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/CAAuNgTY5D992EXwDAUwT6rpEEh2N3Q_1T5r%3DmJoDwDVOjgH8Ww%40mail.gmail.com.
Any rule matching for a PathScopeExpression.BLOCKING edge is the accumulated path.for example see BalanceHashJoinBlockingHashJoinExpression.That said in hindsight, it might be interesting to write a rule that comes last in the BalanceAssembly phase that looks forTempTap — blocking —> HashJoin (and isn’t a self-join)and wrap the TempTap with a DistCacheTap. This may require a new Insertion type for Replace to keep it generic.this lets all the other rules fire, the we just use meta-data on the edges to optimize things a bit.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/CAAuNgTaO0fxnuLzLFYzv82wj6qEjTGWiz%2BY07tUSpfCymZWrEA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-user+unsubscribe@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at https://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/A4CBC00F-0019-4B60-9464-D47303F6B7A1%40wensel.net.
--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-user+unsubscribe@googlegroups.com.
To post to this group, send email to cascading-user@googlegroups.com.
Visit this group at https://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/CAAuNgTY5D992EXwDAUwT6rpEEh2N3Q_1T5r%3DmJoDwDVOjgH8Ww%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-user+unsubscribe@googlegroups.com.
To post to this group, send email to cascading-user@googlegroups.com.
Visit this group at https://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/210F17FC-68C1-4420-9F38-9221A2E7EB9F%40wensel.net.
It tries to use a general rule to match any Hfs -> blocking -> HashJoin change. The RuleExpression likely needs some more tweaking. A review would be great. Thanks!
On Fri, Aug 5, 2016 at 11:09 AM, Chris K Wensel <ch...@wensel.net> wrote:
Any rule matching for a PathScopeExpression.BLOCKING edge is the accumulated path.for example see BalanceHashJoinBlockingHashJoinExpression.That said in hindsight, it might be interesting to write a rule that comes last in the BalanceAssembly phase that looks forTempTap — blocking —> HashJoin (and isn’t a self-join)and wrap the TempTap with a DistCacheTap. This may require a new Insertion type for Replace to keep it generic.this lets all the other rules fire, the we just use meta-data on the edges to optimize things a bit.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at https://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/CAAuNgTYTXp5U4MAccZ2qR9JHpWuGA5ySMkqzDCOO_wk6tTtYkQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at https://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/14A9B130-CF18-4E7C-8884-46D00BC9672D%40wensel.net.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at https://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/CAAuNgTaO0fxnuLzLFYzv82wj6qEjTGWiz%2BY07tUSpfCymZWrEA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at https://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/A4CBC00F-0019-4B60-9464-D47303F6B7A1%40wensel.net.
--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at https://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/CAAuNgTY5D992EXwDAUwT6rpEEh2N3Q_1T5r%3DmJoDwDVOjgH8Ww%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at https://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/dd01481d-9a38-4794-9518-7414bc51523f%40googlegroups.com.
On Aug 19, 2016, at 4:23pm, 'Piyush Narang' via cascading-user <cascadi...@googlegroups.com> wrote:
Thought I'd circle back with some updates. I've put up a PR with my changes: https://github.com/cwensel/cascading/pull/55. Would be great if someone could take a look.
I was able to test this out on a few hashJoin jobs internally and was Hi able to verify that the dist cache was being applied on the rhs. I do end up seeing a 10-20% reduction in HDFS bytes read / read ops on one of my jobs that had around 1300 map tasks which are performing the hashJoin (size of rhs in HDFS was around 100MB). Runtime of the two jobs was comparable.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/8c0ab1d7-57a6-4da0-ab09-4e1db4f5582c%40googlegroups.com.
cascading.tuple.TupleException: unable to read from input identifier: viewfs://hadoop-dw2-nn.smf1.twitter.com/tables/statuses/2016/07/27/19/statuses-20160727190000-20160727200000.lzo at cascading.tuple.TupleEntrySchemeIterator.hasNext(TupleEntrySchemeIterator.java:152) at cascading.flow.stream.element.SourceStage.map(SourceStage.java:84) at cascading.flow.stream.element.SourceStage.run(SourceStage.java:66) at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:139) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:180) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1645) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:175) Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1748500278-10.52.50.140-1377803467793:blk_2305124715_1101319970837 file=/foo/bar/topN/part-00000 at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:897) at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:568) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:803) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:849) at java.io.DataInputStream.read(DataInputStream.java:100) at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180) at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174) at org.apache.hadoop.mapred.LineRecordReader.skipUtfByteOrderMark(LineRecordReader.java:206) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:244) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:47) at cascading.tap.hadoop.util.MeasuredRecordReader.next(MeasuredRecordReader.java:61) at cascading.scheme.hadoop.TextDelimited.source(TextDelimited.java:1005) at cascading.tuple.TupleEntrySchemeIterator.getNext(TupleEntrySchemeIterator.java:166) at cascading.tuple.TupleEntrySchemeIterator.hasNext(TupleEntrySchemeIterator.java:139) ... 10 more
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/819dd9f1-f690-4bd3-b3c5-fa6cb0ee908e%40googlegroups.com.