--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/0aeadee7-03dc-44b4-bea8-041f15586541%40googlegroups.com.
val outlier = pipe1
.insert('sid, 2)
.rename(('id, 'sid, 'mobile) -> ('id_o, 'sid_o, 'mobile_o))
.project(('id_o, 'sid_o, 'mobile_o, 'revenue))
pip.joinWithTiny(('id, 'sid, 'mobile) -> ('id_o, 'sid_o, 'mobile_o),
outlier)
.map(('revenue1, 'revenue), 'revenue1)
{ r: (Double, Double) =>
val (rev1, rev2) = r
rev1.toDouble - rev2.toDouble
}
What kind of join are you doing? Some code would be helpful…— Ken
On Jul 28, 2019, at 8:50 AM, Jing Lu <aji...@gmail.com> wrote:
Hi,My cascading pipeline runs about 2 hours to finish. However, after I join another pipe (about 50 MB), my pipeline becomes extremely slow (more than 20 hours). How to debug this situation? Is that because the file format of my data set?Thanks,--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascadi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/0aeadee7-03dc-44b4-bea8-041f15586541%40googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/3638a32f-1424-4e73-b049-4728d850b523%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/3638a32f-1424-4e73-b049-4728d850b523%40googlegroups.com.