Hi Oscar,
I just stumbled across issue #3 myself and it took a while for me to figure out what was going on and why it was behaving like that. I now know that it behaves like this due to the flow planning issue, after reading this thread and the Field-rules Github page. Now I'm by no means a scala/scalding and/or cascading expert, but can't this be solved easily?
For example:
Suppose we have a pipe with fields 'A, 'B, 'C and 'D. Now if I want to construct a field 'E with a value that is constructed from 'B, 'C, and 'D, I can do the following:
pipe.map(('B, 'C, 'D) -> 'E) { operation('B, 'C, 'D) }
Which results in a pipe with values 'A, 'B, 'C, 'D and 'E. Perfect!
Now if I want to replace 'B with a value that is constructed from 'B, 'C, and 'D, I can do the following:
pipe.map(('B, 'C, 'D) -> 'B) { operation('B, 'C, 'D) }
The result of this is a pipe with only fields 'A and 'B. This is kind of unexpected in my opinion, especially since the map function behaves differently for various inputs with respect to dropping or propagating fields. I would expect the pipe now to still have all the fields mentioned before, like in the 'E case.
So if I need 'C later on in the process, I now use the following to solve the problem:
pipe.map(('B, 'C, 'D) -> ('B, 'C)) { (operation('A, 'B, 'C), 'C) }
Why can't the map function be build to always forward the values in the subset case? Are there technical and/or performance reasons for this?
Kind regards,
Dirk