Ufuk Celebi
unread,Jun 4, 2014, 6:34:11 PM6/4/14Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to stratosp...@googlegroups.com
All of the following is not pressing, just something that crossed my mind.
I think looping is a very simple programming concept -- in the sense that it is easy to grasp. If I look at iterative Stratosphere jobs, I don't feel that way. I find the way we currently have to specify bulk iterations unnatural.
As a simple case in point, consider the following two programs:
```java
IterativeDataSet<Integer> data = env.fromElements(1, 2, 3).iterate(10);
DataSet<Integer> increment = data.map(new MapFunction<Integer, Integer>() {
@Override
public Integer map(Integer value) throws Exception {
return value + 1;
}
});
data.closeWith(increment).print();
```
```java
DataSet<Integer> data = env.fromElements(1, 2, 3);
for (int i = 0; i < 10; i++) {
data = data.map(new MapFunction<Integer, Integer>() {
@Override
public Integer map(Integer value) throws Exception {
return value + 1;
}
});
}
data.print();
```
Both programs have the same output, but I think the second one is easier to understand. As an example, I had to look at the documentation to refresh how to get the IterativeDataSet whereas I only need to know about the basic transformations in the second case.
My questions now are the following:
1. Shouldn't it be possible to automatically translate the second program to the first one when generating the plan?
2. And generally, do you think it would be worthwhile to specify loops as in the second example?
===
As a side note: I get the feeling that "iteration" is a very academic term. We don't talk about for or while iterations, but about for and while loops. Of course, names are sound and smoke, but it might make sense to rename stuff accordingly, if others also feel that way. At least for bulk iterations, this is how I think about them: default case as a for loop and with a termination criterion as a while loop.