Automatic translation of loops to iterations

11 views
Skip to first unread message

Ufuk Celebi

unread,
Jun 4, 2014, 6:34:11 PM6/4/14
to stratosp...@googlegroups.com
All of the following is not pressing, just something that crossed my mind.

I think looping is a very simple programming concept -- in the sense that it is easy to grasp. If I look at iterative Stratosphere jobs, I don't feel that way. I find the way we currently have to specify bulk iterations unnatural.

As a simple case in point, consider the following two programs:

```java
IterativeDataSet<Integer> data = env.fromElements(1, 2, 3).iterate(10);

DataSet<Integer> increment = data.map(new MapFunction<Integer, Integer>() {
@Override
public Integer map(Integer value) throws Exception {
return value + 1;
}
});

data.closeWith(increment).print();
```

```java
DataSet<Integer> data = env.fromElements(1, 2, 3);

for (int i = 0; i < 10; i++) {
data = data.map(new MapFunction<Integer, Integer>() {
@Override
public Integer map(Integer value) throws Exception {
return value + 1;
}
});
}

data.print();
```

Both programs have the same output, but I think the second one is easier to understand. As an example, I had to look at the documentation to refresh how to get the IterativeDataSet whereas I only need to know about the basic transformations in the second case.

My questions now are the following:
1. Shouldn't it be possible to automatically translate the second program to the first one when generating the plan?
2. And generally, do you think it would be worthwhile to specify loops as in the second example?

===

As a side note: I get the feeling that "iteration" is a very academic term. We don't talk about for or while iterations, but about for and while loops. Of course, names are sound and smoke, but it might make sense to rename stuff accordingly, if others also feel that way. At least for bulk iterations, this is how I think about them: default case as a for loop and with a termination criterion as a while loop.

Stephan Ewen

unread,
Jun 5, 2014, 9:13:29 AM6/5/14
to stratosp...@googlegroups.com
The loops look nice, and actually the loop code you write should work.

For more complex programs, it will be not be terribly efficient until the incremental roll-out is there. That's why I propagate the "closed loop" variant currently.

Automatic detection of loops is hard. I am not sure it is important enough to focus on that right now. Especially since the incremental roll out will make much of that unnecessary.

Ufuk Celebi

unread,
Jun 5, 2014, 9:16:15 AM6/5/14
to stratosp...@googlegroups.com

On 05 Jun 2014, at 15:13, Stephan Ewen <se...@apache.org> wrote:

> The loops look nice, and actually the loop code you write should work.

Yeah, it does.

> For more complex programs, it will be not be terribly efficient until the incremental roll-out is there. That's why I propagate the "closed loop" variant currently.
>
> Automatic detection of loops is hard. I am not sure it is important enough to focus on that right now. Especially since the incremental roll out will make much of that unnecessary.

Yes, exactly. I didn't suggest that it is important right now. I just wanted to get it out there. ;-)
Reply all
Reply to author
Forward
0 new messages