On 04 Jun 2014, at 22:49, Robert Metzger <
rmet...@apache.org> wrote:
> 2) Yes, the UDFs only provide iterators. I think there is no fundamental limitation behind it. I guess the authors could not imagine a use case where this is really required. In addition to that, it saves you from writing an additional line of code, where you create a local iterator. It is probably not very difficult to resolve the limitation. I'm just not sure if it is really required. (In general, I'm not 100% sure about this answer)
There is no technical reason.
We had this discussion some time ago I am strongly in favour of returning an Iterable (or IterableIterator) instead of an Iterator. There also (closed) issues for this related to Spargel:
https://github.com/stratosphere/stratosphere/issues/425
https://github.com/stratosphere/stratosphere/pull/433
A reason against Iterable is that users could request multiple iterators via `iterator()`, which wouldn't work with our runtime. But we could make sure that it's only allowed to call this method a single time and throw an Exception if it's called more often.
The new API would have been the perfect time to introduce Iterable. :-( We hesitated before, because we didn't want people to have to change their programs. Still, I think we should go for it as it makes the UDFs less verbose, which is a good thing and people can still use the iterator if they need to via `iterator()`. Then again it might be just a matter of taste. ;-)
> 3) Yes. In order to maintain a list of all incoming objects inside a UDF, you have to copy them because we are re-using the objects internally.
There is also a plan to provide the option to turn re-using objects off (which will impact performance). I can't find the issue, but someone else ran into a similar problem a short while ago. Is there a separate issue to allow this?
> 4) Well, thats a good question. I personally ignore the warnings ;)
> We can not really do anything about it, since we want to have Serializable classes and you can not inherit the serial version id. So its basically a limitation of the programming language / runtime we are using. If you are not going to "maintain" the serialVersionUID (change it with each change) its actually pointless to have it at all.
I see the technical reasons for it (
http://stackoverflow.com/questions/285793/what-is-a-serialversionuid-and-why-should-i-use-it), but also find it "inconvenient" (for the lack of a better word) to either suppress the warning or provide a serialVerisonUID. As a user, I don't want to think about serialization.