Interaction between window() and limit() in jOOL Seq

51 views
Skip to first unread message

nwmi...@gmail.com

unread,
Jul 30, 2019, 5:28:48 PM7/30/19
to jOOQ User Group
I'm building a Seq that contains limit() and window() terms and I'm seeing some surprising behavior: 
  • If the limit() comes after the window() it appears that the iterator that the Seq is based on gets completely drained even though the stream terminates as expected after the number of items specified in the limit() are processed.
  • If the limit() comes before the window() the iterator is not drained. (The stream also terminates as expected.)

I've attached a simple program that exhibits the behavior.


In my real scenario, the iterator that the Seq is based on is an iterator on top of a DB cursor and the limit() comes after the window(). I really need the behavior to not be that the code tries to drain this iterator--i.e., read a ton of rows from the DB.


Is this a bug or just the way things work?


Thanks.


Nat


SeqDemo.java

Knut Wannheden

unread,
Jul 31, 2019, 2:13:14 AM7/31/19
to jooq...@googlegroups.com
Hi Nat,

Thanks for your message.

At the moment this is how window() works, but your use case sounds reasonable. Would you mind reporting and issue for this on GitHub (https://github.com/jOOQ/jOOL) so we can discuss and track this feature request there?

Since you mention that you are pulling the data from a database I am however also wondering why you don't use window functions directly in your DB query. That would surely be a lot more efficient, no?

Knut

--
You received this message because you are subscribed to the Google Groups "jOOQ User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jooq-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jooq-user/9114e574-7d05-4109-95f8-c9700b38fc32%40googlegroups.com.

nwmi...@gmail.com

unread,
Jul 31, 2019, 10:02:42 AM7/31/19
to jOOQ User Group
Hi Knut,

I just opened this bug in the jOOL project. (Feel free to continue this discussion in comments to that bug instead of in this thread.)

WRT your suggestion about using window functions in my DB query, unfortunately the DB I'm using (MongoDB) doesn't seem to offer such functions.

Can you (1) think of any workaround for this behavior, or (2) give me a sense of when jOOL might be changed to "fix" this behavior, or (3) give me some hints about what the cause and possible solution in jOOL might be so I could possibly attempt to make and contribute a patch myself?

Thanks.

Nat
To unsubscribe from this group and stop receiving emails from it, send an email to jooq...@googlegroups.com.

Knut Wannheden

unread,
Aug 5, 2019, 9:53:23 AM8/5/19
to jooq...@googlegroups.com
Hi Nat,

On Wed, Jul 31, 2019 at 4:02 PM <nwmi...@gmail.com> wrote:
I just opened this bug in the jOOL project. (Feel free to continue this discussion in comments to that bug instead of in this thread.)
Great. Thanks!

WRT your suggestion about using window functions in my DB query, unfortunately the DB I'm using (MongoDB) doesn't seem to offer such functions.

Can you (1) think of any workaround for this behavior, or (2) give me a sense of when jOOL might be changed to "fix" this behavior, or (3) give me some hints about what the cause and possible solution in jOOL might be so I could possibly attempt to make and contribute a patch myself?

(1) A workaround I can think of would be to put limit() before window(), but I suppose that is not the answer you were looking for. It would also require your limit() call to account for the window size (e.g. when calling window(-1, 1) and a desired limit of 10 you would first have to call limit(11)).

(2) We will first have to discuss this problem with Lukas, when he returns from vacation next week.

(3) The problem is that Seq#window(long, long) will end up calling Seq#toList() very early, which in turn collects all elements into a List (using Stream#collect() and Collectors#toList(), see Seq#toList(Stream) for details).

Hope this helps,
Knut
 

On Wednesday, July 31, 2019 at 2:13:14 AM UTC-4, Knut Wannheden wrote:
Hi Nat,

Thanks for your message.

At the moment this is how window() works, but your use case sounds reasonable. Would you mind reporting and issue for this on GitHub (https://github.com/jOOQ/jOOL) so we can discuss and track this feature request there?

Since you mention that you are pulling the data from a database I am however also wondering why you don't use window functions directly in your DB query. That would surely be a lot more efficient, no?

Knut

On Tue, Jul 30, 2019 at 11:28 PM <nwmi...@gmail.com> wrote:
I'm building a Seq that contains limit() and window() terms and I'm seeing some surprising behavior: 
  • If the limit() comes after the window() it appears that the iterator that the Seq is based on gets completely drained even though the stream terminates as expected after the number of items specified in the limit() are processed.
  • If the limit() comes before the window() the iterator is not drained. (The stream also terminates as expected.)

I've attached a simple program that exhibits the behavior.


In my real scenario, the iterator that the Seq is based on is an iterator on top of a DB cursor and the limit() comes after the window(). I really need the behavior to not be that the code tries to drain this iterator--i.e., read a ton of rows from the DB.


Is this a bug or just the way things work?


Thanks.


Nat


--
You received this message because you are subscribed to the Google Groups "jOOQ User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jooq...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jooq-user/9114e574-7d05-4109-95f8-c9700b38fc32%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "jOOQ User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jooq-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jooq-user/a979bc20-2658-45e8-b23b-e4847d5f62a9%40googlegroups.com.

Nathaniel Mishkin

unread,
Aug 15, 2019, 12:52:23 PM8/15/19
to jooq...@googlegroups.com
Hi Knut,

On Mon, Aug 5, 2019 at 10:22 AM Knut Wannheden <knut.wa...@gmail.com> wrote:
(1) A workaround I can think of would be to put limit() before window(), but I suppose that is not the answer you were looking for. It would also require your limit() call to account for the window size (e.g. when calling window(-1, 1) and a desired limit of 10 you would first have to call limit(11)).

Unfortunately, the logic of what I'm trying to do requires the limit() to be after the window().  

(2) We will first have to discuss this problem with Lukas, when he returns from vacation next week.

Any update on this? Did Lukas have a good vacation? :-)

FYI this issue is happening in some existing code where I in fact have the limit() before the window() but I recently realized that that order is incorrect. I have a pretty complicated pipeline and what I need to do is limit the ultimate number of elements produced by the stream, not the number of elements that happen to be "passing by" and earlier stage of the stream. I'd really hate to have to re-write this all in iterative form.

Thanks.

Nat

Knut Wannheden

unread,
Aug 16, 2019, 12:43:49 AM8/16/19
to jooq...@googlegroups.com
Hi Nat,

On Thu, Aug 15, 2019 at 6:52 PM Nathaniel Mishkin <mis...@aya.yale.edu> wrote:

On Mon, Aug 5, 2019 at 10:22 AM Knut Wannheden <knut.wa...@gmail.com> wrote:
(1) A workaround I can think of would be to put limit() before window(), but I suppose that is not the answer you were looking for. It would also require your limit() call to account for the window size (e.g. when calling window(-1, 1) and a desired limit of 10 you would first have to call limit(11)).

Unfortunately, the logic of what I'm trying to do requires the limit() to be after the window().  

What I meant was that you could call limit() before and after calling window(). The first call to limit() would have to account for the window size. So instead of calling seq.window(-1, 1).limit(10) you would call seq.limit(11).window(-1, 1).limit(10). Hope this makes more sense now. 

(2) We will first have to discuss this problem with Lukas, when he returns from vacation next week.

Any update on this? Did Lukas have a good vacation? :-)

No update yet. But he is indeed back from his vacation, so we should get back to you soon regarding this.
 
FYI this issue is happening in some existing code where I in fact have the limit() before the window() but I recently realized that that order is incorrect. I have a pretty complicated pipeline and what I need to do is limit the ultimate number of elements produced by the stream, not the number of elements that happen to be "passing by" and earlier stage of the stream. I'd really hate to have to re-write this all in iterative form.

Makes sense.

Knut

Nathaniel Mishkin

unread,
Aug 16, 2019, 9:09:46 AM8/16/19
to jooq...@googlegroups.com
On Fri, Aug 16, 2019 at 12:43 AM Knut Wannheden <knut.wa...@gmail.com> wrote:
On Thu, Aug 15, 2019 at 6:52 PM Nathaniel Mishkin <mis...@aya.yale.edu> wrote:
Unfortunately, the logic of what I'm trying to do requires the limit() to be after the window().  

What I meant was that you could call limit() before and after calling window(). The first call to limit() would have to account for the window size. So instead of calling seq.window(-1, 1).limit(10) you would call seq.limit(11).window(-1, 1).limit(10). Hope this makes more sense now. 

Ah, yes, thanks. I might be able to make this work but the problem is that the pipeline I want has a downstream filter(). I.e., I want something like:

seq.window(-1, 1).map(...).filter(...).limit(...)

If I make this be:

seq.limit(...).window(-1, 1).map(...).filter(...).limit(...)

it's possible the result of the second limit() will be a sequence that's shorter than the sequence produced by the original pipeline (the one without two limit()s). In the worst case, the sequence might be of zero length which, if I don't add some compensating code, could be misunderstood to mean that the originating source of data for the stream (a DB cursor wrapped in an iterator, in my case) has been fully drained. Maybe I could put the code in a loop and use a .hasNext() on the iterator as the loop condition. (Maybe. I love the whole stream paradigm and think it has a lot of value but it's sort of the case that if you think about things either too much or too little you can get really confused :-)

Any update on this? Did Lukas have a good vacation? :-)

No update yet. But he is indeed back from his vacation, so we should get back to you soon regarding this.

Thanks.

Nat

Reply all
Reply to author
Forward
0 new messages