Rapid paging and MaxNumberOfRequestsPerSession

153 views
Skip to first unread message

Matt Johnson

unread,
Dec 12, 2012, 11:00:42 AM12/12/12
to ravendb
Lots of talk lately about paging, and some new contrib extensions for
paging to get all results, got me thinking about how to deal with
"safe by default" limits. Wouldn't I easily run into this if there
are more than 30 pages of results?

Perhaps I should use the TotalResults stats from the first page and do
some math to increase the value of
session.Advanced.MaxNumberOfRequestsPerSession?

Am I on the right track?

Matt Johnson

unread,
Dec 12, 2012, 11:20:15 AM12/12/12
to ravendb
I answered my own question with a unit test. (duh, when in doubt -
test!)

The answer is YES, and I will update the paging extensions
appropriately and commit the test as well.

Matt Warren

unread,
Dec 12, 2012, 11:23:37 AM12/12/12
to ravendb
Just out of interest, in what scenario are you pulling in multiple pages within one session?

Is this a desktop app? Because with a web site I would've thought that with one session per-request, you would only get 1 page of results per request to the web-server, or am I mis-understanding?

Matt Johnson

unread,
Dec 12, 2012, 11:40:13 AM12/12/12
to ravendb
Ok, fixed and unit test here:
https://github.com/ravendb/contrib/blob/master/Raven.Client.Contrib.Tests/ExtensionTests/PagingExtensionTests.cs

While normally you would pull back one page at a time in a web app or
even on a desktop app, there are still a few scenarios where you want
all pages. For example, reporting or exporting data. Another would
be as we discussed in Chris's thread about consistency of paging
results, where you need a pseudo-snapshot of the full list of document
keys.

You may noticed that the extensions yield their results. So each page
is not actually fetched from the server until you enumerate into that
part of the paging results. If you only reach page 2 and exit your
loop, then pages 3+ are never fetched. Of course, if you .ToList(),
then you lose that benefit, but that's up to the consumer.

Matt Warren

unread,
Dec 12, 2012, 11:42:42 AM12/12/12
to ravendb
Fair enough, that makes sense

Matt Johnson

unread,
Dec 12, 2012, 11:58:26 AM12/12/12
to ravendb
Another scenario might be when updating or deleting related changes,
as I describe in my answer to this SO question:
http://stackoverflow.com/q/13822479


On Dec 12, 9:42 am, Matt Warren <mattd...@gmail.com> wrote:
> Fair enough, that makes sense
>
> On 12 December 2012 16:40, Matt Johnson <mj1...@hotmail.com> wrote:
>
>
>
>
>
>
>
> > Ok, fixed and unit test here:
>
> >https://github.com/ravendb/contrib/blob/master/Raven.Client.Contrib.T...

Oren Eini (Ayende Rahien)

unread,
Dec 12, 2012, 12:37:52 PM12/12/12
to rav...@googlegroups.com
PLEASE DO NOT DO THAT!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

I mean, seriously do not. Those limits are there for a reason. You are actively creating a system in which shooting your own foot is very easy.

Oren Eini (Ayende Rahien)

unread,
Dec 12, 2012, 12:38:33 PM12/12/12
to rav...@googlegroups.com
inline

On Wed, Dec 12, 2012 at 6:40 PM, Matt Johnson <mj1...@hotmail.com> wrote:
Ok, fixed and unit test here:
https://github.com/ravendb/contrib/blob/master/Raven.Client.Contrib.Tests/ExtensionTests/PagingExtensionTests.cs

While normally you would pull back one page at a time in a web app or
even on a desktop app, there are still a few scenarios where you want
all pages.  For example, reporting or exporting data.  Another would
be as we discussed in Chris's thread about consistency of paging
results, where you need a pseudo-snapshot of the full list of document
keys.

You don't get that with your scenario.

Matt Johnson

unread,
Dec 12, 2012, 1:47:47 PM12/12/12
to ravendb
I'm open to suggestions. "Do not do that" applies to which part -
paging in a single session *at all* or, specifically passing off an
IEnumerable of all results to a reporting engine?

Getting paging right with skipped results, session limits, etc. is
code I (and many others) don't want to write over and over. IMHO,
getting it solid in an extension method is a good way to avoid
shooting your foot.

Yielding the results should have the effect I described. I'll put
some tests together to demonstrate. If you see the flaw already,
please save me some time and describe.

Thanks.

On Dec 12, 10:38 am, "Oren Eini (Ayende Rahien)" <aye...@ayende.com>
wrote:
> inline
>
> On Wed, Dec 12, 2012 at 6:40 PM, Matt Johnson <mj1...@hotmail.com> wrote:
> > Ok, fixed and unit test here:
>
> >https://github.com/ravendb/contrib/blob/master/Raven.Client.Contrib.T...

Oren Eini (Ayende Rahien)

unread,
Dec 12, 2012, 2:09:38 PM12/12/12
to rav...@googlegroups.com
Trying to load all the data in this way, and implicitly disabling the too many requests warning.

Oren Eini (Ayende Rahien)

unread,
Dec 12, 2012, 2:09:55 PM12/12/12
to rav...@googlegroups.com
Yielding isn't good enough, people do .ToList(), and you are done.

Matt Johnson

unread,
Dec 12, 2012, 3:18:34 PM12/12/12
to ravendb
How about something very explicit then? Also I think the default page
size here should be 128.

public static IEnumerable<T> GetAllResultsWithPaging<T>(this
IRavenQueryable<T> queryable,
int pageSize = 128, bool
expandSessionRequestLimitAsNeededWhichMightBeDangerous = false)
{
...

if (expandSessionRequestLimitAsNeededWhichMightBeDangerous)
// increase the limit as we go like I currently have implemented

else if ((totalResults / pageSize) + currentRequests > maxRequests)
throw new InvalidOperationException("Can't get all pages because
you'll exceed the request limit.");

...
}

Not sure how I feel about your opinion on yield. It is a language
framework feature, after all. Is it just that you feel people are in
a bad habit of using .ToList() when it's unnecessary? Perhaps with
the change, it's now more "safe by default", and if you are smart
enough to enumerate properly then you get that added bonus of
streaming pages as you consume them.

On Dec 12, 12:09 pm, "Oren Eini (Ayende Rahien)" <aye...@ayende.com>

Matt Johnson

unread,
Dec 12, 2012, 5:14:56 PM12/12/12
to ravendb
I applied that code into the contrib extensions if you want to look.
I think this is much more "safe by default" now.

Jace Bennett

unread,
Dec 12, 2012, 10:21:49 PM12/12/12
to rav...@googlegroups.com
I believe a more robust implementation would be one which throws away a session that has reached it's limits and continues paging on a new session. Sessions are cheap to create and are meant to be short lived. They also accumulate data as you page, and these are some of the core reasons the request limit is there. If the request limit is to be manipulated, it should be done explicitly by application code.

Jace

Matt Johnson

unread,
Dec 12, 2012, 10:53:34 PM12/12/12
to ravendb
I disagree. That would mean that the caller couldn't do anything else
with the session. I can't throw it out - there might be changes to
commit. And the ForEachWithPaging extensions would be useless because
it would end up putting changes into different batches. You might as
well not have a session.

Jace Bennett

unread,
Dec 12, 2012, 11:15:20 PM12/12/12
to rav...@googlegroups.com
I wasn't necessarily suggesting that the library should be the one to throw the session away. The larger point is that "safe by default" means that we make you think about doing dangerous things, like paging through an unbounded result set. The request limit exception should make the app developer reconsider his plan of action. And a change as dangerous as raising the request limit to an arbitrary value should show up in the app's code reviews. I feel an optional "true" parameter is too easy to overlook.

As to throwing the session away, I've run out of memory without doing so in my ETLs. And so (and for the reasons you enumerate), I believe the app developer should be involved in that process as well. 

"Rapid Paging" has a legitimate case sometimes, and maybe the library should provide some support for it, but it shouldn't subvert Raven's architectural pressures (which are for my money Good Things) in doing so.

Sorry if I failed to properly articulate before.

Jace

Matt Johnson

unread,
Dec 13, 2012, 12:16:40 AM12/13/12
to rav...@googlegroups.com
No problem.  I appreciate the feedback.  I understand what your getting at, but ultimately I'm after exactly that - getting an unbounded result set.  It's not for every day stuff, but sometimes it is needed.  I am open to alternatives if you think there's a pattern that will work better.  Thanks.
Reply all
Reply to author
Forward
0 new messages