set MaxPageSize and Take defaults by convention

137 views
Skip to the first unread message

Wallace Turner

unread,
19 Oct 2014, 8:45:59 pm19/10/14
to rav...@googlegroups.com
I want to set MaxPageSize and Take to large values by default on the DocumentStore.

Before I hear cries of 'you're doing it wrong' consider the usage. I have 2 apps - a WebApp and a windows service. The windows service does a daily summary where it routinely hits the 1024 limit. There is NO point having paging of any kind on the windows service; it runs daily and it needs to pull > 1024 results to do what it does.

Setting MaxPageSize in the Raven.exe.config is not ideal because
a) still need to remember to put Take(xyz) everywhere
b) I have a web server that queries as well and I want to keep the 1024 default for that.

The most pragmatic get-things-done approach is do the above. Yes I can use `_session.Advanced.Stream` on the windows service for queries and regular linq on the web app but we have to remember to do that.

Have viewed some previous threads but they seemingly went to a dead end [1]

Thus is it out of the question to add

store.Conventions.MaxPageSize   and

Oren Eini (Ayende Rahien)

unread,
20 Oct 2014, 12:25:55 am20/10/14
to ravendb
No, use IDocumentQueryListener, you can manipulate the queries that way

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Wallace Turner

unread,
20 Oct 2014, 12:54:20 am20/10/14
to rav...@googlegroups.com

   
public class TakeUnboundedDocumentQueryListener : IDocumentQueryListener
   
{
       
public void BeforeQueryExecuted(IDocumentQueryCustomization queryCustomization)
       
{
            queryCustomization
.BeforeQueryExecution(q=>q.PageSize=int.MaxValue);            
       
}        
   
}

Am using this to override the default Take size, however this kind of query still is bounded to 1024 results

var list = session.Query<Entity>().ToList();

how do you set the MaxPageSize   ? I've had a dig around on the forum, docs and the auto complete.

Oren Eini (Ayende Rahien)

unread,
20 Oct 2014, 1:16:12 am20/10/14
to ravendb
Raven/MaxPageSize on the server.

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


Kijana Woodard

unread,
20 Oct 2014, 12:34:06 pm20/10/14
to rav...@googlegroups.com
"Yes I can use `_session.Advanced.Stream` on the windows service for queries and regular linq on the web app but we have to remember to do that."

Fwiw, I find the Stream API to be a, quite pleasant, reminder that I'm doing something unbounded. IIRC, it's doing paging for you under the covers. If you increase the max page size and do a ToList(), you're pulling everything into memory at once. Also, with Stream you get a snapshot whereas the results can change underneath paged results. 

If you're interested in exploring the solution space, is there a known upper limit for the windows service if there is no point in having paging? Would counts / aggregation / map/reduce apply? Would patch / scripted patch be meaningful?

Wallace Turner

unread,
20 Oct 2014, 6:44:27 pm20/10/14
to rav...@googlegroups.com
Hi Kijana , thanks for your thoughts.

>Would counts / aggregation / map/reduce apply? Would patch / scripted patch be meaningful?
I wish :)   It is not just a summary but also a raw dump of the data (eg list of all users and entities on the system for downstream reconciliation purposes)

Kijana Woodard

unread,
20 Oct 2014, 6:48:52 pm20/10/14
to rav...@googlegroups.com
The thing I would worry about with upping the max docs returned is that someone forgets about _that_ and suddenly is pulling 32768 10MB documents at once.

Wallace Turner

unread,
20 Oct 2014, 7:20:51 pm20/10/14
to rav...@googlegroups.com
I've read a similar discussion while researching...
https://groups.google.com/forum/#!topic/ravendb/Eo96fqoyVvc
I do tend to agree with rfuller here. There is a constant discussion about 'what if you return an unbounded set' - to be frank the 'safety' has caused more hidden bugs (with things failing after 1024 results) than problems it has solved.

I would argue the programmer is as or more likely to forget about the bounded-by-default result than the unbounded set.

My guess is 10 out of 10 business managers (non techies) would rather it fail because you were too successful and couldnt deal with 1,000,000 records vs the alternative where it silently 'fails' at 1,024 records.



Federico Lois

unread,
20 Oct 2014, 7:40:59 pm20/10/14
to rav...@googlegroups.com
We use streams to do that with async calls. Anything else will in the end complicate your code and your life. Being there done that. Hell neither that will ensure your code will be easier.

One path that we realized could work but it was too late was streaming by etag and store the intermediate work on a different database (essentially manual map reduce... Hadoop style like).

From: Wallace Turner
Sent: ‎20/‎10/‎2014 20:20
To: rav...@googlegroups.com
Subject: Re: [RavenDB] set MaxPageSize and Take defaults by convention

Kijana Woodard

unread,
20 Oct 2014, 7:45:59 pm20/10/14
to rav...@googlegroups.com
My guess is 10 out of 10 business managers (non techies) are going to call you at 3am on Saturday either way. ;-D

That as may be, I generally lean toward choosing options that are "in the developers face" as opposed to set by configuration. It helps when they are looking for help on the web, etc. 

I gave a a similar answer to a topic centered around Id conventions the other day. I'd rather use the document properties to construct the Id [or have a plain hilo] rather than use conventions. It's more "discoverable".

My tendency is also to stay idiomatic to whatever tech in use. If javascript, braces on the same line as conditional, and all that. In this case, Stream is the more transaparent approach.

At any rate, you set up the query just as you normally would without the Take and ToList and then iterate the docs. What's not to like?

Chris Marisic

unread,
21 Oct 2014, 11:08:25 am21/10/14
to rav...@googlegroups.com


On Monday, October 20, 2014 7:20:51 PM UTC-4, Wallace Turner wrote:
I've read a similar discussion while researching...
https://groups.google.com/forum/#!topic/ravendb/Eo96fqoyVvc
I do tend to agree with rfuller here. There is a constant discussion about 'what if you return an unbounded set' - to be frank the 'safety' has caused more hidden bugs (with things failing after 1024 results) than problems it has solved.

I would argue the programmer is as or more likely to forget about the bounded-by-default result than the unbounded set.

Good. The unbounded result set can crash entire servers (or server farms). Failure to remember bounded sets results in a phone call "hey i can't find the thing i created" not "OMG THE ENTIRE WEBSITE IS DOWN" or "WHY DOES IT TAKE 2 MINUTES TO LOAD THE HOMEPAGE"
 

My guess is 10 out of 10 business managers (non techies) would rather it fail because you were too successful and couldnt deal with 1,000,000 records vs the alternative where it silently 'fails' at 1,024 records.

I highly doubt this. They would only agree if you explained it poorly. If you asked the question would you rather have some inventory missing or burn down the warehouse? They're not going to pick burn down the warehouse.

Wallace Turner

unread,
21 Oct 2014, 7:34:22 pm21/10/14
to rav...@googlegroups.com
> They would only agree if you explained it poorly

This just isnt true. This isnt an inventory app where missing stuff isnt critical. I *need* the correct results coming back from a query, not 128, not 1024, but the actual number. its critical. If it returns the wrong thing I want it to break, period.

Wallace Turner

unread,
21 Oct 2014, 7:36:43 pm21/10/14
to rav...@googlegroups.com
>Good. The unbounded result set can crash entire servers (or server farms)

How did you deal with this problem in other frameworks, eg nhibernate where you can accidently do something like 'select * from TableWithLotsOfStuff' ??


On Tuesday, 21 October 2014 23:08:25 UTC+8, Chris Marisic wrote:

Wallace Turner

unread,
21 Oct 2014, 7:45:49 pm21/10/14
to rav...@googlegroups.com
Chris I clarified this was a ConsoleApplication running end of day routines - NOT a webapp and I clarified i prefer the bounded set for the web app. Restrictions are good but one size does not fit all.


On Tuesday, 21 October 2014 23:08:25 UTC+8, Chris Marisic wrote:

Oren Eini (Ayende Rahien)

unread,
22 Oct 2014, 2:39:24 am22/10/14
to ravendb
Great, use the streaming API, that is what it is for.

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


--

Chris Marisic

unread,
22 Oct 2014, 1:26:10 pm22/10/14
to rav...@googlegroups.com
I abandoned them for RavenDB.

Chris Marisic

unread,
22 Oct 2014, 1:30:09 pm22/10/14
to rav...@googlegroups.com
Even all the more reason to properly use paging or streaming.

What happens if your console application runs out of memory?

I highly doubt you have 1 giant transaction spanning 1000s of documents, if it runs out of memory do you half finished data? Is your worker able to resume faulted runs? What happens to your business if a run is half way finished?

These are entirely the reasons RavenDB is safe by default. It forces you to account for these things. If you are dead set on ToList() all the things, you can certainly build a class that will page or stream through an entire collection/query and materialize it as a gigantic list.

Wallace Turner

unread,
18 Apr 2018, 12:01:02 am18/4/18
to RavenDB - 2nd generation document database
this code should be modified to only set PageSize if PageSize is not already set (using PageSizeSet)

    public class TakeUnboundedDocumentQueryListener : IDocumentQueryListener
    {
        public void BeforeQueryExecuted(IDocumentQueryCustomization queryCustomization)
        {
            queryCustomization.BeforeQueryExecution(q =>
            {
                if(!q.PageSizeSet)
                    q.PageSize = int.MaxValue;
            });
        }
    }
Reply all
Reply to author
Forward
0 new messages