Querying large document collections

David MZ

unread,

Nov 18, 2012, 1:14:59 AM11/18/12

to rav...@googlegroups.com

My scenario is as following:

I have a large collections of documents ~100K in size and I need to use small chunks of this data to create a new document, say 1K chunks, this mean that one run of my algorithm is translated to around 100 calls to the database.

I feel like there maybe a better way performance wise.

Is loading the whole collection upfront once, better then doing 100 calls?

Any advice?

Oren Eini (Ayende Rahien)

unread,

Nov 18, 2012, 3:40:26 AM11/18/12

to rav...@googlegroups.com

What is it that you are trying to do?

In general, it is better to work in batches, yes.

David MZ

unread,

Nov 18, 2012, 3:51:28 AM11/18/12

to rav...@googlegroups.com

I am querying for 1000 documents and creating a new document based on the information in those 1000 docs, and I need to go over 100K documents in a single operation,

Is there a heuristics what is the recommended size of a batch

Oren Eini (Ayende Rahien)

unread,

Nov 18, 2012, 3:52:55 AM11/18/12

to rav...@googlegroups.com

Why 1000 docs? One doc per thousand docs?

How do you group things?

David MZ

unread,

Nov 18, 2012, 4:04:10 AM11/18/12

to rav...@googlegroups.com

I have business logic that required 1000 docs to perform an operation, without grouping.

So I loop over all the docs in the database in batches of 1000, using Skip and Take

Oren Eini (Ayende Rahien)

unread,

Nov 18, 2012, 4:51:46 AM11/18/12

to rav...@googlegroups.com

What business logic is that?

David MZ

unread,

Nov 18, 2012, 5:35:02 AM11/18/12

to rav...@googlegroups.com

It's complex, I can't paste it here

Chris Marisic

unread,

Nov 19, 2012, 12:43:01 PM11/19/12

to rav...@googlegroups.com

Then talk about it in pseudo code or a faux domain like blog / estore context etc. Without more details there's nothing any one can say.

Reply all

Reply to author

Forward