Querying large document collections

44 views
Skip to first unread message

David MZ

unread,
Nov 18, 2012, 1:14:59 AM11/18/12
to rav...@googlegroups.com
My scenario is as following:

I have a large collections of documents ~100K in size and I need to use small chunks of this data to create a new document, say 1K chunks, this mean that one run of my algorithm is translated to around 100 calls to the database.
I feel like there maybe a better way performance wise. 

Is loading the whole collection upfront once, better then doing 100 calls?

Any advice? 

Oren Eini (Ayende Rahien)

unread,
Nov 18, 2012, 3:40:26 AM11/18/12
to rav...@googlegroups.com
What is it that you are trying to do? 
In general, it is better to work in batches, yes.

David MZ

unread,
Nov 18, 2012, 3:51:28 AM11/18/12
to rav...@googlegroups.com
I am querying for 1000 documents and creating a new document based on the information in those 1000 docs, and I need to go over 100K documents in a single operation,
Is there a heuristics what is the recommended size of a batch 

Oren Eini (Ayende Rahien)

unread,
Nov 18, 2012, 3:52:55 AM11/18/12
to rav...@googlegroups.com
Why 1000 docs? One doc per thousand docs?
How do you group things?

David MZ

unread,
Nov 18, 2012, 4:04:10 AM11/18/12
to rav...@googlegroups.com
I have business logic that required 1000 docs to perform an operation, without grouping.

So I loop over all the docs in the database in batches of 1000, using Skip and Take

Oren Eini (Ayende Rahien)

unread,
Nov 18, 2012, 4:51:46 AM11/18/12
to rav...@googlegroups.com
What business logic is that?

David MZ

unread,
Nov 18, 2012, 5:35:02 AM11/18/12
to rav...@googlegroups.com
It's complex, I can't paste it here

Chris Marisic

unread,
Nov 19, 2012, 12:43:01 PM11/19/12
to rav...@googlegroups.com
Then talk about it in pseudo code or a faux domain like blog / estore context etc. Without more details there's nothing any one can say.
Reply all
Reply to author
Forward
0 new messages