How Raven/MaxNumberOfItemsToIndexInSingleBatch impact the index speed?

203 views
Skip to first unread message

micro

unread,
Jan 5, 2014, 1:04:20 AM1/5/14
to rav...@googlegroups.com
Oren,

I was so frustrated with _so_ slow replication & indexing for existing DB when I upgraded to 2.5 with default settings these days.  I post couple topics related to it, such as version out memory issue but I did not get right answer.

Here is more interesting but not that good found.  Please take a look.

For a 7M db with 100 index. In 1.0, it took about 6-8 hours total to replicate and index with default settings (Raven/Esent/MaxVerPages=1024)
However in 2.5, it took _24_ hours with one change Raven/Esent/MaxVerPages=2048 because the first try got out of memory error.

BTW, the servers I am using are very powerful: 4-8CPU/24-64GB memory, SAN high performance storage (response time 1-9ms)

I tried to play with some configuration and really confused with Raven/MaxNumberOfItemsToIndexInSingleBatch, I tested two replications:
Raven/MaxNumberOfItemsToIndexInSingleBatch default, took 24+ hours
Raven/MaxNumberOfItemsToIndexInSingleBatch=8192,  8 hours

During the time, the memory usage is about 2-6GB, CPU average 15-20%

I think RavenDB 2.5 handle the memory usage much worse than 1.0 in the massive data insert case (for example replication)
It is not as the document said the bigger Raven/MaxNumberOfItemsToIndexInSingleBatch the better index speed.

I have one explanation that bigger# Raven/MaxNumberOfItemsToIndexInSingleBatch good for few index, but consider the index update cost, smaller # of Raven/MaxNumberOfItemsToIndexInSingleBatch could be good for if has couple tens index.  This is to balance the indexing in memory and index updating for multiple index.

Any thought?

Thanks

James

Oren Eini (Ayende Rahien)

unread,
Jan 5, 2014, 5:59:02 AM1/5/14
to ravendb
James,
IIRC, you are using index replication as well, right? That is pretty much a guarantee for slower indexing.
We have the SQL Replication bundle to resolve that.

Next, a LOT of the cost of indexing is actually I/O. And batching allows us to amortize this cost among many documents.
The default batch size is 128K. That means that it should take roughly 60 runs to index everything. And it should take roughly 5 - 6 hours to do the entire thing. 

Note that the whole reason for the way we operate in those batches in to make sure that we don't have to load the documents multiple times for multiple indexes, so it shouldn't matter that much how many indexes you have there.
However, we will only process N indexes at a time (to avoid putting too much pressure on the machine). In a 8 cores system, we'll process up to 8 indexes at a time, so that might contribute to the time.



Oren Eini
CEO
Hibernating Rhinos
Office:    +972-4-674-7811
Fax:       +972-153-4622-7811





--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

James Tan

unread,
Jan 5, 2014, 8:44:15 AM1/5/14
to rav...@googlegroups.com

no index replication. the default batch size cause slower indexing. I think this is worth to look into it.

Thanks

James

You received this message because you are subscribed to a topic in the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ravendb/zYVYcrdVOYM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ravendb+u...@googlegroups.com.

Oren Eini (Ayende Rahien)

unread,
Jan 5, 2014, 8:54:40 AM1/5/14
to ravendb

Can you share the db?

Chris Marisic

unread,
Jan 6, 2014, 9:56:37 AM1/6/14
to rav...@googlegroups.com
I can provide anecdotal evidence to this in regards to RavenDB 1.0, One batch of (1024 * 120) would take longer than 120 batches of 1024, atleast when using cloud based drives, also sometimes the batches of 60,000 or 120,000 would fault and require iis to be reset and all that work would be thrown away.
Reply all
Reply to author
Forward
0 new messages