Bouncing Indexes with Replication and Sharding in v2142

158 views
Skip to first unread message

Troy

unread,
Nov 15, 2012, 5:10:19 PM11/15/12
to rav...@googlegroups.com
I have a weird situation ... I am beginning to test Sharding and Replication for when we are ready to go live with a new application which should be shortly after 2.0 is released.

I have 1 server 4 databases ... 2 Databases are set up Shards, 2 are replicated instances of each of the 2 Shards. Replication is set to go both ways, with Changes Only.

I persisted the indexes to all 4 database.

I have about 8 documents total spread out in 5 collections

I have 17 indexes total... About every 3 - 5 seconds.. it goes from 17 indexes stale... to 0 Indexes stale ... to 1 index stale (EntityByName) ... then 17 indexes stale... and the whole process repeats itself.. basically every 10 - 15 seconds it goes through that cycle.

I am wondering if I have a configuration incorrect or what. When I added the indexes it took a great amount of time when normally with a normal DocumentStore the indexes persist within only a second or 2...

Would I be better off creating different instances of RavenDB on different ports to test this configuration?

Troy

unread,
Nov 15, 2012, 5:55:03 PM11/15/12
to rav...@googlegroups.com
As an FYI ... v2143 has not fixed this issue.

Oren Eini (Ayende Rahien)

unread,
Nov 15, 2012, 5:59:29 PM11/15/12
to rav...@googlegroups.com
That is interesting, can you check what the http traffic to this looks like?

Troy

unread,
Nov 15, 2012, 6:23:32 PM11/15/12
to rav...@googlegroups.com
I put Fiddler on the server... and there is no traffic.

If I bring up Studio and look at a non-replicated database, there is not really any traffic ... I bring up in Studio one of the sharded databases... and I constantly see stats?noCache=xxx ... I assume sinece things are stale it might constantly be checking to see if the stale index number has changed for the statistics.

Is there a different way you wanted me to inspect the http traffic?

Troy

unread,
Nov 15, 2012, 9:02:13 PM11/15/12
to rav...@googlegroups.com
More info:

The document called: "Raven/Replication/Sources/http://web01:8182/databases/Members.USA.R01" is CONSTANTLY being changed by the server... the etags are incrementing... and that causes the indexes to constantly go in and out of stale.

Troy

unread,
Nov 15, 2012, 10:25:18 PM11/15/12
to rav...@googlegroups.com
More Info: If I disable the 2nd Shard... the chatter goes away and the indexes return to normal.

Oren Eini (Ayende Rahien)

unread,
Nov 16, 2012, 2:24:49 AM11/16/12
to rav...@googlegroups.com
Hm, I thought that I fixed that in 2143, is that what you are running?

Troy

unread,
Nov 16, 2012, 8:55:59 AM11/16/12
to rav...@googlegroups.com
Yes, 2143 both server and client.

Oren Eini (Ayende Rahien)

unread,
Nov 16, 2012, 8:59:11 AM11/16/12
to rav...@googlegroups.com
Fixed in 2145, out soon

Vlad K

unread,
Nov 16, 2012, 12:49:21 PM11/16/12
to rav...@googlegroups.com
Once I added data/indexes to db I got the same issue as mentioned here. Happy it's fixed in 45.

This brings us to a question - are there actually any tests for being run for replication bundle because this issue should've been obvious to catch?

Thanks.

Oren Eini (Ayende Rahien)

unread,
Nov 16, 2012, 1:31:49 PM11/16/12
to rav...@googlegroups.com
Vlad,
Actually, it won't be obvious to catch. Things _work_, it is just that we have extra stuff happening.
We run an extensive series of tests / workout for a lot of the things that we do, but we don't run them for every build.
Only on stable / RC builds.

Vlad K

unread,
Nov 16, 2012, 2:14:24 PM11/16/12
to rav...@googlegroups.com
Fair enough, I just figured you'd have a test to make sure that system documents don't take part in document replication.

Oren Eini (Ayende Rahien)

unread,
Nov 16, 2012, 3:03:59 PM11/16/12
to rav...@googlegroups.com
They don't.
The problem was something else, it was that we updated the _etag_, and due to another issue, we needed to update what the last etag is even if we don't replicate.
Unfortunately, this resulted in what is effectively a distributed infinite loop, only when you have master/ master relations.
That is actually quite hard to notice, because the end result is that everything still works, it is just that the system keeps working.

Vlad Kosarev

unread,
Nov 16, 2012, 3:12:47 PM11/16/12
to rav...@googlegroups.com
Ok, can't wait for 2145 :) I think it will be a magic release that will fix all our problems...


Paul Hinett

unread,
Nov 16, 2012, 3:25:43 PM11/16/12
to rav...@googlegroups.com
I think it's up for download now...

Vlad Kosarev

unread,
Nov 16, 2012, 4:10:37 PM11/16/12
to rav...@googlegroups.com
It is and replication on a working db seems to be working fine now.
Now to test memory utilization.

Troy

unread,
Nov 16, 2012, 6:06:00 PM11/16/12
to rav...@googlegroups.com
Hmm. I am still seeing the issue with 2145.

Oren Eini (Ayende Rahien)

unread,
Nov 17, 2012, 1:09:51 AM11/17/12
to rav...@googlegroups.com
Yes, it was finally fixed in 46
Tricky sob

Troy

unread,
Nov 17, 2012, 1:39:21 PM11/17/12
to rav...@googlegroups.com
Thanks for the update! Will 46 be out today at some point? If not, any ETA?

Oren Eini (Ayende Rahien)

unread,
Nov 17, 2012, 10:37:12 PM11/17/12
to rav...@googlegroups.com
Just pushed it.

Troy

unread,
Nov 18, 2012, 12:42:15 AM11/18/12
to rav...@googlegroups.com
Thanks, do you ever sleep? :-)

I see 2148 in Nuget.. but the build server still is at 2145 ... what is the best place to get the latest?

On Saturday, November 17, 2012 10:37:35 PM UTC-5, Oren Eini wrote:
Just pushed it.

Oren Eini (Ayende Rahien)

unread,
Nov 18, 2012, 1:19:40 AM11/18/12
to rav...@googlegroups.com
There is a problem with publishing builds, we will sort it out later today 

Troy

unread,
Nov 19, 2012, 1:02:29 AM11/19/12
to rav...@googlegroups.com
Thanks! 2149 looks to have resolved this issue.
Reply all
Reply to author
Forward
0 new messages