Continuous staleness on database with few document modifications

104 views
Skip to first unread message

jag7070

unread,
Nov 4, 2015, 7:22:03 PM11/4/15
to RavenDB - 2nd generation document database
We have several customers that each run a system consisting of 12 databases. Most of these run quite fine but recently two of our "largest" customers are getting a lot of "waiting for non-stale results"-timeouts.

We have done some investigations at the customer with the most problems and found that all the indexes in one of the databases stay stale when the system is just somewhat in use. It
is enough to change just one or a few documents every few minutes to keep the database continuously stale.

This is not acceptable for us or our customers and we need some help in figuring out why our system is behaving so poorly performance-wise.

We did some measurements through RavenStudio on some of the largest indexes in the problematic database during some light usage. We recorded the etag of last modified document and the etags of the three indexes with the largest DocsCount number (SchoolViewIndex, LessonStudentIndex, LessonIndex):

       etag of the
       last modified  SchoolViewIndex   LessonStudentIndex   LessonIndex
Time   document       etag              etag                 etag
13:19  11EB0          11A1C             11E64                11E64
13:23  11F06          11E64             11EB0                11EB0
13:24  120A6          11F5C             120A2                120A2
13:24  12374          120A6             12374                12374
13:26  12413          120A6             12374                12374
13:28  128C8          1247A             126A8                126A8
13:30  129F2          126A8             129DF                129DF
13:31  12ED3          12A2A             12A24                12E7A
13:32  12EEE          12A2A             12E7A                12E7A
13:33  12F09          12A2A             12E7A                12E7A
13:36  12F27          12A2A             12E7A                12E7A
13:37  12F5E          12E7A             12F3D                12F3D

During this 18-minute period it was typical to see between one and three LarerStilling documents being modified by users which also triggered a new etag on multiple unmodified Lesson
documents (could be up to several hundreds) because they through their Lesson.Resources[] property referenced them by id.

The strange thing is that the system never achieved non-staleness during this 18-minute period when there were only a few document changes (more than 10, but less than 100) and the total number of etags to spin through was around 5000 (12E7A-11A1C=145Ehex=5214dec), which gives an "etag indexing rate" of just 5 etags per second (5214/18 minutes = 289 per minute = 4.8 per second).

Where can the problem lie?

Also, one of the observations we made was that when resetting an index it triggers a high CPU rate, large disk read and disk write activity in the very beginning, but stabilizes fairly quickly to a much lower CPU and disk read/write activity. Is this normal?


Best regards,
Per Eivind Jenssen,
Developer, Oppad as



=========== System info ==========
 Virtual server:
  Windows server 2008 R2 Standard, Service Pack 1
  CPU Intel Xeon L5...@2.27GHz 2.26GHz
  8,00 GB RAM, 64-bit Operating System
  
 RavenDB service
  version 2.5.2946
  license "Commercial - Subscription"

 Database with problem
  Total 252.154 documents
  Size 24.1GB
  28 indexes: 13 are selfmade indexes (some are "from-from"s, some use LoadDocument, some also use Reduce) and 15 "simpler" autoindexes.

=========== The three "largest" indexes ==========

SchoolViewIndex:
            AddMap<Lesson>(items => from lesson in items
                                    from resource in lesson.Resources
                                    let schoolPeriodAgg = LoadDocument<SkoleAr>(lesson.SchoolPeriodId)
                                    let grp = LoadDocument<Gruppe>(lesson.Group.GroupId)
                                    let resourceName = resource.ResourceName ?? LoadDocument<INamedAggregate>(resource.ResourceId).Name
                                    let studentName = resource.ResourceType == "Student" ? resource.ResourceName ?? LoadDocument<INamedAggregate>(resource.ResourceId).Name : ""
                                    let potentialStudent = LoadDocument<Elev>(resource.ResourceId)
                                    let studentDeceased = resource.ResourceType == "Student" && ((potentialStudent != null) && potentialStudent.IsDeceased)
                                    where !studentDeceased
                                    select new
                                    {
                                        Acl = new[] {lesson.Id, lesson.Group.GroupId, resource.ResourceId, schoolPeriodAgg.SkoleId}.Concat(grp.ContactTeacherPrincipalIdsAcl),
                                        ContactTeacherPrincipalIdsAcl = new[] { resource.ResourceId }.Concat(grp.ContactTeacherPrincipalIdsAcl),
                                        LessonId = lesson.Id,
                                        LessonId_Reversed = lesson.Id.Reverse(),

                                        StudentId = resource.ResourceType == "Student" ? resource.ResourceId : "",
                                        StudentFullName = studentName,
                                        StudentQuery = resource.ResourceType == "Student" ? new []
                                        {
                                            studentName,
                                            grp.Name
                                        } : new []{""},

                                        GroupId = lesson.Group.GroupId,
                                        GroupId_Reversed = lesson.Group.GroupId.Reverse(),

                                        GroupName = grp.Name,
                                        GroupSubjectCode = grp.FagkodeGruppeNavn,
                                        GradeGroup = grp.KunKarakterer,

                                        lesson.SchoolPeriodId,
                                        SchoolPeriodId_Reversed = lesson.SchoolPeriodId.Reverse(),

                                        resource.ResourceId,
                                        ResourceId_Reversed = resource.ResourceId.Reverse(),
                                        resource.ResourceType,

                                        ResourceName = resourceName,

                                        lesson.StartInterval,
                                        lesson.EndInterval,
                                    });

LessonStudentIndex:
            AddMap<Lesson>(items => from lesson in items
                                    let schoolPeriodAgg = LoadDocument<SkoleAr>(lesson.SchoolPeriodId)
                                    from student in lesson.Students 
                                    select new
                                               {
                                                   Acl = new[] { lesson.Id, lesson.Group.GroupId, student.ResourceId, schoolPeriodAgg != null ? schoolPeriodAgg.SkoleId : "#L#S0#"}
                                                   .Union( lesson.Teachers.Select(t=>t.TeacherId)),

                                                   LessonId = lesson.Id,
                                                   LessonId_Stripped = lesson.Id,

                                                   StudentId = student.ResourceId,
                                                   StudentId_Stripped = student.ResourceId,

                                                   SchoolPeriodId = lesson.SchoolPeriodId,
                                                   SchoolPeriodId_Stripped = lesson.SchoolPeriodId,

                                                   LessonStatus = student.LessonStatus,
                                                   AbsenceType = student.AbsenceType,
                                                   LessonRemark = student.Remark,

                                                   GroupId = lesson.Group.GroupId,
                                                   GroupId_Stripped = lesson.Group.GroupId,

                                                   GroupName = lesson.Group.GroupName,
                                               });

LessonIndex:
            AddMap<Lesson>(items => from lesson in items
                                    let schoolPeriodAgg = LoadDocument<SkoleAr>(lesson.SchoolPeriodId)
                                    select new
                                    {
                                        Acl = new[] { lesson.Id, lesson.Group.GroupId, schoolPeriodAgg != null ? schoolPeriodAgg.SkoleId : "SP0-SI0" },

                                        lesson.StartInterval,
                                        StartInterval_Date = lesson.StartInterval.Date,
                                        lesson.EndInterval,

                                        HasRegistrations = lesson.Students.Any(s => s.LessonStatus != StudentLessonStatus.NotRegistered) || lesson.Teachers.Any(s => s.LessonStatus != 

TeacherLessonStatus.NotRegistered),

                                        LessonId = lesson.Id,
                                        LessonId_Stripped = lesson.Id,

                                        StudentIds = lesson.Students.Select(s => s.ResourceId),
                                        StudentIds_Stripped = lesson.Students.Select(s => s.ResourceId),

                                        TeacherIds = lesson.Teachers.Select(t => t.ResourceId),
                                        TeacherIds_Stripped = lesson.Teachers.Select(t => t.ResourceId),

                                        SchoolPeriodId = lesson.SchoolPeriodId,
                                        SchoolPeriodId_Stripped = lesson.SchoolPeriodId,

                                        GroupId = lesson.Group.GroupId,
                                        GroupId_Stripped = lesson.Group.GroupId,

                                        GroupName = lesson.Group.GroupName,
                                    });

Chris Marisic

unread,
Nov 4, 2015, 7:44:53 PM11/4/15
to RavenDB - 2nd generation document database
you're doing fan out indexes with multiple load documents, this will have large performance impacts.

your model looks highly relational and incorrectly designed for a non-relational database. you likely need to denormalize copy some or all of these relationships into your documents. if the data changes referentially frequently, your system is just not designed properly for a non-relational data store. 

A textbook data design for use in a relational database is entirely incorrect to use in a non-relational database.

Oren Eini (Ayende Rahien)

unread,
Nov 5, 2015, 1:01:38 AM11/5/15
to ravendb
The easiest way to figure out why this is happening is to import the data to 3.0 and look at the indexing tab.
This produce output like this:

Inline image 1


Note that this would give us information even if you don't move to 3.0, because it would indicate the relative costs in indexing.

You could also enable debug logging and look at the logs for the timing, but that is lot more work.

In terms of what is actually going on, I think that you have a lot of indexes that are expensive. 
You have a lot of load document and fanout, both of which have perf issues if abused.

You also have 28 indexes on a standard license, so we'll process only 6 at a time. That would increase the staleness period, especially if you have a lot of slow indexes.

If you have suggestions enabled, or if you have spatial, that can also increase the indexing times.

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jag7070

unread,
Nov 5, 2015, 6:56:53 AM11/5/15
to RavenDB - 2nd generation document database
We took your advice and started importing a dump file into v3.0 (latest stable - 3800) but encountered an EsentVersionStoreOutOfMemory that stopped the import.
We lowered the import batch size number from default 1024 down to 512 and managed to import the full database.
While importing, the indexing process gave warning about one or more indexes exceeding the fanout limit of 15 - so we stopped the ravendb service and added the following two settings to config:
<add key="Raven/MaxSimpleIndexOutputsPerDocument" value="150"/>
<add key="Raven/MaxMapReduceIndexOutputsPerDocument" value="500"/>

Then we reset the indexes to start at scratch.

The server started reindexing, but used more and more memory and the pagefaultdelta count of Raven.Server.exe started showing continuous numbers in the range of 3.000 to 60.000 and RavenStudio got unresponsive. We also got indexing errors but were not able to see them because of unresponsive RavenStudio - could they be EsentVersionStoreOutOfMemory errors? It all ended up in the entire computer slowing down and we had to stop the RavenDB-service.

Do you have any suggestions for the next step we should do?

Oren Eini (Ayende Rahien)

unread,
Nov 5, 2015, 6:58:44 AM11/5/15
to ravendb
What is the size of your documents? 

Oren Eini (Ayende Rahien)

unread,
Nov 5, 2015, 6:58:51 AM11/5/15
to ravendb
Also, try to introduce an index at a time to narrow things down.

jag7070

unread,
Nov 5, 2015, 8:26:46 AM11/5/15
to RavenDB - 2nd generation document database
Typical document sizes
----------------------
 Lesson: 18kB, 216.000 documents
 SkoleAr: 4.0kB, 80 documents  (referenced by Lesson.SchoolPeriodId property - one reference)
 Gruppe: 2.6kB, 4.700 documents  (referenced by Lesson.Resources[] - one reference)
 LarerStilling: 2.3kB, 2.500 documents  (referenced by Lesson.Resources[] - typically between 1-3 LarerStillings)
 Elev: 1.3kB, 9.000 documents  (referenced by Lesson.Resources[] - typically between 10-25 Elevs)

Also, I will try to start up the service by disabling all indexes and do one-by-one...

Chris Marisic

unread,
Nov 5, 2015, 1:30:51 PM11/5/15
to RavenDB - 2nd generation document database
Those document counts and sizes confirm my statements that this is a relational model, but that is an absolutely tiny database.

jag7070

unread,
Nov 6, 2015, 8:28:12 PM11/6/15
to RavenDB - 2nd generation document database
By disabling all indexes and enabling them one by one we were able to get the database up and running in a non-stale state.

We reset the three slowest indexes and found that LessonIndex, LessonStudentIndex and SchoolViewIndex used respectively 6min, 16min and 49 minutes to go from a reset to non-staleness.

Here is an example of one of the batches from SchoolViewIndex:


It seems that it is the Lucene_AddDocument and the UpdateDocumentReferences that is taking the most time in the batches of SchoolViewIndex.
We are surprised that our tiny database with its indexes end up with around 15GB on disk - does this sound correct or does it maybe point to something not being quite right?

Per Eivind Jenssen,
Oppad as


torsdag 5. november 2015 07.01.38 UTC+1 skrev Oren Eini følgende:

Oren Eini (Ayende Rahien)

unread,
Nov 7, 2015, 1:22:46 AM11/7/15
to ravendb
Okay, that helps.

Note that we'll follow up on this as a support call outside the list.
I'll just mention that 2K input items and 64K output items are a likely place to start investigating what is going on. 

Thor A. Johansen

unread,
Nov 7, 2015, 8:34:31 AM11/7/15
to RavenDB - 2nd generation document database
The ratio 64k/2k is by design (arguably a bad design which we consider changing). There is a fanout; for each Lesson we iterate the resources in the lesson which number 15-32 resources per lesson. That accounts for the observed ratio.

But even with this design we believe there must be something else at play here. Note that the indexing times observed at the customer site was in the multiple hour range (also see the OP), and not in the minutes as observed now in the 3.0 instance.

Btw: we have the database dump available for you but we need your dropbox account to share it with you.

--
Thor Arne Johansen
Oppad AS

Thor A. Johansen

unread,
Nov 22, 2015, 7:48:22 AM11/22/15
to RavenDB - 2nd generation document database
Update: The issue was diagnosed through HRs excellent support response (and even resulted in a postmortem blog article im Ayendes blog).

The conclusion was that we somehow had enabled the suggestions feature and that this along with large indexes and high refcount conspired to give stale indexes for tens of minutes.

To resolve this we have killed the suggesions feature by deleting the suggestions folder and started changes our indexes to avoid high fanout and high refcounts.

This is all fine except that (until just recently) we never understood how the suggestions got enabled in the first place (we have no code and no manual procedures to enable this)

We discovered that the 2.5 studio automatically enables suggestions if you do queries in the studio. Researching how to disable this we discovered a thread in the forum suggesting that the "feature" was disabled (or made opt-in) in 2.5.

Should autoenabling suggestions from the studio be disabled in 2.5 2946 or is this a regression?

Oren Eini (Ayende Rahien)

unread,
Nov 22, 2015, 7:56:59 AM11/22/15
to ravendb
There is a configuration option to disable this.
We can't disable this by default because users may rely on it.
It is disabled by default in 3.0

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


jag7070

unread,
Nov 26, 2015, 3:59:40 AM11/26/15
to RavenDB - 2nd generation document database
We tried finding this option for v2.5 but did not really see it. The closest one we found were "Raven/CreateAutoIndexesForAdHocQueriesIfNeeded".

Which one is it that disables the server from creating index-suggestions when Studio index queries return an empty result set?

Best regards,
Per Eivind Jenssen,
developer Oppad as

Oren Eini (Ayende Rahien)

unread,
Nov 26, 2015, 4:42:45 AM11/26/15
to ravendb
Raven/PreventAutomaticSuggestionCreation
Reply all
Reply to author
Forward
0 new messages