Index entries causing slow updates for associated documents

36 views
Skip to first unread message

Jordan Brown

unread,
Sep 18, 2017, 2:57:02 PM9/18/17
to RavenDB - 2nd generation document database
We've been having performance issues when updating documents in one of our Raven application databases. It's a single instance, single database setup on Raven Server/Client build #35215. Specifically, we store activity logs when users interact with calendars and venue events in the system. Storing activity logs in Raven may be a problem in and of itself, but our database is quite small, and we're surprised to see performance issues arise this early in development.

Here is our index definition:

public class ActivityLogIndex : AbstractIndexCreationTask<ActivityLog, ActivityLogIndex.ReduceResult>
{
public ActivityLogIndex()
{
Map = activityLogs => from activityLog in activityLogs
let venueEvent = LoadDocument<VenueEvent>( activityLog.VenueEvent.Id )
let user = LoadDocument<User>( activityLog.ActivityByUser.Id )
let calendar = LoadDocument<Calendar>( activityLog.Calendar.Id )
select new
{
IsExternalEvent = calendar.IsExternal,
VenueEventId = activityLog.VenueEvent.Id,
activityLog.ActivityOn,
ActivityByUserId = activityLog.ActivityByUser.Id,

EventDateOldStart = activityLog.EventDateOld.StartDate,
EventDateOldEnd = activityLog.EventDateOld.EndDate,
EventDateOldStatus = activityLog.EventDateOld.Status,

EventDateNewStart = activityLog.EventDateNew.StartDate,
EventDateNewEnd = activityLog.EventDateNew.EndDate,
EventDateNewStatus = activityLog.EventDateNew.Status,

EventName = activityLog.VenueEventName,
ActivityByUserName = user.Name,

ResolvedEventStart = activityLog.EventDateNew != null ? activityLog.EventDateNew.StartDate : venueEvent.When.StartDate,
ResolvedEventEnd = activityLog.EventDateNew != null ? activityLog.EventDateNew.EndDate : venueEvent.When.EndDate,
venueEvent.IsConfirmed,

activityLog.Action
};
}

public class ReduceResult
{
public string CalendarId { get; set; }
public bool IsExternalEvent { get; set; }
public string VenueId { get; set; }
public string VenueEventId { get; set; }
public Instant ActivityOn { get; set; }
public string ActivityByUserId { get; set; }
public LocalDate? EventDateOldStart { get; set; }
public LocalDate? EventDateOldEnd { get; set; }
public LocalDate? EventDateNewStart { get; set; }
public LocalDate? EventDateNewEnd { get; set; }
public EventDateStatus EventDateOldStatus { get; set; }
public EventDateStatus EventDateNewStatus { get; set; }
public LocalDate ResolvedEventStart { get; set; }
public LocalDate ResolvedEventEnd { get; set; }
public string EventName { get; set; }
public string ActivityByUserName { get; set; }
public bool IsConfirmed { get; set; }
public ActivityLogAction Action { get; set; }
}
}

Our database currently contains 532,000 activity logs, 102,000 venue events, 700 users, and 300 calendars. Each document is typically 0.5-2kb in size. The performance issue occurs when updating a user having a large number of associated activity logs.

For example, on an i7/16gb developer machine, the following SaveChangesAsync call takes ~2 seconds when updating a user associated with 40,000 activity logs:

var user = await _documentSession.LoadAsync<User>( userId );
user.IsActive = true;
await _documentSession.SaveChangesAsync();

If I delete the 40,000 activity logs associated with the user, the same update takes less than 5ms.

We fixed this issue by removing the "let user = LoadDocument" call from the ActivityLogIndex, but we're still curious as to why this is an issue at all. Shouldn't Raven's indexing process be 100% asynchronous here and not impact such a small document update?

Also, as a side note, we've had intermittent issues when deploying the ActivityLogIndex to our servers. Though indexing usually takes 5-10 minutes, it sometimes takes much longer than that and causes Raven to throw hundreds of EsentVersionStoreOutOfMemory exceptions. Setting the Raven/Esent/MaxVerPages config option seems to help, but again, we're surprised this is an issue to at all.

Thanks in advance for any insight you can provide us.

Tal Weiss

unread,
Sep 18, 2017, 3:12:16 PM9/18/17
to RavenDB - 2nd generation document database
The reason for that is when a document is modified and referenced by a load document we need to "touch" all the referencing documents. This process increments the etag of the referencing documents letting them know they need to be reindexed. While the indexing is async the touching isn't and we need to modify 40k documents etags that will explain the performance.
Try to avoid such many to one modeling, Ayende blog contains a few modeling tips for such cases. 

--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages