Back to basics: issuing Load between two sessions quickly will return stale data

76 views
Skip to first unread message

Kamran Ayub

unread,
Jun 2, 2015, 4:56:18 AM6/2/15
to rav...@googlegroups.com
I was just observing an issue in production while I had decent load on the server (for my little app, anyway).

I use Web API primarily for making changes to documents in Raven and depending on the circumstances, these API requests could load the same document multiple times pretty quickly.

I was noticing that I was making a change to a document (my user) and I would make a change, than request to undo it (e.g. Add/Remove). If I did it fast enough, the second request would fail because the server didn't see the new changes to my user.

So my first thought was, something is wrong--Load should always be returning the instance. But then I looked at the docs and for some reason, I had misread (a long time ago) what Load actually guaranteed (the same instance in a single session). Somehow I knew and understood an index query would return stale results but had not thought the same about Load across sessions. Since I'm using IDocumentSession InRequestScope, I'm using a new session per request. So two API requests close in time will not guarantee the same instance is loaded.

So how do people reconcile this with business logic that might be doing validation on the entity that is loaded. Is everyone just using UniqueConstraints on anything they must guarantee to be unique (email, phone, username, collection items, etc.)?

This is more of a best practice/pattern discussion because I know the answer to my question is: if you care about uniqueness, it must be a separate document. But are some people not doing that and how do you deal with this issue?

Kamran Ayub

unread,
Jun 2, 2015, 4:57:58 AM6/2/15
to rav...@googlegroups.com
I forgot to say, the case that had surprised me the most was a case where it took 30 seconds for my request to succeed in discovering my new change. That is how I started investigating this. I imagine this had to do with the load on the server. Most test cases I did were within a second, but 30 seconds? That's a long time!

Michael Yarichuk

unread,
Jun 2, 2015, 5:08:28 AM6/2/15
to rav...@googlegroups.com
What do you mean by "discovering new change"? Querying or using Load()?

--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Best regards,

 

Michael Yarichuk

RavenDB Core Team

Tel: 972-4-6227811

Fax:972-153-4-6227811

Email : michael....@hibernatingrhinos.com

 

RavenDB paving the way to "Data Made Simple" http://ravendb.net/  

Kamran Ayub

unread,
Jun 2, 2015, 5:18:24 AM6/2/15
to rav...@googlegroups.com
Using Load<T>(id). If I issue two requests simultaneously (<1ms), the second request will fail because it Loaded an old instance.

You received this message because you are subscribed to a topic in the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ravendb/u-C5UQmhwnU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ravendb+u...@googlegroups.com.

Kamran Ayub

unread,
Jun 2, 2015, 5:29:31 AM6/2/15
to rav...@googlegroups.com
Apologies, in that specific case it happened over 30 seconds (I have a log of the trace). In local development on my machine, I can reproduce the issue if I issue two API requests side-by-side (since my machine is faster and is quicker than my cloud host).

For example, in production, this is what I saw. The first PUT is "Add something to user" and the subsequent failed requests are "Remove that thing I just added". After ~30 seconds, it finally succeeded because my code checks to see if that item exists in the user document before deleting it:

6/2/2015, 2:16:45 AM
AddSomethingToUser(1);
Saving 1 changes to https://...
PUT users/1
 => SUCCESS

RemoveSomethingFromUser(1, item);
Loading document [Users/1] from https://...
=> "Item does not exist on your user!"

// Same results for these manual requests (same trace entry)
6/2/2015, 2:16:46 AM
6/2/2015, 2:16:49 AM
6/2/2015, 2:16:51 AM
6/2/2015, 2:16:54 AM
6/2/2015, 2:16:56 AM
6/2/2015, 2:16:58 AM
6/2/2015, 2:17:00 AM
6/2/2015, 2:17:01 AM
6/2/2015, 2:17:02 AM
6/2/2015, 2:17:03 AM
6/2/2015, 2:17:04 AM
6/2/2015, 2:17:05 AM
6/2/2015, 2:17:12 AM
6/2/2015, 2:17:13 AM
6/2/2015, 2:17:14 AM

6/2/2015, 2:17:14 AM
RemoveSomethingFromUser(1, item);
Saving 1 changes to https://...
PUT users/1
 => SUCCESS

Pseudo code:

AddSomethingToUser(userId, item) =>
  Load(userId).Collection.Add(item);

RemoveSomethingFromUser(userId, item) => 
  var user = Load(userId);

  IF (user.Collection.Any(item))
    user.Collection.Remove(item)
    SaveChanges()
  ELSE
    exception("Item does not exist!")

Since the two methods are called across two sessions (1 API request = 1 session), it makes sense why it doesn't work. I was just surprised it took 30 seconds to work.

On Tuesday, June 2, 2015 at 10:56:18 AM UTC+2, Kamran Ayub wrote:

Kamran Ayub

unread,
Jun 2, 2015, 5:31:36 AM6/2/15
to rav...@googlegroups.com
Oh my God I wish Google groups allowed editing. That trace is not 30 seconds; it's only a second, I just realized--whatever, I did observe a case of this that was ~30 seconds, but that trace approximates the issue.


On Tuesday, June 2, 2015 at 10:56:18 AM UTC+2, Kamran Ayub wrote:

Oren Eini (Ayende Rahien)

unread,
Jun 2, 2015, 6:39:50 AM6/2/15
to ravendb
Are you using Aggresive Caching? Or multiple nodes load balancing?
Can you show the Fiddler trace?

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


--

Kamran Ayub

unread,
Jun 2, 2015, 7:03:41 AM6/2/15
to rav...@googlegroups.com
Thanks Oren. I am using AggressiveCaching but not on those requests, at least not intentionally. This crossed my mind too--was the delay due to Raven sending a change notification and invalidating the cache? However, I'm not explicitly caching those requests.

What I am doing is aggressively caching for specific blocks of code, but I'm wondering if I'm inadvertantly caching other things.

For example, here's what some code looks like (all called in one session):

class MyAppService {

    // injected InRequestScope
ctor(IDocumentSession session) {
   this.session = session;
}

public IEnumerable<Company> GetAllCompanies() {
  // companies only change nightly, aggressively cache so we
  // don't have to request so much
  using (this.session.Advanced.DocumentStore.AggressivelyCache()) {
    // business rule: always <1000 companies, so load them all
    return this.session.Advanced.LoadStartingWith("Company/", pageSize:1000);
  }
}

public SomeViewModel PopulateViewModel(userId) {
var user = session.Load<User>(userId);

var companies = GetAllCompanies();

var vm = new SomeViewModel();
vm.Username = user.Username;
// blah blah map properties

// return companies for use in view/API
vm.Companies = companies;

return vm;
}
}

class SomeController : ApiController {
ctor(MyAppService) {
  this.service = myAppService;
}

public HttpResponseMessage Get() {
  var vm = this.service.PopulateViewModel("users/1");

  return Request.CreateResponse(HttpStatusCode.OK, vm);
}

}

This is simplified, but essentially illustrates how I'm calling AggressivelyCache. Is the GetAllCompanies inadvertently being applied to the entire session in the request? I think the aggressive caching only applies to getting all the companies--but I haven't had a chance to test that assumption (it was working so I assumed this is the case but now I'm not sure). I can probably isolate that in a test case, I just haven't yet.

Please correct me if my assumption is wrong.

Are you asking for a Fiddler locally where I can reproduce the issue? It's very quick locally but I can get it to fail. I can try to capture the Fiddler requests to Raven/my API. I am testing by issuing two AJAX requests simultaneously in the browser using the console, all of which I can capture in Fiddler (haven't tried capturing Raven yet).

By the way--I think this is all by design, the docs say the same instance is guaranteed in a single session and my issue is issuing TWO requests, creating two sessions, and seeing the second one not have the instance that is saved in the first session. Am I wrong on that account?

As for the delay in SaveChanges being commited to Raven and read from my code, when I observed the longer case, I had Raven Studio open and I am about 90% sure I refreshed the doc on Raven and my change was in there, but my code was still failing--so that's why I thought it might be due to caching. I say 90% because I did not stop to capture the timestamp of everything and it was 3am last night so my mind was foggy, this was early on in my observations :(

I'll try to get a real capture going with all the information and timestamps and traces.

Oren Eini (Ayende Rahien)

unread,
Jun 2, 2015, 7:41:50 AM6/2/15
to ravendb
RavenDB guarantees that a Load will _always_ see the latest committed value.
However, what you are doing: my issue is issuing TWO requests, creating two sessions, and seeing the second one not have the instance that is saved in the first session. Am I wrong on that account?

If the 2nd session loaded the instance _before_ you saved it, that will cause you to see the older version, yes.
And if you keep calling Load, that is just hitting the session cache, not the server.

Capturing the App -> Raven interaction should tell us what is going on, yes.

Kamran Ayub

unread,
Jun 2, 2015, 7:55:54 AM6/2/15
to ravendb
Great, I will try to capture what the heck is going on. So in my production observation (and log above) with the long delay between successful requests, my API requests were definitely far enough apart (and issued multiple times) that it shouldn't have failed to get a new instance? 

Does Raven log every request to trace even if it aggressively caches (AFAIR, aggressive request shouldn't result in an HTTP request, right?)? i.e. could the "Loading users/1" trace entries in production still be issued if there was caching going on? Or does each of those entries imply the server requested an entity from Raven and Raven returned it?

By the way, I didn't sleep good--that trace I posted early is 30s long, ugh (2:16:45 to 2:17:14 ~ 30 seconds). I can't read today.



You received this message because you are subscribed to a topic in the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ravendb/u-C5UQmhwnU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ravendb+u...@googlegroups.com.

Oren Eini (Ayende Rahien)

unread,
Jun 2, 2015, 7:57:49 AM6/2/15
to ravendb
Yes, there are logging that you can use, but it is probably better to use the profiling API for that. See: http://ravendb.net/docs/article-page/2.0/csharp/appendixes/profiling-mvc-application

Kamran Ayub

unread,
Jun 2, 2015, 8:40:46 AM6/2/15
to rav...@googlegroups.com
You're right so far. My local test cannot reproduce the production issue. If I issue the AJAX simultaneously, you are correct in saying the second request executes before the first saves the document--thus there's no issue there.

In production, I was observing the issue with much longer times and that was after the first response returned.

I'll keep digging and seeing if I can get any more info.

Kijana Woodard

unread,
Jun 2, 2015, 10:22:17 AM6/2/15
to rav...@googlegroups.com
At any rate, set UseOptimisicConcurrency = true and the "stale update" won't succeed.
Fwiw, I turn this on by default.

Kamran Ayub

unread,
Jun 2, 2015, 12:05:31 PM6/2/15
to rav...@googlegroups.com
Thanks Kijana. I don't think I'm having an issue there--my second request already fails anyway. The issue is my code is not getting the latest instance so my validation fails altogether (since it doesn't see the new change I made). If I didn't have validation in the first place, then I might run into a concurrency issue.

Is there a good article on globally dealing with UseOptimisticConcurrency, how do you handle it in your app? Off-topic, so a link is fine.

Chris Marisic

unread,
Jun 2, 2015, 12:12:27 PM6/2/15
to rav...@googlegroups.com
Maybe a simpler solution is just throttle the undo operation on the client side. User clicks undo just wait 500ms or whatever before calling the server. Or maybe check the time between when the undo action was offered vs when clicked and ensure a minimum amount of time has elapsed. 

Kijana Woodard

unread,
Jun 2, 2015, 12:26:01 PM6/2/15
to rav...@googlegroups.com
if 3.0
   store.Conventions.DefaultUseOptimisticConcurrency

else 
   set "session.Advanced.UseOptimisticConcurrency = true" during session construction using IoC of choice

Kijana Woodard

unread,
Jun 2, 2015, 12:34:21 PM6/2/15
to rav...@googlegroups.com
Kamran, if I understand your scenario, for these cases I tend to let the user make multiple changes before explicitly submitting to server. A boring old Save button. 

If you still want to keep things "slick", perhaps some kind of debounce logic such that you don't submit until "changes have stopped". Further, don't send another request until the first returns. Pipe-lining requests will hurt since you're mutating data on separate threads.



--

Kamran Ayub

unread,
Jun 2, 2015, 12:59:12 PM6/2/15
to rav...@googlegroups.com
Hmm, I just witnessed the issue again in production, this time there was a good 2-3 seconds before issuing the second request (I actually went to a new page and Web API had returned old data). I'll try to find the log entries.

I'll see what else I can find out, this seems wrong. I made a change regarding IoC configuration due to an issue with Singleton not being respected by Web API Ninject resolver: http://stackoverflow.com/questions/11356864/ninject-insingletonscope-with-web-api-rc

I noticed this issue after I made this change--but that is only a correlation, there were also many people hitting the site while I was testing. Right now there was not many people, so I am not sure if it's due to amount of requests. If I could just get it to reproduce reliably, I could test properly.

You received this message because you are subscribed to a topic in the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ravendb/u-C5UQmhwnU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ravendb+u...@googlegroups.com.

Chris Marisic

unread,
Jun 2, 2015, 1:51:59 PM6/2/15
to rav...@googlegroups.com
With your comments about Ninject i would say it's 70% likely you have a defect in your state management you're doing through your IOC.

Kijana Woodard

unread,
Jun 2, 2015, 2:47:24 PM6/2/15
to rav...@googlegroups.com
Agree. If I could get back all the time Ninject has cost me.......

Kijana Woodard

unread,
Jun 2, 2015, 2:49:47 PM6/2/15
to rav...@googlegroups.com
I'll add that if you can repro this, it is a *major* bug in raven client.
It would affect everyone using the product.

Every time I have thought I found a similar bug, it turned out to be some quirk [bad code] in my infrastructure.

Kamran Ayub

unread,
Jun 2, 2015, 4:51:55 PM6/2/15
to rav...@googlegroups.com
OK, I have some more data. This is gonna be long, sorry, this thing is driving me insane. I am not sure what it means but it's very interesting.

I wrote a little script that will execute on my MVC action to time how long it takes to read back a saved instance in two sessions side-by-side. 

Here it is:

public ActionResult Timing()
{
    DateTime start;
    DateTime save;
    using (var session = _store.OpenSession())
    {
        save = DateTime.UtcNow;
        var user = session.Load<User>("Users/1");

        user.LastActivityDate = save;

        start = DateTime.UtcNow;
        session.SaveChanges();                
    }

    bool updated = false;
    int iterations = 0;
    DateTime end = DateTime.MinValue;
    TimeSpan tts = TimeSpan.Zero;
    DateTime lmd = DateTime.MinValue;
    TimeSpan diff = TimeSpan.Zero;

    while (!updated)
    {
        end = DateTime.UtcNow;
        tts = end - start;
        using (var session = _store.OpenSession())
        {
            var user = session.Load<User>("Users/1");

            if (user.LastActivityDate == save)
            {
                lmd = session.Advanced.GetMetadataFor(user)["Last-Modified"].Value<DateTime>();
                diff = lmd - end;
                updated = true;
            }
        }

        iterations++;
    }

    var builder = new StringBuilder();

    builder.AppendFormat("Started: {0}", start);
    builder.AppendLine();
    builder.AppendFormat("Ended: {0}", end);
    builder.AppendLine();
    builder.AppendFormat("Time-to-Save: {0}", tts);
    builder.AppendLine();
    builder.AppendFormat("Iterations: {0}", iterations);
    builder.AppendLine();
    builder.AppendFormat("LMD: {0}", lmd);
    builder.AppendLine();
    builder.AppendFormat("Difference: {0}", diff);

    return Content(builder.ToString());
}

I only deployed the script to prod because:

a) that's where I've been seeing the issue and
b) I wanted to get the time differential between my Azure server and RavenHQ server (Amazon).

Time Difference between RavenHQ and Azure: +00:00:27

Good, now I can normalize all my findings, just keep this in mind as you read through--I've adjusted all the Raven-specific timestamps (Last-Modified) according to this number.

The Bad Case

[Trace] is the timestamp from Azure App Insights.
[Chrome] is the timestamp from the Response "Date" header sent by Azure
Raven Last-Modified dates are normalized by subtracting 27 seconds

So here is a sequence of events I gathered from Chrome and Application Insights (logging traces) when I observed the best showcase of the issue:

... I make a change (press a button, issue AJAX) to my user document!

[Trace][18:58:31] (Raven Trace -- PUT [Users/1])

... I try to undo (press another button, issues AJAX), it fails! Maybe I'll catch it in the act now.
... I go to Raven Studio and sure enough, the doc has been modified

Last-Modified for Users/1: [18:58:??] (Raven Studio does not show seconds)

... OK, doc is updated, so now my undo requests should work!
... I go back to my other tab and try to perform an undo request

... they all fail with trace entries like:

[Trace][18:59:03] Loading document [Users/1] from https://...
[Chrome][18:59:03] AJAX Response (FAIL)
[Trace][18:59:35] Loading document [Users/1] from https://...
[Chrome][18:59:35] AJAX Response (FAIL)

... finally, it works!

[Trace][18:59:43] (Raven Trace -- PUT [Users/1])
[Chrome][18:59:45] AJAX Response (SUCCESS)

... so I checked the LM date timestamp (from C#, not Studio)

Last-Modified for Users/1: [18:59:43]

Total time it took for my code to load the newly changed document? 00:01:12!

Holy crap, that definitely isn't right! The document has CLEARLY been updated properly in Raven, so why are my subsequent requests failing?

This is when I decided to write my timing code. I wrote it only to get the time differential but I found something more interesting.

Here's a normal output (reference script above) (LMD is Raven's Last-Modified date on the document):

GET /test-timings
Started: 6/2/2015 7:52:53 PM
Ended: 6/2/2015 7:52:53 PM
Time-to-Save: 00:00:00.0468759
Iterations: 1
LMD: 6/2/2015 7:53:21 PM
Difference: 00:00:27.0657413

GET /test-timings
Started: 6/2/2015 7:59:40 PM
Ended: 6/2/2015 7:59:40 PM
Time-to-Save: 00:00:00.5459673
Iterations: 1
LMD: 6/2/2015 8:00:08 PM
Difference: 00:00:27.0820819

Those are two "normal" sample requests. OK, so we can see on average it's only about 500ms to save a change to a document and the time difference between RavenHQ and Azure is 27 seconds--that's how I could normalize the event sequence above.

However, I decided to see what would happen if I browsed the site in one tab and refresh the timing page. Suddenly it was taking a long time. So long in fact, I refreshed the other tab again to see if the site was frozen and I think it stopped the loop (or it was a coincidence, but I did it again the same way one other time):

Started: 6/2/2015 7:52:55 PM
Ended: 6/2/2015 7:55:55 PM
Time-to-Save: 00:02:59.8295933
Iterations: 9541
LMD: 6/2/2015 7:52:55 PM
Difference: -00:02:32.6229524

Holy crap. What's interesting about this is the Last Modified date on the document! It matches the observation from that sequence above; the save happens immediately... so why the hell does it takes 9500 iterations to load an instance that has been changed?!

I checked my Application Insight logs after this:

So it was definitely writing to trace (all 9500 times). Unfortunately, I cannot see the responses--if they were 304 Not Modified or were returning a stale document. 

RE: Ninject

My app uses Ninject, but this IS NOT through the Web API resolver that I changed--this is regular Ninject with MVC controller (_store is a singleton). I've been using this for the entire time I've had Raven so I know it's disposing sessions InRequestScope.

I do not think it's a Ninject issue--the code clearly creates a new session and disposes within a single iteration. Something else is going on!

At this point my conclusions are:

1. Caching. Something is being cached. Somewhere. If only I could see those HTTP requests being sent to Raven.
2. Magic. Gnomes have invaded my codebase and are wrecking havoc.
3. Something else I can't see

So, I'll probably continue my investigation tomorrow and try to get the timing issue to happen again (it was easy to do with my code above) and capture the requests between Azure and Raven, maybe using code in that script to hook into Raven's profiling.

On Tuesday, June 2, 2015 at 10:56:18 AM UTC+2, Kamran Ayub wrote:

Chris Marisic

unread,
Jun 2, 2015, 5:32:29 PM6/2/15
to rav...@googlegroups.com
I don't have the slightest idea the only suggestion I would have is take your code and put it in a new solution with no ninject and the barest minimum things and then test on entirely new raven databases.

Oren Eini (Ayende Rahien)

unread,
Jun 2, 2015, 5:37:16 PM6/2/15
to ravendb

Or get fiddler traces

--

Kamran Ayub

unread,
Jun 2, 2015, 6:16:45 PM6/2/15
to rav...@googlegroups.com
Yah, I'm trying to create an isolated test case... we'll see if I find anything.


On Tuesday, June 2, 2015 at 10:56:18 AM UTC+2, Kamran Ayub wrote:

Oren Eini (Ayende Rahien)

unread,
Jun 3, 2015, 1:42:27 AM6/3/15
to ravendb
The profiling API support in RavenDB would be helpful here.
Also, I'm betting that the underlying issue is proxy / http invasive stuff in your stack.

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


--

Oren Eini (Ayende Rahien)

unread,
Jun 3, 2015, 1:43:21 AM6/3/15
to ravendb
Or, of course, two people running the code at the same time.
Since you are always updating the same document, if something else also updated it, overwriting your last activity date, that would explain it.

Oren Eini (Ayende Rahien)

unread,
Jun 3, 2015, 1:43:53 AM6/3/15
to ravendb
Note that the issue is that you are using your own "concurrency" mechanism, which isn't really how it works. 
Try using etags, they are guaranteed to work and will give more details.

Kamran Ayub

unread,
Jun 3, 2015, 5:19:46 AM6/3/15
to ravendb
Thanks Oren. For this case, only one person is updating the document (me), not anyone else--all my API calls are for the authenticated user (currently, no ability to update anyone else). Users are pretty much the only docs that get updated right now.

I'll try using Etags during the timing test, you're right, it would be better.

PS. With my limited testing, I could not reproduce the issue with a blank MVC5 site using the RavenController pattern and simulating traffic, nor could I reproduce it in a Raven test case where I tried using threads to emulate requests. It could be differences between my PC and the cloud environment (namely, they're slower, there's more latency).

For now I'm going to revert my Ninject changes and see if I can still reproduce it.

You received this message because you are subscribed to a topic in the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ravendb/u-C5UQmhwnU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ravendb+u...@googlegroups.com.

Oren Eini (Ayende Rahien)

unread,
Jun 3, 2015, 5:57:01 AM6/3/15
to ravendb
Note that multiple requests (for example, double clicking the button generating two ajax requests can do that.

Kamran Ayub

unread,
Jun 3, 2015, 5:58:34 AM6/3/15
to rav...@googlegroups.com
That's true, I understand that (I actually use a plugin that prevents that)--but in that timing test above, it's all within a single request.
Reply all
Reply to author
Forward
0 new messages