Strange behavior at RavenDB startup and stale data

TJIMH

unread,

Aug 22, 2014, 2:39:51 PM8/22/14

to rav...@googlegroups.com

I am trying to track down an occasional bug that results in a duplicate document being written to the documentstore incorrectly.

Essentially the app receives data from an external system via an api. The data has a datetime associated with it.

When the data arrives we look to see if there is already data stored for this datetime. If so we update the data, otherwise we add a new document.

These data notifications can come very close together in time for the same day.

The code below works 99% of the time just fine.

Occasionally we get two records for the same datetime which led me to believe that the index is stale even though we are waiting for nonstale results. But this doesn't seem to be the issue because we are logging if the results are stale when we are trying to add a new record and they are not.

The only thing that seems to be involved is that at the time this error occurs it seems that RavenDB is in start up mode because I see this in the log file: Raven.Database.DocumentDatabase,Debug,Start loading the following database: HJ_App

So if two notifications of data with the same datetime arrive close together while RavenDb is starting up it seems the first document is written for the datetime. Then when the second record is processed the query below is not finding the first record so it is added again with the same datetime.

Is this a possible scenario at RavenDB startup time?

while (moredata)

{

var storedRecord = await asyncsession.Query<Data>()

.Customize(x => x.WaitForNonStaleResultsAsOfLastWrite());

.Where(x => x.Common.When == datetime).SingleOrDefaultAsync();

if (storedRecord != null)

{

add new record

}

else

{

update old record
}

}

Kijana Woodard

unread,

Aug 22, 2014, 2:46:38 PM8/22/14

to rav...@googlegroups.com

For external import, I've used Bulk Insert with success. I format the document Id to be computed with something known from the 3rd party.

"products/{affiliate]/{productId}"

In this way, I add or update the same document, but no duplicates.

I don't rely on queries here.

--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Federico Lois

unread,

Aug 22, 2014, 2:49:38 PM8/22/14

to rav...@googlegroups.com

Having a similar use case here, using indexing and waiting for non-stale results is a very risky business.

- It won't scale when the amount of data increases.

- If an index gets reset you may not receive data for a long time.

- You can have race conditions (what sounds like this case).

- Let's say that you wait, but also B, C and D are waiting (one of those probably a timeout client that will eventually retry).

- Now all of those can write (index is not stale anymore).
- Race condition (unless you are aggressive handling of ids -- in which case you why to wait for indexing in the first place).

I would suggest avoid queries altogether for storing new data and use id checks instead.