Patches not applied to all documents

332 views
Skip to first unread message

Kenneth Truyers

unread,
Oct 21, 2014, 7:25:14 AM10/21/14
to rav...@googlegroups.com
Hi,

Sometimes when we execute a patch on a collection, it only updates a part of the collection.

This is an example of such a patch:

this.Active = true;


When we run this, sometimes it only updates 80% of the documents.
What would be the reason of this and how can we troubleshoot it?

Oren Eini (Ayende Rahien)

unread,
Oct 21, 2014, 9:50:48 AM10/21/14
to ravendb
Is the database idle? Is the index stale?
Look at the logs, it should tell you what it is doing.

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Kenneth Truyers

unread,
Oct 21, 2014, 9:52:42 AM10/21/14
to rav...@googlegroups.com
Hi Oren,

It's possible that some indexes may have become stale while it was applying the patch. 
Could you tell me which logs to look at and what to look for?

Oren Eini (Ayende Rahien)

unread,
Oct 21, 2014, 9:53:55 AM10/21/14
to ravendb
The debug logs. By default, we take a snapshot of the index at the start of the operation, and if it is stale, we might miss docs.

Tom Allard

unread,
Oct 21, 2014, 1:12:12 PM10/21/14
to rav...@googlegroups.com
I run often into this as well. Usually have to run a patch multiple times for it to get all the documents.

(This is from the studio and i did not investigate much yet what is going wrong, but
there really should be a progress indicator or something like that in the studio when patching, so i can discard the factor of me being impatient.)

-Tom

Kijana Woodard

unread,
Oct 21, 2014, 1:49:57 PM10/21/14
to rav...@googlegroups.com
I will add that, in 2.5, the patching process just ends with no visual indication. I usually end up doing some queries or running the Test again to see if "something has happened". 

Hmmmm. *runs off to try studio Patching 3.0*

Chris Marisic

unread,
Oct 21, 2014, 1:54:28 PM10/21/14
to rav...@googlegroups.com
When i use patches i always include some type of filter. This allows me to easily use the patch window to show the results that will be affected by the patch. I run the patch, hit the button, and see if any new results trickled in and if so, run the patch again.

I do my best to avoid patch the entire database/collection.

Kijana Woodard

unread,
Oct 21, 2014, 1:55:48 PM10/21/14
to rav...@googlegroups.com
3.0 html5 studio patch looks "odd" in Firefox, but a nice green toast appears when the operation is in progress and complete.

Inline image 1

Oren Eini (Ayende Rahien)

unread,
Oct 22, 2014, 2:45:37 AM10/22/14
to ravendb
Yes, we already have an issue to fix this issue.

Shayne van Asperen

unread,
Oct 23, 2014, 8:52:11 AM10/23/14
to rav...@googlegroups.com
And when will we be able to have that fix? This has been an issue for a LONG time

Oren Eini (Ayende Rahien)

unread,
Oct 23, 2014, 8:58:20 AM10/23/14
to ravendb

I'm pretty sure that the patch ui fix is in a public build

Shayne van Asperen

unread,
Oct 23, 2014, 9:03:36 AM10/23/14
to rav...@googlegroups.com
It sound like you are talking about the UI for 3.0. Is there a patch UI progress inidicator for 2.5?

Oren Eini (Ayende Rahien)

unread,
Oct 23, 2014, 9:04:42 AM10/23/14
to ravendb

No

Shayne van Asperen

unread,
Oct 23, 2014, 10:01:53 AM10/23/14
to rav...@googlegroups.com
So basically what you're saying is that patching in 2.5 has never worked properly and will never work properly and that the only chance of having it work properly is to migrate to 3.0.

Thanks, that's not very helpful.

Chris Marisic

unread,
Oct 23, 2014, 10:30:26 AM10/23/14
to rav...@googlegroups.com
Patching works properly, it just doesn't have a great visual indicator in 2.5's studio.

Shayne van Asperen

unread,
Oct 23, 2014, 10:55:01 AM10/23/14
to rav...@googlegroups.com
No. It definitely doesn't work properly. When patching a collection (not an index, a whole collection) while no writes are happening to the database and no indexes are stale, 9 times out of 10 it fails to patch all documents. It stops somewhere and leaves some documents patched and others not patched. This is outrageous and totally unacceptable, especially since we have paid for a licence to use RavenDB. This product is not fit for purpose!

Kenneth Truyers

unread,
Oct 23, 2014, 11:41:06 AM10/23/14
to rav...@googlegroups.com
Oren, Chris,

I tested this again. I was executing a patch on a collection, so I'm not sure why a stale index would have anything to do with it. Also, there were no writes in between, so there would be no reason for an index to go stale.
What I did notice however, is that indexes do go stale because of the patch. There are a few things that I noticed:
 - All indexes go stale, not only the ones where the patch is applied. It doesn't make sense that documents from a different collection would be altered (but I believe this is already discussed in a different thread). 
 - In this case, I'm adding a property, which by its very nature cannot be a part of an index that existed before. Shouldn't it at least detect that this patch won't affect any indexes?
 - The patch does not run as a transaction.The result is that while it's applying a patch, it keeps indexing, so every time it patches a document, whatever has been indexed up to then, needs to indexed again. In my opinion, a patch should be applied as a transaction, which would also eliminate the possibility of partial patches.

I enabled the debug log, but after running the patch this file was 6GB in size (I suspect because of the continuous re-indexing), I obviously cannot attach that here. Could you tell me what I'm looking for? Here's an excerpt that gets repeated over and over in the log (with different data obviously):

2014-10-23 17:21:13.9836;Raven.Storage.Esent.StorageActions.DocumentStorageActions;Debug;Inserted a new document with key 'performers/868596', update: True, ;
2014-10-23 17:21:13.9836;Raven.Database.DocumentDatabase;Debug;Checking references for performers/868596;
2014-10-23 17:21:13.9836;Raven.Database.DocumentDatabase;Debug;Put document performers/868596 with etag 01000000-0000-000D-0000-0000000448E4;
2014-10-23 17:21:13.9836;Raven.Database.Indexing.Index.Indexing;Debug;"Index Performers/ByDataSource for document performers/822758 resulted in:
{}
";
2014-10-23 17:21:13.9836;Raven.Database.Indexing.Index.Indexing;Debug;"Index Performers/ByDataSource for document performers/822758 resulted in (performers/822758): {
  ""Id"": ""performers/822758"",
  ""Sources"": [
    ""PA""
  ],
  ""SourcesCount"": 1,
  ""Staging"": false,
  ""Name"": ""Irish Music Session"",
  ""MatchName"": ""irish music session"",
  ""MbId"": null,
  ""EventCount"": 0
}";

There really is an issue here. 
First of all, I can definitely see patches not being applied completely. To add insult to injury, there's no confirmation whether it was successful and there's no easy way of determining whether it really did patch all documents (apart from creating a new index to check for non-patched documents, which brings me back to my other issue about stale indexes).

Kijana Woodard

unread,
Oct 23, 2014, 11:49:27 AM10/23/14
to rav...@googlegroups.com
IIRC, a "collection" is nothing more than a synthetic construct built from the DocumentsByName index.

In 2.5, a change to a doc "visits" all indexes to see if it applies. Thus all indexes stale. I believe that has been addressed in 3.0.

I would say that the patch shouldn't be a "transaction" as that applies all or nothing. For a large collection, that may not be possible. Instead, the patch should be over a stable snapshot. Not sure how it's implemented, but a Stream would be the way I would _think_ to approach this problem.

None of what I just said addresses the core problem.

A workaround would be to use the Stream API in code.


Kenneth Truyers

unread,
Oct 23, 2014, 12:06:12 PM10/23/14
to rav...@googlegroups.com
I understand the indexing-situation has a lot of improvements in 3.0.  

However, I can not see how a patch should not be transactional. If I issue a command that says "make all performers active", leaving the database with 10% of the performers active is not an acceptable outcome. I can understand that it's not immediate, it might need to do two or three runs, but eventually! all performers should be active. If there's an error mid process, I'd like to decide what to do, rollback, try again, ... and if not that, at least I'd like to be notified. Even having a number of affected documents would be an indicator to see whether everything was patched (although when doing conditional patching it's not enough to determine the outcome either).

I thought of a patch as a shorthand method for an update of a lot of objects at once (which is transactional in Raven), but from what I experience it seems to be more of a hit or miss feature.

I'm still wondering whether this is because of a particular bug in 2.5, something we're doing wrong or some other cause. At the moment however, we're following guidelines from the documentation and the forums and these are the results we're seeing. So I hope someone can clarify what I'm seeing. I think I have given all info, but if you need anything else, I'm happy to provide it.
Message has been deleted

Chris Marisic

unread,
Oct 23, 2014, 12:49:52 PM10/23/14
to rav...@googlegroups.com
How can you have a transaction that spans 1M documents, or 10M, or a billion?

Kijana Woodard

unread,
Oct 23, 2014, 2:22:52 PM10/23/14
to rav...@googlegroups.com
I agree with your intended result. It's just that "transaction" means something that I don't think you intend.

If a "transaction", then either all the documents would be patched or, if any trouble, none of them would be patched.

I _think_ what you mean is "all documents that are in the collection at the moment you issue the command will eventually have the patch applied". That would be the effect of the using the Stream API.

Kenneth Truyers

unread,
Oct 23, 2014, 4:54:04 PM10/23/14
to rav...@googlegroups.com
OK, maybe the terminology is off.

I can understand that the documents that are added while the patch is executing are not affected. (The fact that patching takes very long just makes this worse, for comparison: doing a SQL update on 1M records takes about 1.5 seconds)

Either way, my question still stands:
If it's possible to apply a patch, why doesn't this patch all the documents? If this is not to be expected, how can I troubleshoot it? (The log file doesn't really help as it several GB's in size and mostly littered with indexing statements, so I don't know what to look for.)

Maybe it's not your intention, but I kind of feel like the answers I'm getting here are telling me that there's nothing wrong, while clearly there is. Meanwhile we're playing ping-pong with arguments and counter arguments about terminology and other things that don't really get to the core of the issue. I appreciate the time you're investing to look into this, but I don't feel we're getting any closer to a resolution.

Kijana Woodard

unread,
Oct 23, 2014, 5:20:24 PM10/23/14
to rav...@googlegroups.com
That's _not_ my intention. I just wanted to clarify the terminology as to your expectations. The reason for that is to properly shape the solution space for the core team. I believe we are on the same page there ["I can understand that the documents that are added while the patch is executing are not affected"].

As I said:
"None of what I just said addresses the core problem."

It appears you have identified a real bug. 
I haven't tried to repro myself.

In order to help you get unstuck, I suggested a workaround:
"A workaround would be to use the Stream API in code."

That is _not_ to say "you're doing it wrong", but rather, "the thing your using doesn't appear to be working properly, here's something else to try to get the result you're looking for while waiting for a resolution on this issue".

Hope that clarifies.

Oren Eini (Ayende Rahien)

unread,
Oct 26, 2014, 9:16:57 AM10/26/14
to ravendb
No, it is executing very nicely.
The only issue is that we aren't showing indication that it is over in the ui.

Oren Eini (Ayende Rahien)

unread,
Oct 26, 2014, 9:18:55 AM10/26/14
to ravendb
inline

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


On Thu, Oct 23, 2014 at 5:41 PM, Kenneth Truyers <ken...@kenneth-truyers.net> wrote:
Oren, Chris,

I tested this again. I was executing a patch on a collection, so I'm not sure why a stale index would have anything to do with it. Also, there were no writes in between, so there would be no reason for an index to go stale.
What I did notice however, is that indexes do go stale because of the patch. There are a few things that I noticed:
 - All indexes go stale, not only the ones where the patch is applied. It doesn't make sense that documents from a different collection would be altered (but I believe this is already discussed in a different thread). 

Yes, by design. Any change in any docs turn all documents stale.
 
 - In this case, I'm adding a property, which by its very nature cannot be a part of an index that existed before. Shouldn't it at least detect that this patch won't affect any indexes?

Doesn't matter. You modified a document, the etag changed, therefor the index is stale.
 
 - The patch does not run as a transaction.The result is that while it's applying a patch, it keeps indexing, so every time it patches a document, whatever has been indexed up to then, needs to indexed again. In my opinion, a patch should be applied as a transaction, which would also eliminate the possibility of partial patches.

Yes, what happens if you run a patch over 10000000 items? It would blow up the internal transaction buffers. That is why we run them in batches.

Oren Eini (Ayende Rahien)

unread,
Oct 26, 2014, 9:20:31 AM10/26/14
to ravendb
Yes, it is on a stable snapshot.
We get the list of all document ids matching a query and then run on them.

Oren Eini (Ayende Rahien)

unread,
Oct 26, 2014, 9:22:23 AM10/26/14
to ravendb
Patch is a way to apply a script to a set of documents. It is not applied as a single transaction.
If there is an error at any point, the whole process is stopped. Things that has already been applied will not be reverted, however.

Kijana Woodard

unread,
Oct 26, 2014, 9:50:02 AM10/26/14
to rav...@googlegroups.com
Interesting.

So the OP is seeing that not all docs are patched when starting with a non-stale index and not doing any additional writes.

How could that occur?

OP:  perhaps you can repro by adding an index with the property you're patching. After the patch, wait for that index to go non-stale, and see if any documents are not patched. Iirc, you mentioned considering this approach.

From: Oren Eini (Ayende Rahien)
Sent: ‎10/‎26/‎2014 8:20 AM
To: ravendb
Subject: Re: [RavenDB] Patches not applied to all documents

Yes, it is on a stable snapshot.
We get the list of all document ids matching a query and then run on them.

[The entire original message is not included.]

Kijana Woodard

unread,
Oct 26, 2014, 9:52:58 AM10/26/14
to rav...@googlegroups.com
And the error would be in the logs, correct?

An error on one doc would explain "some docs didn't get patched".

The OP mentioned 6GB of logs.

OP: did you delete the logs just before the patch?

Hmmm. Now I'm curious if an error during patching gets well surfaced in 3.0 studio.
Sent: ‎10/‎26/‎2014 8:22 AM

To: ravendb
Subject: Re: [RavenDB] Patches not applied to all documents

Patch is a way to apply a script to a set of documents. It is not applied as a single transaction.
If there is an error at any point, the whole process is stopped. Things that has already been applied will not be reverted, however.


Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


On Thu, Oct 23, 2014 at 6:06 PM, Kenneth Truyers <ken...@kenneth-truyers.net> wrote:
I understand the indexing-situation has a lot of improvements in 3.0.  

However, I can not see how a patch should not be transactional. If I issue a command that says "make all performers active", leaving the database with 10% of the performers active is not an acceptable outcome. I can understand that it's not immediate, it might need to do two or three runs, but eventually! all performers should be active. If there's an error mid process, I'd like to decide what to do, rollback, try again, ... and if not that, at least I'd like to be notified. Even having a number of affected documents would be an indicator to see whether everything was patched (although when doing conditional patching it's not enough to determine the outcome either).

I thought of a patch as a shorthand method for an update of a lot of objects at once (which is transactional in Raven), but from what I experience it seems to be more of a hit or miss feature.

I'm still wondering whether this is because of a particular bug in 2.5, something we're doing wrong or some other cause. At the moment however, we're following guidelines from the documentation and the forums and these are the results we're seeing. So I hope someone can clarify what I'm seeing. I think I have given all info, but if you need anything else, I'm happy to provide it.

Oren Eini (Ayende Rahien)

unread,
Oct 26, 2014, 10:11:53 AM10/26/14
to ravendb

Error at some point from the patch
Should be able to tell from the log

Bruno Lopes

unread,
Oct 27, 2014, 6:50:58 AM10/27/14
to rav...@googlegroups.com

As someone who's been thinking about using patches to migrate up documents (for when I need to add a field that shouldn't be empty by default to possibly large collections), and would like to have feedback as to whether the patch ran ok, this makes me a bit uneasy. 

Is there any way to figure out if the patch went ok without looking at the logs?
Does the client api return when the patch is done or does it finish in the background?
If it finishes in the background, perhaps we could get a "token" that represents that particular task, so we can periodically check if it's done. 
Or perhaps something like the changes API, where we could subscribe to an event, although for my case I think that might be a bit unwieldy, as I'll be waiting for a particular set of events ("patch ok, N docs patched", "patch not ok, error at X on doc Y. Z docs patched") before proceeding.

Oren Eini (Ayende Rahien)

unread,
Oct 27, 2014, 10:32:47 AM10/27/14
to ravendb

Chris Marisic

unread,
Oct 27, 2014, 12:27:48 PM10/27/14
to rav...@googlegroups.com
Is database.QueryDocumentIds consumable? What does this do when the query is the entire database aka how does it not suck up a billion strings into memory and pwn the server?

Oren Eini (Ayende Rahien)

unread,
Oct 27, 2014, 12:59:28 PM10/27/14
to ravendb
It streams them into memory based on an index snapshot.

Chris Marisic

unread,
Oct 27, 2014, 1:25:05 PM10/27/14
to rav...@googlegroups.com
With the index snapshot being persisted on disk?

Oren Eini (Ayende Rahien)

unread,
Oct 27, 2014, 4:04:57 PM10/27/14
to ravendb

No, it uses the lucene index snapshot

Kenneth Truyers

unread,
Oct 28, 2014, 8:40:10 AM10/28/14
to rav...@googlegroups.com
I haven't heard a single response to resolve this issue.

What I'm trying to do is not rocket science, nor is it anything uncommon: I'm trying to add a field with a default value to a collection. I come from SQL, in which, this is adding a column to a table and is nearly instant.
Given that Raven is not exactly like SQL, I can get that it doesn't run in a transaction but on a snapshot of the index. 

The problem is that patching takes a long time to execute and it fails in the middle (several times, on different collections, with different kinds of patches).

To troubleshoot the failure, the only possible action I heard here is to "check the log files".
But since Raven is not very smart about patches and indexing (it should stop indexing before the patch and resume immediately after IMO), the indexing process just goes insane and starts indexing continuously which turns the log file into a 6GB file. (the file was empty before running the patch)

If you don't tell me what to look for in the log-file, how can I possibly find out what the reason for the error is, you can't possibly expect anyone to scour 6GB of text? On top of that, why does adding a field to a set of documents fail at all? You should be interested in that, since it obviously is a bug regardless of how we did it and what our db structure is, (I never had an error adding a column to a table in SQL)

So, please, stop beating around the bush saying that everything is working fine, because it most certainly is not. If you care about Raven, you should care about a bug and take the opportunity to get the information you need from me to be able to resolve it. Instead the reply is  "No, it is executing very nicely. The only issue is that we aren't showing indication that it is over in the ui." and "Error at some point from the patch Should be able to tell from the log".  The most useful answers here have been from Raven users, not Raven owners.

Chris Marisic

unread,
Oct 28, 2014, 8:58:10 AM10/28/14
to rav...@googlegroups.com
You can disable indexing yourself if that's what you want to do http://ravendb.net/docs/2.5/studio/tasks

The only reason a patch should stop in the middle is that you have an unhandled javascript error. Since the documents are json it's possible you have an errant document for some reason and expectations you have are not meant such as diving into a nested property that is null and getting an undefined error.

Any time i would write patches I use extremely defensive programming with if checks on nearly every single line to prevent an unhandled exception

Kenneth Truyers

unread,
Oct 28, 2014, 9:02:02 AM10/28/14
to rav...@googlegroups.com
I understand a failure could happen when you have a complicated script, but this is my script:

this.Active = true;

How could this fail?

About the indexing: shouldn't Raven be smart enough to do that by itself? Since you're applying a patch, it knows documents are going to change, so the indexes will be invalid. There's no point in indexing when you know more documents will change.

Oren Eini (Ayende Rahien)

unread,
Oct 28, 2014, 9:59:31 AM10/28/14
to ravendb
Kenneth,
Your problem is that you are using the UI to do so, and we didn't plan the UI to do those long running operation, or report on that very well.
That is why we recommend that such things will be called from your code, where you can get the actual operation status as it runs.

And I would be interested in knowing why it failed, but as I have no information at all to go at here, I can't tell you.

Chris Marisic

unread,
Oct 28, 2014, 10:29:17 AM10/28/14
to rav...@googlegroups.com
If it's that trivial you should be able to easily create an isolated reproduction using http://ravendb.net/docs/samples/raven-tests/createraventests

Kenneth Truyers

unread,
Oct 30, 2014, 10:19:57 AM10/30/14
to rav...@googlegroups.com
Thanks, that is an honest and direct answer.

However, if you offer a feature, it's supposed to work on any type of dataset (I don't see any other DB that says you can't update a collection/table when there's too much data). 
The studio is there for trivial tasks so that you don't actually have to write code for these things. If it's not able to do that, it's broken in my opinion.

Bugs happen, I know, but the problem is that I can't possibly e-mail you a 6GB log file, nor did you tell me what to look for.  

Grisha Kotler

unread,
Oct 30, 2014, 12:13:37 PM10/30/14
to rav...@googlegroups.com
Do you have a way to repro this? I would take a look at it

Hibernating Rhinos Ltd  cid:image001.png@01CF95E2.8ED1B7D0

Grisha Kotler l RavenDB Core Team Developer Mobile: +972-54-586-8647

Reply all
Reply to author
Forward
0 new messages