Revisions in 4.0 - immutability, revision numbers, explicit/implicit versioning

51 views
Skip to first unread message

Bruno Lopes

unread,
Jan 16, 2018, 1:04:24 PM1/16/18
to RavenDB - 2nd generation document database
So, I'm now at the point where I'm really looking at how to move the parts of our app that use revisions.

I've got a couple of questions.

- Are revisions truly immutable?

From the ravendb book, at https://github.com/ravendb/book/blob/v4.0/Ch04/Ch04.md#document-revisions, it's mentioned that "Because revisions are immutable, it isn’t possible to run migration on them, and you need to take that into account. When working with revisions, you might want to consider working with the raw document, rather than turning it into an instance of an object in your model."

This is a really harsh constraint if what I want is to keep versioning that's not as strict as "regulatory". 
We prefer to migrate data than to have code to handle all different entity versions for all eternity.

If it's not that strict, then I think the sentence on the book needs to be rewritten to something like "consider revisions as immutable, and we advise to work with the raw document. If you prefer to migrate old revisions, do X, but remember that's a destructive operation and might lose data which should be immutable"

In 3.5 we had a helper in migrations which disabled the read-only flag on revisions, changed them, and re-set the flag. It's only used on migrations.
We might also need to disable revisions during patches , since we might use patches to migrate data.

(Small aside: we might look into just using RQL for migrations with the UPDATE clause, since that brings immutability by definition to migration scripts, but that's unrelated to revisions)

Some other notes:

- there's no longer a "Raven-Document-Revision". 

We used this to reference "reference data" like "this blogpost uses revision X of the blog post form", and to have a simple ordinal version number to show the end user (this is revision X of the document). I'm going to use blogposts here just a stand-in for our entities.

For reference data, it means that we have fields like "FormId:BlogPostForms/42/revision/2" and ids like "user/28/flag/BlogPost/27/revision/2". 
We use it to always load the form which the post was edited in, and to be able to do Load instead of Query when checking some user actions (like, has this user reported this revision of the blogpost).

If we want to use the whole vector, we might need to always encode it before adding to the id, and decode when loading the revision, since the vector can include / (which is the default separator for ids):
- A:306-rwmS7Sk9ukaK8J/OOc2tiw
- A:367-OCGd+6C94Eu2Ngj0VZnJFg

The other option would be to take care to only append the vector at the end of the id (meaning, no BlogPost/24/A:367-OCGd+6C94Eu2Ngj0VZnJFg/flags).

For the same functionality for references, we'd use the change vector instead of the revision number? Could we just use the node-name + numeric part instead of the whole vector with (what I read as) the database id ?
And if we want to have a simple "revision number" it's on us to implement?

- There's no way to explicitly version docs.

Some changes are not really important enough to create new versions, so for some entities we just versioned "major changes". 
This is gone for good?

Oren Eini (Ayende Rahien)

unread,
Jan 16, 2018, 1:50:11 PM1/16/18
to ravendb
inline

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


On Tue, Jan 16, 2018 at 8:04 PM, Bruno Lopes <bruno...@gmail.com> wrote:
So, I'm now at the point where I'm really looking at how to move the parts of our app that use revisions.

I've got a couple of questions.

- Are revisions truly immutable?

From the ravendb book, at https://github.com/ravendb/book/blob/v4.0/Ch04/Ch04.md#document-revisions, it's mentioned that "Because revisions are immutable, it isn’t possible to run migration on them, and you need to take that into account. When working with revisions, you might want to consider working with the raw document, rather than turning it into an instance of an object in your model."

This is a really harsh constraint if what I want is to keep versioning that's not as strict as "regulatory". 
We prefer to migrate data than to have code to handle all different entity versions for all eternity.

There is no way to run a migration on revisions. What you can do is do export / import and that give you the option to run a script to do that.
 

If it's not that strict, then I think the sentence on the book needs to be rewritten to something like "consider revisions as immutable, and we advise to work with the raw document. If you prefer to migrate old revisions, do X, but remember that's a destructive operation and might lose data which should be immutable"

In 3.5 we had a helper in migrations which disabled the read-only flag on revisions, changed them, and re-set the flag. It's only used on migrations.
We might also need to disable revisions during patches , since we might use patches to migrate data.


As this is not done via a bundle, that wouldn't work.
The problem is that if you change the revision, you'll change its change vector, and thus lose the revision itself.
 
(Small aside: we might look into just using RQL for migrations with the UPDATE clause, since that brings immutability by definition to migration scripts, but that's unrelated to revisions)

Some other notes:

- there's no longer a "Raven-Document-Revision". 

We used this to reference "reference data" like "this blogpost uses revision X of the blog post form", and to have a simple ordinal version number to show the end user (this is revision X of the document). I'm going to use blogposts here just a stand-in for our entities.

You can always load a revision by it's change vector. That uniquely identify it in the cluster.
 

For reference data, it means that we have fields like "FormId:BlogPostForms/42/revision/2" and ids like "user/28/flag/BlogPost/27/revision/2". 
We use it to always load the form which the post was edited in, and to be able to do Load instead of Query when checking some user actions (like, has this user reported this revision of the blogpost).

If we want to use the whole vector, we might need to always encode it before adding to the id, and decode when loading the revision, since the vector can include / (which is the default separator for ids):
- A:306-rwmS7Sk9ukaK8J/OOc2tiw
- A:367-OCGd+6C94Eu2Ngj0VZnJFg


You would need the whole vector, but note that you don't actually need the id there. Just the change vector is enough.
 
The other option would be to take care to only append the vector at the end of the id (meaning, no BlogPost/24/A:367-OCGd+6C94Eu2Ngj0VZnJFg/flags).

For the same functionality for references, we'd use the change vector instead of the revision number? Could we just use the node-name + numeric part instead of the whole vector with (what I read as) the database id ?

No, in certain topologies, you may have multiple nodes with the same tag (cross cluster communication, mostly)
 
And if we want to have a simple "revision number" it's on us to implement?

Yes
 
- There's no way to explicitly version docs.

Some changes are not really important enough to create new versions, so for some entities we just versioned "major changes". 

Yes
 
This is gone for good?

We are still open to (post RTM) changes, and we should probably open some issues to discuss these features.
 

--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Bruno Lopes

unread,
Jan 17, 2018, 7:20:13 AM1/17/18
to ravendb
First of all, I just want to say thanks for the support and putting up with all these emails and requests.
I appreciate you guys taking the time to read and answer both the questions and the ocasional rant :)

(inline)

On Tue, Jan 16, 2018 at 6:49 PM, Oren Eini (Ayende Rahien) <aye...@ayende.com> wrote:
inline

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


On Tue, Jan 16, 2018 at 8:04 PM, Bruno Lopes <bruno...@gmail.com> wrote:
So, I'm now at the point where I'm really looking at how to move the parts of our app that use revisions.

I've got a couple of questions.

- Are revisions truly immutable?

From the ravendb book, at https://github.com/ravendb/book/blob/v4.0/Ch04/Ch04.md#document-revisions, it's mentioned that "Because revisions are immutable, it isn’t possible to run migration on them, and you need to take that into account. When working with revisions, you might want to consider working with the raw document, rather than turning it into an instance of an object in your model."

This is a really harsh constraint if what I want is to keep versioning that's not as strict as "regulatory". 
We prefer to migrate data than to have code to handle all different entity versions for all eternity.

There is no way to run a migration on revisions. What you can do is do export / import and that give you the option to run a script to do that.
 

If it's not that strict, then I think the sentence on the book needs to be rewritten to something like "consider revisions as immutable, and we advise to work with the raw document. If you prefer to migrate old revisions, do X, but remember that's a destructive operation and might lose data which should be immutable"

In 3.5 we had a helper in migrations which disabled the read-only flag on revisions, changed them, and re-set the flag. It's only used on migrations.
We might also need to disable revisions during patches , since we might use patches to migrate data.


As this is not done via a bundle, that wouldn't work.
The problem is that if you change the revision, you'll change its change vector, and thus lose the revision itself.

Eeekk. Okay, that makes sense due to the implementation, but I'm really not comfortable with the implication for our usecase. 
 
 
(Small aside: we might look into just using RQL for migrations with the UPDATE clause, since that brings immutability by definition to migration scripts, but that's unrelated to revisions)

Some other notes:

- there's no longer a "Raven-Document-Revision". 

We used this to reference "reference data" like "this blogpost uses revision X of the blog post form", and to have a simple ordinal version number to show the end user (this is revision X of the document). I'm going to use blogposts here just a stand-in for our entities.

You can always load a revision by it's change vector. That uniquely identify it in the cluster. 
 

For reference data, it means that we have fields like "FormId:BlogPostForms/42/revision/2" and ids like "user/28/flag/BlogPost/27/revision/2". 
We use it to always load the form which the post was edited in, and to be able to do Load instead of Query when checking some user actions (like, has this user reported this revision of the blogpost).

If we want to use the whole vector, we might need to always encode it before adding to the id, and decode when loading the revision, since the vector can include / (which is the default separator for ids):
- A:306-rwmS7Sk9ukaK8J/OOc2tiw
- A:367-OCGd+6C94Eu2Ngj0VZnJFg


You would need the whole vector, but note that you don't actually need the id there. Just the change vector is enough.

Okay, so the vector works as an "alternate id" for a document, right?

 
 
The other option would be to take care to only append the vector at the end of the id (meaning, no BlogPost/24/A:367-OCGd+6C94Eu2Ngj0VZnJFg/flags).

For the same functionality for references, we'd use the change vector instead of the revision number? Could we just use the node-name + numeric part instead of the whole vector with (what I read as) the database id ?

No, in certain topologies, you may have multiple nodes with the same tag (cross cluster communication, mostly)
 
And if we want to have a simple "revision number" it's on us to implement?

Yes
 
- There's no way to explicitly version docs.

Some changes are not really important enough to create new versions, so for some entities we just versioned "major changes". 

Yes
 
This is gone for good?

We are still open to (post RTM) changes, and we should probably open some issues to discuss these features.

Okay. I think we'll have to implement our version of revisions, mostly due to the hard immutability rules around them.

- I want to be able to migrate old revisions (so docs should be soft "immutable")
- I want a "revision number" for a given document
- I'd like to be able to refer to a revision by an id that's easily composable
- I'd like to be able to explicitly version documents

Spitballing, we might be able to do this client-side with a store listener:

- On store, if a doc has changed:
  - bump revision
  - copy it into another document with an id composed with the revision
  - mark the other document as read-only (from what I can gather there's no equivalent to 3.5's Read-Only metadata flag?)
  - change the other document's collection to something like BlogPost@Revision

This would probably have to be implemented on the 3.5 version, so that we migrate everything to "our-revision-system", and then move it to 4.0.


-- 
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ravendb/b7jAvXgofi8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ravendb+unsubscribe@googlegroups.com.

Oren Eini (Ayende Rahien)

unread,
Jan 17, 2018, 7:41:19 AM1/17/18
to ravendb
inline

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


On Wed, Jan 17, 2018 at 2:19 PM, Bruno Lopes <bruno...@gmail.com> wrote:
First of all, I just want to say thanks for the support and putting up with all these emails and requests.
I appreciate you guys taking the time to read and answer both the questions and the ocasional rant :)

I think that talking about this give both us and the people reading this a lot of insight. Into how the product work and how it is actually being used.

You would need the whole vector, but note that you don't actually need the id there. Just the change vector is enough.
Okay, so the vector works as an "alternate id" for a document, right?

Yes


We are still open to (post RTM) changes, and we should probably open some issues to discuss these features.
Okay. I think we'll have to implement our version of revisions, mostly due to the hard immutability rules around them.
- I want to be able to migrate old revisions (so docs should be soft "immutable")
- I want a "revision number" for a given document

Note that this is really hard to do in a cluster. What happen if you have concurrent updates on different nodes?
 



Bruno Lopes

unread,
Jan 17, 2018, 7:48:43 AM1/17/18
to ravendb
We'd get conflicts? Would this be different than updating a document on two different nodes?

Not having a revision number will be very odd from an end user perspective.
I think we'd be willing to sacrifice availability and/or a bit of performance on create/update to get this number.

Oren Eini (Ayende Rahien)

unread,
Jan 17, 2018, 7:57:53 AM1/17/18
to ravendb
Yes, you might get a conflict, just like regular documents. But how would you solve a revisions conflict in a consistent manner?

--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.

Oren Eini (Ayende Rahien)

unread,
Jan 17, 2018, 7:58:15 AM1/17/18
to ravendb
If you are willing to sacrifice availability, then use the identity feature, which will ensure no conflicts on the revision number.

--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.

Bruno Lopes

unread,
Jan 19, 2018, 3:59:44 AM1/19/18
to ravendb
My expectation would be that conflicts would be so rare as to manually resolved in our case, if we ever jump to a clustered deploy.
I'll look into identity, thanks.

Reply all
Reply to author
Forward
0 new messages