ReadRepair for conflicting VectorClocks

37 views
Skip to first unread message

Nidhin

unread,
Jan 27, 2015, 5:48:50 PM1/27/15
to project-...@googlegroups.com
Hi,

I am trying to understand how the ReadRepair works.

I assume, ReadRepair uses ReadRepairer to generate the list of NodeValues that need to be updated or repaired.

Can someone please let me know how the following scenario would get resolved?
Consider I have two nodes with the following NodeValues:
[NodeValue(id=1, key=1, versioned= [1, version(1:2) ts:1422398249270]), NodeValue(id=2, key=1, versioned= [1, version(1:1, 2:1) ts:1422398252594])]

After readRepairer, output would be as as follows for NodeValues:
[NodeValue(id=1, key=1, versioned= [1, version(1:1, 2:1) ts:1422398252594]), NodeValue(id=2, key=1, versioned= [1, version(1:2) ts:1422398249270])]


Now, ReadRepair would send two PUT requests:
  1. To Node_1 with the version and value details of Node_2
  2. To Node_2 with the version and value details of Node_1
Is this correct?

Thanks & Regards,
-Nidhin



Chinmay Soman

unread,
Jan 28, 2015, 1:01:40 PM1/28/15
to project-...@googlegroups.com
Hey Nidhin,

That sounds right. Essentially - the ReadRepairer generates a set of all conflicting versions (if any). Then it writes only the missing versioned values to the corresponding nodes. 

FYI: This code was horribly broken before. So make sure you're on the latest stable version. To make sure - see your git log and search for this commit:

be4dbc00921474334371266ff18856dc9ec84bee

Hope this helps !

Nidhin

unread,
Jan 28, 2015, 1:23:11 PM1/28/15
to project-...@googlegroups.com
Hi Chinmay,

Thanks for the update.

But, in the scenario I mentioned, are we not corrupting the data instead of repair.
Due to this two conflicting versions, Node1 is getting updated with Node2 value and 
Node2 is getting updated with Node1 value. Data is only getting swapped.

Regards,
-Nidhin

Chinmay Soman

unread,
Jan 28, 2015, 1:57:15 PM1/28/15
to project-...@googlegroups.com
No - you're not corrupting the data. All the ReadRepairer does is make sure all the nodes see the same set of Versioned values for a given key. It is upto the client to resolve inconsistency (if any). 

You could argue that ReadRepairer itself could resolve the inconsistency. But this becomes complicated especially when there's a custom inconsistency resolver in play. 

But maybe I'm missing something - can you elaborate what else you expect the system to do ?


On Tuesday, January 27, 2015 at 2:48:50 PM UTC-8, Nidhin wrote:

Arunachalam

unread,
Jan 28, 2015, 2:36:02 PM1/28/15
to project-...@googlegroups.com
Nidhin,
      To add on to Chinmay's point, when there is a version conflict you need a resolver. The resolver takes in more than one version of the data and returns a new version. The application can implement the resolver, voldemort by default provides few resolvers ( time stamp based). Once the version is resolved, the client writes the resolved value back to the nodes.

Thanks,
Arun.

--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldem...@googlegroups.com.
Visit this group at http://groups.google.com/group/project-voldemort.
For more options, visit https://groups.google.com/d/optout.

Nidhin

unread,
Jan 28, 2015, 5:06:15 PM1/28/15
to project-...@googlegroups.com
I got the point.

But, here ReadRepair is using the similar logic as VectorClockInconsistencyResolver to repair the nodes. This repair job would not be complete if there are conflicting versions in different nodes. After the repair also, there would be some nodes whose data differs if there were conflicting versions. 

Is n't it better if ReadRepair uses a combination of logic from VectorClockInconsistencyResolver and TimeBasedInconsistencyResolver, to repair node values? 
I think that is the default behavior provided by the DefaultStoreClient if we don't specify any secondary resolver.
This would make sure that after a ReadRepair, all the nodes would have the same data.

Please let me know your inputs.

Thanks,
-Nidhin

Arunachalam

unread,
Jan 28, 2015, 8:23:49 PM1/28/15
to project-...@googlegroups.com
I don't get your question completely, But Voldemort is an eventual consistent system. If we still end up with conflicting versions, the next read will try to fix when it sees the conflicting versions and it goes on.

Most of the system provided ones are for the experimental purposes, if you want a specific resolver you can implement one and you can use that to create the store client.

Thanks,
Arun.

Nidhin

unread,
Jan 29, 2015, 6:17:38 PM1/29/15
to project-...@googlegroups.com
Thanks Arun and Chinmay for your inputs,

I understood now. 

Since duplicate entries for a key is supported in Database, conflicting versions are stored in the Database for a given key during a ReadRepair.
During the get call, InconsistencyResolver, resolves this conflict at Client side. 
For any future PUT operation, this multiple version would get resolved since Client does a GET operation before each PUT request to know the Version details.

Regards,
-Nidhin
Reply all
Reply to author
Forward
0 new messages