Migrating a store onto a new cluster

7 views
Skip to first unread message

Tejus Parikh

unread,
Mar 12, 2010, 9:13:51 AM3/12/10
to project-voldemort
There doesn't appear to be an automatic way to handle this, so we
wrote a little java utility. Our strategy was the following ( I can
post real code if desired):

foreach node in nodes
keys = adminClient.allKeys(node, partitions)

foreach (20 key chunk)
documents = client.getAll(keys)
foreach doc in documents
newStoreClient.put(key, doc.getValue())
end
end
end

This process is multithreaded, with thread reading from one node. It
all works and our data did migrate, but our logs were full of
ObsoleteVersionExceptions.

My two questions are, is this the best way to migrate data from one
cluster to another? The other is what is causing the
ObsoleteVersionException on the write into the new store?

I've looked through the wiki, list and code and I can't figure out the
sequence that would cause these errors.

We're running Voldemort 80.0 backed by BDB, with Sun's 1.6 jvm on
linux machines.

Thanks.

ijuma

unread,
Mar 13, 2010, 9:36:37 AM3/13/10
to project-voldemort
On Mar 12, 2:13 pm, Tejus Parikh <tejus.par...@gmail.com> wrote:
> There doesn't appear to be an automatic way to handle this, so we
> wrote a little java utility. Our strategy was the following ( I can
> post real code if desired):
>
> foreach node in nodes
>   keys = adminClient.allKeys(node, partitions)
>
>   foreach (20 key chunk)
>         documents = client.getAll(keys)
>         foreach doc in documents
>           newStoreClient.put(key, doc.getValue())
>         end
>   end
> end
>
> This process is multithreaded, with thread reading from one node.  It
> all works and our data did migrate, but our logs were full of
> ObsoleteVersionExceptions.

One possibility is that you stored the same key/value concurrently
because you are fetching from multiple nodes concurrently. The best
strategy depends on what you want to optimise for. For example, your
current approach means that some values will be fetched multiple
times. A way to avoid that is to get the keys from all the nodes and
to aggregate them in a set, but the naive way may be problematic if
all the keys for a given store are bigger than the memory of the
machine performing the migration.

Best,
Ismael

Tejus Parikh

unread,
Mar 16, 2010, 3:34:20 PM3/16/10
to project-voldemort
Thanks for the response.

I'm not entirely sure I understand nodes and partitioning then. If I'm
only pulling back the keys from the partitions configured on that
node, is there still the possibility of pulling duplicate keys?

Rob Adams

unread,
Mar 17, 2010, 4:02:16 PM3/17/10
to project-...@googlegroups.com
Also make sure you're not passing a versionedvalue to the new store.  Pull out the "real" underlying value and send it to the new store stripped of the version information.  Otherwise the new store is going to get very confused by the version info in the old store.

--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
To post to this group, send email to project-...@googlegroups.com.
To unsubscribe from this group, send email to project-voldem...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/project-voldemort?hl=en.


Jay Kreps

unread,
Mar 17, 2010, 8:46:04 PM3/17/10
to project-...@googlegroups.com
I think the problem is that there are multiple versions stored on
different machines. When you write the first time it works (after
fetching from the first replica), when you write the second time
(after fetching from the second replica) it has the same version and
so it is rejected as being obsolete (e.g. another write with the same
version instead of a higher version). This should be fine since it was
written the first time. A more efficient migration tool would check
the router and make sure that a put() only occurs when fetching from
the first node in the replication list. An inefficient implementation
would probably just ignore the problem.

-Jay

Tejus Parikh

unread,
Mar 23, 2010, 9:56:14 AM3/23/10
to project-...@googlegroups.com
The multiple versions on different machines makes sense. Since this was a one time deal we just ignored it. In case we need to do it again, i'll keep this in mind.

Thanks.
--
Tejus

Jay Kreps

unread,
Mar 25, 2010, 2:06:05 AM3/25/10
to project-...@googlegroups.com
Hi Tejus,

If you get something working and reusable would you be willing to
contribute it back?

-Jay

Reply all
Reply to author
Forward
0 new messages