offline sync

60 views
Skip to first unread message

Aaron Boxer

unread,
Feb 9, 2011, 7:27:52 AM2/9/11
to redi...@googlegroups.com
I had an idea: git is designed for fast, offline synchronization. Now
that Redis is getting diskstore, why not add a git
layer on top of disktore. Every write to Redis would trigger a git
commit. So, you would be trading some performance
for a very powerful synch solution. And, with a lot more work, this
could work for b-tree and in-memory configurations,
so gain back some performance.

Salvatore Sanfilippo

unread,
Feb 9, 2011, 7:31:10 AM2/9/11
to redi...@googlegroups.com
Just to add a bit of contest about all this: I think that simply we
don't have developers bandwidth big enough to solve this problem now.
Cluster and Diskstore are our focus currently... and I think that most
users are more concerned with this than with off line sync :)

So it's a cool idea but not a good fit for Redis at least in the short
(one year) timeframe.

Cheers,
Salvatore

> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>
>

--
Salvatore 'antirez' Sanfilippo
open source developer - VMware

http://invece.org
"We are what we repeatedly do. Excellence, therefore, is not an act,
but a habit." -- Aristotele

Demis Bellot

unread,
Feb 9, 2011, 7:40:52 AM2/9/11
to redi...@googlegroups.com
IMHO what will allow other developers to be able to implement their own custom logic ontop of Redis is the existence of a trigger mechanism. I've tried to explore this feature a while ago at:

Obviously I don't care what the mechanisms look like, only that it supports the use-cases that needs it.

Salvatore, I understand that your bandwidth is very limited atm, but the ability to have a trigger will allow client / redis community developers to better provide enhanced / higher-level functionality.

D

Salvatore Sanfilippo

unread,
Feb 9, 2011, 7:46:20 AM2/9/11
to redi...@googlegroups.com
On Wed, Feb 9, 2011 at 1:40 PM, Demis Bellot <demis....@gmail.com> wrote:
> Salvatore, I understand that your bandwidth is very limited atm, but the
> ability to have a trigger will allow client / redis community developers to
> better provide enhanced / higher-level functionality.

I don't like the idea of triggers for a number of reasons... it's too
stateful for a database IMHO.
But what I like is the idea of publishing changes in the key space via
our built-in Pub/Sub, and that's pretty trivial to accomplish.

We can add this in Redis unstable for sure. The idea is that you can
subscribe to keys for changes via the usual SUBSCRIBE, or even
PSUBSCRIBE. All the changes are PUBLISH-ed as the exact command and
arguments that modified the key.

Additional channels should provide informations about expiring of keys.

Cheers,
Salvatore

Aaron Boxer

unread,
Feb 9, 2011, 7:51:08 AM2/9/11
to redi...@googlegroups.com
I was thinking on working on this myself, being a git fanatic. :)

I certainly don't want to delay the advent of Cluster.

Demis Bellot

unread,
Feb 9, 2011, 7:52:27 AM2/9/11
to redi...@googlegroups.com
Hi Salvatore,

Would this 'subscription' remain after the clients connection is closed? 

IMHO it needs to, in order to ensure no messages are lost.

D


--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.

Salvatore Sanfilippo

unread,
Feb 9, 2011, 8:37:45 AM2/9/11
to redi...@googlegroups.com
On Wed, Feb 9, 2011 at 1:52 PM, Demis Bellot <demis....@gmail.com> wrote:
> Hi Salvatore,
> Would this 'subscription' remain after the clients connection is closed?
> IMHO it needs to, in order to ensure no messages are lost.

Pub/Sub is fire and forget, such a facility should use the Pub/Sub as
it is, so it would not be persistent, and messages can indeed be lost.
I think that if you need a reliable way to do this it's better to use
the AOF as a stream of data, that is guaranteed to have everything
inside, but the AOF is currently of limited use since the file would
continuously enlarged.
But this may be solved once we'll have the AOF that splits itself in parts.

I think that the way to go about it is a bit different, that is, using
MULTI/EXEC to perform the operation and then queue such an operation
into a list, that is then replayed when there is already connection
with the master database. There is no need for this to intercept calls
that are not aware of what is happening under the hoods, since anyway
to make it working well you need help from the application, otherwise
to merge in a meaningless way is more or less impossible.

Demis Bellot

unread,
Feb 9, 2011, 9:09:52 AM2/9/11
to redi...@googlegroups.com
Hi Salvatore,

One of the benefits of Pub/Sub is the loose coupling, which is the reason why it's a popular Enterprise Integration Pattern. i.e. you can publish a message to a key/topic without knowing who/which client is listening to the messages.

If it's a requirement is to pass the burden of maintaining state back to the client (publishing the message) then this defeats the purpose. If there's a chance of loss of messages then it can't be considered reliable and will not support the use-cases that requires this. Looking at the AOF is not going to be a popular solution since it's harder to achieve, requires file system access and does not have a central server handling concurrency and making the stream available.

Not wanting to affect the spirit of redis to achieve this, but having a copy of the messages automatically added to destination LIST/SET IMHO is an elegant solution to the problem.

D




--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.

Salvatore Sanfilippo

unread,
Feb 9, 2011, 9:19:14 AM2/9/11
to redi...@googlegroups.com
On Wed, Feb 9, 2011 at 3:09 PM, Demis Bellot <demis....@gmail.com> wrote:
> Hi Salvatore,
> One of the benefits of Pub/Sub is the loose coupling, which is the reason
> why it's a popular Enterprise Integration Pattern. i.e. you can publish a
> message to a key/topic without knowing who/which client is listening to the
> messages.
> If it's a requirement is to pass the burden of maintaining state back to the
> client (publishing the message) then this defeats the purpose. If there's a
> chance of loss of messages then it can't be considered reliable and will not
> support the use-cases that requires this.

I think it depends on what you want to achieve. Pub/Sub in Redis is
designed to be fire and forget, however you may want to attach a
listener to key state changes in order to perform operations that are
not a problem even if you don't get all the messages.

Btw it's not clear if this is really convenient or not, and how to do
it, a reason why I did not went forward with this so far.

Redis can't work for all the use cases, I think that we can't force
Pub/Sub to be what it is not, nor we can force Redis to be what it is
not trying to implement offline sync as an external tool. This is the
kind of feature that requires a lot of work to get right and that IMHO
can't work very well as an external component.
Especially since it is not a priority, and with the Redis data model
in general having different nodes getting queries for the same key,
and then merging the two versions, is harder then usually, as we have
complex data types.

So in this stage, why trow complexity and possibly open the window to
bad design in order to be in a rush for implementing something that is
to start mostly out of scope?

> Not wanting to affect the spirit of redis to achieve this, but having a copy
> of the messages automatically added to destination LIST/SET IMHO is an
> elegant solution to the problem.

What's wrong with MULTI/EXEC in order to LPUSH the change, if you want
to build such a complex system you can for sure have a wrapper between
Redis and the client.

So when you do: Redis.set("foo","bar") actually the lib does:

MULTI
SET foo bar
LPUSH dataset.changes <json for: op:set key:foo newval:bar>
EXEC

Adding an automatism for this makes Redis more complex without any
real gain I think.

Cheers,
Salvatore

Demis Bellot

unread,
Feb 9, 2011, 9:37:17 AM2/9/11
to redi...@googlegroups.com
Redis can't work for all the use cases, I think that we can't force
Pub/Sub to be what it is not, nor we can force Redis to be what it is
not trying to implement offline sync as an external tool. 

I understand this, although there is no point trying to build anything on the brittle solution of having the client application providing this functionality of replaying the message back into redis, this really should be in the broker/redis-server if at all. Having the ability to register 'permanent subscriptions' really is a benefit to all clients that need this functionality.

Imagine having an existing system already developed and deployed system in production that for whatever reason can't be changed (a characterstic of many production systems). If a new requirement would be that for every new user you would send to send an email, register this entry in an external analytics/reporting database, add to the users search index, etc.

If you had the following available command:

PSUBSCRIBESET    urn:customer:*    unique-modified-customer-keys-set-name

Where even after the client closes the connection the target set/list will get a copy of all the keys that have been modified by other clients. Then this becomes fairly trivial to receive, even if the production system was developed using a different client library, programming platform, etc.

Anyway to summarize, if this is not a good fit for redis than its simply not fit for redis. It's not a blocker for usage just means that it won't be able to support these use-cases.

D

Josiah Carlson

unread,
Feb 9, 2011, 1:55:40 PM2/9/11
to redi...@googlegroups.com
In a word: no.

In more words: Git is about keeping a history of changes to files forever. The only way you lose history is by deleting an unmerged branch, or in forcing a replacement of history (the latter of which is generally considered to be a bad idea). And even then, you have to garbage collect before you get your disk space back.

Git was certainly designed to be fast, in relation to how quickly a distributed group of humans can write and maintain code. It is not fast enough to handle the 10,000 times faster that a remote service like Redis generally requires.

If someone were to want to do syncing of diskstore, a quick zfs/btrfs snapshot + rsync + remove snapshot is the right thing to do.

Regards,
 - Josiah

Aaron Boxer

unread,
Feb 9, 2011, 2:11:06 PM2/9/11
to redi...@googlegroups.com
Thanks, Josiah. Appreciate the feedback.

Form what I understand, because GIT uses compression, the size of an
entire GIT repo,
even with large history, can still be manageable.

I think for offline sync, you can forgo a lot of speed in exchange for
the new features, primarily the ability to resolve
multi-master conflicts.

Here is a quote from Planet CouchDB, refering to the Membase merger:

"At CouchOne we've been focusing on very different problems: mobile,
sync and offline use cases. We make it easy to build applications that
travel with you, allowing you access to your important data no matter
the network conditions. Slow and unreliable connectivity means many
businesses can't rely on the cloud for mission critical apps, all
their data is gone when their network is down. But with Couch powered
apps on your phone, tablet, putting data directly on the machines at
the edge of the network, you have your apps and data with you at all
times and safely backed up to the cloud."


So, IMHO, its worth looking into.

And, sorry for getting so OT on this Redis group !

Cheers,
Aaron

Aaron Boxer

unread,
Feb 9, 2011, 2:13:20 PM2/9/11
to redi...@googlegroups.com
Here's the article link:

http://planet.couchdb.org/

On Wed, Feb 9, 2011 at 1:55 PM, Josiah Carlson <josiah....@gmail.com> wrote:

Josiah Carlson

unread,
Feb 9, 2011, 3:11:11 PM2/9/11
to redi...@googlegroups.com
On Wed, Feb 9, 2011 at 11:11 AM, Aaron Boxer <box...@gmail.com> wrote:
Thanks, Josiah. Appreciate the feedback.

Form what I understand, because GIT uses compression, the size of an
entire GIT repo,
even with large history, can still be manageable.

Compression is on a per-file basis. Every data file with unique content will have it's own hash, and thusly it's own file on disk. For small keys (probably a lot of keys in Redis), compression buys you little. Redis already compresses large keys, so re-compressing them with gzip (as git does) doesn't help anything.

I think for offline sync, you can forgo a lot of speed in exchange for
the new features, primarily the ability to resolve
multi-master conflicts.

... multi-master conflicts, like any k-way merge, are very difficult problems, and not uncommonly have no 100% correct algorithmically-defined resolution method.
 
Here is a quote from Planet CouchDB, refering to the Membase merger:

"At CouchOne we've been focusing on very different problems: mobile,
sync and offline use cases. We make it easy to build applications that
travel with you, allowing you access to your important data no matter
the network conditions. Slow and unreliable connectivity means many
businesses can't rely on the cloud for mission critical apps, all
their data is gone when their network is down. But with Couch powered
apps on your phone, tablet, putting data directly on the machines at
the edge of the network, you have your apps and data with you at all
times and safely backed up to the cloud."

I understand the desire and purpose of offline data access. I also understand many of the issues involved with syncing such data. I'm not saying in this thread that Redis shouldn't do offline/online resyncing.

What I'm saying is that your proposed solution of using Git as a method of resolving the k-way merge problem is not a good one, for technical reasons based on the way git was designed and built. Git was primarily meant as a source control system. It's ability to handle binary files is very useful, but not it's primary use-case, and it's handling of k-way merge on binary files is effectively nonexistent.

So, IMHO, its worth looking into.

Offline sync: yes.
Using Git for offline Redis sync: no.

But don't take my word for it, you can (in)validate my claims by writing a 20-line script to inject data into files, commit those files, etc., and report on how fast Git works, and how much disk your path (with .git hidden path) takes up.

Regards,
 - Josiah

Aaron Boxer

unread,
Feb 9, 2011, 3:37:41 PM2/9/11
to redi...@googlegroups.com
Thanks, Josiah. I am sure you're right; I seem to be suffering from
"When all you have is a hammer" syndrome :)

So, how does zfs snapshot + rsync handle a key changing in both
offline node and server node, once
offline node comes back online?

Josiah Carlson

unread,
Feb 9, 2011, 6:31:27 PM2/9/11
to redi...@googlegroups.com
It handles the one-way sync problem in that at least large keys can't change content while you are examining them. It's useful for offline read-only access (when the offline client connects, you sync as I describe), but the k-way sync problem is still hard/unsolved.

 - Josiah

Aaron Boxer

unread,
Feb 10, 2011, 10:25:03 AM2/10/11
to redi...@googlegroups.com
Thanks.
Reply all
Reply to author
Forward
0 new messages