replication between/from MySQL and Redis

3,822 views
Skip to first unread message

Lin Ma

unread,
Apr 16, 2015, 5:37:03 PM4/16/15
to redi...@googlegroups.com
Hello Redis masters,

Are there any existing code reference to replicate MySQL to/from Redis?

regards,
Lin

Josiah Carlson

unread,
Apr 16, 2015, 7:06:30 PM4/16/15
to redi...@googlegroups.com
There was one poster that was replicating MySQL -> Redis via the binary replication stream, but I am sure that is proprietary.

What are your goals in doing this?

 - Josiah



--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

Lin Ma

unread,
Apr 16, 2015, 7:27:04 PM4/16/15
to redi...@googlegroups.com
Thanks Josiah,

I want to use Redis as a serving side storage, and use MySQL as single source of truth of data (which spread to Redis). For example, add a message to push to mobile clients into MySQL and then replicate the message to all Redis DB of serving boxes.

regards,
Lin

Josiah Carlson

unread,
Apr 16, 2015, 7:45:11 PM4/16/15
to redi...@googlegroups.com
What I have been doing for about 5 years is to use post-commit hooks in my application-level ORM, which handles the DB -> Redis writes. This works best when you are okay with 1-5 second delays between writing to your DB and when it hits Redis (I always design to allow for multi-second cache update delays). If you require lower latency than 1-5 seconds, then you have a few other options:
1. consume the replication stream (from Mysql) and transform that into something that Redis understands
2. a) acquire lock, b) write to Mysql, c) write to Redis, d) release lock
3. switch to Postgres and use one of the Redis foreign data wrappers to write directly to Redis from Postgres
4. switch to Postgres and use one of the embedded languages + a generic Redis client to write directly to Redis from Postgres

 - Josiah

Lin Ma

unread,
Apr 16, 2015, 8:24:28 PM4/16/15
to redi...@googlegroups.com
Hi Josiah,

Longer than 5 seconds is fine. Two more questions,

1. What do you mean "post-commit hooks"?
2. How do you handle replication from Redis to MySQL?

regards,
Lin

Josiah Carlson

unread,
Apr 16, 2015, 9:03:11 PM4/16/15
to redi...@googlegroups.com
If you're okay with high delays between writes to Mysql getting to Redis, then you've got a lot of options.

1. What do you mean "post-commit hooks"?

A post-commit hook (or similarly a post-insert, post-update, post-delete, or all pre- variants of the same) lets you attach a callback function to perform any before/after commit/insert/update/delete operations that are necessary for your application to function. In a commercial product I'm building right now, I use a post-commit callback to create a task that causes data to be copied from my "source of truth" Postgres to Redis. I also use several other pre- and post-commit callbacks to clear out cached summary data, compute data deltas (for changelog-like interfaces), and sometimes even induce pre-caching of summary and analytics data after model changes. I use Python + SQLAlchemy to make this happen (event.listens_for(..., 'before_insert') as an example). I would hope/expect other languages/libraries to have similar features somewhere, though perhaps not all languages and libraries.

Note that commit hooks in the client language are not perfect, but can solve these problems pretty well. A trigger on the relational database side of things would be better, if your relational database can handle it (this is one of the big reasons why I advocate for Postgres). Some other languages refer to these "hooks" as callbacks, signals, messages, notifications, etc. Incidentally, git uses the "hook" terminology for its pre/post *code* commit shell scripts, which are very similar in concept, if radically different in implementation.

2. How do you handle replication from Redis to MySQL?

Whenever possible, I write to my relational database first, then copy data to Redis as necessary. So far I've only seen a few situations where it didn't make sense to write to the relational database first - primarily during peak load, where Redis is basically just being used as a buffer for data on its way into the DB. The proper solution there is not Redis, it's actually better hardware, so even in those cases my advice is still, "write to your relational database first".


 - Josiah

Lin Ma

unread,
Apr 16, 2015, 9:35:50 PM4/16/15
to redi...@googlegroups.com
Thanks Josiah for your advice,

Your design looks pretty solid and fancy! And good to learn. Like the thoughts of always write to MySQL.

regards,
Lin

Dvir Volk

unread,
Apr 17, 2015, 3:07:41 AM4/17/15
to redi...@googlegroups.com
I've been doing MySQL to redis proprietary replication for a few years now, using an ORM that can handle both databases, and a listener on the MySQL replication channel.

While it works, I'm moving away from it now, and I think a read-through approach is more solid, and requires less moving parts. 

Dvir Volk

unread,
Apr 17, 2015, 3:08:22 AM4/17/15
to redi...@googlegroups.com
Adding to that - you should still use MySQL replication to invalidate caches, but actually migrating everything to redis in advance has proven to be cumbersome.

Lin Ma

unread,
Apr 17, 2015, 3:45:12 AM4/17/15
to redi...@googlegroups.com
Thanks Dvir,

What do you mean  read-through approach? And ORM is short for?

regards,
Lin

Dvir Volk

unread,
Apr 17, 2015, 6:20:20 AM4/17/15
to redi...@googlegroups.com
What I mean is that instead of replicating objects from mysql to redis directly, your app's logic should be:
read from redis -> if found, good. if not found - read from mysql and write to redis -> return data to user
that way you just need to delete the redis records following mysql's log, and there isn't a full copy of the sql data in redis all the time.

ORM stands for Object Relational Mapping, or an abstraction layer that abstracts the actual database and maps the records directly to objects.
I've built such a layer that can read and write objects from both MySQL and redis, so "replication" is just read from mysql -> write to redis, and can be fully automatic. 

Lin Ma

unread,
Apr 17, 2015, 5:25:16 PM4/17/15
to redi...@googlegroups.com
Thanks for all the advice Dvir!

regards,
Lin

Abhishek Sharma

unread,
Apr 17, 2015, 8:40:52 PM4/17/15
to redi...@googlegroups.com
Dvir - Is your performance similar to what Josiah Carlson has mentioned. 1-5 seconds of delay before SQL update hits Redis ? Or is following the MySQL log giving you better performance ?
I am just very curios if there is faster way while using MySQL.

Dvir Volk

unread,
Apr 18, 2015, 4:15:37 PM4/18/15
to redi...@googlegroups.com
IIRC it's much faster than this as long as you keep up with the write change.


On Sat, Apr 18, 2015 at 3:40 AM Abhishek Sharma <abhish...@gmail.com> wrote:
Dvir - Is your performance similar to what Josiah Carlson has mentioned. 1-5 seconds of delay before SQL update hits Redis ? Or is following the MySQL log giving you better performance ?
I am just very curios if there is faster way while using MySQL.

Josiah Carlson

unread,
Apr 19, 2015, 1:44:08 AM4/19/15
to redi...@googlegroups.com
I expect 99% of requests will complete in 30-60ms, but bad things happen to good databases. Networks get saturated. VMs have noisy neighbor problems sometimes. Hardware can partially fail. And sometimes you can just get a bunch of load out of a promotion, something went viral, ...

 - Josiah

Lin Ma

unread,
Apr 19, 2015, 2:48:24 AM4/19/15
to redi...@googlegroups.com
Hi Josiah,

What do you mean "noisy neighbor problems"?

regards,
Lin

Dvir Volk

unread,
Apr 19, 2015, 11:56:43 AM4/19/15
to redi...@googlegroups.com
In my case BTW this is used across data-centers.
There is no cross-DC redis replication, just MySQL replication, and the redis cache invalidation happens in each DC.
Also, we have 2 MySQL slaves in the remote DC chained to each other, and we want to expire redis keys only when a transaction happened on the last MySQL slave, or reloading the cache might read an out-of-date value.

The replication chain is:
[MySQL master (DC1)] ==WAN==> 
[MySQL slave #1 (DC2)] ==LAN==>
[MySQL slave #2 (DC2)]==LAN==>
["Pseudo slave" cache invalidator] ---> redis

This can amount to a second or two if the network is busy, plus every transaction is travelling halfway across the world :)

Lin Ma

unread,
Apr 19, 2015, 7:14:47 PM4/19/15
to redi...@googlegroups.com
Hi Dvir,

WAN and LAN after MySQL master/slave means? Wondering why LAN only follows MySQL slave? Thanks.

regards,
Lin

Dvir Volk

unread,
Apr 20, 2015, 3:03:29 AM4/20/15
to redi...@googlegroups.com

Wan means replicating between datacenters in different locations. LAN is replication inside the same datacenter.

Lin Ma

unread,
Apr 20, 2015, 4:20:24 AM4/20/15
to redi...@googlegroups.com
Thanks Dvir,

For "[MySQL slave #1 (DC2)] ==LAN==>", it means replication inside the same Data Center, from MySQL slave #1, to where? Thanks.

regards,
Lin

Dvir Volk

unread,
Apr 20, 2015, 4:44:01 AM4/20/15
to redi...@googlegroups.com
To a special service that acts as a MySQL slave but in fact is the invalidation service. It follows the replication log, and invalidates (or it can do other actions like notify apps that use internal caching and not redis) the keys as they are updated on the slave.

It's written using this python library:

Basically it has a configuration of callbacks per table, and in a sense turns the MySQL replication log into a PubSub message bus :)

Abhishek Sharma

unread,
Apr 20, 2015, 3:14:40 PM4/20/15
to redi...@googlegroups.com, dv...@everything.me
@Dvir and @Joshia - Thanks for the useful insights into your post-commit hooks and invalidation mechanism. 

Josiah Carlson

unread,
Apr 20, 2015, 5:49:31 PM4/20/15
to redi...@googlegroups.com
When hosting your software in a virtualized environment (VMWare, XEN, etc., in AWS, Rackspace, etc.), other users heavily using resources on the same physical machine (and even some other resources) can reduce the performance you experience measurably.

 - Josiah

Lin Ma

unread,
Apr 21, 2015, 3:08:14 AM4/21/15
to redi...@googlegroups.com
Thanks Josiah,

The virtualization H/W themselves does not provide multi-tenant isolation to avoid interfere with each other? How about OpenStack?

regards,
Lin

Lin Ma

unread,
Apr 21, 2015, 3:10:07 AM4/21/15
to redi...@googlegroups.com
Thanks Dvir,

For the invalidation service, I think the logic should be quite complicated? It needs to have logics to decide which record is changed, and whom to notify invalid?

regards,
Lin

Dvir Volk

unread,
Apr 21, 2015, 4:24:40 AM4/21/15
to redi...@googlegroups.com
Not really.
The replication log gives the schema, table, primary id and all changed fields in the transaction. So you just need to deduce the cache key for that.
If you're caching complicated result sets it gets much more complicated, but as long as it's row-level caching (e.g. articles, user profiles, etc) -  it's dead simple. 

Josiah Carlson

unread,
Apr 21, 2015, 12:09:14 PM4/21/15
to redi...@googlegroups.com
Some isolation is possible, but the only guaranteed 100% isolation is separate hardware, which defeats the purpose of virtualization. I don't know enough about OpenStack to say with certainty, but I'm not aware of anything that makes OpenStack more able to isolate resources in virtualization compared to other solutions.

As a general rule, anything that is shared may be difficult to isolate. As an example, when you allow virtualization down to the individual CPU level, you can run into memory bandwidth and CPU cache isolation concerns for some workloads.

 - Josiah

Reply all
Reply to author
Forward
0 new messages