What are the different approaches used for database synchronization and replication?

14 views

Skip to first unread message

perv...@gmail.com

unread,

Jan 2, 2009, 7:34:27 PM1/2/09

to pervasync

In database replication schemes, physical transactions on each node
are recorded and played back on all the other nodes. This technique
would only work if each node has a replica of the full-set data.

There is also a stability issue with physical transaction based
replications when the number of nodes goes up. Transactions on
different replicas could conflict with each other. To handle this,
normally cross system locking or complicated conflict resolution
schemes are needed. In fact, they are used in eager replication and
lazy replication respectively.

Eager replication synchronously updates all replicas as part of one
atomic transaction. This is also called synchronous replication or
pessimistic replication as it incurs global locking and waiting.

In contrast to eager replication, lazy replication allows updates of
any replicas without locking others. The local transactions then
propagate to other replicas in background. This scheme is also called
asynchronous replication or optimistic replication since it is based
on the assumption that conflicts will occur rarely. Here each host
must apply transactions generated by the rest of the hosts. Conflicts
must be detected and resolved. Since the transactions are final at
their origin nodes, usually manual or complicated ad hoc conflict
resolutions are required at the destination nodes.

Researchers have shown that the traditional transactional replication
has unstable behaviors as the workload scales up: a ten-fold increase
in node and traffic gives a thousand fold increase in deadlocks or
reconciliations/conflict resolutions. A system that performs well on a
few nodes may become unstable as the system scales up to even a dozen
of nodes. To alleviate this problem, master-slave schemes are created
where writes are only allowed on the master server and the slave
servers are made read-only.

The traditional database replication schemes are clearly not suited
for database synchronization, which involves hundreds or even
thousands of nodes in one system. The traditional replication systems
are symmetric and each node contains a full-set data, while a
synchronization system is asymmetric and the client nodes contain sub-
sets of the data on central server. The asymmetry, together with the
instability introduced with large number of nodes calls for a
different approach than propagating physical transactions to all
nodes, as used in replication.

The key to a successful synchronization scheme is server-centric.
Replication schemes, except for the special purpose master-slave
scheme, treat all the nodes equally, symmetrically. Transactions on
each node are propagated to all other nodes and conflict detection and
resolution are done on the destination nodes. In contrast, database
synchronization employs an asymmetric client-server paradigm. In the
server centric synchronization system, the server database is the
single source of truth. Client nodes only talk to the server, not each
other. Conflict detection and resolution only happen on one node, the
server. This ensures the system stability.

Physical transactions applied on the central database are final. In
contrast, device database serves as a cache of a sub-set of central
database data. Physical transactions applied on the device database
are tentative until they are checked in to the central database. At
check in time, the changes committed by all the local physical
transactions form a single logical transaction, which is applied
directly to only the central database, not directly to other devices.
The checking in of logical transactions in the sync system is just
like the committing of user physical transactions in a traditional
client-server DB system.

The checkin of client transactions is pretty straightforward except
that you cannot simply record the SQL statements on client and replay
them on server. Remember the client has a sub-set of server data and
the same SQL statement could do different things on server than on
client. Instead, you keep track of changes to records/rows and apply
the changes to server DB.

On the other hand, it is trickier to refresh client DB with server
transactions. Again, you cannot simply record the SQL statements and
should instead track the changes to records/rows. In addition, you
need to handle logical inserts and deletes. For example, let’s say you
have a tasks table on server containing all the tasks. You want each
user/client to sync only the tasks assigned to him or her. In case you
re-assign the task from user A to user B, user A should get a logical
delete and user B should get a logical insert. It is called a
“logical” insert/delete since the task wasn’t physically inserted of
deleted on server.

Fortunately a good synchronization system should be able to handle all
the intricacies like change tracking, logical inserts/deletes,
conflict detection and resolution.

Pervasync Software
mailto:in...@pervasync.com
http://www.pervasync.com

Reply all

Reply to author

Forward

0 new messages