How to start replication process without needing to backup database first

Hieu Nguyen

unread,

Oct 25, 2022, 7:19:58 AM10/25/22

to

Hello all,

A little context for my usecase:

I need to stream YottaDB/GT.M's journal entries to Apache Kafka. I'm currently using golang to implements an external replication filter. However, it runs too slow compared to replication without using filter.

Even if the filter does nothing but printing from STDIN to STDOUT, its performance doesn't reach 10% of the one running without filter.

As this would massively impacts business, I would like to know if it's possible to start a replication instance without any data, so the replicated instance would only acts as a way to stream journal entry from the source instance in realtime, instead of it being another standby server?

Or, more ideally, can the replication process with external filter be tuned so that its replication speed can reach at least half of the one running without filter?

Thanks and Best Regards,
Hieu Nguyen

K.S. Bhaskar

unread,

Oct 26, 2022, 12:07:54 PM10/26/22

to

Hieu –

There is some overhead to using a filter, since the binary data of the replication stream is converted to text, and the text is then converted back. Also, Go may not be the fastest language to write a filter.

1. Is the filter running on the (presumably fast) source machine, or the receiving machine, and if the latter, is it as fast as the source machine?

2. If you use cat as a filter, how fast is it?

Regards
– Bhaskar

Hieu Nguyen

unread,

Oct 26, 2022, 2:51:15 PM10/26/22

to

Hello Bhaskar -

I run the filter on the receiver end, both servers have the same specs

If I run the filter using cat ( `-filter=/usr/bin/cat` ), it runs about 20% slower than without using filter

I have tried increasing the receiver’s buffer pool and the replication speed increased significantly, it is now within acceptable range for our requirements. I will try to improve this further by using replication helper processes.

Can you recommends a number for helper processes based on server’s specs (e.g: 4 CPU cores = 8 writers …)

Also, if Go is not the most optimal language, what would you recommend for implementing a filter?

As the source server on production is running on AIX, I cannot use C/C++ without writing a custom Kafka library, which is part of the reasons why I wanted to create a replication process without needing to backup and restore the source database in the first place, to run the receiver on other platform.

Thanks and Best Regards,
- Hieu Nguyen

K.S. Bhaskar

unread,

Oct 26, 2022, 4:24:13 PM10/26/22

to

Hieu –

There is no algorithm for tuning either the number of helper processes or the balance between read-helpers and write helpers. It has to be determined empirically. In general, read helpers are more important than write helpers.

I would guess that the languages for writing the fastest filters are probably C/C++, Rust, Lua and M.

As long as you have the required access & permission, you can certainly run the filter on the Source Server on AIX. The receiving side can be Linux, and running the filter on the source side will reduce the network traffic.

Regards
– Bhaskar

K.S. Bhaskar

unread,

Oct 27, 2022, 11:19:11 AM10/27/22

to

Hieu –

Please do close the loop when you are done, and tell us what your solution is. Thank you.

Regards
– Bhaskar

Hieu Nguyen

unread,

Oct 27, 2022, 11:29:11 AM10/27/22

to

Hello Bhaskar -

I'm planning to implements the filter using C++ to see if there are any performance improvements.

Instead of publishing directly to Apache Kafka topics, I will instead use Kafka Proxy to publish using REST API instead.
There will be delays for publishing messages but our main goal is keep replication speed's impacts to a minimum.

Please close the loop for me. As this is my first time using Google Groups, I'm not too familiar with its functionality. My apologies.

K.S. Bhaskar

unread,

Oct 27, 2022, 4:15:15 PM10/27/22

to

Thanks for the update, Hieu. There is no formal closing the loop on a discussion thread. My apologies for using an American colloquialism. I was just requesting you to give an update when you complete the project so that people can learn from your experience.

Regards
– Bhaskar