Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

答复: Are everyone interested in compressing binlog?(Internet mail)

8 views
Skip to first unread message

chouryzhou(周威)

unread,
Aug 3, 2015, 11:05:19 PM8/3/15
to
>> I like the idea.
>> * Which compression algorithm do you intend to use?
> >* What sizes of binlog events does your workload create so that
> >compression
> >will be effective?
>>
> >For workloads I care about, there are many small events (a few hundred
> >bytes) so I would like to compress many events at a time as per-event
> >compression might not be sufficient. But that requires writing the
> >binlog
> >twice today - first as uncompress on commit, then again after commit to
> >compress a batch of events.

>I guess the problem with it is not only writing twice, but that it is
>not going to work with any kind of synchronous replication and in
>general will introduce unpredictable slave lag...

>The proposed solution may not be so efficient with small events, but is
>so nicely transparent...


As you can see in my first email, the binlog is decompressed in IO thread and then written to the relaylog.
In most cases, the delay comes from execute relaylog steps.

> On Mon, Aug 3, 2015 at 1:14 AM, chouryzhou(周威) <chour...@tencent.com>
> wrote:
>
>> Hi All,
>>
>>
>> I'm a dba of tencent Inc. We make extensive use of MySQL as our
>> database in our
>> business. So we are very grateful for your contribution.
>>
>>
>> But in our businesses (mainly are games), there is usually a lot of
>> binlog generated in
>> a short time. For example, a game called "QQ Dancer", it has one
>> million
>> online players at the
>> same time, we use more than 200 machines as its database server, they
>> generate about 1.6T
>> binlog per hour altogether. This is the result of using MIXED mode, it
>> will be even larger in ROW mode,
>> However, we have more than 10 games just like it. Such a large amount
>> of
>> binlog will not only take up
>> disk space, it will also take a lot of network bandwidth, and it is
>> very
>> difficult to make a long-distance
>> backup. We have searched for solution and only found these:
>> https://bugs.mysql.com/bug.php?id=48396
>> https://bugs.mysql.com/bug.php?id=46435
>> I don't know why it hasn't been implemented so long. Given this,
>> we made an idea of compressing
>> a binlog when generating and we have already implemented it.
>>
>> The solution is as follows:
>> We added some event types for the compressed edition of event, there
>> are:
>> QUERY_COMPRESSED_EVENT,
>> WRITE_ROWS_COMPRESSED_EVENT,
>> UPDATE_ROWS_COMPRESSED_EVENT,
>> DELETE_POWS_COMPRESSED_EVENT.
>> These events inheritance the uncompressed editor events. One of their
>> constructor functions and write
>> function have been overridden for uncompressing and compressing.
>> Anything
>> but this is totally the same.
>>
>> On slave, The IO thread will uncompress and convert them When it
>> receiving
>> the events from the master.
>> So the SQL and worker threads can be stay unchanged.
>>
>> We also added two option for this feature: "log_bin_compress " and
>> "log_bin_compress_min_len", the
>> former is a switch of whether the binlog should be compressed and the
>> latter is the minimum length of
>> sql statement(in statement mode) or record(in row mode) that can be
>> compressed. All can be described
>> by the code:
>>
>> if binlog_format == statement {
>> if log_bin_compress == true and query_len >=
>> log_bin_compress_min_len
>> create a Query_compressed_log_event;
>> else
>> create a Query_log_event;
>> }
>> if binlog_format == row {
>> if log_bin_compress == true and record_len >=
>> log_bin_compress_min_len
>> create a Write_rows_compressed_log_event(when INSERT)
>> else
>> create a Write_log_event(when INSERT);
>> }
>>
>>
>> The complete change for MySQL 5.6.25 can be found by:
>>
>> https://github.com/choury/mysql-server/commit/b0337044942a92dc6f5e4059032a532a60d04862
>> and two micro fixes:
>>
>> https://github.com/choury/mysql-server/commit/c791c62aaf042a47274e42847c6a95d4a9723640
>>
>> https://github.com/choury/mysql-server/commit/40436ac0b639cb5528b547b6cc702e4ad32fb337
>>
>> We have tested it on some of our games for months, and the result is
>> obvious: the amount of binlog
>> is reduced by 42% ~ 70%. We will be very glad if you can accept our
>> patch.
>>
>> If you have any other questions, please don't hesitate to reply to me!
>> ________________________________________
>> Thanks
>> chouryzhou(周威)
>>

ry


--
MySQL Internals Mailing List
For list archives: http://lists.mysql.com/internals
To unsubscribe: http://lists.mysql.com/internals

chouryzhou(周威)

unread,
Aug 5, 2015, 12:09:05 AM8/5/15
to
The replication between master and slave is based on event not binlog. That means we can't(or very difficult)
read binlog events from a specific position if we compress the whole binlog file.


-----邮件原件-----
发件人: Johan De Meersman [mailto:vegi...@tuxera.be]
发送时间: 2015年8月4日 18:31
收件人: MARK CALLAGHAN <mdca...@gmail.com>
抄送: chouryzhou(周威) <chour...@tencent.com>; inte...@lists.mysql.com; repli...@lists.mysql.com; mya...@acmug.com <mya...@47zu.com>; felixliang(梁飞龙) <felix...@tencent.com>; willhan(韩全安) <wil...@tencent.com>; vinchen(陈福荣) <vin...@tencent.com>; robincui(崔玉明) <robi...@tencent.com>
主题: Re: Are everyone interested in compressing binlog?(Internet mail)

Very interesting development, indeed, and something that will definitely be useful to a lot of people.

----- Original Message -----
> From: "MARK CALLAGHAN" <mdca...@gmail.com>
> Subject: Re: Are everyone interested in compressing binlog?
> * Which compression algorithm do you intend to use?

I kind of hope the answer there is "standard gzip" because that pretty much works everywhere :-)

> * What sizes of binlog events does your workload create so that compression
> will be effective?
>
> For workloads I care about, there are many small events (a few hundred
> bytes) so I would like to compress many events at a time as per-event
> compression might not be sufficient. But that requires writing the binlog
> twice today - first as uncompress on commit, then again after commit to
> compress a batch of events.


I'm wondering how hard it would be to just insert stream compression into the file writes and decompression in the reads. I believe that would have several benefits:

* No extra event types would be required, as it would be invisible to the upstream bits
* Replication clients would not need to be aware of it, as long as they connect to the server to stream the logs
* Better compression on small events
* Log type (compressed/uncompressed) could easily be determined by the magic bits.
* DBA could use standard gzip to compress a few large logfiles even if the server doesn't compress by default - they'd just get recognized when opened

I may be missing something important, though, as it seems strange that something that simple would not have been implemented years ago - or that Tencent would not have chosen a simpler architecture over the addition of several new event types.


Looking forward to seeing your changes in the official source, Chouryzhou !


--
Unhappiness is discouraged and will be corrected with kitten pictures.
0 new messages