MyRocks VS Cassandra

Aravind S

unread,

Nov 7, 2017, 4:41:50 AM11/7/17

to MyRocks - RocksDB storage engine for MySQL

Both MyRocks and Cassandra uses same LSM architecture. But why they differ in their data size ?
I did a small analysis on this.
Table Strucuture :

I have created the above table in both MySql (MyRocks as storage engine) and Cassandra. I have populated 5 lakhs rows in both mysql and cassandra.

The data size are as follows.

MyRocks - 879 MB.

Cassandra 86 MB.

How can cassandra store the same data but takes only 86 MB? Am I missing something ?
My my.cnf

rocksdb

skip-innodb

default-storage-engine=rocksdb

character-set-server=utf8

default-tmp-storage-engine=MyISAM

collation-server=utf8_bin

rocksdb_block_cache_size=4G

rocksdb_max_total_wal_size=2G

transaction-isolation=READ-COMMITTED

log-bin

binlog-format=ROW

rocksdb_default_cf_options=write_buffer_size=128m;target_file_size_base=100m;max_bytes_for_level_base=1G;level0_file_num_compaction_trigger=4;level0_slowdown_writes_trigger=10;level0_stop_writes_trigger=15;max_write_buffer_number=4;compression_per_level=kZlibCompression;bottommost_compression=kZSTD;compression_opts=-14:6:0;block_based_table_factory={cache_index_and_filter_blocks=1;filter_policy=bloomfilter:10:false;whole_key_filtering=1};level_compaction_dynamic_level_bytes=true;optimize_filters_for_hits=true;compaction_pri=kMinOverlappingRatio

I just want to know the reason behind this. Thanks in advance.

MARK CALLAGHAN

unread,

Nov 7, 2017, 9:55:50 AM11/7/17

to Aravind S, MyRocks - RocksDB storage engine for MySQL

I think you are better off with kZSTD than kZlib for compression_per_level.

Can you paste the output of SHOW TABLE STATUS for the database that has the table?

My guess is that the table is small enough to fit in uncompressed levels of the LSM tree. I thought that compression_per_level=kZlibCompression would enable compression for all levels.

Can you share the following from $datadir/.rocksdb/LOG

2017/11/07-06:14:35.700965 7f575373c1c0 Compression algorithms supported:

2017/11/07-06:14:35.700966 7f575373c1c0 Snappy supported: 1

2017/11/07-06:14:35.700967 7f575373c1c0 Zlib supported: 1

2017/11/07-06:14:35.700968 7f575373c1c0 Bzip supported: 1

2017/11/07-06:14:35.700969 7f575373c1c0 LZ4 supported: 1

2017/11/07-06:14:35.700974 7f575373c1c0 ZSTD supported: 1

2017/11/07-06:14:35.701130 7f575373c1c0 Options.compression[0]: NoCompression

2017/11/07-06:14:35.701131 7f575373c1c0 Options.compression[1]: NoCompression

2017/11/07-06:14:35.701132 7f575373c1c0 Options.compression[2]: NoCompression

2017/11/07-06:14:35.701133 7f575373c1c0 Options.compression[3]: NoCompression

2017/11/07-06:14:35.701134 7f575373c1c0 Options.compression[4]: NoCompression

2017/11/07-06:14:35.701135 7f575373c1c0 Options.compression[5]: NoCompression

2017/11/07-06:14:35.701136 7f575373c1c0 Options.bottommost_compression: NoCompression

A guess was the difference between leveled compaction in MyRocks and the compaction in Cassandra that I think is similar to what Rocks calls universal. But that might explain why Cassandra used more space than MyRocks and here you show where MyRocks used 10X more.
https://github.com/facebook/rocksdb/wiki/Universal-Compaction
https://github.com/facebook/rocksdb/wiki/Leveled-Compaction

--
You received this message because you are subscribed to the Google Groups "MyRocks - RocksDB storage engine for MySQL" group.
To unsubscribe from this group and stop receiving emails from it, send an email to myrocks-dev+unsubscribe@googlegroups.com.
To post to this group, send email to myroc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/myrocks-dev/393ddde6-ff25-4bde-ba2d-0c47b81e08db%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Mark Callaghan
mdca...@gmail.com

Aravind S

unread,

Nov 8, 2017, 2:28:21 AM11/8/17

to MyRocks - RocksDB storage engine for MySQL

Can you paste the output of SHOW TABLE STATUS for the database that has the table?

Can you share the following from $datadir/.rocksdb/LOG

My guess is that the table is small enough to fit in uncompressed levels of the LSM tree

In order to cross check this, I took a dump of the sst file of the table.

In the dump of the sst file 'SST file compression algo : Zlib'. So I guess the sst file is already compressed. Isn't it? Yet the data size is 800 MB. Any idea on this?

MARK CALLAGHAN

unread,

Nov 8, 2017, 1:46:06 PM11/8/17

to Aravind S, MyRocks - RocksDB storage engine for MySQL

I think the problem is that you set bottommost_compression to kZSTD which isn't supported by your build. So most of the LSM tree isn't compressed. Set it to kZlibCompression and we have a few bugs to fix in RocksDB.

https://github.com/facebook/rocksdb/issues/3147
https://github.com/facebook/rocksdb/issues/3146

--

Mark Callaghan
mdca...@gmail.com

Aravind S

unread,

Nov 8, 2017, 1:49:22 PM11/8/17

to MyRocks - RocksDB storage engine for MySQL

Thanks for the reply. But in the sst dump, it says that compression algo : Zlib. So it is using Zlib compression rite? I have attached the screenshot of it in my previous reply.

MARK CALLAGHAN

unread,

Nov 8, 2017, 1:56:14 PM11/8/17

to Aravind S, MyRocks - RocksDB storage engine for MySQL

My guess is that it used Zlib for the upper part of the LSM tree -- maybe just L0 or maybe from L0 to the next to last level. But I have no experience with that dump tool.
Is it easy to try my workaround to see if I found the problem?

--

You received this message because you are subscribed to the Google Groups "MyRocks - RocksDB storage engine for MySQL" group.
To unsubscribe from this group and stop receiving emails from it, send an email to myrocks-dev+unsubscribe@googlegroups.com.
To post to this group, send email to myroc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/myrocks-dev/add9d0ac-eef6-49f2-8925-8697ee43c6ca%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Mark Callaghan
mdca...@gmail.com

Aravind S

unread,

Nov 8, 2017, 2:01:28 PM11/8/17

to MyRocks - RocksDB storage engine for MySQL

Yeah okay.. I ll set bottommost_compression to ZLib and try it. Once done will update here.

On Thursday, 9 November 2017 00:26:14 UTC+5:30, Mark Callaghan wrote:

My guess is that it used Zlib for the upper part of the LSM tree -- maybe just L0 or maybe from L0 to the next to last level. But I have no experience with that dump tool.
Is it easy to try my workaround to see if I found the problem?

On Wed, Nov 8, 2017 at 10:49 AM, Aravind S <aravin...@gmail.com> wrote:

Thanks for the reply. But in the sst dump, it says that compression algo : Zlib. So it is using Zlib compression rite? I have attached the screenshot of it in my previous reply.

On Thursday, 9 November 2017 00:16:06 UTC+5:30, Mark Callaghan wrote:
I think the problem is that you set bottommost_compression to kZSTD which isn't supported by your build. So most of the LSM tree isn't compressed. Set it to kZlibCompression and we have a few bugs to fix in RocksDB.

https://github.com/facebook/rocksdb/issues/3147
https://github.com/facebook/rocksdb/issues/3146

--
Mark Callaghan
mdca...@gmail.com

--
You received this message because you are subscribed to the Google Groups "MyRocks - RocksDB storage engine for MySQL" group.

To unsubscribe from this group and stop receiving emails from it, send an email to myrocks-dev...@googlegroups.com.

To post to this group, send email to myroc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/myrocks-dev/add9d0ac-eef6-49f2-8925-8697ee43c6ca%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Mark Callaghan
mdca...@gmail.com

Aravind S

unread,

Nov 12, 2017, 8:31:18 AM11/12/17

to MyRocks - RocksDB storage engine for MySQL

As suggested I have set the bottomost compression level to ZLibCompression. But still the same.

5 lakh data takes 878 MB in MyRocks.

Is this the expected behaviour?

But how cassandra takes only 85 MB. Thats what I am very much eager to know. Or am I missing something?

For ref :
My my.cnf

[mysqld]

rocksdb
skip-innodb
default-storage-engine=rocksdb
character-set-server=utf8
default-tmp-storage-engine=MyISAM
collation-server=utf8_bin
rocksdb_block_cache_size=4G
rocksdb_max_total_wal_size=2G
transaction-isolation=READ-COMMITTED
log-bin
binlog-format=ROW

rocksdb_default_cf_options=write_buffer_size=128m;target_file_size_base=100m;max_bytes_for_level_base=1G;level0_file_num_compaction_trigger=4;level0_slowdown_writes_trigger=10;level0_stop_writes_trigger=15;max_write_buffer_number=4;compression_per_level=kZlibCompression;bottommost_compression=kZlibCompression;compression_opts=-14:6:0;block_based_table_factory={cache_index_and_filter_blocks=1;filter_policy=bloomfilter:10:false;whole_key_filtering=1};level_compaction_dynamic_level_bytes=true;optimize_filters_for_hits=true;compaction_pri=kMinOverlappingRatio

sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES

Aravind S

unread,

Nov 12, 2017, 12:13:43 PM11/12/17

to MyRocks - RocksDB storage engine for MySQL

rocksdb status

*************************** 1. row ***************************
  Type: STATISTICS
  Name: rocksdb
Status: rocksdb.block.cache.miss COUNT : 1527493
rocksdb.block.cache.hit COUNT : 799502
rocksdb.block.cache.add COUNT : 29358
rocksdb.block.cache.add.failures COUNT : 0
rocksdb.block.cache.index.miss COUNT : 47
rocksdb.block.cache.index.hit COUNT : 231493
rocksdb.block.cache.index.add COUNT : 47
rocksdb.block.cache.index.bytes.insert COUNT : 129383074
rocksdb.block.cache.index.bytes.evict COUNT : 99394824
rocksdb.block.cache.filter.miss COUNT : 46
rocksdb.block.cache.filter.hit COUNT : 0
rocksdb.block.cache.filter.add COUNT : 26
rocksdb.block.cache.filter.bytes.insert COUNT : 1441410
rocksdb.block.cache.filter.bytes.evict COUNT : 1400893
rocksdb.block.cache.data.miss COUNT : 1527400
rocksdb.block.cache.data.hit COUNT : 568009
rocksdb.block.cache.data.add COUNT : 29285
rocksdb.block.cache.data.bytes.insert COUNT : 113815183
rocksdb.block.cache.bytes.read COUNT : 775968159710
rocksdb.block.cache.bytes.write COUNT : 244639667
rocksdb.bloom.filter.useful COUNT : 2198495
rocksdb.persistent.cache.hit COUNT : 0
rocksdb.persistent.cache.miss COUNT : 0
rocksdb.sim.block.cache.hit COUNT : 0
rocksdb.sim.block.cache.miss COUNT : 0
rocksdb.memtable.hit COUNT : 14
rocksdb.memtable.miss COUNT : 652018
rocksdb.l0.hit COUNT : 8
rocksdb.l1.hit COUNT : 0
rocksdb.l2andup.hit COUNT : 0
rocksdb.compaction.key.drop.new COUNT : 16
rocksdb.compaction.key.drop.obsolete COUNT : 518404
rocksdb.compaction.key.drop.range_del COUNT : 0
rocksdb.compaction.key.drop.user COUNT : 518388
rocksdb.compaction.range_del.drop.obsolete COUNT : 0
rocksdb.compaction.optimized.del.drop.obsolete COUNT : 0
rocksdb.number.keys.written COUNT : 1304043
rocksdb.number.keys.read COUNT : 652032
rocksdb.number.keys.updated COUNT : 0
rocksdb.bytes.written COUNT : 5157007130
rocksdb.bytes.read COUNT : 297
rocksdb.number.db.seek COUNT : 15
rocksdb.number.db.next COUNT : 3
rocksdb.number.db.prev COUNT : 0
rocksdb.number.db.seek.found COUNT : 9
rocksdb.number.db.next.found COUNT : 3
rocksdb.number.db.prev.found COUNT : 0
rocksdb.db.iter.bytes.read COUNT : 8220
rocksdb.no.file.closes COUNT : 0
rocksdb.no.file.opens COUNT : 47
rocksdb.no.file.errors COUNT : 0
rocksdb.l0.slowdown.micros COUNT : 0
rocksdb.memtable.compaction.micros COUNT : 0
rocksdb.l0.num.files.stall.micros COUNT : 0
rocksdb.stall.micros COUNT : 109390
rocksdb.db.mutex.wait.micros COUNT : 0
rocksdb.rate.limit.delay.millis COUNT : 0
rocksdb.num.iterators COUNT : 0
rocksdb.number.multiget.get COUNT : 0
rocksdb.number.multiget.keys.read COUNT : 0
rocksdb.number.multiget.bytes.read COUNT : 0
rocksdb.number.deletes.filtered COUNT : 0
rocksdb.number.merge.failures COUNT : 0
rocksdb.bloom.filter.prefix.checked COUNT : 0
rocksdb.bloom.filter.prefix.useful COUNT : 0
rocksdb.number.reseeks.iteration COUNT : 0
rocksdb.getupdatessince.calls COUNT : 0
rocksdb.block.cachecompressed.miss COUNT : 0
rocksdb.block.cachecompressed.hit COUNT : 0
rocksdb.block.cachecompressed.add COUNT : 0
rocksdb.block.cachecompressed.add.failures COUNT : 0
rocksdb.wal.synced COUNT : 4474
rocksdb.wal.bytes COUNT : 2608563150
rocksdb.write.self COUNT : 1206362
rocksdb.write.other COUNT : 97671
rocksdb.write.timeout COUNT : 0
rocksdb.write.wal COUNT : 2608066
rocksdb.compact.read.bytes COUNT : 2628390449
rocksdb.compact.write.bytes COUNT : 1756804386
rocksdb.flush.write.bytes COUNT : 1150297855
rocksdb.number.direct.load.table.properties COUNT : 0
rocksdb.number.superversion_acquires COUNT : 16834
rocksdb.number.superversion_releases COUNT : 25
rocksdb.number.superversion_cleanups COUNT : 6
rocksdb.number.block.compressed COUNT : 0
rocksdb.number.block.decompressed COUNT : 0
rocksdb.number.block.not_compressed COUNT : 0
rocksdb.merge.operation.time.nanos COUNT : 0
rocksdb.filter.operation.time.nanos COUNT : 118961553
rocksdb.row.cache.hit COUNT : 0
rocksdb.row.cache.miss COUNT : 0
rocksdb.read.amp.estimate.useful.bytes COUNT : 0
rocksdb.read.amp.total.read.bytes COUNT : 0
rocksdb.number.rate_limiter.drains COUNT : 0
rocksdb.db.get.micros statistics Percentiles :=> 50 : 15.550929 95 : 53.136468 99 : 2372.797784 100 : 1368824.000000
rocksdb.db.write.micros statistics Percentiles :=> 50 : 7.993323 95 : 477.924689 99 : 3694.352066 100 : 167713.000000
rocksdb.compaction.times.micros statistics Percentiles :=> 50 : 18000000.000000 95 : 103700000.000000 99 : 107035257.000000 100 : 107035257.000000
rocksdb.subcompaction.setup.times.micros statistics Percentiles :=> 50 : 0.000000 95 : 0.000000 99 : 0.000000 100 : 0.000000
rocksdb.table.sync.micros statistics Percentiles :=> 50 : 1132000.000000 95 : 2533333.333333 99 : 2735775.000000 100 : 2735775.000000
rocksdb.compaction.outfile.sync.micros statistics Percentiles :=> 50 : 1287500.000000 95 : 2399656.000000 99 : 2399656.000000 100 : 2399656.000000
rocksdb.wal.file.sync.micros statistics Percentiles :=> 50 : 0.000000 95 : 0.000000 99 : 0.000000 100 : 0.000000
rocksdb.manifest.file.sync.micros statistics Percentiles :=> 50 : 14000.000000 95 : 97750.000000 99 : 222800.000000 100 : 226009.000000
rocksdb.table.open.io.micros statistics Percentiles :=> 50 : 5706.250000 95 : 13521.666667 99 : 43669.000000 100 : 43669.000000
rocksdb.db.multiget.micros statistics Percentiles :=> 50 : 0.000000 95 : 0.000000 99 : 0.000000 100 : 0.000000
rocksdb.read.block.compaction.micros statistics Percentiles :=> 50 : 0.000000 95 : 0.000000 99 : 0.000000 100 : 0.000000
rocksdb.read.block.get.micros statistics Percentiles :=> 50 : 45.474513 95 : 65769.601542 99 : 527681.818182 100 : 1368786.000000
rocksdb.write.raw.block.micros statistics Percentiles :=> 50 : 1.859005 95 : 2.972887 99 : 3.913956 100 : 703810.000000
rocksdb.l0.slowdown.count statistics Percentiles :=> 50 : 0.000000 95 : 0.000000 99 : 0.000000 100 : 0.000000
rocksdb.memtable.compaction.count statistics Percentiles :=> 50 : 0.000000 95 : 0.000000 99 : 0.000000 100 : 0.000000
rocksdb.num.files.stall.count statistics Percentiles :=> 50 : 0.000000 95 : 0.000000 99 : 0.000000 100 : 0.000000
rocksdb.hard.rate.limit.delay.count statistics Percentiles :=> 50 : 0.000000 95 : 0.000000 99 : 0.000000 100 : 0.000000
rocksdb.soft.rate.limit.delay.count statistics Percentiles :=> 50 : 0.000000 95 : 0.000000 99 : 0.000000 100 : 0.000000
rocksdb.numfiles.in.singlecompaction statistics Percentiles :=> 50 : 4.000000 95 : 5.500000 99 : 5.900000 100 : 6.000000
rocksdb.db.seek.micros statistics Percentiles :=> 50 : 17.625000 95 : 16000.000000 99 : 19002.000000 100 : 19002.000000
rocksdb.db.write.stall statistics Percentiles :=> 50 : 0.505831 95 : 0.961079 99 : 932.300000 100 : 5071.000000
rocksdb.sst.read.micros statistics Percentiles :=> 50 : 0.568225 95 : 1.934121 99 : 22.397233 100 : 2074301.000000
rocksdb.num.subcompactions.scheduled statistics Percentiles :=> 50 : 0.000000 95 : 0.000000 99 : 0.000000 100 : 0.000000
rocksdb.bytes.per.read statistics Percentiles :=> 50 : 0.500017 95 : 0.950032 99 : 0.990033 100 : 90.000000
rocksdb.bytes.per.write statistics Percentiles :=> 50 : 3661.651161 95 : 4347.174711 99 : 9761.374403 100 : 132918.000000
rocksdb.bytes.per.multiget statistics Percentiles :=> 50 : 0.000000 95 : 0.000000 99 : 0.000000 100 : 0.000000
rocksdb.bytes.compressed statistics Percentiles :=> 50 : 0.000000 95 : 0.000000 99 : 0.000000 100 : 0.000000
rocksdb.bytes.decompressed statistics Percentiles :=> 50 : 0.000000 95 : 0.000000 99 : 0.000000 100 : 0.000000
rocksdb.compression.times.nanos statistics Percentiles :=> 50 : 0.000000 95 : 0.000000 99 : 0.000000 100 : 0.000000
rocksdb.decompression.times.nanos statistics Percentiles :=> 50 : 0.000000 95 : 0.000000 99 : 0.000000 100 : 0.000000
rocksdb.read.num.merge_operands statistics Percentiles :=> 50 : 0.000000 95 : 0.000000 99 : 0.000000 100 : 0.000000
rocksdb.commit_latency statistics Percentiles :=> 50 : 10.57 95 : 21.19 99 : 115.45 100 : 97924.00
rocksdb.is_write_stopped COUNT : 0
rocksdb.actual_delayed_write_rate COUNT : 0

*************************** 2. row ***************************
  Type: DBSTATS
  Name: rocksdb
Status: 
** DB Stats **
Uptime(secs): 13879.3 total, 29.4 interval
Cumulative writes: 1304K writes, 1304K keys, 1206K commit groups, 1.1 writes per commit group, ingest: 4.80 GB, 0.35 MB/s
Cumulative WAL: 1304K writes, 0 syncs, 1304033.00 writes per sync, written: 2.43 GB, 0.18 MB/s
Cumulative stall: 00:00:0.109 H:M:S, 0.0 percent
Interval writes: 0 writes, 0 keys, 0 commit groups, 0.0 writes per commit group, ingest: 0.00 MB, 0.00 MB/s
Interval WAL: 0 writes, 0 syncs, 0.00 writes per sync, written: 0.00 MB, 0.00 MB/s
Interval stall: 00:00:0.000 H:M:S, 0.0 percent

*************************** 3. row ***************************
  Type: CF_COMPACTION
  Name: __system__
Status: 
** Compaction Stats [__system__] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
----------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      0/0    0.00 KB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   1.0      0.0      0.0         0         2    0.054       0      0
  L6      1/0    1.20 KB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.2      0.1      0.0         0         1    0.052      39     32
 Sum      1/0    1.20 KB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   1.5      0.0      0.0         0         3    0.054      39     32
 Int      0/0    0.00 KB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0         0         0    0.000       0      0
Uptime(secs): 13879.3 total, 29.4 interval
Flush(GB): cumulative 0.000, interval 0.000
AddFile(GB): cumulative 0.000, interval 0.000
AddFile(Total Files): cumulative 0, interval 0
AddFile(L0 Files): cumulative 0, interval 0
AddFile(Keys): cumulative 0, interval 0
Cumulative compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.2 seconds
Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds
Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 memtable_compaction, 0 memtable_slowdown, interval 0 total count

** File Read Latency Histogram By Level [__system__] **
** Level 0 read latency histogram (micros):
Count: 24 Average: 1.2917  StdDev: 1.34
Min: 0  Median: 0.7059  Max: 5
Percentiles: P50: 0.71 P75: 1.25 P99: 5.00 P99.9: 5.00 P99.99: 5.00
------------------------------------------------------
[       0,       1 ]       17  70.833%  70.833% ##############
(       1,       2 ]        4  16.667%  87.500% ###
(       3,       4 ]        2   8.333%  95.833% ##
(       4,       6 ]        1   4.167% 100.000% #

** Level 6 read latency histogram (micros):
Count: 9 Average: 1.7778  StdDev: 1.93
Min: 0  Median: 0.7500  Max: 7
Percentiles: P50: 0.75 P75: 1.38 P99: 7.00 P99.9: 7.00 P99.99: 7.00
------------------------------------------------------
[       0,       1 ]        6  66.667%  66.667% #############
(       1,       2 ]        2  22.222%  88.889% ####
(       6,      10 ]        1  11.111% 100.000% ##


*************************** 4. row ***************************
  Type: CF_COMPACTION
  Name: default
Status: 
** Compaction Stats [default] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
----------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      1/0   54.43 MB   0.2      0.0     0.0      0.0       1.1      1.1       0.0   1.0      0.0     10.9       101        20    5.051       0      0
  L6      8/0   781.83 MB   0.0      2.5     1.8      0.7       1.6      1.0       0.0   0.9      9.5      6.2       268         6   44.722   1520K   518K
 Sum      9/0   836.25 MB   0.0      2.5     1.8      0.7       2.7      2.1       0.0   2.5      6.9      7.5       369        26   14.206   1520K   518K
 Int      0/0    0.00 KB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0         0         0    0.000       0      0
Uptime(secs): 13879.3 total, 29.4 interval
Flush(GB): cumulative 1.071, interval 0.000
AddFile(GB): cumulative 0.000, interval 0.000
AddFile(Total Files): cumulative 0, interval 0
AddFile(L0 Files): cumulative 0, interval 0
AddFile(Keys): cumulative 0, interval 0
Cumulative compaction: 2.71 GB write, 0.20 MB/s write, 2.49 GB read, 0.18 MB/s read, 369.3 seconds
Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds
Stalls(count): 1 level0_slowdown, 1 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 memtable_compaction, 0 memtable_slowdown, interval 0 total count

** File Read Latency Histogram By Level [default] **
** Level 0 read latency histogram (micros):
Count: 1121501 Average: 126.6423  StdDev: 2602.46
Min: 0  Median: 0.5590  Max: 1573221
Percentiles: P50: 0.56 P75: 0.84 P99: 7.20 P99.9: 33634.04 P99.99: 223479.92
------------------------------------------------------
[       0,       1 ]  1003069  89.440%  89.440% ##################
(       1,       2 ]    75673   6.747%  96.187% #
(       2,       3 ]    18530   1.652%  97.840% 
(       3,       4 ]     8891   0.793%  98.632% 
(       4,       6 ]     3865   0.345%  98.977% 
(       6,      10 ]      857   0.076%  99.053% 
(      10,      15 ]      904   0.081%  99.134% 
(      15,      22 ]     2644   0.236%  99.370% 
(      22,      34 ]     2887   0.257%  99.627% 
(      34,      51 ]     1070   0.095%  99.723% 
(      51,      76 ]       80   0.007%  99.730% 
(      76,     110 ]       42   0.004%  99.733% 
(     110,     170 ]       93   0.008%  99.742% 
(     170,     250 ]       74   0.007%  99.748% 
(     250,     380 ]      152   0.014%  99.762% 
(     380,     580 ]      147   0.013%  99.775% 
(     580,     870 ]      117   0.010%  99.785% 
(     870,    1300 ]       75   0.007%  99.792% 
(    1300,    1900 ]       43   0.004%  99.796% 
(    1900,    2900 ]       34   0.003%  99.799% 
(    2900,    4400 ]       47   0.004%  99.803% 
(    4400,    6600 ]       50   0.004%  99.808% 
(    6600,    9900 ]      102   0.009%  99.817% 
(    9900,   14000 ]      144   0.013%  99.830% 
(   14000,   22000 ]      333   0.030%  99.859% 
(   22000,   33000 ]      438   0.039%  99.898% 
(   33000,   50000 ]      496   0.044%  99.943% 
(   50000,   75000 ]      298   0.027%  99.969% 
(   75000,  110000 ]      111   0.010%  99.979% 
(  110000,  170000 ]       56   0.005%  99.984% 
(  170000,  250000 ]      100   0.009%  99.993% 
(  250000,  380000 ]       36   0.003%  99.996% 
(  380000,  570000 ]       25   0.002%  99.998% 
(  570000,  860000 ]        9   0.001%  99.999% 
(  860000, 1200000 ]        4   0.000% 100.000% 
( 1200000, 1900000 ]        8   0.001% 100.000% 

** Level 6 read latency histogram (micros):
Count: 406062 Average: 1306.8832  StdDev: 4692.56
Min: 0  Median: 0.5952  Max: 2074301
Percentiles: P50: 0.60 P75: 0.89 P99: 14177.04 P99.9: 375706.83 P99.99: 1152703.73
------------------------------------------------------
[       0,       1 ]   341106  84.003%  84.003% #################
(       1,       2 ]    38905   9.581%  93.584% ##
(       2,       3 ]    10642   2.621%  96.205% #
(       3,       4 ]     4187   1.031%  97.236% 
(       4,       6 ]     1304   0.321%  97.558% 
(       6,      10 ]      472   0.116%  97.674% 
(      10,      15 ]      277   0.068%  97.742% 
(      15,      22 ]      805   0.198%  97.940% 
(      22,      34 ]     1912   0.471%  98.411% 
(      34,      51 ]      745   0.183%  98.595% 
(      51,      76 ]       31   0.008%  98.602% 
(      76,     110 ]       23   0.006%  98.608% 
(     110,     170 ]       32   0.008%  98.616% 
(     170,     250 ]       22   0.005%  98.621% 
(     250,     380 ]       39   0.010%  98.631% 
(     380,     580 ]       40   0.010%  98.641% 
(     580,     870 ]       42   0.010%  98.651% 
(     870,    1300 ]       55   0.014%  98.664% 
(    1300,    1900 ]       59   0.015%  98.679% 
(    1900,    2900 ]       91   0.022%  98.701% 
(    2900,    4400 ]      149   0.037%  98.738% 
(    4400,    6600 ]      305   0.075%  98.813% 
(    6600,    9900 ]      305   0.075%  98.888% 
(    9900,   14000 ]      438   0.108%  98.996% 
(   14000,   22000 ]      695   0.171%  99.167% 
(   22000,   33000 ]      874   0.215%  99.383% 
(   33000,   50000 ]      873   0.215%  99.598% 
(   50000,   75000 ]      556   0.137%  99.735% 
(   75000,  110000 ]      255   0.063%  99.797% 
(  110000,  170000 ]      114   0.028%  99.825% 
(  170000,  250000 ]      184   0.045%  99.871% 
(  250000,  380000 ]      123   0.030%  99.901% 
(  380000,  570000 ]      133   0.033%  99.934% 
(  570000,  860000 ]      138   0.034%  99.968% 
(  860000, 1200000 ]      105   0.026%  99.994% 
( 1200000, 1900000 ]       23   0.006%  99.999% 
( 1900000, 2900000 ]        1   0.000% 100.000% 


*************************** 5. row ***************************
  Type: MEMORY_STATS
  Name: rocksdb
Status: 
MemTable Total: 550131096
MemTable Unflushed: 1856
Table Readers Total: 0
Cache Total: 143762477
Default Cache Capacity: 0
*************************** 6. row ***************************
  Type: BG_THREADS
  Name: 140722298103552
Status: 
thread_type: High Pri
cf_name: 
operation_type: 
operation_stage: 
elapsed_time_ms: 
state_type: 
*************************** 7. row ***************************
  Type: BG_THREADS
  Name: 140722306496256
Status: 
thread_type: Low Pri
cf_name: 
operation_type: 
operation_stage: 
elapsed_time_ms: 
state_type: 
7 rows in set (0.00 sec)

MARK CALLAGHAN

unread,

Nov 12, 2017, 1:04:22 PM11/12/17

to Aravind S, MyRocks - RocksDB storage engine for MySQL

Thank you. I see that all data is in the L0 and L1. I tried my workload with your my.cnf options and then with the options changed to disable compression. Using your options for my workload I get compression. My guess is that you are not getting compression for some reason that I cannot explain.

My results for the insert benchmark (see http://smalldatum.blogspot.com/2017/06/the-insert-benchmark.html) are here. The data is loaded in PK order, but secondary index changes are in random order:

* https://gist.github.com/mdcallag/89fed1332ec23edb5f583a6ed16753f8 - table has PK but no secondary indexes
* https://gist.github.com/mdcallag/f4be7cdbb17e1ebff128e93642b01425 - table has PK index and 3 secondary indexes

Notice that with compression:
* the database is smaller based on the size in the Sum row

* Wr(MB/s) is the speed at which compaction is done per level. That is much slower in the compressed case for both L0 and L1, because zlib makes compaction slower. You can see the same thing in the Avg(sec) column which is the average number of seconds for a compaction step.

--
You received this message because you are subscribed to the Google Groups "MyRocks - RocksDB storage engine for MySQL" group.

To unsubscribe from this group and stop receiving emails from it, send an email to myrocks-dev+unsubscribe@googlegroups.com.

To post to this group, send email to myroc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/myrocks-dev/5babb71f-3125-4289-b90e-c0dfd3c1ae8f%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Mark Callaghan
mdca...@gmail.com

Aravind S

unread,

Nov 12, 2017, 1:10:06 PM11/12/17

to MyRocks - RocksDB storage engine for MySQL

I see that all data is in the L0 and L1

But in rocksdb status output..

It says my data are in L0 and L6? And my sst files are compressed too. I checked using sst dump tool.
My doubt is, Is this the max compression MyRocks can give?
I mean 5lakh rows - 800MB ?

MARK CALLAGHAN

unread,

Nov 12, 2017, 1:41:36 PM11/12/17

to Aravind S, MyRocks - RocksDB storage engine for MySQL

On Sun, Nov 12, 2017 at 10:10 AM, Aravind S <aravin...@gmail.com> wrote:

I see that all data is in the L0 and L1
But in rocksdb status output..
It says my data are in L0 and L6? And my sst files are compressed too. I checked using sst dump tool.

The levels are listed as L0 and L6 rather than L0 and L1 because that is how dynamic leveled compaction names levels beyond L0. It confused me for a while too. Despite the level naming confusion it is a nice feature - http://rocksdb.org/blog/2015/07/23/dynamic-level.html

With respect to SST files being compressed, I don't know enough to evaluate that claim. I assume you are getting no compression. 5lakh rows and 800MB database is about 1700 bytes/row. Is your input data ~1700 bytes/row?

--

Mark Callaghan
mdca...@gmail.com

Aravind S

unread,

Nov 12, 2017, 2:17:40 PM11/12/17

to MyRocks - RocksDB storage engine for MySQL

Since you asked about my input, I changed my input and tested the size.

First in the above three text column I populated json values with characters " : \ , } { [ ] which will be present in a simple JSON.

Then now I have changed all those json values to a's b's and c's. For example if the length of the json values is 3000, i replaced that with 3000 b's.

Now I have populated 50 lakh data. Now the table size is just 53 MB.

Sample JSON : {"_id":"5a089ae80265e56b5b37b793","index":0,"guid":"d41e116d-9f59-408e-883a-0725c45657b3","isActive":false }

This is a small json. Like the one above I have also populated a huge json in one of the text column.

Does this make any difference??

MARK CALLAGHAN

unread,

Nov 12, 2017, 2:35:51 PM11/12/17

to Aravind S, MyRocks - RocksDB storage engine for MySQL

Regardless of the data that you used -- old style, or current style. It would help to understand how well gzip compresses it when it is in a text file.

I don't know much about Cassandra but I wonder if it uses a much larger compression block size. By that I mean that amount of uncompressed data that is compressed at a time. With MyRocks the block size is 4kb by default which might be too small. Assuming this is correct and Cassandra uses 64kb by default, can you repeat a test with 64kb for MyRocks by setting rocksdb_block_size=64k in my.cnf?
https://docs.datastax.com/en/cql/3.1/cql/cql_reference/compressSubprop.html

--

You received this message because you are subscribed to the Google Groups "MyRocks - RocksDB storage engine for MySQL" group.
To unsubscribe from this group and stop receiving emails from it, send an email to myrocks-dev+unsubscribe@googlegroups.com.
To post to this group, send email to myroc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/myrocks-dev/bcd0e195-1fc3-4943-8f3d-164ac6e6599c%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Mark Callaghan
mdca...@gmail.com

Aravind S

unread,

Nov 13, 2017, 6:49:49 AM11/13/17

to MyRocks - RocksDB storage engine for MySQL

Thank you so much mark. Setting rocksdb_block_size=64k to reduced the data size to 72 MB. So increasing block size reduces space amplification. Is there any link which explains it more clearly.

MARK CALLAGHAN

unread,

Nov 13, 2017, 9:03:27 AM11/13/17

to Aravind S, MyRocks - RocksDB storage engine for MySQL

I am not a compression guru, but AFAIK the compression window in RocksDB is the size of rocksdb_block_size. Another constraint is that rows don't span blocks. If the block size is 4k and the row size is 5k, then there will be a 5k block with that row. So with a small block size the compressor might not see enough data to get a great compression rate. With a small block size and large rows, there might be one row per block and the compressor then compresses one row at a time. Up to a point, larger compression windows lead to a better compression rate.

On Mon, Nov 13, 2017 at 3:49 AM, Aravind S <aravin...@gmail.com> wrote:

Thank you so much mark. Setting rocksdb_block_size=64k to reduced the data size to 72 MB. So increasing block size reduces space amplification. Is there any link which explains it more clearly.

--

You received this message because you are subscribed to the Google Groups "MyRocks - RocksDB storage engine for MySQL" group.
To unsubscribe from this group and stop receiving emails from it, send an email to myrocks-dev+unsubscribe@googlegroups.com.
To post to this group, send email to myroc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/myrocks-dev/36cb6823-1fe5-4f2b-a5ef-45a888e8e3ee%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Mark Callaghan
mdca...@gmail.com

Aravind S

unread,

Nov 21, 2017, 1:59:55 AM11/21/17

to MyRocks - RocksDB storage engine for MySQL

Thank you so much for your response mark.

It seems both cassandra and MyRocks are very much similar in terms of performance.

Read :

Compression Type	No of queries per second	Max time taken	Avg time taken
ZLib	500	310 ms	102 ms
ZLib	1000	390 ms	100 ms
LZ4	500	160 ms	77 ms
LZ4	1000	260 ms	89 ms

Write :

No of insert queries (sequential)	MyRocks	Cassandra	Cassandra with Batch Insert (500 at a time)
1000	458 ms	427 ms	69 ms
10000	2.9 seconds	2.4 seconds	478 ms
100000 But interms of write I find cassandra is faster. Is there anything you can suggest to tune myrocks? And also I am very curious to know how join has been implemented with LSM architecture? Any detailed explanation reg that?	29 seconds	18 seconds	4.3 seconds

MARK CALLAGHAN

unread,

Nov 21, 2017, 12:19:32 PM11/21/17

to Aravind S, MyRocks - RocksDB storage engine for MySQL

I publish perf reports at http://smalldatum.blogspot.com/

For your write perf results, did you get the server into a steady state. I try to collect my results from write-heavy tests by running the test for 60+ minutes.

I expect MyRocks to have better 99th percentile response times compared to Cassandra because there will not be GC stalls. There is a project to add RocksDB as a backend for Cassandra and RocksDB+Cassandra has better p99 latencies than pure Cassandra. A few details are at https://issues.apache.org/jira/browse/CASSANDRA-13476

For my.cnf options see https://github.com/facebook/mysql-5.6/wiki/my.cnf-tuning

Content linked from https://github.com/facebook/mysql-5.6/wiki has some info on doing a SQL engine with an LSM.

--
You received this message because you are subscribed to the Google Groups "MyRocks - RocksDB storage engine for MySQL" group.
To unsubscribe from this group and stop receiving emails from it, send an email to myrocks-dev+unsubscribe@googlegroups.com.
To post to this group, send email to myroc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/myrocks-dev/f323b30d-be0d-45c3-ad3e-4780b9570f50%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Mark Callaghan
mdca...@gmail.com

Aravind S

unread,

Nov 22, 2017, 10:19:01 AM11/22/17

to MyRocks - RocksDB storage engine for MySQL

As you suggested, I decided to perform inserts for a long time. But tried to insert parallely. Since I can get around 600 parallel connections, I simulated a workload which will have 400 parallel inserts 200 to tableA and 200 to tableB.
And 200 parallel selects.

Everything seems to be fine. No slowness in queries since all the queries we using the index. But the cpu usage was way too high.
It reached around 1015.
My machine is 40 cores.

20 physical cores and 20 virtual cores (hyperthreading enabled). And I have set the block_size to 64kb. On reducing block_size to 4kb the cpu usage has reduced to 400 to 500.

Is this normal?

Aravind S

unread,

Nov 22, 2017, 10:39:28 AM11/22/17

to MyRocks - RocksDB storage engine for MySQL

FYI : I have disabled all the custom configurations, and set everything to default. Then I get 900 cpu every 10 or 15 seconds. Then cpu percentage drops down to 250.

MARK CALLAGHAN

unread,

Nov 22, 2017, 10:39:29 AM11/22/17

to Aravind S, MyRocks - RocksDB storage engine for MySQL

Hundreds of concurrent connections can mean hundreds of active threads in the worst case. On a server with 40 HW threads you can easily saturate the CPU. The MySQL connections can be idle when:

1) client is far enough away on the network
2) insert statement processing stalls on a disk read while validating a unique constraint for primary key or unique secondary key
3) inserts are too fast and RocksDB stalls writes
4) commit blocks for fsync to RocksDB WAL or binlog -- whether that can happen depends on my.cnf options

But again, from the above, there might be few chances for a connection to be idle and the CPU can be saturated. Are you using a similar number of connections with Cassandra? To load fast you shouldn't need hundreds of connections. There are bulk load options too, but lets ignore that for now.

How did you get MyRocks - by building it yourself from MariaDB, Percona, FB MySQL? By downloading a binary from MariaDB or Percona? If you compiled it yourself
1) you should confirm that -DNDEBUG is a compiler command line option or MyRocks can use 10% to 20% more CPU than it should.
2) you should confirm that fast CRC is used or some CPU will be wasted
https://github.com/facebook/rocksdb/issues/2488

--

You received this message because you are subscribed to the Google Groups "MyRocks - RocksDB storage engine for MySQL" group.
To unsubscribe from this group and stop receiving emails from it, send an email to myrocks-dev+unsubscribe@googlegroups.com.
To post to this group, send email to myroc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/myrocks-dev/21197d2f-b12b-4e34-8046-ec8041617bd7%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Mark Callaghan
mdca...@gmail.com

MARK CALLAGHAN

unread,

Nov 22, 2017, 10:48:10 AM11/22/17

to Aravind S, MyRocks - RocksDB storage engine for MySQL

Default my.cnf options are not great for write-heavy workloads with MyRocks. I know because I tested that in the past.

This describes a good starting point:
https://github.com/facebook/mysql-5.6/wiki/my.cnf-tuning

Most important options to tune:
* rocksdb_max_background_jobs - set to between 1/4 and 1/2 of HW threads. This is number of threads RocksDB uses for compaction and memtable flush
* rocksdb_block_cache_size - I usually set this to between 1/5 and 1/3 of system RAM

The next thing to tune is compression settings. I usually do:
* no compression on L0, L1, L2 because compression here would use a lot of CPU but not save much space
* zstandard, snappy or lz4 on the max level (see "bottommost_compression"). Don't use zlib here, as zstd is much better.
* lz4 or snappy on L3+

--

You received this message because you are subscribed to the Google Groups "MyRocks - RocksDB storage engine for MySQL" group.
To unsubscribe from this group and stop receiving emails from it, send an email to myrocks-dev+unsubscribe@googlegroups.com.
To post to this group, send email to myroc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/myrocks-dev/7c1e7a79-7189-4257-be7d-2819af733f0c%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Mark Callaghan
mdca...@gmail.com

Aravind S

unread,

Nov 22, 2017, 10:52:44 AM11/22/17

to MyRocks - RocksDB storage engine for MySQL

I am using Percona's mysql 5.7 build which has MyRocks. And I do not want to do any bulk loading ,just want to stress test MyRocks and see its behaviour in high load environments. Currenlty I am using mysql Innodb under similar workload and it hardly takes 150 CPU.

On Wednesday, 22 November 2017 21:09:29 UTC+5:30, Mark Callaghan wrote:

Hundreds of concurrent connections can mean hundreds of active threads in the worst case. On a server with 40 HW threads you can easily saturate the CPU. The MySQL connections can be idle when:
1) client is far enough away on the network
2) insert statement processing stalls on a disk read while validating a unique constraint for primary key or unique secondary key
3) inserts are too fast and RocksDB stalls writes
4) commit blocks for fsync to RocksDB WAL or binlog -- whether that can happen depends on my.cnf options

But again, from the above, there might be few chances for a connection to be idle and the CPU can be saturated. Are you using a similar number of connections with Cassandra? To load fast you shouldn't need hundreds of connections. There are bulk load options too, but lets ignore that for now.

How did you get MyRocks - by building it yourself from MariaDB, Percona, FB MySQL? By downloading a binary from MariaDB or Percona? If you compiled it yourself
1) you should confirm that -DNDEBUG is a compiler command line option or MyRocks can use 10% to 20% more CPU than it should.
2) you should confirm that fast CRC is used or some CPU will be wasted
https://github.com/facebook/rocksdb/issues/2488

On Wed, Nov 22, 2017 at 7:19 AM, Aravind S <aravin...@gmail.com> wrote:

As you suggested, I decided to perform inserts for a long time. But tried to insert parallely. Since I can get around 600 parallel connections, I simulated a workload which will have 400 parallel inserts 200 to tableA and 200 to tableB.
And 200 parallel selects.
Everything seems to be fine. No slowness in queries since all the queries we using the index. But the cpu usage was way too high.
It reached around 1015.
My machine is 40 cores.
20 physical cores and 20 virtual cores (hyperthreading enabled). And I have set the block_size to 64kb. On reducing block_size to 4kb the cpu usage has reduced to 400 to 500.
Is this normal?

--
You received this message because you are subscribed to the Google Groups "MyRocks - RocksDB storage engine for MySQL" group.

To unsubscribe from this group and stop receiving emails from it, send an email to myrocks-dev...@googlegroups.com.

To post to this group, send email to myroc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/myrocks-dev/21197d2f-b12b-4e34-8046-ec8041617bd7%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Mark Callaghan
mdca...@gmail.com

Aravind S

unread,

Nov 22, 2017, 10:53:28 AM11/22/17

to MyRocks - RocksDB storage engine for MySQL

I ll try setting no compressions in upper level. And let you know. Once again Thank you so much for your response mark.

On Wednesday, 22 November 2017 21:18:10 UTC+5:30, Mark Callaghan wrote:

Default my.cnf options are not great for write-heavy workloads with MyRocks. I know because I tested that in the past.

This describes a good starting point:
https://github.com/facebook/mysql-5.6/wiki/my.cnf-tuning

Most important options to tune:
* rocksdb_max_background_jobs - set to between 1/4 and 1/2 of HW threads. This is number of threads RocksDB uses for compaction and memtable flush
* rocksdb_block_cache_size - I usually set this to between 1/5 and 1/3 of system RAM

The next thing to tune is compression settings. I usually do:
* no compression on L0, L1, L2 because compression here would use a lot of CPU but not save much space
* zstandard, snappy or lz4 on the max level (see "bottommost_compression"). Don't use zlib here, as zstd is much better.
* lz4 or snappy on L3+

On Wed, Nov 22, 2017 at 7:39 AM, Aravind S <aravin...@gmail.com> wrote:

FYI : I have disabled all the custom configurations, and set everything to default. Then I get 900 cpu every 10 or 15 seconds. Then cpu percentage drops down to 250.

On Wednesday, 22 November 2017 20:49:01 UTC+5:30, Aravind S wrote:
As you suggested, I decided to perform inserts for a long time. But tried to insert parallely. Since I can get around 600 parallel connections, I simulated a workload which will have 400 parallel inserts 200 to tableA and 200 to tableB.
And 200 parallel selects.
Everything seems to be fine. No slowness in queries since all the queries we using the index. But the cpu usage was way too high.
It reached around 1015.
My machine is 40 cores.
20 physical cores and 20 virtual cores (hyperthreading enabled). And I have set the block_size to 64kb. On reducing block_size to 4kb the cpu usage has reduced to 400 to 500.
Is this normal?

--
You received this message because you are subscribed to the Google Groups "MyRocks - RocksDB storage engine for MySQL" group.

To unsubscribe from this group and stop receiving emails from it, send an email to myrocks-dev...@googlegroups.com.

To post to this group, send email to myroc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/myrocks-dev/7c1e7a79-7189-4257-be7d-2819af733f0c%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Mark Callaghan
mdca...@gmail.com

MARK CALLAGHAN

unread,

Nov 22, 2017, 11:13:33 AM11/22/17

to Aravind S, MyRocks - RocksDB storage engine for MySQL

MyRocks will use more CPU than InnoDB. I will wave my hands and claim 20% more for a similar workload, but that will vary.

My blog has many posts on the performance of the insert benchmark for InnoDB and MyRocks. That workload is insert-only at the start. The table has a PK and 3 secondary indexes -- http://smalldatum.blogspot.com/. My perf reports there include the CPU overhead.

Something to watch for -- secondary index maintenance is read-free for MyRocks. Don't need to read secondary index leaf pages to maintain them. For InnoDB it must do the maintenance. If your indexes don't fit in memory then this can slow InnoDB and cause threads to stall. Meaning InnoDB might not saturate the CPU in some setups, while MyRocks would. But the benefit is that MyRocks would be inserting much faster.

To unsubscribe from this group and stop receiving emails from it, send an email to myrocks-dev+unsubscribe@googlegroups.com.

To post to this group, send email to myroc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/myrocks-dev/5cb64b49-8118-4fd0-875d-456725f27cf6%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Mark Callaghan
mdca...@gmail.com

MARK CALLAGHAN

unread,

Nov 22, 2017, 11:14:46 AM11/22/17

to Aravind S, MyRocks - RocksDB storage engine for MySQL

Also, last time I checked Percona didn't use -DNDEBUG when compiling MyRocks, so it will use more CPU than it should. I discussed this with them.

I am not sure whether their build does the right thing yet for fast CRC support.

--

Mark Callaghan
mdca...@gmail.com

George O. Lorch III

unread,

Nov 22, 2017, 2:28:35 PM11/22/17

to myroc...@googlegroups.com

The upcoming 5.7.20-18 release of Percona Server has the NDEBUG and FastCRC32 issues addressed.

To view this discussion on the web visit https://groups.google.com/d/msgid/myrocks-dev/CAFbpF8NN_FbOiWjumFYhApPwM493Ute9%3D%3DgYVRf8V7iEbGhQQA%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

-- 
George O. Lorch III
Senior Software Engineer, Percona
US/Arizona (GMT -7)
skype: george.ormond.lorch.iii

MARK CALLAGHAN

unread,

Nov 22, 2017, 2:54:36 PM11/22/17

to George O. Lorch III, MyRocks - RocksDB storage engine for MySQL

George - thanks for the good news. I look forward to using it.

To view this discussion on the web visit https://groups.google.com/d/msgid/myrocks-dev/CAFbpF8NN_FbOiWjumFYhApPwM493Ute9%3D%3DgYVRf8V7iEbGhQQA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

-- 
George O. Lorch III
Senior Software Engineer, Percona
US/Arizona (GMT -7)
skype: george.ormond.lorch.iii

--

You received this message because you are subscribed to the Google Groups "MyRocks - RocksDB storage engine for MySQL" group.
To unsubscribe from this group and stop receiving emails from it, send an email to myrocks-dev+unsubscribe@googlegroups.com.
To post to this group, send email to myroc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/myrocks-dev/03e9fa6e-6da5-6f0f-8331-ba72c79cfa8e%40percona.com.

For more options, visit https://groups.google.com/d/optout.

--

Mark Callaghan
mdca...@gmail.com

Message has been deleted

Aravind S

unread,

Nov 24, 2017, 5:12:58 AM11/24/17

to MyRocks - RocksDB storage engine for MySQL

The reason for innodb not using much CPU is due to innodb_thread_concurrency has been set to 8. Which restricts concurrency to 8 cores at max. When I set it to 0 I find similar CPU usage by innodb as of MyRocks. And is there any variables like innodb_thread_concurrency in MyRocks to control concurrent threads ?

MARK CALLAGHAN

unread,

Nov 27, 2017, 12:19:42 PM11/27/17

to Aravind S, MyRocks - RocksDB storage engine for MySQL

I tend to forget about innodb_thread_concurrency. We haven't been using it much. It works better for autocommit statements, but not so great for multi-statement transactions because it can delay a thread that is already holding locks, which blocks even more statements when there is contention for rows.

On Fri, Nov 24, 2017 at 2:12 AM, Aravind S <aravin...@gmail.com> wrote:

The reason for innodb not using much CPU is due to innodb_thread_concurrency has been set to 8. Which restricts concurrency to 8 cores at max. When I set it to 0 I find similar CPU usage by innodb as of MyRocks. And is there any variables like innodb_thread_concurrency in MyRocks to control concurrent threads ?

--

You received this message because you are subscribed to the Google Groups "MyRocks - RocksDB storage engine for MySQL" group.

To unsubscribe from this group and stop receiving emails from it, send an email to myrocks-dev+unsubscribe@googlegroups.com.

To post to this group, send email to myroc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/myrocks-dev/e4a4afc8-7c34-488f-b9d8-f292e9afc147%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Mark Callaghan
mdca...@gmail.com

Aravind S

unread,

Nov 27, 2017, 12:27:30 PM11/27/17

to MyRocks - RocksDB storage engine for MySQL

Currently we are using innodb_thread_concurrency as 8. In our case there won't be much row lock contention. And as per my understanding there is no such option in MyRocks like 'innodb_thread_concurrency' I guess. Correct me if I am wrong.

On Monday, 27 November 2017 22:49:42 UTC+5:30, Mark Callaghan wrote:

I tend to forget about innodb_thread_concurrency. We haven't been using it much. It works better for autocommit statements, but not so great for multi-statement transactions because it can delay a thread that is already holding locks, which blocks even more statements when there is contention for rows.

On Fri, Nov 24, 2017 at 2:12 AM, Aravind S <aravin...@gmail.com> wrote:

The reason for innodb not using much CPU is due to innodb_thread_concurrency has been set to 8. Which restricts concurrency to 8 cores at max. When I set it to 0 I find similar CPU usage by innodb as of MyRocks. And is there any variables like innodb_thread_concurrency in MyRocks to control concurrent threads ?

--

You received this message because you are subscribed to the Google Groups "MyRocks - RocksDB storage engine for MySQL" group.

To unsubscribe from this group and stop receiving emails from it, send an email to myrocks-dev...@googlegroups.com.

To post to this group, send email to myroc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/myrocks-dev/e4a4afc8-7c34-488f-b9d8-f292e9afc147%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Mark Callaghan
mdca...@gmail.com

MARK CALLAGHAN

unread,

Nov 27, 2017, 12:51:40 PM11/27/17

to Aravind S, MyRocks - RocksDB storage engine for MySQL

MyRocks doesn't have the feature and there isn't internal demand for it. Someone else would have to contribute it for it to be added.

To unsubscribe from this group and stop receiving emails from it, send an email to myrocks-dev+unsubscribe@googlegroups.com.

To post to this group, send email to myroc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/myrocks-dev/31a4cbab-0bf8-4e35-8062-804e7373dec5%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Mark Callaghan
mdca...@gmail.com

Aravind S

unread,

Nov 27, 2017, 12:58:20 PM11/27/17

to MyRocks - RocksDB storage engine for MySQL

Ok Mark. Thank you so much once again.

On Monday, 27 November 2017 23:21:40 UTC+5:30, Mark Callaghan wrote:

MyRocks doesn't have the feature and there isn't internal demand for it. Someone else would have to contribute it for it to be added.

On Mon, Nov 27, 2017 at 9:27 AM, Aravind S <aravin...@gmail.com> wrote:

Currently we are using innodb_thread_concurrency as 8. In our case there won't be much row lock contention. And as per my understanding there is no such option in MyRocks like 'innodb_thread_concurrency' I guess. Correct me if I am wrong.

On Monday, 27 November 2017 22:49:42 UTC+5:30, Mark Callaghan wrote:
I tend to forget about innodb_thread_concurrency. We haven't been using it much. It works better for autocommit statements, but not so great for multi-statement transactions because it can delay a thread that is already holding locks, which blocks even more statements when there is contention for rows.

On Fri, Nov 24, 2017 at 2:12 AM, Aravind S <aravin...@gmail.com> wrote:
The reason for innodb not using much CPU is due to innodb_thread_concurrency has been set to 8. Which restricts concurrency to 8 cores at max. When I set it to 0 I find similar CPU usage by innodb as of MyRocks. And is there any variables like innodb_thread_concurrency in MyRocks to control concurrent threads ?

--
You received this message because you are subscribed to the Google Groups "MyRocks - RocksDB storage engine for MySQL" group.
To unsubscribe from this group and stop receiving emails from it, send an email to myrocks-dev...@googlegroups.com.
To post to this group, send email to myroc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/myrocks-dev/e4a4afc8-7c34-488f-b9d8-f292e9afc147%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Mark Callaghan
mdca...@gmail.com

--
You received this message because you are subscribed to the Google Groups "MyRocks - RocksDB storage engine for MySQL" group.
To unsubscribe from this group and stop receiving emails from it, send an email to myrocks-dev...@googlegroups.com.
To post to this group, send email to myroc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/myrocks-dev/31a4cbab-0bf8-4e35-8062-804e7373dec5%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Mark Callaghan
mdca...@gmail.com

Reply all

Reply to author

Forward