Very poor performance after start using WiredTiger

leo.a...@gmail.com

não lida,

25 de nov. de 2015, 15:28:2725/11/2015

para mongodb-user

Hello,

I have a sharded cluster with 3 replica sets (shardrs1, rs2, shardrs3) and I'm having performance issues after changing the storage engine to wiredTiger.

A few data from the primary replica-member of the primary replica for the shard (it's a m2.4xlarge instance) :

1) Before changes :

EBS Data:

Disk: 150GB with 4500 PIOPS, File System: ext4

New Relic Servers Data:

Disk I/O - Utilization: ~1% with 30min peaks of ~4%
Disk I/O - Rate: ~0.8MB/s with 30min peaks of ~15 MB/s
Disk I/O - operations per second: ~55 with 30min peaks of ~300

Network Usage - Bandwidth 146 MB/s
Network Usage - 4,49 Packets per second

Process - RAM 19GBs

Process - CPU Usage 8,6%

MMS Data:

Opcounter:
MetMore 683
Update 861
Query 895
Command 589

Configuration File:

dbpath=/var/lib/mongodb
logpath=/var/log/mongodb/mongodb.log
logappend=true
port = 27018
replSet = rs2
keyFile = /etc/secret
rest = true

2) After changes:

EBS Data:

Disk: 150GB with 3000 PIOPS, File System: xfs

New Relic Servers Datas (Avg for 5min):

Disk I/O - Utilization 6.5%
Disk I/O - rate 21.8 MB/s
Disk I/O - 518 operations per second

Network Usage - Bandwidth 42,5 MB/s
Network Usage - 1,52 Packets per second

Process - RAM 27GBs

Process - CPU Usage 28,8%

Configuration File:

systemLog:
verbosity: 0
quiet: false
traceAllExceptions: true
path: "/var/log/mongodb/mongodb.log"
logAppend: true
logRotate: "rename"
destination: "file"
component:
accessControl:
verbosity: 5
net:
port: ----
wireObjectCheck: true
security:
keyFile: ----
authorization: "enabled"
storage:
dbPath: "/var/lib/mongodb-wt/"
repairPath: /var/lib/mongodb-wt/repair/
engine: "wiredTiger"
wiredTiger:
engineConfig:
journalCompressor: "none"
collectionConfig:
blockCompressor: "none"
replication:
replSetName: ----
sharding:
clusterRole: "shardsvr"

MMS Data:

Opcounter:
MetMore 223,21
Update 195
Query 247
Command 246

(I replaced a few data with "---" for security reasons).

My application slowed down the second after the changes were finished and this replica-member was reelected as primary due to it's priority.

As the EBS block was new, my first assumption was that it needed a "pre-warm" but the EBS docs now say that "New EBS volumes receive their maximum performance the moment that they are available and do not require initialization (formerly known as pre-warming)".

My others replica sets that run with smaller instances (m1.xlarge) are running with an even higher disk usage:

ubuntu@ip-10-238-134-53:~$ iostat 10

Linux 3.13.0-36-generic (ip-10-238-134-53) 11/25/2015 _x86_64_ (2 CPU)

avg-cpu: %user %nice %system %iowait %steal %idle

4.19 0.01 3.18 7.69 0.13 84.81

Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn

xvdap1 0.23 3.46 1.03 31611457 9431676

xvdb 0.00 0.00 0.00 1573 4

xvdf 0.59 0.01 5.11 105421 46760928

xvdg 289.99 4236.77 1670.93 38753180425 15283732632

xvdh 17.80 553.89 170.50 5066314541 1559582544

avg-cpu: %user %nice %system %iowait %steal %idle

13.41 0.00 29.63 55.27 0.10 1.59

Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn

xvdap1 0.10 0.80 0.00 8 0

xvdb 0.00 0.00 0.00 0 0

xvdf 1.00 0.00 8.80 0 88

xvdg 254.10 3336.80 1645.60 33368 16456

xvdh 1836.40 55466.80 18480.00 554668 184800

WiredTiger is running at /dev/xvdh

Can you help me? I really don't know why this is happening...

Thanks in advance

Rhys Campbell

não lida,

26 de nov. de 2015, 10:21:4426/11/2015

para mongodb-user

Perhaps you're hitting this issue...

https://groups.google.com/forum/#!msg/mongodb-user/diGdooN_2Sw/8vvAy_XP2iAJ

leo.a...@gmail.com

não lida,

27 de nov. de 2015, 11:53:1127/11/2015

para mongodb-user

Hello,

No, I started the disk with XFS:

2) After changes:
EBS Data:
Disk: 150GB with 3000 PIOPS, File System: xfs

Just checking:

$ sudo file -sL /dev/xvdh

/dev/xvdh: SGI XFS filesystem data (blksz 4096, inosz 256, v2 dirs)

So, i think this is not the problem. Any other ideas?

Asya Kamsky

não lida,

27 de nov. de 2015, 18:46:4227/11/2015

para mongodb-user

I'm not exactly clear on what the "after" operation stats are but I'm really surprised by this:

wiredTiger:
engineConfig:
journalCompressor: "none"
collectionConfig:
blockCompressor: "none"

Why did you turn off compression? That requires a lot more IO to go to disk and more importantly won't fit as much of your data files into the file system cache if your entire dataset does not fit in RAM.

Asya

--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/3260b755-9a95-4cc6-aa3e-d09a88f51265%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Asya Kamsky
Lead Product Manager
MongoDB
Download MongoDB - mongodb.org/downloads
Free MongoDB Monitoring - cloud.mongodb.com
Free Online Education - university.mongodb.com
Get Involved - mongodb.org/community
We're Hiring! - https://www.mongodb.com/careers

leo.a...@gmail.com

não lida,

30 de nov. de 2015, 13:55:5030/11/2015

para mongodb-user

Hello Asya,

The "after" stats are the stats collected after the change to WiredTiger.

As MMAPV1 don't uses compression and the CPU usage grew after activating WiredTiger, I've deactivate compression to don't increase even more the CPU usage.

In MMS's dashboards, I dont see lots of Page Faults, so I believe that all my data set fits in the RAM.

Even so, I'll run a few tests to stress this possibility and I come back here with the results.

Thank you

Em quarta-feira, 25 de novembro de 2015 18:28:27 UTC-2, leo.a...@gmail.com escreveu:

leo.a...@gmail.com

não lida,

1 de dez. de 2015, 11:39:1501/12/2015

para mongodb-user

Hi,

So I changed my production environment to wiredTiger again and no success.

Actually, I did see a lower disk usage right after changing the storageEngine, but after a few hours, the reading throughput got huge, the CPU I/O Wait percentage skyrocketed and I changed to MMaPV1 again.

I see also that the metric "writers tickets" goes to zero exactly after changing the storage engine at the primary shard. All the others shards have 100+ tickets available.

Digging into logs, I see that it's throwing "write conflict" errors like this:

2015-11-30T23:05:43.948+0000 I WRITE [conn1181] update mydb.users query: { _id: ObjectId('517551f17ae3cf912500xxxx') } update: { $inc: { account.field: -yy } } nscanned:1 nscannedObjects:1 nMatched:1 nModified:1 keyUpdates:0 writeConflicts:2 numYields:1 locks:{ Global: { acquireCount: { r: 3, w: 3 } }, Database: { acquireCount: { w: 3 } }, Collection: { acquireCount: { w: 2 } }, oplog: { acquireCount: { w: 1 } } } 7458ms

2015-11-30T23:05:43.948+0000 I COMMAND [conn1181] command mydb.$cmd command: update { update: "users", updates: [ { q: { _id: ObjectId('517551f17ae3cf912500xxxx') }, u: { $inc: { account.field: -yy } }, multi: false, upsert: false } ], writeConcern: { w: 1 }, ordered: true, metadata: { shardName: "rs2", shardVersion: [ Timestamp 0|0, ObjectId('000000000000000000000000') ], session: 0 } } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:155 locks:{ Global: { acquireCount: { r: 3, w: 3 } }, Database: { acquireCount: { w: 3 } }, Collection: { acquireCount: { w: 2 } }, oplog: { acquireCount: { w: 1 } } } 7458ms

2015-11-30T23:05:43.953+0000 W - [conn1281] DBException thrown :: caused by :: 112 WriteConflict

2015-11-30T23:05:43.957+0000 I - [conn1142]

2015-11-30T23:05:43.957+0000 W - [conn1312] DBException thrown :: caused by :: 112 WriteConflict

2015-11-30T23:05:43.962+0000 W - [conn1358] DBException thrown :: caused by :: 112 WriteConflict

2015-11-30T23:05:43.968+0000 I - [conn1313]

2015-11-30T23:05:43.972+0000 W - [conn1301] DBException thrown :: caused by :: 112 WriteConflict

2015-11-30T23:05:43.978+0000 W - [conn1058] DBException thrown :: caused by :: 112 WriteConflict

2015-11-30T23:05:43.984+0000 W - [conn1484] DBException thrown :: caused by :: 112 WriteConflict

At this collections "users", a few users are more popular and, therefore, their documents are much more accessed then others. But, if MMaPV1 has collection-level locking, why document-level locking is slower?

Thank you :D

Em quarta-feira, 25 de novembro de 2015 18:28:27 UTC-2, leo.a...@gmail.com escreveu:

Asya Kamsky

não lida,

1 de dez. de 2015, 16:41:1001/12/2015

para mongod...@googlegroups.com

What is the exact version here? WriteConflict exception is an internal thing that should *not* be thrown back to the application. You may be encountering a bug that we would want to get fixed ASAP. If you are not on the latest 3.0.7 then please upgrade before any further testing!

--

You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.

To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/7025bdcc-a43e-45f8-b22d-3854f1541d73%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

leo.a...@gmail.com

não lida,

2 de dez. de 2015, 11:27:5502/12/2015

para mongodb-user

Hi,

My version is 3.0.7.

How should I proceed?

Thanks in advance

Em quarta-feira, 25 de novembro de 2015 18:28:27 UTC-2, leo.a...@gmail.com escreveu:

Responder a todos

Responder ao autor

Encaminhar