We have run into an issue with XFS’s FITRIM ioctl implementation (see: https://github.com/torvalds/linux/blob/master/fs/xfs/xfs_discard.c#L155) (used by the fstrim command (see:https://github.com/karelzak/util-linux/blob/master/sys-utils/fstrim.c#L87)) when running against local SSDs that is severely impacting IO in general and MongoDB specifically.
Essentially, XFS is iterating over every allocation group and issuing TRIM s for all free extents every time this ioctl is called. This, coupled with the facts that Linux’s interface to the TRIM command is both synchronous and does not support a vectorized list of ranges (see: https://github.com/torvalds/linux/blob/3fc9d690936fb2e20e180710965ba2cc3a0881f8/block/blk-lib.c#L112), is leading to a large number of extraneous TRIM commands (each of which have been observed to be slow, see: http://oss.sgi.com/archives/xfs/2011-12/msg00311.html) being issued to the disk for ranges that both the filesystem and the disk know to be free. In practice, we have seen IO disruptions of up to 2 minutes. I realize that the duration of these disruptions may be controller dependent. Unfortunately, when running on a platform like AWS, one does not have the luxury of choosing specific hardware.
EXT4, on the other hand, tracks blocks that have been deleted since the previous FITRIM ioctl and targets subsequent TRIM s to the appropriate block ranges (see: http://blog.taz.net.au/2012/01/07/fstrim-and-xfs/). In real-world tests this significantly reduces the impact of fstrim to the point that it is un-noticeable to the database / application. We are currently switching back to EXT4 as a result.
Alternatively, we could mount the filesystem with the discard option (as AWS suggests here: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ssd-instance-store.html), however, our confidence in this performing better is not high given XFS developer comments on the subject (see: http://oss.sgi.com/archives/xfs/2014-08/msg00465.html):
It was introduced into XFS as a checkbox feature. We resisted as
long as we could, but too many people were shouting at us that we
needed realtime discard because ext4 and btrfs had it. Of course,
all those people shouting for it realised that we were right in that
it sucked the moment they tried to use it and found that performance
was woeful. Not to mention that SSD trim implementations were so bad
that they caused random data corruption by trimming the wrong
regions, drives would simply hang randomly and in a couple of cases
too many trims too fast would brick them...So, yeah, it was implement because lots of people demanded it, not
because it was a good idea.
I am aware that MongoDB strongly recommends using XFS (see: https://docs.mongodb.com/manual/administration/production-notes/#kernel-and-file-systems) and that this is because EXT4 journaling could impact Wired Tiger checkpointing under heavy write load (https://groups.google.com/forum/#!msg/mongodb-user/diGdooN_2Sw/4H7t5JTDcpAJ). Can anybody elaborate on this? Is this the only concern that drove the strong recommendation to go with XFS and, in MongoDB’s opinion, is this still valid given the performance issues with TRIM on Linux when running XFS on SSDs? We are currently running the MMAPv1 storage engine on MongoDB 2.6 and, as mentioned above, we have reverted to EXT4 without apparent consequence. Any more info would really help us in weighing the pros and cons while we work toward Wired Tiger.
Also, any more general recommendations for mitigating the disruption incurred by running fstrim would be more than welcome.
Hi Greg,
I am aware that MongoDB strongly recommends using XFS (see: https://docs.mongodb.com/manual/administration/production-notes/#kernel-and-file-systems) and that this is because EXT4 journaling could impact Wired Tiger checkpointing under heavy write load (https://groups.google.com/forum/#!msg/mongodb-user/diGdooN_2Sw/4H7t5JTDcpAJ). Can anybody elaborate on this?
The recommendation to use XFS relates to our investigation on SERVER-18314 and similar performance issues reported with EXT4 (periodic stalls during WiredTiger checkpoints). However, as noted on SERVER-26131: if you have tested your workload and server configuration with WiredTiger on EXT4 and see better results than XFS you may choose to deploy differently.
We are currently running the MMAPv1 storage engine on MongoDB 2.6 and, as mentioned above, we have reverted to EXT4 without apparent consequence.
For MMAPv1 we currently recommend either EXT4 or XFS. As per the MongoDB production notes, XFS generally performs better with MongoDB (including MMAPv1).
The production notes include recommendations based on aggregate user experience, but factors such as workload and server resources may result in a different outcome for your deployment.
The considerations and stalls you’ve seen as a result of filesystem TRIM support are definitely interesting, but we haven’t had any other reports to correlate this with yet. Can you share some more information on your AWS instance types and storage configuration?
Many thanks,Hi Sicabol,
The MongoDB Production Notes state that generally XFS is the preferred filesystem if you are using the WiredTiger storage engine.
In relation to your other partition that is not hosting the WiredTiger files you may use whichever filesystem you like.
Regards,
John Murphy
Hi John,
perfect, thanks !
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to a topic in the Google Groups "mongodb-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mongodb-user/Mj0x6m-02Ms/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/8152c9cc-8d07-40c7-b1b6-c88b12ef05af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+unsubscribe@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/052a3320-c23e-4217-adff-83310141e4f6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.