Slowdown with spectre and meltdown patches?

353 views
Skip to first unread message

Bernd Melchers

unread,
Jan 8, 2018, 10:58:58 AM1/8/18
to fhgfs-user
Hi all,
beegfs management and meta data daemons heavily uses system calls.
This raises concerns about beegfs performance with the kernel patches
for the latest security bugs of recent cpus.
Does anyone have numbers? With or without rdma and infiniband usage?
I observe that sys time increases considerably for "du" commands
at the client side for patched clients - but our system is in production
state and performance numbers therefore not easy to reproduce (beegfs-6.17
and Scientific Linux-6.9 with kernel
2.6.32-696.16.1.el6.x86_64 compared with 2.6.32-696.18.7.el6.x86_64).
And i don't trust to patch the server side in fear of performance
regression.

kindly regards
Bernd Melchers

--

Christian Goll

unread,
Jan 9, 2018, 7:40:02 AM1/9/18
to fhgfs...@googlegroups.com
Hello Bernd,
both bugs would allow local users to access memory which they are not
allowed to access. But as it good practice, users should not be able to
log in to the meta and storage servers, because they could slow down
the performance of the whole file system, by running jobs on these
servers.
So personally I would not install the patches on the storage servers,
if one could assure that no one could access these servers.

kind regards,
Christian

Bernd Melchers

unread,
Jan 9, 2018, 9:00:29 AM1/9/18
to fhgfs...@googlegroups.com
> both bugs would allow local users to access memory which they are not
> allowed to access. But as it good practice, users should not be able to
> log in to the meta and storage servers, because they could slow down
> the performance of the whole file system, by running jobs on these
> servers.
> So personally I would not install the patches on the storage servers,
> if one could assure that no one could access these servers.

You are right, but all future kernels will contain these patches.
The one for the page table isolation can be disabled with the "nopti"
or "pti=off" kernel parameter (the other with "nospec" for SLES kernels).
We should know if this is needed.

Bernd

--
Archiv- und Backup-Service | fab-s...@zedat.fu-berlin.de
Freie Universität Berlin | Tel. +49-30-838-55905

Sven Breuner

unread,
Jan 9, 2018, 9:52:58 AM1/9/18
to fhgfs...@googlegroups.com, Bernd Melchers, Christi...@web.de

Hi Bernd and Christian,

I agree with Christian here regarding the fact that PTI does not need to be enabled on typical servers, where normal users are not allowed to run applications (at least until someone discovers a ssh security problem ;-) ).

Does anyone have numbers? With or without rdma and infiniband usage?

I don't have numbers, but unfortunately the "bad news" is that RDMA userspace applications also use syscalls to access "/dev/infiniband", so this will also impact e.g. MPI applications on the compute nodes. However, the good news is that this does not apply to the beegfs-client accessing the InfiniBand (or other network) interface, as this happens inside the kernel without syscalls. But like you already noticed, accessing the beegfs-client (or any other file system) with a stat() operation for "du" or similar is of course a syscall that will now has significantly more overhead.

I would also like to hear more details or examples with numbers, if someone happens to have them. On the other hand, like already said, it doesn't seem like there is much of a choice on the compute node side, unfortunately.

Best regards

Sven


Bernd Melchers wrote on 09.01.2018 15:00:
both bugs would allow local users to access memory which they are not
allowed to access. But as it good practice, users should not be able to
log in to the meta and storage servers, because they could slow down
the performance of the whole file system, by running jobs on these
servers.
So personally I would not install the patches on the storage servers,
if one could assure that no one could access these servers.
You are right, but all future kernels will contain these patches.
The one for the page table isolation can be disabled with the "nopti"
or "pti=off" kernel parameter (the other with "nospec" for SLES kernels).
We should know if this is needed.

Bernd


--
Sven Breuner
CEO
ThinkParQ GmbH
p: +49 631 277576302  m: +49 159 04007047
a: Trippstadter Str. 110, 67663 Kaiserslautern, Germany
w: www.thinkparq.com  e: sven.b...@thinkparq.com
  
 
CEO (Geschäftsführer): Sven Breuner / COB (Beiratsvorsitzender): Dr. Franz-Josef Pfreundt / Registered (Registergericht): Amtsgericht Kaiserslautern HRB 31565 / VAT ID (USt. ID): DE 292001792

Sven Breuner

unread,
Jan 10, 2018, 9:21:59 AM1/10/18
to fhgfs...@googlegroups.com, Bernd Melchers, Christi...@web.de
Hi,

our partner Megware sent me the following link and I thought some people might
find it interesting to learn how to disable the unnecessary security mechanisms
on servers:
https://access.redhat.com/articles/3311301

(Knowing that Christian is with SUSE, I guess there is a similar document from
SUSE as well or you maybe you could just confirm that the same info applies to
SUSE-based systems?)

Best regards
Sven Breuner
ThinkParQ

Christian Goll

unread,
Jan 11, 2018, 8:05:12 AM1/11/18
to fhgfs...@googlegroups.com, Christi...@web.de
On Di, 2018-01-09 at 15:52 +0100, Sven Breuner wrote:
> Hi Bernd and Christian,
> I agree with Christian here regarding the fact that PTI does not need
> to be enabled on typical servers, where normal users are not allowed
> to run applications (at least until someone discovers a ssh security
> problem ;-) ).
> > Does anyone have numbers? With or without rdma and infiniband
> > usage?
> I don't have numbers, but unfortunately the "bad news" is that RDMA
> userspace applications also use syscalls to access "/dev/infiniband",
> so this will also impact e.g. MPI applications on the compute nodes.
> However, the good news is that this does not apply to the beegfs-
> client accessing the InfiniBand (or other network) interface, as this
> happens inside the kernel without syscalls. But like you already
> noticed, accessing the beegfs-client (or any other file system) with
> a stat() operation for "du" or similar is of course a syscall that
> will now has significantly more overhead.
> I would also like to hear more details or examples with numbers, if
> someone happens to have them. On the other hand, like already said,
> it doesn't seem like there is much of a choice on the compute node
> side, unfortunately.
> Best regards
> Sven
>
The official information about the SUSE systems is collected under
https://www.suse.com/de-de/support/kb/doc/?id=7022512

kind regards,
Christian
> -- 
> You received this message because you are subscribed to the Google
> Groups "beegfs-user" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to fhgfs-user+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Guido Laubender

unread,
Jan 15, 2018, 9:44:07 AM1/15/18
to fhgfs...@googlegroups.com

Hi @ all,

On Tue, 9 Jan 2018, Sven Breuner wrote:

> I would also like to hear more details or examples with numbers, if someone
> happens to have them.

Here are some numbers to get a feeling for the impact of the recent
Spectre and Meltdown patches.

Numbers are measured on a storage test system with 5 storage servers and 8
clients connected by FDR InfiniBand and running CentOS 7.4, with the
patches enabled versus disabled on the clients (pti_enabled=0,
ibpb_enabled=0, ibrs_enabled=0 in /sys/kernel/debug/x86/).

On the servers the patches were disabled for all runs.

Single Stream (dd)

1 storage target:

write 1.0 GB/s vs. 1.6 GB/s
read 1.5 GB/s vs. 1.5 GB/s

5 storage targets:

write 1.4 GB/s vs. 4.0 GB/s
read 3.6 GB/s vs. 5.0 GB/s


IOR Multi-Stream:

write 7051.17 MiB/s vs. 6995.34 MiB/s
read 6068.97 MiB/s vs. 6052.33 MiB/s


IOR Shared File:

write 6701.14 MiB/s vs. 6848.43 MiB/s
read 6184.53 MiB/s vs. 6483.18 MiB/s


mdtest: no impact


GCC parallel build run:

walltime 29m13.775s vs. 25m10.220s


Application Turbomole (parallel ricc2):

Small size (7.7 GiB of scratch files written):

total wall-time : 3 minutes and 31 seconds vs. 3 minutes and 29 seconds

Medium size (79 GiB of scratch files written):

total wall-time : 14 minutes and 46 seconds vs. 14 minutes and 27 seconds


So it looks like that there is a performance drop within the synthetic
benchmarks, but, at least in my case, the impact on the run time of an IO
intensive application is suprisingly small.

I can provide more details or numbers for the case when the patches are
also enabled on the servers, if someone is interested.

(or even numbers for other applications, if you provide them to me ;-) )

Guido


Oliver Freyermuth

unread,
Jan 15, 2018, 10:46:34 AM1/15/18
to fhgfs...@googlegroups.com, Guido Laubender
Am 15.01.2018 um 15:44 schrieb Guido Laubender:
> So it looks like that there is a performance drop within the synthetic benchmarks, but, at least in my case, the impact on the run time of an IO intensive application is suprisingly small.
>
> I can provide more details or numbers for the case when the patches are also enabled on the servers, if someone is interested.
>
> (or even numbers for other applications, if you provide them to me ;-) )
Hi,

thanks a lot for these numbers!
We are still in the setup phase for our BeeGFS system, and right now are trying to find optimal striping settings for our mostly ROOT-based applications.
Spectre and Meltdown have kind of stumbled into our testing phase.

It would be of great interest to see the impact of the patches on the server-side, I would expect these to be significantly more heavy,
since syscalls are used extensively.
This would also be of crucial interest for usage of BeeOND, I guess.

Thanks,
Oliver

>
> Guido
>
>

Guido Laubender

unread,
Jan 15, 2018, 12:17:03 PM1/15/18
to 'Oliver Freyermuth' via beegfs-user


On Mon, 15 Jan 2018, 'Oliver Freyermuth' via beegfs-user wrote:

> Am 15.01.2018 um 15:44 schrieb Guido Laubender:
>> So it looks like that there is a performance drop within the synthetic benchmarks, but, at least in my case, the impact on the run time of an IO intensive application is suprisingly small.
>>
>> I can provide more details or numbers for the case when the patches are also enabled on the servers, if someone is interested.

> [...]
>
> It would be of great interest to see the impact of the patches on the server-side, I would expect these to be significantly more heavy, since syscalls are used extensively.

Here, they are (order is <patches on clients and servers enabled> vs.
<patches only on clients enabled> vs. <no patches enabled on clients and
servers>):

First of all, the impact on the BeeGFS internal storage benchmark is
huge for the write case:

write:

Avg throughput (per server): 1113772 KiB/s vs. 1863688 KiB/s

read:

Avg throughput (per server): 1385964 KiB/s vs. 1440912 KiB/s


Single Stream (dd):

5 storage targets:

write 1.2 GB/s vs. 1.4 GB/s vs. 4.0 GB/s
read 3.1 GB/s vs. 3.6 GB/s vs. 5.0 GB/s


IOR Multi-Stream:

write 6995.34 MiB/s vs. 7051.17 MiB/s vs. 6995.34 MiB/s
read 5993.13 MiB/s vs. 6068.97 MiB/s vs. 6052.33 MiB/s


IOR Shared File:

write 3543.73 MiB/s vs. 6701.14 MiB/s vs. 6848.43 MiB/s
read 6312.25 MiB/s vs. 6184.53 MiB/s vs. 6483.18 MiB/s


mdtest:

File creation: 82029.645 vs. 128759.852
File stat: 408782.896 vs. 635151.214


GCC parallel build run:

walltime 30m32.402s vs. 29m13.775s vs. 25m10.220s


Application Turbomole (parallel ricc2):

Small size (7.7 GiB of scratch files written):

total wall-time : 3 minutes and 34 seconds vs. 3 minutes and 31 seconds vs. 3 minutes and 29 seconds

Medium size (79 GiB of scratch files written):

total wall-time : 14 minutes and 46 seconds vs. 14 minutes and 46 seconds vs. 14 minutes and 27 seconds


I would like to add that these numbers should be handled with caution, as
I only changed the runtime parameters to enable or disable the recent
Meltdown and Spectre patches in the different measurements.

It might be that with different tuning settings the pti_enabled kernels
might perform better.

Cheers,
Guido

Nathan R.M. Crawford

unread,
Jan 15, 2018, 1:33:54 PM1/15/18
to fhgfs...@googlegroups.com
Hi Guido,

  Can you give some specifics on the single-stream test with dd? I'm most interested in the effect of block size (expect the speed to increase as bs increases).

  Also, the ricc2 times will make certain research groups at UCI quite happy!

Thanks,
Nate


Guido

--
You received this message because you are subscribed to the Google Groups "beegfs-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fhgfs-user+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Dr. Nathan Crawford              nathan....@uci.edu
Modeling Facility Director
Department of Chemistry
1102 Natural Sciences II         Office: 2101 Natural Sciences II
University of California, Irvine  Phone: 949-824-4508
Irvine, CA 92697-2025, USA

Christian Goll

unread,
Jan 16, 2018, 2:57:57 AM1/16/18
to fhgfs...@googlegroups.com
Hello Guido,
great work.
Can you also share the RAM size of the servers and the file sizes for
IOR and the dd tests?

kind regards,
Christian

Steffen Grunewald

unread,
Jan 16, 2018, 3:05:59 AM1/16/18
to fhgfs...@googlegroups.com
Thanks for the update.
There are some details though that make me scratch my head:

On Mon, 2018-01-15 at 18:16:55 +0100, Guido Laubender wrote:
>
>
> On Mon, 15 Jan 2018, 'Oliver Freyermuth' via beegfs-user wrote:
>
> > Am 15.01.2018 um 15:44 schrieb Guido Laubender:
> > > So it looks like that there is a performance drop within the synthetic benchmarks, but, at least in my case, the impact on the run time of an IO intensive application is suprisingly small.
> > >
> > > I can provide more details or numbers for the case when the patches are also enabled on the servers, if someone is interested.
>
> > [...]
> >
> > It would be of great interest to see the impact of the patches on the server-side, I would expect these to be significantly more heavy, since syscalls are used extensively.
>
> Here, they are (order is <patches on clients and servers enabled> vs.
> <patches only on clients enabled> vs. <no patches enabled on clients and
> servers>):
>
> First of all, the impact on the BeeGFS internal storage benchmark is
> huge for the write case:
>
> write:
>
> Avg throughput (per server): 1113772 KiB/s vs. 1863688 KiB/s

Minus 40 percent. That's really bad.
This is (servers and clients patched) vs (clients only patched), I presume?

> read:
>
> Avg throughput (per server): 1385964 KiB/s vs. 1440912 KiB/s

Read-ahead kicking in, compensating for higher syscall latency?

> Single Stream (dd):
>
> 5 storage targets:
>
> write 1.2 GB/s vs. 1.4 GB/s vs. 4.0 GB/s
> read 3.1 GB/s vs. 3.6 GB/s vs. 5.0 GB/s

Looks quite bad, and doesn't really match the single-server figures?

> IOR Multi-Stream:
>
> write 6995.34 MiB/s vs. 7051.17 MiB/s vs. 6995.34 MiB/s

I'm surprised to see identical figures in the first and third column here,
without any notable slowdown.

> read 5993.13 MiB/s vs. 6068.97 MiB/s vs. 6052.33 MiB/s

What about the precision of those measurements? A patched client is not
supposed to be faster than an unpatched one, is it?

> IOR Shared File:
>
> write 3543.73 MiB/s vs. 6701.14 MiB/s vs. 6848.43 MiB/s
> read 6312.25 MiB/s vs. 6184.53 MiB/s vs. 6483.18 MiB/s

This looks like a patched server is faster than an unpatched one.
Again, what are the error bars?

> mdtest:
>
> File creation: 82029.645 vs. 128759.852
> File stat: 408782.896 vs. 635151.214

Minus 40 percent, as before. As metadata ops seem to dominate the
overall slowdown, one is tempted to advise to keep the MDS unpatched...

> GCC parallel build run:
>
> walltime 30m32.402s vs. 29m13.775s vs. 25m10.220s

21 percent longer, or about 18 percent less throughput. I'd have expected
a bigger difference. (Patched server side effects are small, btw.)

> Application Turbomole (parallel ricc2):
>
> Small size (7.7 GiB of scratch files written):

"small" in terms of the app, not the files involved?

> total wall-time : 3 minutes and 34 seconds vs. 3 minutes and 31 seconds vs. 3 minutes and 29 seconds
>
> Medium size (79 GiB of scratch files written):
>
> total wall-time : 14 minutes and 46 seconds vs. 14 minutes and 46 seconds vs. 14 minutes and 27 seconds

In other words, the difference is invisible...

> I would like to add that these numbers should be handled with caution, as I
> only changed the runtime parameters to enable or disable the recent Meltdown
> and Spectre patches in the different measurements.
>
> It might be that with different tuning settings the pti_enabled kernels
> might perform better.
>
> Cheers,
> Guido

Cheers,
Steffen

Guido Laubender

unread,
Jan 16, 2018, 11:44:25 AM1/16/18
to fhgfs...@googlegroups.com

Hi Steffen,

On Tue, 16 Jan 2018, Steffen Grunewald wrote:

> On Mon, 2018-01-15 at 18:16:55 +0100, Guido Laubender wrote:
>>
>> First of all, the impact on the BeeGFS internal storage benchmark is
>> huge for the write case:
>>
>> write:
>>
>> Avg throughput (per server): 1113772 KiB/s vs. 1863688 KiB/s
>
> Minus 40 percent. That's really bad.
> This is (servers and clients patched) vs (clients only patched), I presume?

It is servers patched versus unpatched. The clients are not involved in
this benchmark.


>> Single Stream (dd):
>>
>> 5 storage targets:
>>
>> write 1.2 GB/s vs. 1.4 GB/s vs. 4.0 GB/s
>> read 3.1 GB/s vs. 3.6 GB/s vs. 5.0 GB/s
>
> Looks quite bad, and doesn't really match the single-server figures?

Yes, this really hurts.


>> IOR Multi-Stream:
>>
>> write 6995.34 MiB/s vs. 7051.17 MiB/s vs. 6995.34 MiB/s
>
> I'm surprised to see identical figures in the first and third column here,
> without any notable slowdown.
>
>> read 5993.13 MiB/s vs. 6068.97 MiB/s vs. 6052.33 MiB/s
>
> What about the precision of those measurements? A patched client is not
> supposed to be faster than an unpatched one, is it?
>
>> IOR Shared File:
>>
>> write 3543.73 MiB/s vs. 6701.14 MiB/s vs. 6848.43 MiB/s
>> read 6312.25 MiB/s vs. 6184.53 MiB/s vs. 6483.18 MiB/s
>
> This looks like a patched server is faster than an unpatched one.
> Again, what are the error bars?

It looks like there is only an impact on the write performance for the
shared file case. The other differences are negligibly small and within
the error bars.

Cheers,
Guido

Guido Laubender

unread,
Jan 16, 2018, 11:50:38 AM1/16/18
to fhgfs...@googlegroups.com

Hi Christian,

On Tue, 16 Jan 2018, Christian Goll wrote:

> Can you also share the RAM size of the servers and the file sizes for
> IOR and the dd tests?

RAM size is 64 GB.

dd file sizes are 3 times the RAM size of the involved servers, so 192 GiB
for the single target case (1 server) and 960 GiB for the five target case
(5 servers).

IOR aggregate file sizes are 975 GiB for the FPP case and 1000 GiB for the
shared-file case.

Cheers,
Guido

Guido Laubender

unread,
Jan 16, 2018, 11:54:09 AM1/16/18
to fhgfs...@googlegroups.com

Hi Nathan,

On Mon, 15 Jan 2018, Nathan R.M. Crawford wrote:

> Can you give some specifics on the single-stream test with dd? I'm most
> interested in the effect of block size (expect the speed to increase as bs
> increases).

dd single-target tests were done with bs=1024k and a BeeGFS chunksize of
1M.

dd five-target run was done with bs=5120k and a chunksize of 1M.

Cheers,
Guido
Reply all
Reply to author
Forward
0 new messages