Resource difference between m4.4xlarge and r4.4xlarge instances on EC2

135 views
Skip to first unread message

bastian.s...@instana.com

unread,
Jan 17, 2018, 10:03:43 AM1/17/18
to Nomad
Hi everyone,

I realized that on a m4.4xlarge instance, Nomad detects 36800 MHz of available CPU resources, while on a r4.4xlarge it detects 48000 Mhz of available CPU resources. Both instance types have the same type and amount of CPUs, so I wonder where this substential difference is coming from?

Cheers,
Bastian

Michael Schurter

unread,
Jan 17, 2018, 5:06:11 PM1/17/18
to bastian.s...@instana.com, Nomad
Hi Bastian,

That is strange! We use the ... library for reading CPU speed and it reads /sys/devices/system/cpu/cpuN/cpufreq/cpuinfo_max_freq or falls back to cpu MHz from /proc/cpuinfo if the sysfs endpoint isn't available.

Can you paste the output of /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq from a machine of each instance?

/proc/cpuinfo's reported speed varies depending on the CPU's current power level, so it's not a reliable source. (see https://github.com/hashicorp/nomad/issues/1392 for that resolved issue)

As a workaround you can manually set your cpu_total_compute on each client, but I realize that's not ideal long term: https://www.nomadproject.io/docs/agent/configuration/client.html#cpu_total_compute

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
 
GitHub Issues: https://github.com/hashicorp/nomad/issues
IRC: #nomad-tool on Freenode
---
You received this message because you are subscribed to the Google Groups "Nomad" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nomad-tool+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nomad-tool/3443ecc6-9e64-4f4a-9425-b3f2e10e5bb7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

bastian.s...@instana.com

unread,
Jan 25, 2018, 11:23:59 AM1/25/18
to Nomad
Hi Michael

thanks for the answer. Turns out that on the m4 instance, the CPU speed does not show via sysfs and as you said, the CPU speed in /proc/cpuinfo shows different values, even though the model is the same. Below are the infos for both instances.
I think i will just adjust my resource values for now, as I do not really want to hardcode/overwrite values per node.

r4.4xlarge:

cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq
3000000

cat /proc/cpuinfo | grep "cpu MHz" | head -n 1
cpu
MHz : 2699.894

cat /proc/cpuinfo | grep "model name" | head -n 1
model name
: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz


m4
.4xlarge

cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq
cat
: /sys/devices/system/cpu/cpu1/cpufreq/cpuinfo_max_freq: No such file or directory

cat /proc/cpuinfo | grep "cpu MHz" | head -n 1
cpu
MHz : 2300.066

cat /proc/cpuinfo | grep "model name" | head -n 1
model name : Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz




On Wednesday, January 17, 2018 at 11:06:11 PM UTC+1, Michael Schurter wrote:
Hi Bastian,

That is strange! We use the ... library for reading CPU speed and it reads /sys/devices/system/cpu/cpuN/cpufreq/cpuinfo_max_freq or falls back to cpu MHz from /proc/cpuinfo if the sysfs endpoint isn't available.

Can you paste the output of /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq from a machine of each instance?

/proc/cpuinfo's reported speed varies depending on the CPU's current power level, so it's not a reliable source. (see https://github.com/hashicorp/nomad/issues/1392 for that resolved issue)

As a workaround you can manually set your cpu_total_compute on each client, but I realize that's not ideal long term: https://www.nomadproject.io/docs/agent/configuration/client.html#cpu_total_compute
On Wed, Jan 17, 2018 at 7:03 AM, <bastian.s...@instana.com> wrote:
Hi everyone,

I realized that on a m4.4xlarge instance, Nomad detects 36800 MHz of available CPU resources, while on a r4.4xlarge it detects 48000 Mhz of available CPU resources. Both instance types have the same type and amount of CPUs, so I wonder where this substential difference is coming from?

Cheers,
Bastian

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
 
GitHub Issues: https://github.com/hashicorp/nomad/issues
IRC: #nomad-tool on Freenode
---
You received this message because you are subscribed to the Google Groups "Nomad" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nomad-tool+...@googlegroups.com.

Michael Schurter

unread,
Jan 31, 2018, 8:13:37 PM1/31/18
to bastian.s...@instana.com, Nomad
That is very strange. Are both instances using Amazon Linux? Can you provide the output of "uname -a"? I can't imagine why sysfs would differ between those systems.

To unsubscribe from this group and stop receiving emails from it, send an email to nomad-tool+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nomad-tool/ff122b68-7087-4240-a21d-a66c31c25ed0%40googlegroups.com.

bastian.s...@instana.com

unread,
Feb 1, 2018, 4:57:41 AM2/1/18
to Nomad
In both cases cases it is a custom AMI, but based on the Ubuntu 16 AMI.

Here is the output for both instances:

r4.4xlarge:

Linux worker-28 4.4.0-79-generic #100-Ubuntu SMP Wed May 17 19:58:14 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

m4.4xlarge:

Linux worker-32 4.4.0-1044-aws #53-Ubuntu SMP Mon Dec 11 13:49:57 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Michael Schurter

unread,
Feb 1, 2018, 1:15:10 PM2/1/18
to bastian.s...@instana.com, Nomad
Interesting the m4 has the linux-aws variant kernel. It's Ubnutu's "high performance kernel for EC2."

Would you mind filing a bug with them to see if it's an upstream problem that can be fixed? https://bugs.launchpad.net/ubuntu/+source/linux-aws

I can't really think of a legitimate reason for them omitting /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq, especially in a kernel that's only supposed to have optimizations for EC2.

To unsubscribe from this group and stop receiving emails from it, send an email to nomad-tool+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nomad-tool/c4be0cbe-38b2-4f5f-be0b-18fdee7889eb%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages