Eiger / HPE server

82 views
Skip to first unread message

Daniel Bertran

unread,
Oct 30, 2023, 5:16:37 PM10/30/23
to stressapptest-discuss
Hello,

We are trying to run stressapptest in a 960 cpu system (HPE servers) but we don't see the output of the test once this is completed.

What flag do we need to show the output when the test is finished? 

This is an example.
localhost:~# stressapptest -s 5 

2023/10/30-21:07:49(UTC) Log: Commandline - stressapptest -s 5

2023/10/30-21:07:49(UTC) Stats: SAT revision 1.0.9_autoconf, 64 bit binary

2023/10/30-21:07:49(UTC) Log: reproducible @ reproducible on Tue Jul 28 21:30:57 UTC 2020 from open source release

2023/10/30-21:07:49(UTC) Log: 1 nodes, 960 cpus.

2023/10/30-21:07:49(UTC) Log: Defaulting to 960 copy threads

2023/10/30-21:07:49(UTC) Log: Total 16259838 MB. Free 1791019 MB. Hugepages 0 MB. Targeting 15446654 MB (94%)

2023/10/30-21:07:49(UTC) Log: Prefer plain malloc memory allocation.

2023/10/30-21:07:49(UTC) Log: Using mmap() allocation at 0x712834200000.

2023/10/30-21:07:49(UTC) Stats: Starting SAT, 15446654M, 5 seconds




Usually at the end of the test, the results are shown but not on this case.



Here some info about the system:

localhost:~# uname -a

Linux localhost 5.10.0-yocto-standard-icebreaker #1 SMP Tue Apr 5 23:00:00 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux

localhost:~# 

localhost:~# 

localhost:~# cat /etc/os-release

ID=poky

NAME="Poky (Yocto Project Reference Distro)"

VERSION="4.0.11 (kirkstone)"

VERSION_ID=4.0.11

PRETTY_NAME="Poky (Yocto Project Reference Distro) 4.0.11 (kirkstone)"

DISTRO_CODENAME="kirkstone"



Nick Sanders

unread,
Oct 30, 2023, 5:32:42 PM10/30/23
to stressappt...@googlegroups.com
Can you check /var/log/messages? It's likely stressapptest OOMed or crashed. The system log might have some clue. 

Have you tried running with a smaller amount of memory and cores? 

stressapptest -s 5 -M 1000 -m 10

--

---
You received this message because you are subscribed to the Google Groups "stressapptest-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stressapptest-di...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/stressapptest-discuss/af25a003-7c3c-4131-89b5-6faea515a619n%40googlegroups.com.

Daniel Bertran

unread,
Oct 30, 2023, 6:03:27 PM10/30/23
to stressappt...@googlegroups.com
Thanks. Thats works correctly

localhost:~# stressapptest -s 5 -M 1000 -m 10

2023/10/30-22:01:53(UTC) Log: Commandline - stressapptest -s 5 -M 1000 -m 10

2023/10/30-22:01:53(UTC) Stats: SAT revision 1.0.9_autoconf, 64 bit binary

2023/10/30-22:01:53(UTC) Log: reproducible @ reproducible on Tue Jul 28 21:30:57 UTC 2020 from open source release

2023/10/30-22:01:53(UTC) Log: 1 nodes, 960 cpus.

2023/10/30-22:01:53(UTC) Log: Prefer plain malloc memory allocation.

2023/10/30-22:01:53(UTC) Log: Using mmap() allocation at 0x7fd68b067000.

2023/10/30-22:01:53(UTC) Stats: Starting SAT, 1000M, 5 seconds

2023/10/30-22:01:53(UTC) Log: region number 19518 exceeds region count 1

2023/10/30-22:01:53(UTC) Log: Region mask: 0x1

2023/10/30-22:01:58(UTC) Stats: Found 0 hardware incidents

2023/10/30-22:01:58(UTC) Stats: Completed: 143062.00M in 5.00s 28603.85MB/s, with 0 hardware incidents, 0 errors

2023/10/30-22:01:58(UTC) Stats: Memory Copy: 143062.00M at 28607.57MB/s

2023/10/30-22:01:58(UTC) Stats: File Copy: 0.00M at 0.00MB/s

2023/10/30-22:01:58(UTC) Stats: Net Copy: 0.00M at 0.00MB/s

2023/10/30-22:01:58(UTC) Stats: Data Check: 0.00M at 0.00MB/s

2023/10/30-22:01:58(UTC) Stats: Invert Data: 0.00M at 0.00MB/s

2023/10/30-22:01:58(UTC) Stats: Disk: 0.00M at 0.00MB/s

2023/10/30-22:01:58(UTC) 

2023/10/30-22:01:58(UTC) Status: PASS - please verify no corrected errors

2023/10/30-22:01:58(UTC) 

localhost:~# 

localhost:~# 





--
Thanks,
Daniel

Nick Sanders

unread,
Oct 30, 2023, 6:13:43 PM10/30/23
to stressappt...@googlegroups.com
You may want to try scaling the numbers higher until you see a failure, to see if it's simply OOM, or there is some stressapptest bug on very large systems.

The command to test the full system is:
stressapptest -s 5 -M  15446654  -m 960

It's likely the problem is here, since your system seems to have most of its memory used by something else, preventing it from being tested: 

2023/10/30-21:07:49(UTC) Log: Total 16,259,838 MB. Free 1,791,019 MB. Hugepages 0 MB. Targeting 15446654 MB (94%)



Daniel Bertran

unread,
Oct 31, 2023, 6:12:34 PM10/31/23
to stressapptest-discuss
Thank you!.

The command for 1 hour "stressapptest -s 3600 -M  15446654  -m 960" worked correctly although it took 17 minutes to report the status of the remaining testing time (which is output every 10 seconds).



Nick Sanders

unread,
Oct 31, 2023, 7:03:04 PM10/31/23
to stressappt...@googlegroups.com

On Tue, Oct 31, 2023 at 3:12 PM 'Daniel Bertran' via stressapptest-discuss <stressappt...@googlegroups.com> wrote:
The command for 1 hour "stressapptest -s 3600 -M  15446654  -m 960" worked correctly although it took 17 minutes to report the status of the remaining testing time (which is output every 10 seconds).

Ah, there is a design issue in stressapptest that the initialization of memory uses a fixed (8) number of cores. Normally this isn't an issue but for very large systems it's really slow.

You can change it to 960 here to be faster:

Reply all
Reply to author
Forward
0 new messages