Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Large machine test ideas

12 views
Skip to first unread message

Ivan Voras

unread,
Aug 26, 2011, 1:36:37 PM8/26/11
to
I'll have a 8x8x2 (128 logical CPUs) machine to test for an afternoon
next week and I'm just wondering if any of you have something they want
tested. The opportunities are limited: it would have to be a
self-contained test (no network, drives, etc.) and fairly short.

Of course, I'll do some of my own tests just to get a feel of the machine.

I think that I'll need a 9-CURRENT snapshot on it to run all 128 CPUs,
right?

_______________________________________________
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hacke...@freebsd.org"

Garrett Cooper

unread,
Aug 26, 2011, 1:44:36 PM8/26/11
to
On Fri, Aug 26, 2011 at 10:36 AM, Ivan Voras <ivo...@freebsd.org> wrote:

...

> I think that I'll need a 9-CURRENT snapshot on it to run all 128 CPUs,
> right?

A 9.0-BETA1 snapshot, yes.
-Garrett

Ivan Voras

unread,
Aug 29, 2011, 10:46:29 AM8/29/11
to
On 26/08/2011 19:44, Garrett Cooper wrote:
> On Fri, Aug 26, 2011 at 10:36 AM, Ivan Voras <ivo...@freebsd.org> wrote:
>
> ...
>
>> I think that I'll need a 9-CURRENT snapshot on it to run all 128 CPUs,
>> right?
>
> A 9.0-BETA1 snapshot, yes.

Well, I'll leave it another half an hour but the 9.9-beta1 shapshot
froze on boot after showing a "SRAT: No CPU found for memory domain 4".

(all this after the traditional "do-nothing" pause of 10-or so minutes
before displaying the copyright banner).

Ivan Voras

unread,
Aug 29, 2011, 11:11:21 AM8/29/11
to
On 29/08/2011 16:46, Ivan Voras wrote:
> On 26/08/2011 19:44, Garrett Cooper wrote:
>> On Fri, Aug 26, 2011 at 10:36 AM, Ivan Voras <ivo...@freebsd.org> wrote:
>>
>> ...
>>
>>> I think that I'll need a 9-CURRENT snapshot on it to run all 128 CPUs,
>>> right?
>>
>> A 9.0-BETA1 snapshot, yes.
>
> Well, I'll leave it another half an hour but the 9.9-beta1 shapshot
> froze on boot after showing a "SRAT: No CPU found for memory domain 4".
>
> (all this after the traditional "do-nothing" pause of 10-or so minutes
> before displaying the copyright banner).

No luck - it's frozen. Linux and Windows Server work fine.

Andriy Gapon

unread,
Aug 29, 2011, 11:15:44 AM8/29/11
to
on 29/08/2011 17:46 Ivan Voras said the following:

> On 26/08/2011 19:44, Garrett Cooper wrote:
>> On Fri, Aug 26, 2011 at 10:36 AM, Ivan Voras <ivo...@freebsd.org> wrote:
>>
>> ...
>>
>>> I think that I'll need a 9-CURRENT snapshot on it to run all 128 CPUs,
>>> right?
>>
>> A 9.0-BETA1 snapshot, yes.
>
> Well, I'll leave it another half an hour but the 9.9-beta1 shapshot
> froze on boot after showing a "SRAT: No CPU found for memory domain 4".
>
> (all this after the traditional "do-nothing" pause of 10-or so minutes
> before displaying the copyright banner).

Not sure if hw.memtest.tests tunable has made it into 9.0-BETA1.
Setting it to zero should result in skipping the checks.

You may also try to capture and share a verbose dmesg, if possible.

--
Andriy Gapon

Andriy Gapon

unread,
Aug 29, 2011, 11:20:48 AM8/29/11
to
on 29/08/2011 18:18 Ivan Voras said the following:

> On 29 August 2011 17:15, Andriy Gapon <a...@freebsd.org> wrote:
>> on 29/08/2011 17:46 Ivan Voras said the following:
>>> On 26/08/2011 19:44, Garrett Cooper wrote:
>>>> On Fri, Aug 26, 2011 at 10:36 AM, Ivan Voras <ivo...@freebsd.org> wrote:
>>>>
>>>> ...
>>>>
>>>>> I think that I'll need a 9-CURRENT snapshot on it to run all 128 CPUs,
>>>>> right?
>>>>
>>>> A 9.0-BETA1 snapshot, yes.
>>>
>>> Well, I'll leave it another half an hour but the 9.9-beta1 shapshot
>>> froze on boot after showing a "SRAT: No CPU found for memory domain 4".
>>>
>>> (all this after the traditional "do-nothing" pause of 10-or so minutes
>>> before displaying the copyright banner).
>>
>> Not sure if hw.memtest.tests tunable has made it into 9.0-BETA1.
>> Setting it to zero should result in skipping the checks.
>
> If it did, to what should I set it?

See one line above your question :-)

>> You may also try to capture and share a verbose dmesg, if possible.
>

> I'll take some photos of the screen.

No serial console? :(

Ivan Voras

unread,
Aug 29, 2011, 11:24:32 AM8/29/11
to
On 29 August 2011 17:20, Andriy Gapon <a...@freebsd.org> wrote:
> on 29/08/2011 18:18 Ivan Voras said the following:

>>> Not sure if hw.memtest.tests tunable has made it into 9.0-BETA1.


>>> Setting it to zero should result in skipping the checks.
>>
>> If it did, to what should I set it?
>
> See one line above your question :-)

Sorry :) I blame the ifluence of fan noise on my head :)

>>> You may also try to capture and share a verbose dmesg, if possible.
>>
>> I'll take some photos of the screen.
>
> No serial console? :(

There is on the server side but not on the client... didn't bring my
usb2serial cable.

Ivan Voras

unread,
Aug 29, 2011, 11:18:47 AM8/29/11
to
On 29 August 2011 17:15, Andriy Gapon <a...@freebsd.org> wrote:
> on 29/08/2011 17:46 Ivan Voras said the following:
>> On 26/08/2011 19:44, Garrett Cooper wrote:
>>> On Fri, Aug 26, 2011 at 10:36 AM, Ivan Voras <ivo...@freebsd.org> wrote:
>>>
>>> ...
>>>
>>>> I think that I'll need a 9-CURRENT snapshot on it to run all 128 CPUs,
>>>> right?
>>>
>>> A 9.0-BETA1 snapshot, yes.
>>
>> Well, I'll leave it another half an hour but the 9.9-beta1 shapshot
>> froze on boot after showing a "SRAT: No CPU found for memory domain 4".
>>
>> (all this after the traditional "do-nothing" pause of 10-or so minutes
>> before displaying the copyright banner).
>
> Not sure if hw.memtest.tests tunable has made it into 9.0-BETA1.
> Setting it to zero should result in skipping the checks.

If it did, to what should I set it?

> You may also try to capture and share a verbose dmesg, if possible.

I'll take some photos of the screen.

m...@freebsd.org

unread,
Aug 29, 2011, 12:33:29 PM8/29/11
to
On Mon, Aug 29, 2011 at 7:46 AM, Ivan Voras <ivo...@freebsd.org> wrote:
> On 26/08/2011 19:44, Garrett Cooper wrote:
>> On Fri, Aug 26, 2011 at 10:36 AM, Ivan Voras <ivo...@freebsd.org> wrote:
>>
>> ...
>>
>>> I think that I'll need a 9-CURRENT snapshot on it to run all 128 CPUs,
>>> right?
>>
>> A 9.0-BETA1 snapshot, yes.
>
> Well, I'll leave it another half an hour but the 9.9-beta1 shapshot
> froze on boot after showing a "SRAT: No CPU found for memory domain 4".

This message implies the memory affinity information coming from ACPI
is either non-sensical, or you have an unexpected physical setup where
there really are CPUs with no memory in the local sockets.

You should be able to boot with something like hint.srat.0="disabled"
at the boot loader prompt.

Thanks,
matthew

Ivan Voras

unread,
Aug 29, 2011, 1:28:37 PM8/29/11
to
On 29 August 2011 18:33, <m...@freebsd.org> wrote:
> On Mon, Aug 29, 2011 at 7:46 AM, Ivan Voras <ivo...@freebsd.org> wrote:
>> On 26/08/2011 19:44, Garrett Cooper wrote:
>>> On Fri, Aug 26, 2011 at 10:36 AM, Ivan Voras <ivo...@freebsd.org> wrote:
>>>
>>> ...
>>>
>>>> I think that I'll need a 9-CURRENT snapshot on it to run all 128 CPUs,
>>>> right?
>>>
>>> A 9.0-BETA1 snapshot, yes.
>>
>> Well, I'll leave it another half an hour but the 9.9-beta1 shapshot
>> froze on boot after showing a "SRAT: No CPU found for memory domain 4".
>
> This message implies the memory affinity information coming from ACPI
> is either non-sensical, or you have an unexpected physical setup where
> there really are CPUs with no memory in the local sockets.
>
> You should be able to boot with something like hint.srat.0="disabled"
> at the boot loader prompt.

Unfortunately, neither the memtest or the srat disabling tunables
worked (I also tried disabling srat.4).

My time with the machine is over, so I can't do more testing.

John Baldwin

unread,
Aug 29, 2011, 2:15:32 PM8/29/11
to
On Monday, August 29, 2011 1:28:37 pm Ivan Voras wrote:
> On 29 August 2011 18:33, <m...@freebsd.org> wrote:
> > On Mon, Aug 29, 2011 at 7:46 AM, Ivan Voras <ivo...@freebsd.org> wrote:
> >> On 26/08/2011 19:44, Garrett Cooper wrote:
> >>> On Fri, Aug 26, 2011 at 10:36 AM, Ivan Voras <ivo...@freebsd.org> wrote:
> >>>
> >>> ...
> >>>
> >>>> I think that I'll need a 9-CURRENT snapshot on it to run all 128 CPUs,
> >>>> right?
> >>>
> >>> A 9.0-BETA1 snapshot, yes.
> >>
> >> Well, I'll leave it another half an hour but the 9.9-beta1 shapshot
> >> froze on boot after showing a "SRAT: No CPU found for memory domain 4".
> >
> > This message implies the memory affinity information coming from ACPI
> > is either non-sensical, or you have an unexpected physical setup where
> > there really are CPUs with no memory in the local sockets.
> >
> > You should be able to boot with something like hint.srat.0="disabled"
> > at the boot loader prompt.
>
> Unfortunately, neither the memtest or the srat disabling tunables
> worked (I also tried disabling srat.4).
>
> My time with the machine is over, so I can't do more testing.

The hint to set would be 'hint.srat.0.disabled=1'.

However, the SRAT code just ignores the table when it encounters an issue like
this, it doesn't hang. Something else later in the boot must have hung.

--
John Baldwin

Ivan Voras

unread,
Aug 30, 2011, 8:11:56 PM8/30/11
to
On 29.8.2011. 20:15, John Baldwin wrote:

> However, the SRAT code just ignores the table when it encounters an issue like
> this, it doesn't hang. Something else later in the boot must have hung.

Anyway... that machine can in its maximal configuration be populated
with eight 10-core CPUs, i.e. 80 physical / 160 logical, so here's a
vote from me to bump the shiny new cpuset infrastructure maximum CPU
count to 256 before 9.0.

http://www.supermicro.com/products/system/5U/5086/SYS-5086B-TRF.cfm

Sean Bruno

unread,
Aug 31, 2011, 3:18:42 PM8/31/11
to
On Tue, 2011-08-30 at 17:11 -0700, Ivan Voras wrote:
> On 29.8.2011. 20:15, John Baldwin wrote:
>
> > However, the SRAT code just ignores the table when it encounters an issue like
> > this, it doesn't hang. Something else later in the boot must have hung.
>
> Anyway... that machine can in its maximal configuration be populated
> with eight 10-core CPUs, i.e. 80 physical / 160 logical, so here's a
> vote from me to bump the shiny new cpuset infrastructure maximum CPU
> count to 256 before 9.0.
>
> http://www.supermicro.com/products/system/5U/5086/SYS-5086B-TRF.cfm

Doesn't that (MAXCPU) seriously impact VM usage, lock contention
etc ... ?

I mean, if we have 2 cpus in a machine, but MAXCPU is set to 256, there
is a bunch of "lost" memory and higher levels of lock contention?

I thought that attilio was taking a stab at enhancing this, but at the
current time anything more than a value of 64 for MAXCPU is kind of a
"caveat emptor" area of FreeBSD.

Sean

P.S. I say 64 as yahoo has been running 64 cpus with local patches for
a while, so I know that this works fairly well.

Attilio Rao

unread,
Sep 1, 2011, 10:11:55 AM9/1/11
to
2011/8/31 Sean Bruno <sea...@yahoo-inc.com>:

> On Tue, 2011-08-30 at 17:11 -0700, Ivan Voras wrote:
>> On 29.8.2011. 20:15, John Baldwin wrote:
>>
>> > However, the SRAT code just ignores the table when it encounters an issue like
>> > this, it doesn't hang.  Something else later in the boot must have hung.
>>
>> Anyway... that machine can in its maximal configuration be populated
>> with eight 10-core CPUs, i.e. 80 physical / 160 logical, so here's a
>> vote from me to bump the shiny new cpuset infrastructure maximum CPU
>> count to 256 before 9.0.
>>
>> http://www.supermicro.com/products/system/5U/5086/SYS-5086B-TRF.cfm
>
> Doesn't that (MAXCPU) seriously impact VM usage, lock contention
> etc ... ?
>
> I mean, if we have 2 cpus in a machine, but MAXCPU is set to 256, there
> is a bunch of "lost" memory and higher levels of lock contention?
>
> I thought that attilio was taking a stab at enhancing this, but at the
> current time anything more than a value of 64 for MAXCPU is kind of a
> "caveat emptor" area of FreeBSD.

With newest current you can redefine MAXCPU in your kernel config, so
you don't need to bump the default value.
I think 64 as default value is good enough.

Removing MAXCPU dependency from the KBI is an important project
someone should adopt and bring to conclusion.

Thanks,
Attilio


--
Peace can only be achieved by understanding - A. Einstein

Ivan Voras

unread,
Sep 1, 2011, 10:58:31 AM9/1/11
to
On 1 September 2011 16:11, Attilio Rao <att...@freebsd.org> wrote:

>> I mean, if we have 2 cpus in a machine, but MAXCPU is set to 256, there
>> is a bunch of "lost" memory and higher levels of lock contention?
>>
>> I thought that attilio was taking a stab at enhancing this, but at the
>> current time anything more than a value of 64 for MAXCPU is kind of a
>> "caveat emptor" area of FreeBSD.
>
> With newest current you can redefine MAXCPU in your kernel config, so
> you don't need to bump the default value.
> I think 64 as default value is good enough.
>
> Removing MAXCPU dependency from the KBI is an important project
> someone should adopt and bring to conclusion.

That's certainly one half of it and thanks for the work, but the real
question in this thread is what Sean asked: what are the negative
side-effects of simply bumping MAXCPU to 256 by default? AFAIK, there
are not that many structures which are statically sized by MAXCMPU and
most use the runtime-detected smp_cpus?

Attilio Rao

unread,
Sep 1, 2011, 1:22:47 PM9/1/11
to
2011/9/1 Ivan Voras <ivo...@freebsd.org>:

> On 1 September 2011 16:11, Attilio Rao <att...@freebsd.org> wrote:
>
>>> I mean, if we have 2 cpus in a machine, but MAXCPU is set to 256, there
>>> is a bunch of "lost" memory and higher levels of lock contention?
>>>
>>> I thought that attilio was taking a stab at enhancing this, but at the
>>> current time anything more than a value of 64 for MAXCPU is kind of a
>>> "caveat emptor" area of FreeBSD.
>>
>> With newest current you can redefine MAXCPU in your kernel config, so
>> you don't need to bump the default value.
>> I think 64 as default value is good enough.
>>
>> Removing MAXCPU dependency from the KBI is an important project
>> someone should adopt and bring to conclusion.
>
> That's certainly one half of it and thanks for the work, but the real
> question in this thread is what Sean asked: what are the negative
> side-effects of simply bumping MAXCPU to 256 by default? AFAIK, there
> are not that many structures which are statically sized by MAXCMPU and
> most use the runtime-detected smp_cpus?
>

Well, there are quite a few statically allocated, but as I said,
making the kernel MAXCPU-agnostic (or sort of agnostic) is a goal and
a good project.

Thanks,
Attilio


--
Peace can only be achieved by understanding - A. Einstein

0 new messages