Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Ryzen issues on FreeBSD ?

337 views
Skip to first unread message

Mike Tancsa

unread,
Jan 17, 2018, 8:38:47 AM1/17/18
to
With the Intel issues exposed in meltdown, we were looking at possibly
deploying some Ryzen based servers for FreeBSD. We got a pair of
ASUS PRIME X370-PRO and

CPU: AMD Ryzen 5 1600X Six-Core Processor (3593.34-MHz
K8-class CPU)
Origin="AuthenticAMD" Id=0x800f11 Family=0x17 Model=0x1 Stepping=1

Everything is at its default in the BIOS, no overclocking etc.

However, we are seeing random lockups on both boxes. It doesnt seem to
correspond with load/activity. And its a hard lockup. Keyboard not
responsive and I cant break to serial debugger, so it doesnt seem to be
an issue with something in the kernel going into deadlock.

It sort of feels like a hardware issue, but it seems odd that both boxes
are showing the same issue with random lockups like that. It could be
twice in a day or once every 3 days.

Anyone have any insights ? Anyone have any suggestions about better
motherboards out there ? We are waiting for Supermicro's Epyc
availability, but nothing yet. It would be nice if we could find a
board with at least some hardware watchdog on it.


---Mike

--
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mi...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada http://www.tancsa.com/
_______________________________________________
freebsd...@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stabl...@freebsd.org"

Pete French

unread,
Jan 17, 2018, 8:43:43 AM1/17/18
to
I am in much the same situation as you (want to deploy Epyc, waiting for
SM stuff to become available). I currently have here a set of parts to
make a test Ryzen box, so you are ahead of me on that though. Should
have that gong this week I hope.

Are you running the latest STABLE ? There were some patches for Ryzen
which went in I belive, and might affect te stability. Specificly the
chnages to stop it locking up when executing code in the top page ?

I'll get back to you when I have done some more testing...

-pete.

On 17/01/2018 13:38, Mike Tancsa wrote:
> With the Intel issues exposed in meltdown, we were looking at possibly
> deploying some Ryzen based servers for FreeBSD. We got a pair of
> ASUS PRIME X370-PRO and
>
> CPU: AMD Ryzen 5 1600X Six-Core Processor (3593.34-MHz
> K8-class CPU)
> Origin="AuthenticAMD" Id=0x800f11 Family=0x17 Model=0x1 Stepping=1
>
> Everything is at its default in the BIOS, no overclocking etc.
>
> However, we are seeing random lockups on both boxes. It doesnt seem to
> correspond with load/activity. And its a hard lockup. Keyboard not
> responsive and I cant break to serial debugger, so it doesnt seem to be
> an issue with something in the kernel going into deadlock.
>
> It sort of feels like a hardware issue, but it seems odd that both boxes
> are showing the same issue with random lockups like that. It could be
> twice in a day or once every 3 days.
>
> Anyone have any insights ? Anyone have any suggestions about better
> motherboards out there ? We are waiting for Supermicro's Epyc
> availability, but nothing yet. It would be nice if we could find a
> board with at least some hardware watchdog on it.
>
>
> ---Mike
>

Nimrod Levy

unread,
Jan 17, 2018, 8:47:47 AM1/17/18
to
I've been seeing similar issues on Ryzen and asked some questions, here
https://lists.freebsd.org/pipermail/freebsd-stable/2017-December/088121.html

My previous queries didn't go anywhere.

--
Nimrod

On Wed, Jan 17, 2018 at 8:38 AM Mike Tancsa <mi...@sentex.net> wrote:

> With the Intel issues exposed in meltdown, we were looking at possibly
> deploying some Ryzen based servers for FreeBSD. We got a pair of
> ASUS PRIME X370-PRO and
>
> CPU: AMD Ryzen 5 1600X Six-Core Processor (3593.34-MHz
> K8-class CPU)
> Origin="AuthenticAMD" Id=0x800f11 Family=0x17 Model=0x1 Stepping=1
>
> Everything is at its default in the BIOS, no overclocking etc.
>
> However, we are seeing random lockups on both boxes. It doesnt seem to
> correspond with load/activity. And its a hard lockup. Keyboard not
> responsive and I cant break to serial debugger, so it doesnt seem to be
> an issue with something in the kernel going into deadlock.
>
> It sort of feels like a hardware issue, but it seems odd that both boxes
> are showing the same issue with random lockups like that. It could be
> twice in a day or once every 3 days.
>
> Anyone have any insights ? Anyone have any suggestions about better
> motherboards out there ? We are waiting for Supermicro's Epyc
> availability, but nothing yet. It would be nice if we could find a
> board with at least some hardware watchdog on it.
>
>
> ---Mike
>
> --
> -------------------
> Mike Tancsa, tel +1 519 651 3400 <(519)%20651-3400>
> Sentex Communications, mi...@sentex.net
> Providing Internet services since 1994 www.sentex.net
> Cambridge, Ontario Canada http://www.tancsa.com/
> _______________________________________________
> freebsd...@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stabl...@freebsd.org"
>


--

--
Nimrod

Mike Tancsa

unread,
Jan 17, 2018, 9:32:37 AM1/17/18
to
On 1/17/2018 8:46 AM, Nimrod Levy wrote:
> I've been seeing similar issues on Ryzen and asked some questions,
> here https://lists.freebsd.org/pipermail/freebsd-stable/2017-December/088121.html
>
> My previous queries didn't go anywhere.  
>

Thats not very promising :( Googling around, shows lots of similar
reports both on FreeBSD and Linux, but its a lot of "I tweaked this BIOS
setting and so far so good" but nothing definitive / conclusive. Having
to mess about with hardware settings for days on end hoping to fix
random lockups is .... not good.

---Mike


--
-------------------
Mike Tancsa, tel +1 519 651 3400

Mike Tancsa

unread,
Jan 17, 2018, 10:05:03 AM1/17/18
to
On 1/17/2018 8:43 AM, Pete French wrote:
>
> Are you running the latest STABLE ? There were some patches for Ryzen
> which went in I belive, and might affect te stability. Specificly the
> chnages to stop it locking up when executing code in the top page ?

Hi,
I was testing with RELENG_11 as of 2 days ago. The fix seems to be there

# sysctl -A hw.lower_amd64_sharedpage
hw.lower_amd64_sharedpage: 1

Would love to find a class of motherboard that pushes its "You dont need
to dork around with any BIOS settings. It just works. Oh, and we have a
hardware watchdog too".... ipmi would be stellar.

Don Lewis

unread,
Jan 17, 2018, 3:40:09 PM1/17/18
to
On 17 Jan, Mike Tancsa wrote:
> On 1/17/2018 8:43 AM, Pete French wrote:
>>
>> Are you running the latest STABLE ? There were some patches for Ryzen
>> which went in I belive, and might affect te stability. Specificly the
>> chnages to stop it locking up when executing code in the top page ?
>
> Hi,
> I was testing with RELENG_11 as of 2 days ago. The fix seems to be there
>
> # sysctl -A hw.lower_amd64_sharedpage
> hw.lower_amd64_sharedpage: 1
>
> Would love to find a class of motherboard that pushes its "You dont need
> to dork around with any BIOS settings. It just works. Oh, and we have a
> hardware watchdog too".... ipmi would be stellar.

The shared page change fixed the random lockup and silent reboot problem
for me. I've got a 1700X eight core CPU and a Gigabyte X370 Gaming 5. I
did have to RMA my CPU (it was an early one) because it had the problem
with random segfaults that seemed to be triggered by process migration
between CPU cores. I still haven't switched over to using it for
package builds because I see more random fallout than on my older
package builder. I'm not blaming the hardware for that at this point
because I see a lot of the same issues on my older machine, but less
frequently.

One thing to watch (though it should be less critical with a six core
CPU) is VRM cooling. I removed the stupid plastic shroud over the VRM
sink on my motherboard so that it gets some more airflow.

Mike Tancsa

unread,
Jan 17, 2018, 4:01:20 PM1/17/18
to
Thanks! I will confirm the cooling. I tried just now looking at the CPU
FAN control in the BIOS and up'd it to "turbo" from the default. Does
amdtmp.ko work with your chipset ? Nothing on mine unfortunately, so I
cant tell from the OS if its running hot.

Is there a way to see if your CPU is old and has that bug ? I havent
seen any segfaults on the few dozen buildworlds I have done. So far its
always been a total lockup and not crash with RELENG11.

x86info v1.31pre
Found 12 identical CPUs
Extended Family: 8 Extended Model: 0 Family: 15 Model: 1 Stepping: 1
CPU Model (x86info's best guess): AMD Zen Series Processor (ZP-B1)
Processor name string (BIOS programmed): AMD Ryzen 5 1600 Six-Core
Processor

Monitor/Mwait: min/max line size 64/64, ecx bit 0 support, enumeration
extension
SVM: revision 1, 32768 ASIDs, np, lbrVirt, SVMLock, NRIPSave,
TscRateMsr, VmcbClean, FlushByAsid, DecodeAssists, PauseFilter,
PauseFilterThreshold
Address Size: 48 bits virtual, 48 bits physical
The physical package has 12 of 16 possible cores implemented.
running at an estimated 3.20GHz




---Mike



--
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mi...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada http://www.tancsa.com/

Nimrod Levy

unread,
Jan 17, 2018, 4:46:15 PM1/17/18
to
I'm running 11-STABLE from 12/9. amdtemp works for me. It also has the
systl indicating that it it has the shared page fix. I'm pretty sure I've
seen the lockups since then. I'll update to the latest STABLE and see
what happens.

One weird thing about my experience is that if I keep something running
continuously like the distributed.net client on 6 of 12 possible threads,
it keeps the system up for MUCH longer than without. This is a home server
and very lightly loaded (one could argue insanely overpowered for the use
case).

I'm glad to see that there has been some attention on this. I was a little
disappointed by the earlier thread.

I'm happy to help troubleshoot, but I'm not sure what information I can
gather from a hard locked system that doesn't even show anything on the
console.

--
Nimrod
> Mike Tancsa, tel +1 519 651 3400 <(519)%20651-3400>
> Sentex Communications, mi...@sentex.net
> Providing Internet services since 1994 www.sentex.net
> Cambridge, Ontario Canada http://www.tancsa.com/
> _______________________________________________
> freebsd...@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stabl...@freebsd.org"
>


--

--
Nimrod

Mark Millard via freebsd-stable

unread,
Jan 17, 2018, 4:46:29 PM1/17/18
to
Mike Tancsa mike at sentex.net wrote on:
Wed Jan 17 14:31:50 UTC 2018 :

> On 1/17/2018 8:46 AM, Nimrod Levy wrote:
> > I've been seeing similar issues on Ryzen and asked some questions,
> > here https://lists.freebsd.org/pipermail/freebsd-stable/2017-December/088121.html
> >
> > My previous queries didn't go anywhere.
> >
>
>
>
> Thats not very promising :( Googling around, shows lots of similar
> reports both on FreeBSD and Linux, but its a lot of "I tweaked this BIOS
> setting and so far so good" but nothing definitive / conclusive. Having
> to mess about with hardware settings for days on end hoping to fix
> random lockups is .... not good.

See Bugzilla 219399 and 221029 :

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219399
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221029

I'm not sure how much stable/11 and the like have been
tracking things that were done in head (12) during this.
My use has only been via versions of head.

My 1800X use was basically after head was updated to deal
with what 219399 eventually was isolated to. (221029 is
from splitting off problems that were not originally known
to be separate.)

While I had problems for 1800X that are what the 221029
bugzilla above is about, I've not had such with a 1950X
in the same sorts of contexts as I had been using the
1800X. But this was under Hyper-V for both processor
variants (with matching boards).

I've only tried the 1950X with a native FreeBSD boot once
(a fair time ago). It showed a lockup problem fairly
quickly (power switch/plug time). I've never seen such
(or anything analogous) under Hyper-V with extensive use.

It does not look like I'll be investigating native FreeBSD
on the 1950X anytime soon. (I no longer have access to the
1800X.)

===
Mark Millard
marklmi26-fbsd at yahoo.com
( markmi at dsl-only.net is going away in 2018-Feb, late)

Don Lewis

unread,
Jan 17, 2018, 6:11:15 PM1/17/18
to
My original CPU had a date code of 1708SUT (8th week of 2017 I think),
and the replacement has a date code of 1733SUS. There's a humungous
discussion thread here <https://community.amd.com/thread/215773> where
date codes are discussed. As I recall, the first replacement parts
shipped had dates codes somewhere in the mid 20's, but I think AMD was
still hand screening parts at that point. My replacement came in a
sealed box, so it wasn't hand screened and AMD probably was able to
screen for this problem in their production test.

Don Lewis

unread,
Jan 17, 2018, 6:39:09 PM1/17/18
to
On 17 Jan, Nimrod Levy wrote:
> I'm running 11-STABLE from 12/9. amdtemp works for me. It also has the
> systl indicating that it it has the shared page fix. I'm pretty sure I've
> seen the lockups since then. I'll update to the latest STABLE and see
> what happens.
>
> One weird thing about my experience is that if I keep something running
> continuously like the distributed.net client on 6 of 12 possible threads,
> it keeps the system up for MUCH longer than without. This is a home server
> and very lightly loaded (one could argue insanely overpowered for the use
> case).

This sounds like the problem with the deep Cx states that has been
reported by numerous Linux users. I think some motherboard brands are
more likely to have the problem. See:
http://forum.asrock.com/forum_posts.asp?TID=5963&title=taichi-x370-with-ubuntu-idle-lock-ups-idle-freeze

Nimrod Levy

unread,
Jan 17, 2018, 9:38:08 PM1/17/18
to
That looks promising. I just found that seeing in the bios and disabled it.
I'll see how it runs.

Thanks


On Wed, Jan 17, 2018, 18:38 Don Lewis <truc...@freebsd.org> wrote:

> On 17 Jan, Nimrod Levy wrote:
> > I'm running 11-STABLE from 12/9. amdtemp works for me. It also has the
> > systl indicating that it it has the shared page fix. I'm pretty sure I've
> > seen the lockups since then. I'll update to the latest STABLE and see
> > what happens.
> >
> > One weird thing about my experience is that if I keep something running
> > continuously like the distributed.net client on 6 of 12 possible
> threads,
> > it keeps the system up for MUCH longer than without. This is a home
> server
> > and very lightly loaded (one could argue insanely overpowered for the use
> > case).
>
> This sounds like the problem with the deep Cx states that has been
> reported by numerous Linux users. I think some motherboard brands are
> more likely to have the problem. See:
>
> http://forum.asrock.com/forum_posts.asp?TID=5963&title=taichi-x370-with-ubuntu-idle-lock-ups-idle-freeze
>
> --

--
Nimrod

Nimrod Levy

unread,
Jan 19, 2018, 3:09:33 PM1/19/18
to
Looks like disabling the C- states in the bios didn't change anything.

Mike Tancsa

unread,
Jan 19, 2018, 3:13:52 PM1/19/18
to
Drag :( I have mine disabled as well as lowering the RAM freq to 2100
from 2400. For me the hangs are infrequent. Its only been a day and a
half, so not sure if its gone or I have been "lucky"... Either ways,
this platform feels way too fragile to deploy on anything :(

---Mike

On 1/19/2018 3:08 PM, Nimrod Levy wrote:
> Looks like disabling the C- states in the bios didn't change anything. 
>
> On Wed, Jan 17, 2018 at 9:22 PM Nimrod Levy <nim...@gmail.com
> <mailto:nim...@gmail.com>> wrote:
>
> That looks promising. I just found that seeing in the bios and
> disabled it. I'll see how it runs.
>
> Thanks
>
>
> On Wed, Jan 17, 2018, 18:38 Don Lewis <truc...@freebsd.org
> <mailto:truc...@freebsd.org>> wrote:
>
> On 17 Jan, Nimrod Levy wrote:
> > I'm running 11-STABLE from 12/9.  amdtemp works for me.  It
> also has the
> > systl indicating that it it has the shared page fix. I'm
> pretty sure I've
> > seen the lockups since then.  I'll update to the latest STABLE
> and see
> > what  happens.
> >
> > One weird thing about my experience is that if I keep
> something running

> > continuously like the distributed.net <http://distributed.net>


> client on 6 of 12 possible threads,
> > it keeps the system up for MUCH longer than without.  This is
> a home server
> > and very lightly loaded (one could argue insanely overpowered
> for the use
> > case).
>
> This sounds like the problem with the deep Cx states that has been
> reported by numerous Linux users.  I think some motherboard
> brands are
> more likely to have the problem.  See:
> http://forum.asrock.com/forum_posts.asp?TID=5963&title=taichi-x370-with-ubuntu-idle-lock-ups-idle-freeze
>
> --
>
> --
> Nimrod
>
>
>
> --
>
> --
> Nimrod
>

--
-------------------
Mike Tancsa, tel +1 519 651 3400

Sentex Communications, mi...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada http://www.tancsa.com/

Lucas Holt

unread,
Jan 19, 2018, 3:23:05 PM1/19/18
to
I have an Asus Prime X370-pro and a Ryzen 7 1700 that I bought in late
April. Make sure you have the latest BIOS for these boards or else it
will randomly freak out.

While i haven't used it much with FreeBSD, I can confirm that I had a
lot of stability issues solved with a December BIOS update on
MidnightBSD. I back ported the shared page fix and amdtemp. (it's
basically FreeBSD 9.1)

I couldn't even get it to boot until the August BIOS update. I've had
my box stay up at least a week, and it's my primary development box so
I'm mostly doing src/ports builds all the time on it.

If you have the latest BIOS, check the memory timings too. It's rather
picky with some memory modules.

Luke

Mike Tancsa

unread,
Jan 19, 2018, 3:30:37 PM1/19/18
to
On 1/19/2018 3:22 PM, Lucas Holt wrote:
> I have an Asus Prime X370-pro and a Ryzen 7 1700 that I bought in late

Thanks! Thats the board I have, but no luck with amdtemp. Did you have
to change the source code for it to work ?

dmidecode shows

Manufacturer: ASUSTeK COMPUTER INC.
Product Name: PRIME X370-PRO

Vendor: American Megatrends Inc.
Version: 3402
Release Date: 12/11/2017
Address: 0xF0000
Runtime Size: 64 kB
ROM Size: 16 MB
Characteristics:

memory is

Type: DDR4
Type Detail: Synchronous Unbuffered (Unregistered)
Speed: 2133 MT/s
Manufacturer: Unknown
Serial Number: 192BE196
Asset Tag: Not Specified
Part Number: CT16G4DFD824A.C16FHD
Rank: 2
Configured Clock Speed: 1067 MT/s
Minimum Voltage: 1.2 V
Maximum Voltage: 1.2 V
Configured Voltage: 1.2 V

When I try and load the kld, I get nothing :(

0(ms-v1)# kldload amdtemp
0(ms-v1)# dmesg | tail -2
ums0: at uhub0, port 3, addr 1 (disconnected)
ums0: detached
0(ms-v1)#

> April.  Make sure you have the latest BIOS for these boards or else it
> will randomly freak out.
>
> While i haven't used it much with FreeBSD, I can confirm that I had a
> lot of stability issues solved with a December BIOS update on
> MidnightBSD. I back ported the shared page fix and amdtemp.  (it's
> basically FreeBSD 9.1)
>
> I couldn't even get it to boot until the August BIOS update.  I've had
> my box stay up at least a week, and it's my primary development box so
> I'm mostly doing src/ports builds all the time on it.
>
> If you have the latest BIOS, check the memory timings too.  It's rather
> picky with some memory modules.
>
> Luke
>
>

--
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mi...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada http://www.tancsa.com/

Ryan Root

unread,
Jan 19, 2018, 3:34:23 PM1/19/18
to
Have you double checked the qualified vendors list data for your
motherboard.  Sometimes memory chips not on the list will work as it's
probably only a list of ones they've tested but it might be the problem
in this situation.  If that was already brought up by someone else sorry
for butting in.

This looks like the QVL list for your MB ->
http://download.gigabyte.us/FileList/Memory/mb_memory_ga-ax370-Gaming5.pdf

Peter Moody

unread,
Jan 19, 2018, 3:34:32 PM1/19/18
to
On Fri, Jan 19, 2018 at 12:13 PM, Mike Tancsa <mi...@sentex.net> wrote:
> Drag :( I have mine disabled as well as lowering the RAM freq to 2100
> from 2400. For me the hangs are infrequent. Its only been a day and a
> half, so not sure if its gone or I have been "lucky"... Either ways,
> this platform feels way too fragile to deploy on anything :(
>
> ---Mike
>
> On 1/19/2018 3:08 PM, Nimrod Levy wrote:
>> Looks like disabling the C- states in the bios didn't change anything.

it's too early for me to be 100% certain, but disabling SMT in the
bios has thus far resulted in a more stable system.

I have a ryzen5 1600X and an ASRock AB350M and I've tried just about
everything in all of these threads; disabling C state (no effect),
setting the sysctl (doesn't exist on my 11.1 RELEASE), tweaking
voltage and cooling settings, rma'ing the board the cpu and the
memory. nothing helped.

last night I tried disabling SMT and, so far so good.

Lucas Holt

unread,
Jan 19, 2018, 3:38:49 PM1/19/18
to
We have the same bios version.

I have corsair RAM

Handle 0x003B, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x0032
Error Information Handle: 0x003A
Total Width: 64 bits
Data Width: 64 bits
Size: 16384 MB
Form Factor: DIMM
Set: None
Locator: DIMM_A2
Bank Locator: BANK 1
Type: <OUT OF SPEC>
Type Detail: Synchronous Unbuffered (Unregistered)
Speed: 2666 MHz
Manufacturer: Unknown
Serial Number: 00000000
Asset Tag: Not Specified
Part Number: CMK32GX4M2A2666C16
Rank: 2
Configured Clock Speed: 1333 MHz
Minimum voltage: 1.200 V
Maximum voltage: 1.200 V
Configured voltage: 1.200 V


I just double checked and amdtemp isn't working correctly. I was
probably thinking of my other system which has an FX 8350.

Luke

Mike Tancsa

unread,
Jan 19, 2018, 3:40:54 PM1/19/18
to
On 1/19/2018 3:32 PM, Peter Moody wrote:
>
> I have a ryzen5 1600X and an ASRock AB350M and I've tried just about
> everything in all of these threads; disabling C state (no effect),
> setting the sysctl (doesn't exist on my 11.1 RELEASE), tweaking
> voltage and cooling settings, rma'ing the board the cpu and the
> memory. nothing helped.
>
> last night I tried disabling SMT and, so far so good.


Is there anything that can be done to trigger the lockup more reliably ?
I havent found any patterns. I have had lockups with the system is 100%
idle and lockups when lightly loaded. I have yet to see any segfaults
or sig 11s while doing buildworld (make -j12 or make -j16 even)

---Mike

Mike Tancsa

unread,
Jan 19, 2018, 3:49:05 PM1/19/18
to
On 1/19/2018 3:23 PM, Ryan Root wrote:
> This looks like the QVL list for your MB ->
> http://download.gigabyte.us/FileList/Memory/mb_memory_ga-ax370-Gaming5.pdf

Its an Asus MB, but the memory I have is in the above PDF list

I dont see CT16G4DFD824A, but I do see other crucial products with
slower clock speeds. Right now I do have it set to 2133 where as it was
2400 before.

---Mike

--
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mi...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada http://www.tancsa.com/

Peter Moody

unread,
Jan 19, 2018, 3:49:35 PM1/19/18
to
On Fri, Jan 19, 2018 at 12:39 PM, Mike Tancsa <mi...@sentex.net> wrote:
> On 1/19/2018 3:32 PM, Peter Moody wrote:
>>
>> I have a ryzen5 1600X and an ASRock AB350M and I've tried just about
>> everything in all of these threads; disabling C state (no effect),
>> setting the sysctl (doesn't exist on my 11.1 RELEASE), tweaking
>> voltage and cooling settings, rma'ing the board the cpu and the
>> memory. nothing helped.
>>
>> last night I tried disabling SMT and, so far so good.
>
>
> Is there anything that can be done to trigger the lockup more reliably ?
> I havent found any patterns. I have had lockups with the system is 100%
> idle and lockups when lightly loaded. I have yet to see any segfaults
> or sig 11s while doing buildworld (make -j12 or make -j16 even)

"reliably" trigger the lockup, no.

The general pattern was high load followed by low load would lock up;
so a couple 'make -j 32 buildworld' would almost always result in a
lock up a few hours after the last one completes.

weirdly enough though, with SMT enabled, building net/samba47 would
always hang (like compilation segfaults). with SMT disabled, no such
problems.

Nimrod Levy

unread,
Jan 19, 2018, 4:03:35 PM1/19/18
to
I can try lowering my memory clock and see what happens. I'm a little
skeptical because I have been able to run memtest with no errors for some
time. I'm glad to give anything a try...
> Mike Tancsa, tel +1 519 651 3400 <(519)%20651-3400>
> Sentex Communications, mi...@sentex.net
> Providing Internet services since 1994 www.sentex.net
> Cambridge, Ontario Canada http://www.tancsa.com/
> _______________________________________________
> freebsd...@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stabl...@freebsd.org"
>


Pete French

unread,
Jan 19, 2018, 4:14:36 PM1/19/18
to
Out of interest, is there anyone out there running Ryzen who *hasnt*
seen lockups ? I'd be curious if there a lot of lurkers thinking "mine
works fine"

To be honest this thread has put me off building my machine, the pile of
boxes with motherboard, case, cpu and ram is still sitting next to me
desk at work!

-pete.

Mike Tancsa

unread,
Jan 19, 2018, 4:18:18 PM1/19/18
to
On 1/19/2018 3:48 PM, Peter Moody wrote:
>
> weirdly enough though, with SMT enabled, building net/samba47 would
> always hang (like compilation segfaults). with SMT disabled, no such
> problems.

wow, thats so strange! I just tried,and the same thing and see it as well.

[ 442/3804] Generating lib/ldb-samba/ldif_handlers_proto.h
[ 443/3804] Generating source4/lib/registry/tools/common.h
runner /usr/local/bin/perl
"/usr/ports/net/samba47/work/samba-4.7.4/source4/script/mkproto.pl"
--srcdir=.. --builddir=. --public=/dev/null
--private="default/lib/ldb-samba/ldif_handlers_proto.h"
../lib/ldb-samba/ldif_handlers.c ../lib/ldb-samba/ldb_matching_rules.c
[ 444/3804] Generating source4/lib/registry/tests/proto.h
runner /usr/local/bin/perl
"/usr/ports/net/samba47/work/samba-4.7.4/source4/script/mkproto.pl"
--srcdir=.. --builddir=. --public=/dev/null
--private="default/source4/lib/registry/tools/common.h"
../source4/lib/registry/tools/common.c


And it just hangs there. No segfaults, but it just hangs.

A ctrl+t shows just shows

load: 0.16 cmd: python2.7 65754 [usem] 589.51r 10.52u 1.63s 0% 122360k
make: Working in: /usr/ports/net/samba47


---Mike



--
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mi...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada http://www.tancsa.com/

Mike Tancsa

unread,
Jan 19, 2018, 4:27:57 PM1/19/18
to
On 1/19/2018 4:14 PM, Pete French wrote:
> To be honest this thread has put me off building my machine, the pile of
> boxes with motherboard, case, cpu and ram is still sitting next to me
> desk at work!

It is quite discouraging, isnt it :( From my POV however, I really want
a "plan b" to Spectre / Meltdown that at least slows down attackers. On
a few servers, I do have ways to detect after that fact intrusions
(tripwire etc), but I want to mitigate this attack from happening in the
first place as much as possible. Supposedly Spectre exploits on AMD are
MUCH harder (or so says AMD at least) so I am hoping I can at least have
this as an option. We even ordered a Tyan Epyc based board to see what
its like too.

---Mike



--
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mi...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada http://www.tancsa.com/

Peter Moody

unread,
Jan 19, 2018, 4:28:43 PM1/19/18
to
On Fri, Jan 19, 2018 at 1:17 PM, Mike Tancsa <mi...@sentex.net> wrote:
> On 1/19/2018 3:48 PM, Peter Moody wrote:
>>
>> weirdly enough though, with SMT enabled, building net/samba47 would
>> always hang (like compilation segfaults). with SMT disabled, no such
>> problems.
>
> wow, thats so strange! I just tried,and the same thing and see it as well.
>
> [ 442/3804] Generating lib/ldb-samba/ldif_handlers_proto.h
> [ 443/3804] Generating source4/lib/registry/tools/common.h
> runner /usr/local/bin/perl
> "/usr/ports/net/samba47/work/samba-4.7.4/source4/script/mkproto.pl"
> --srcdir=.. --builddir=. --public=/dev/null
> --private="default/lib/ldb-samba/ldif_handlers_proto.h"
> ../lib/ldb-samba/ldif_handlers.c ../lib/ldb-samba/ldb_matching_rules.c
> [ 444/3804] Generating source4/lib/registry/tests/proto.h
> runner /usr/local/bin/perl
> "/usr/ports/net/samba47/work/samba-4.7.4/source4/script/mkproto.pl"
> --srcdir=.. --builddir=. --public=/dev/null
> --private="default/source4/lib/registry/tools/common.h"
> ../source4/lib/registry/tools/common.c
>
>
> And it just hangs there. No segfaults, but it just hangs.

yeah whoops, sorry. I had compilation segfaults on my mind b/c of the
linked amd forums posts about gcc segfaults.

anyway, try disabling SMT and see if that helps .. ?

this seems to make parallel stuff perceptibly slower (like a make -j12
buildworld at the same time as make -j 12 in net/samba47), but at this
point I'm just happy with the relative stability.

Mike Tancsa

unread,
Jan 19, 2018, 5:07:04 PM1/19/18
to
On 1/19/2018 4:27 PM, Peter Moody wrote:
>> And it just hangs there. No segfaults, but it just hangs.
>
> yeah whoops, sorry. I had compilation segfaults on my mind b/c of the
> linked amd forums posts about gcc segfaults.
>
> anyway, try disabling SMT and see if that helps .. ?
>

Strange, still hangs, different location :(

[ 386/3804] HEIMDAL_GSSAPI_ASN1_PRIV_H:
bin/default/source4/heimdal/lib/gssapi/gssapi_asn1-priv.hx ->
bin/default/source4/heimdal/lib/gssapi/gssapi_asn1-priv.h
runner cp default/source4/heimdal/lib/gssapi/gssapi_asn1-priv.hx
default/source4/heimdal/lib/gssapi/gssapi_asn1-priv.h




load: 19.29 cmd: make 79210 [wait] 301.89r 0.04u 0.00s 0% 1560k
make: Working in: /usr/ports/net/samba47
make[1]: Working in: /usr/ports/net/samba47

Don Lewis

unread,
Jan 19, 2018, 5:29:56 PM1/19/18
to
On 19 Jan, Pete French wrote:
> Out of interest, is there anyone out there running Ryzen who *hasnt*
> seen lockups ? I'd be curious if there a lot of lurkers thinking "mine
> works fine"

No hangs or silent reboots here with either my original CPU or warranty
replacement once the shared page fix was in place.

Don Lewis

unread,
Jan 19, 2018, 5:33:09 PM1/19/18
to
On 19 Jan, Mike Tancsa wrote:
> On 1/19/2018 3:32 PM, Peter Moody wrote:
>>
>> I have a ryzen5 1600X and an ASRock AB350M and I've tried just about
>> everything in all of these threads; disabling C state (no effect),
>> setting the sysctl (doesn't exist on my 11.1 RELEASE), tweaking
>> voltage and cooling settings, rma'ing the board the cpu and the
>> memory. nothing helped.
>>
>> last night I tried disabling SMT and, so far so good.
>
>
> Is there anything that can be done to trigger the lockup more reliably ?
> I havent found any patterns. I have had lockups with the system is 100%
> idle and lockups when lightly loaded. I have yet to see any segfaults
> or sig 11s while doing buildworld (make -j12 or make -j16 even)

I never seen the idle lockup problem here. Prior to the shared page
fix, I could almost always trigger a system hang or silent reboot by
doing a parallel build of openjdk8.

Don Lewis

unread,
Jan 19, 2018, 5:46:10 PM1/19/18
to
On 19 Jan, Mike Tancsa wrote:
> On 1/19/2018 3:48 PM, Peter Moody wrote:
>>
>> weirdly enough though, with SMT enabled, building net/samba47 would
>> always hang (like compilation segfaults). with SMT disabled, no such
>> problems.
>
> wow, thats so strange! I just tried,and the same thing and see it as well.
>
> [ 442/3804] Generating lib/ldb-samba/ldif_handlers_proto.h
> [ 443/3804] Generating source4/lib/registry/tools/common.h
> runner /usr/local/bin/perl
> "/usr/ports/net/samba47/work/samba-4.7.4/source4/script/mkproto.pl"
> --srcdir=.. --builddir=. --public=/dev/null
> --private="default/lib/ldb-samba/ldif_handlers_proto.h"
> ../lib/ldb-samba/ldif_handlers.c ../lib/ldb-samba/ldb_matching_rules.c
> [ 444/3804] Generating source4/lib/registry/tests/proto.h
> runner /usr/local/bin/perl
> "/usr/ports/net/samba47/work/samba-4.7.4/source4/script/mkproto.pl"
> --srcdir=.. --builddir=. --public=/dev/null
> --private="default/source4/lib/registry/tools/common.h"
> ../source4/lib/registry/tools/common.c
>
>
> And it just hangs there. No segfaults, but it just hangs.
>
> A ctrl+t shows just shows
>
> load: 0.16 cmd: python2.7 65754 [usem] 589.51r 10.52u 1.63s 0% 122360k
> make: Working in: /usr/ports/net/samba47

I sometimes seen build runaways when using poudriere to build my
standard set of packages on my Ryzen machine. I don't think this is a
Ryzen-specific issue since I also see the same on older AMD FX-8320E
machine, but much less frequently there. It looks like a lost wakeup
issue, but I haven't had a chance to dig into it yet.

=>> Killing runaway build after 7200 seconds with no output
=>> Cleaning up wrkdir
===> Cleaning for doxygen-1.8.13_1,2
=>> Warning: Leftover processes:
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
nobody 55576 0.0 0.0 10556 1528 0 I+J 00:32 0:00.04 /usr/bin/make -C /usr/ports/devel/doxygen build
nobody 55625 0.0 0.0 11660 1952 0 I+J 00:32 0:00.00 - /bin/sh -e -c (cd /wrkdirs/usr/ports/devel/dox
ygen/work/.build; if ! /usr/bin/env XDG_DATA_HOME=/wrkdirs/usr/ports/devel/doxygen/work XDG_CONFIG_HOME=/wr
kdirs/usr/ports/devel/doxygen/work HOME=/wrkdirs/usr/ports/devel/doxygen/work TMPDIR="/tmp" PATH=/wrkdirs/u
sr/ports/devel/doxygen/work/.bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/usr/local/bin:/nonexistent/b
in NO_PIE=yes MK_DEBUG_FILES=no MK_KERNEL_SYMBOLS=no SHELL=/bin/sh NO_LINT=YES PREFIX=/usr/local LOCALBASE=
/usr/local LIBDIR="/usr/lib" CC="cc" CFLAGS="-O2 -pipe -DLIBICONV_PLUG -fstack-protector -fno-strict-alia
sing" CPP="cpp" CPPFLAGS="-DLIBICONV_PLUG" LDFLAGS=" -fstack-protector" LIBS="" CXX="c++" CXXFLAGS="-O2 -
pipe -DLIBICONV_PLUG -fstack-protector -fno-strict-aliasing -DLIBICONV_PLUG" MANPREFIX="/usr/local" BSD_IN
STALL_PROGRAM="install -s -m 555" BSD_INSTALL_LIB="install -s -m 0644" BSD_INSTALL_SCRIPT="install -m 5
55" BSD_INSTALL_DATA="install -m 0644" BSD_INSTALL_MAN="install -m 444" /usr/bin/make -f Makefile all
docs; then if [ -n "" ] ; then echo "===> Compilation failed unexpectedly."; (echo "") | /usr/bin/fmt 75
79 ; fi; false; fi)
nobody 55636 0.0 0.0 9988 1108 0 I+J 00:32 0:00.01 `-- /usr/bin/make -f Makefile all docs
nobody 6734 0.0 0.0 10140 1216 0 I+J 00:42 0:00.00 `-- /usr/bin/make -f CMakeFiles/Makefile2 docs
nobody 6764 0.0 0.0 10140 1216 0 I+J 00:42 0:00.01 `-- /usr/bin/make -f CMakeFiles/Makefile2 do
c/CMakeFiles/docs.dir/all
nobody 7107 0.0 0.0 10512 1536 0 I+J 00:42 0:00.03 `-- /usr/bin/make -f examples/CMakeFiles/e
xamples.dir/build.make examples/CMakeFiles/examples.dir/build
nobody 12111 0.0 0.0 61468 27060 0 I+J 00:43 0:00.16 `-- ../bin/doxygen diagrams.cfg
Killed
build of devel/doxygen | doxygen-1.8.13_1,2 ended at Sat Dec 30 18:44:47 PST 2017
build time: 02:14:51
!!! build failure encountered !!!

Don Lewis

unread,
Jan 19, 2018, 6:10:28 PM1/19/18
to
On 19 Jan, Mike Tancsa wrote:
> On 1/19/2018 3:48 PM, Peter Moody wrote:
>>
>> weirdly enough though, with SMT enabled, building net/samba47 would
>> always hang (like compilation segfaults). with SMT disabled, no such
>> problems.
>
> wow, thats so strange! I just tried,and the same thing and see it as well.
>
> [ 442/3804] Generating lib/ldb-samba/ldif_handlers_proto.h
> [ 443/3804] Generating source4/lib/registry/tools/common.h
> runner /usr/local/bin/perl
> "/usr/ports/net/samba47/work/samba-4.7.4/source4/script/mkproto.pl"
> --srcdir=.. --builddir=. --public=/dev/null
> --private="default/lib/ldb-samba/ldif_handlers_proto.h"
> ../lib/ldb-samba/ldif_handlers.c ../lib/ldb-samba/ldb_matching_rules.c
> [ 444/3804] Generating source4/lib/registry/tests/proto.h
> runner /usr/local/bin/perl
> "/usr/ports/net/samba47/work/samba-4.7.4/source4/script/mkproto.pl"
> --srcdir=.. --builddir=. --public=/dev/null
> --private="default/source4/lib/registry/tools/common.h"
> ../source4/lib/registry/tools/common.c
>
>
> And it just hangs there. No segfaults, but it just hangs.
>
> A ctrl+t shows just shows
>
> load: 0.16 cmd: python2.7 65754 [usem] 589.51r 10.52u 1.63s 0% 122360k
> make: Working in: /usr/ports/net/samba47

I just tried building samba47 here. Top shows python spending a lot of
time in that state and steadily growing in size, but forward progress
does happen. I got a successful build:
[00:07:54] [01] [00:06:31] Finished net/samba47 | samba47-4.7.4_1: Success

I'm currently running:
FreeBSD 12.0-CURRENT #0 r327261M: Wed Dec 27 22:44:16 PST 2017

Don Lewis

unread,
Jan 19, 2018, 6:16:40 PM1/19/18
to
On 19 Jan, Mike Tancsa wrote:
What FreeBSD version are you running? It looks like the amdtemp changes
for Ryzen are only in 12.0-CURRENT. It looks like r323185 and r323195
need to be merged to stable/11.

Mike Tancsa

unread,
Jan 19, 2018, 9:26:42 PM1/19/18
to
On 1/19/2018 5:45 PM, Don Lewis wrote:
>>
>> And it just hangs there. No segfaults, but it just hangs.
>>
>> A ctrl+t shows just shows
>>
>> load: 0.16 cmd: python2.7 65754 [usem] 589.51r 10.52u 1.63s 0% 122360k
>> make: Working in: /usr/ports/net/samba47
>
> I sometimes seen build runaways when using poudriere to build my
> standard set of packages on my Ryzen machine. I don't think this is a
> Ryzen-specific issue since I also see the same on older AMD FX-8320E
> machine, but much less frequently there. It looks like a lost wakeup
> issue, but I haven't had a chance to dig into it yet.

Odd, does this happen on Intel machines too ?

---Mike


--
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mi...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada http://www.tancsa.com/

Mike Tancsa

unread,
Jan 19, 2018, 9:42:04 PM1/19/18
to
On 1/19/2018 6:16 PM, Don Lewis wrote:
>>
>> 0(ms-v1)# kldload amdtemp
>> 0(ms-v1)# dmesg | tail -2
>> ums0: at uhub0, port 3, addr 1 (disconnected)
>> ums0: detached
>> 0(ms-v1)#
>
> What FreeBSD version are you running? It looks like the amdtemp changes
> for Ryzen are only in 12.0-CURRENT. It looks like r323185 and r323195
> need to be merged to stable/11.

releng11. It seems amdsmn is needed as well

---Mike

>
>


--
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mi...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada http://www.tancsa.com/

Mike Tancsa

unread,
Jan 19, 2018, 9:42:25 PM1/19/18
to
On 1/19/2018 6:09 PM, Don Lewis wrote:
>
> I just tried building samba47 here. Top shows python spending a lot of
> time in that state and steadily growing in size, but forward progress
> does happen. I got a successful build:
> [00:07:54] [01] [00:06:31] Finished net/samba47 | samba47-4.7.4_1: Success
>
> I'm currently running:
> FreeBSD 12.0-CURRENT #0 r327261M: Wed Dec 27 22:44:16 PST 2017

RELENG11
FreeBSD ms-v1.sentex.ca 11.1-STABLE FreeBSD 11.1-STABLE #0 r328163: Fri
Jan 19 09:57:36 EST 2018

Still no luck building.

Its been stuck here for 20 min

[ 214/3804] Compiling source4/heimdal/lib/roken/estrdup.c
runner cc -pipe -g -ggdb -gdwarf-2 -gstrict-dwarf -DLIBICONV_PLUG
-fno-color-diagnostics -D_FUNCTION_DEF -g -fstack-protector
-DLDAP_DEPRECATED -fno-strict-aliasing -fno-omit-frame-pointer
-DSOCKET_WRAPPER_DISABLE=1 -D_SAMBA_HOSTCC_ -fPIC -D_REENTRANT
-D_POSIX_PTHREAD_SEMANTICS -DSTATIC_ROKEN_HOSTCC_MODULES=NULL
-DSTATIC_ROKEN_HOSTCC_MODULES_PROTO=extern void
__ROKEN_HOSTCC_dummy_module_proto(void) -MD
-Idefault/source4/heimdal_build -I../source4/heimdal_build
-Idefault/source4/heimdal/lib/roken -I../source4/heimdal/lib/roken
-Idefault/include/public -I../include/public -Idefault/source4
-I../source4 -Idefault/lib -I../lib -Idefault/source4/lib
-I../source4/lib -Idefault/source4/include -I../source4/include
-Idefault/include -I../include -Idefault/lib/replace -I../lib/replace
-Idefault -I.. -I/usr/local/include -I/usr/local/include -DLIBICONV_PLUG
-D_SAMBA_BUILD_=4 -DHAVE_CONFIG_H=1 -D_GNU_SOURCE=1
-D_XOPEN_SOURCE_EXTENDED=1 ../source4/heimdal/lib/roken/estrdup.c -c -o
default/source4/heimdal/lib/roken/estrdup_3.o


load: 0.11 cmd: python2.7 72089 [usem] 996.84r 10.84u 0.47s 0% 99788k
make: Working in: /usr/ports/net/samba47
make[1]: Working in: /usr/ports/net/samba47


--
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mi...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada http://www.tancsa.com/

Mike Tancsa

unread,
Jan 19, 2018, 9:52:32 PM1/19/18
to
On 1/19/2018 5:29 PM, Don Lewis wrote:
> On 19 Jan, Pete French wrote:
>> Out of interest, is there anyone out there running Ryzen who *hasnt*
>> seen lockups ? I'd be curious if there a lot of lurkers thinking "mine
>> works fine"
>
> No hangs or silent reboots here with either my original CPU or warranty
> replacement once the shared page fix was in place.


Hmmm, I wonder if I have a pair of the old CPUs (came from 2 different
suppliers however).

---Mike


--
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mi...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada http://www.tancsa.com/

Don Lewis

unread,
Jan 19, 2018, 10:06:42 PM1/19/18
to
On 19 Jan, Mike Tancsa wrote:
> On 1/19/2018 5:45 PM, Don Lewis wrote:
>>>
>>> And it just hangs there. No segfaults, but it just hangs.
>>>
>>> A ctrl+t shows just shows
>>>
>>> load: 0.16 cmd: python2.7 65754 [usem] 589.51r 10.52u 1.63s 0% 122360k
>>> make: Working in: /usr/ports/net/samba47
>>
>> I sometimes seen build runaways when using poudriere to build my
>> standard set of packages on my Ryzen machine. I don't think this is a
>> Ryzen-specific issue since I also see the same on older AMD FX-8320E
>> machine, but much less frequently there. It looks like a lost wakeup
>> issue, but I haven't had a chance to dig into it yet.
>
> Odd, does this happen on Intel machines too ?

Unknown. The last one of those I had was a Pentium III ...

Don Lewis

unread,
Jan 19, 2018, 10:12:44 PM1/19/18
to
On 19 Jan, Mike Tancsa wrote:
> On 1/19/2018 6:16 PM, Don Lewis wrote:
>>>
>>> 0(ms-v1)# kldload amdtemp
>>> 0(ms-v1)# dmesg | tail -2
>>> ums0: at uhub0, port 3, addr 1 (disconnected)
>>> ums0: detached
>>> 0(ms-v1)#
>>
>> What FreeBSD version are you running? It looks like the amdtemp changes
>> for Ryzen are only in 12.0-CURRENT. It looks like r323185 and r323195
>> need to be merged to stable/11.
>
> releng11. It seems amdsmn is needed as well

That sounds right.

Don Lewis

unread,
Jan 19, 2018, 10:29:23 PM1/19/18
to
On 19 Jan, Mike Tancsa wrote:
> On 1/19/2018 5:29 PM, Don Lewis wrote:
>> On 19 Jan, Pete French wrote:
>>> Out of interest, is there anyone out there running Ryzen who *hasnt*
>>> seen lockups ? I'd be curious if there a lot of lurkers thinking "mine
>>> works fine"
>>
>> No hangs or silent reboots here with either my original CPU or warranty
>> replacement once the shared page fix was in place.
>
>
> Hmmm, I wonder if I have a pair of the old CPUs (came from 2 different
> suppliers however).

The only real problem with the old CPUs is the random segfault problem
and some other random strangeness, like the lang/ghc build almost always
failing.

Mike Tancsa

unread,
Jan 19, 2018, 10:45:16 PM1/19/18
to
On 1/19/2018 9:18 PM, Don Lewis wrote:
> On 19 Jan, Mike Tancsa wrote:
>> On 1/19/2018 5:45 PM, Don Lewis wrote:
>>>>
>>>> And it just hangs there. No segfaults, but it just hangs.
>>>>
>>>> A ctrl+t shows just shows
>>>>
>>>> load: 0.16 cmd: python2.7 65754 [usem] 589.51r 10.52u 1.63s 0% 122360k
>>>> make: Working in: /usr/ports/net/samba47
>>>
>>> I sometimes seen build runaways when using poudriere to build my
>>> standard set of packages on my Ryzen machine. I don't think this is a
>>> Ryzen-specific issue since I also see the same on older AMD FX-8320E
>>> machine, but much less frequently there. It looks like a lost wakeup
>>> issue, but I haven't had a chance to dig into it yet.
>>
>> Odd, does this happen on Intel machines too ?
>
> Unknown. The last one of those I had was a Pentium III ...

it builds without issue on a couple of Intel boxes I tried. The AMD one
is still stuck after 30 min at the same spot :(







--
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mi...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada http://www.tancsa.com/

Mark Millard via freebsd-stable

unread,
Jan 21, 2018, 5:31:59 AM1/21/18
to
Don Lewis truckman at FreeBSD.org wrote on
Sat Jan 20 02:35:40 UTC 2018 :

> The only real problem with the old CPUs is the random segfault problem
> and some other random strangeness, like the lang/ghc build almost always
> failing.


At one time you had written
( https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221029
comment #103 on 2017-Oct-09):

QUOTE
The ghc build failure seems to be gone after upgrading the a
more recent 12.0-CURRENT. I will try to bisect for the fix
when I have a chance.
END QUOTE

Did that not pan out? Did you conclude it was
hardware-context specific?


===
Mark Millard
marklmi26-fbsd at yahoo.com
( markmi at dsl-only.net is
going away in 2018-Feb, late)

Nimrod Levy

unread,
Jan 21, 2018, 8:51:35 AM1/21/18
to
almost 2 days uptime with a lower memory clock. still holding my breath,
but this seems promising.

Peter Moody

unread,
Jan 21, 2018, 2:06:36 PM1/21/18
to
hm, so i've got nearly 3 days of uptime with smt disabled.
unfortunately this means that my otherwise '12' cores is actually only
'6'. I'm also getting occasional segfaults compiling go programs.

should I just RMA this beast again?

Willem Jan Withagen

unread,
Jan 21, 2018, 3:03:18 PM1/21/18
to
On 19/01/2018 23:29, Don Lewis wrote:
> On 19 Jan, Pete French wrote:
>> Out of interest, is there anyone out there running Ryzen who *hasnt*
>> seen lockups ? I'd be curious if there a lot of lurkers thinking "mine
>> works fine"
>
> No hangs or silent reboots here with either my original CPU or warranty
> replacement once the shared page fix was in place.

Perhaps a too weird reference:

I have supplied a customer with a Ryzen5 and a 350MB motherboard.
But he runs Windows 10, but I haven't heard him complain about anything
like this.
But I'll ask him specific.

--WjW

Don Lewis

unread,
Jan 21, 2018, 3:18:24 PM1/21/18
to
On 20 Jan, Mark Millard wrote:
> Don Lewis truckman at FreeBSD.org wrote on
> Sat Jan 20 02:35:40 UTC 2018 :
>
>> The only real problem with the old CPUs is the random segfault problem
>> and some other random strangeness, like the lang/ghc build almost always
>> failing.
>
>
> At one time you had written
> ( https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221029
> comment #103 on 2017-Oct-09):
>
> QUOTE
> The ghc build failure seems to be gone after upgrading the a
> more recent 12.0-CURRENT. I will try to bisect for the fix
> when I have a chance.
> END QUOTE
>
> Did that not pan out? Did you conclude it was
> hardware-context specific?

I was never able to reproduce the problem. It seems like it failed on
the first ports build run after I replaced the CPU. When I upgraded the
OS and ports, the build succeeded. I tried going back to much earlier
OS and ports versions, but I could never get the ghc build to fail
again. I'm baffled by this ...

Don Lewis

unread,
Jan 21, 2018, 3:24:57 PM1/21/18
to
On 21 Jan, Willem Jan Withagen wrote:
> On 19/01/2018 23:29, Don Lewis wrote:
>> On 19 Jan, Pete French wrote:
>>> Out of interest, is there anyone out there running Ryzen who *hasnt*
>>> seen lockups ? I'd be curious if there a lot of lurkers thinking "mine
>>> works fine"
>>
>> No hangs or silent reboots here with either my original CPU or warranty
>> replacement once the shared page fix was in place.
>
> Perhaps a too weird reference:
>
> I have supplied a customer with a Ryzen5 and a 350MB motherboard.
> But he runs Windows 10, but I haven't heard him complain about anything
> like this.
> But I'll ask him specific.

Only the BSDs were affected by the shared page issue. I think Linux
already had a guard page. I don't think Windows was affected by the
idle C-state issue. I suspect it is caused by software not doing the
right thing during C-state transitions, but the publicly available
documentation from AMD is pretty lacking. The random segfault issue is
primarily triggered by heavy parallel software build loads and how many
Windows users do that?

Willem Jan Withagen

unread,
Jan 21, 2018, 3:51:18 PM1/21/18
to
On 21/01/2018 21:24, Don Lewis wrote:
> On 21 Jan, Willem Jan Withagen wrote:
>> On 19/01/2018 23:29, Don Lewis wrote:
>>> On 19 Jan, Pete French wrote:
>>>> Out of interest, is there anyone out there running Ryzen who *hasnt*
>>>> seen lockups ? I'd be curious if there a lot of lurkers thinking "mine
>>>> works fine"
>>>
>>> No hangs or silent reboots here with either my original CPU or warranty
>>> replacement once the shared page fix was in place.
>>
>> Perhaps a too weird reference:
>>
>> I have supplied a customer with a Ryzen5 and a 350MB motherboard.
>> But he runs Windows 10, but I haven't heard him complain about anything
>> like this.
>> But I'll ask him specific.
>
> Only the BSDs were affected by the shared page issue. I think Linux
> already had a guard page. I don't think Windows was affected by the
> idle C-state issue. I suspect it is caused by software not doing the
> right thing during C-state transitions, but the publicly available
> documentation from AMD is pretty lacking. The random segfault issue is
> primarily triggered by heavy parallel software build loads and how many
> Windows users do that?

This is an adobe workstation where several users remote login and do
work. So I would assume that the system is seriously (ab)used.

Adn as expected I'm know aware of any of the detailed things that
Windows does while powering into lesser active states.

--WjW

Don Lewis

unread,
Jan 21, 2018, 7:53:12 PM1/21/18
to
On 21 Jan, Peter Moody wrote:
> hm, so i've got nearly 3 days of uptime with smt disabled.
> unfortunately this means that my otherwise '12' cores is actually only
> '6'. I'm also getting occasional segfaults compiling go programs.

Both my original and replacement CPUs croak on go, so I don't think an
RMA is likely to help with that. Go is a heavy user of threads and my
suspicion is that there is some sort of issue with the locking that is
uses. I'm guessing a memory barrier issue of some sort ...

Don Lewis

unread,
Jan 21, 2018, 8:01:03 PM1/21/18
to
It might depend on the scheduler details. On Linux and the FreeBSD ULE
scheduler, runnable threads migrate between CPUs to balance the loading
across all cores. When I did some experiments to disable that, the rate
of build failures greatly decreased. AMD has been very vague about the
cause of the problem (a "performance marginality") and resorted to
replacing CPUs with this problem without suggesting any sort of software
workaround.

Mark Millard via freebsd-stable

unread,
Jan 21, 2018, 8:03:21 PM1/21/18
to
On 2018-Jan-21, at 12:17 PM, Don Lewis <truckman at FreeBSD.org> wrote:

> On 20 Jan, Mark Millard wrote:
>> Don Lewis truckman at FreeBSD.org wrote on
>> Sat Jan 20 02:35:40 UTC 2018 :
>>
>>> The only real problem with the old CPUs is the random segfault problem
>>> and some other random strangeness, like the lang/ghc build almost always
>>> failing.
>>
>>
>> At one time you had written
>> ( https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221029
>> comment #103 on 2017-Oct-09):
>>
>> QUOTE
>> The ghc build failure seems to be gone after upgrading the a
>> more recent 12.0-CURRENT. I will try to bisect for the fix
>> when I have a chance.
>> END QUOTE
>>
>> Did that not pan out? Did you conclude it was
>> hardware-context specific?
>
> I was never able to reproduce the problem. It seems like it failed on
> the first ports build run after I replaced the CPU. When I upgraded the
> OS and ports, the build succeeded. I tried going back to much earlier
> OS and ports versions, but I could never get the ghc build to fail
> again. I'm baffled by this ...

Sounds like the overall information is then:

Old CPU: frequent problem building ghc (nearly always
fails as far as I know)

New CPU: rare problem building ghc
(possibly never for some softare version combinations?)

(On a Ryzen Threadripper 1950X I've not seen a failure. For the
above I'm including what I observed under Hyper-V for the 1800X
and 1950X as contributing evidence: The 1800X was a early one
and fit the "Old CPU" case above. AMD has stated that
threadrippers never had the problems that other, early Ryzen
CPUs did for heavy compiling use. So far, for me, that seems
true.)

So, it sounds like building ghc is still a good test. Back when
I had access to the 1800X Ryzen system ghc was the most reliable
failure-to-build of what I tried. It still may be useful for
that sort of test activity to classify Ryzen CPUs for the one
type of issue.

===
Mark Millard
marklmi at yahoo.com
( markmi at dsl-only.net is
going away in 2018-Feb, late)

Pete French

unread,
Jan 22, 2018, 6:48:44 AM1/22/18
to


On 21/01/2018 19:05, Peter Moody wrote:
> hm, so i've got nearly 3 days of uptime with smt disabled.
> unfortunately this means that my otherwise '12' cores is actually only
> '6'. I'm also getting occasional segfaults compiling go programs.

Isn't go known to have issues on BSD anyway though ? I have seen
complaints of random crashes running go under BSD systems - and
preseumably the go compiler itself is written in go, so those issues
might surface when compiling.

Don Lewis

unread,
Jan 22, 2018, 1:26:05 PM1/22/18
to
On 22 Jan, Pete French wrote:
>
>
> On 21/01/2018 19:05, Peter Moody wrote:
>> hm, so i've got nearly 3 days of uptime with smt disabled.
>> unfortunately this means that my otherwise '12' cores is actually only
>> '6'. I'm also getting occasional segfaults compiling go programs.
>
> Isn't go known to have issues on BSD anyway though ? I have seen
> complaints of random crashes running go under BSD systems - and
> preseumably the go compiler itself is written in go, so those issues
> might surface when compiling.

Not that I'm aware of. I'm not a heavy go user on FreeBSD, but I don't
recall any unexpected go crashes and I haven't seen problems building
go on my older AMD machines.

Mike Tancsa

unread,
Jan 22, 2018, 1:32:56 PM1/22/18
to
On 1/22/2018 1:25 PM, Don Lewis wrote:
> On 22 Jan, Pete French wrote:
>>
>>
>> On 21/01/2018 19:05, Peter Moody wrote:
>>> hm, so i've got nearly 3 days of uptime with smt disabled.
>>> unfortunately this means that my otherwise '12' cores is actually only
>>> '6'. I'm also getting occasional segfaults compiling go programs.
>>
>> Isn't go known to have issues on BSD anyway though ? I have seen
>> complaints of random crashes running go under BSD systems - and
>> preseumably the go compiler itself is written in go, so those issues
>> might surface when compiling.
>
> Not that I'm aware of. I'm not a heavy go user on FreeBSD, but I don't
> recall any unexpected go crashes and I haven't seen problems building
> go on my older AMD machines.

We use go quite a bit on one customer app and its quite stable. But
thats a FreeBSD RELENG_10 box on an intel chip.

---Mike


--
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mi...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada http://www.tancsa.com/

Mike Tancsa

unread,
Jan 22, 2018, 1:51:21 PM1/22/18
to
On 1/22/2018 1:41 PM, Peter Moody wrote:
> fwiw, I upgraded to 11-STABLE (11.1-STABLE #6 r328223), applied the
> hw.lower_amd64_sharedpage setting to my loader.conf and got a crash
> last night following the familiar high load -> idle. this was with SMT
> re-enabled. no crashdump, so it was the hard crash that I've been
> getting.

hw.lower_amd64_sharedpage=1 is the default on AMD boxes no ? I didnt
need to set mine to 1

>
> shrug, I'm at a loss here.

I am trying an RMA with AMD.

---Mike

Mike Tancsa

unread,
Jan 22, 2018, 3:57:59 PM1/22/18
to
On 1/21/2018 3:24 PM, Don Lewis wrote:
>>
>> I have supplied a customer with a Ryzen5 and a 350MB motherboard.
>> But he runs Windows 10, but I haven't heard him complain about anything
>> like this.
>> But I'll ask him specific.
>
> Only the BSDs were affected by the shared page issue. I think Linux
> already had a guard page. I don't think Windows was affected by the
> idle C-state issue. I suspect it is caused by software not doing the
> right thing during C-state transitions, but the publicly available
> documentation from AMD is pretty lacking. The random segfault issue is
> primarily triggered by heavy parallel software build loads and how many
> Windows users do that?


Are all the AMD accomodations that DragonFly did in FreeBSD ?

http://lists.dragonflybsd.org/pipermail/commits/2017-August/626190.html


---Mike






--
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mi...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada http://www.tancsa.com/

Don Lewis

unread,
Jan 22, 2018, 4:08:08 PM1/22/18
to
On 22 Jan, Mike Tancsa wrote:
> On 1/21/2018 3:24 PM, Don Lewis wrote:
>>>
>>> I have supplied a customer with a Ryzen5 and a 350MB motherboard.
>>> But he runs Windows 10, but I haven't heard him complain about anything
>>> like this.
>>> But I'll ask him specific.
>>
>> Only the BSDs were affected by the shared page issue. I think Linux
>> already had a guard page. I don't think Windows was affected by the
>> idle C-state issue. I suspect it is caused by software not doing the
>> right thing during C-state transitions, but the publicly available
>> documentation from AMD is pretty lacking. The random segfault issue is
>> primarily triggered by heavy parallel software build loads and how many
>> Windows users do that?
>
>
> Are all the AMD accomodations that DragonFly did in FreeBSD ?
>
> http://lists.dragonflybsd.org/pipermail/commits/2017-August/626190.html

We only lowered the top of user space by 4KB, which should be
sufficient, and we unmapped the boundary page. The signal trampoline
was already in a separate page than the stack.

Pete French

unread,
Jan 23, 2018, 5:17:55 AM1/23/18
to
On 22/01/2018 18:25, Don Lewis wrote:
> On 22 Jan, Pete French wrote:
>>
>>
>> On 21/01/2018 19:05, Peter Moody wrote:
>>> hm, so i've got nearly 3 days of uptime with smt disabled.
>>> unfortunately this means that my otherwise '12' cores is actually only
>>> '6'. I'm also getting occasional segfaults compiling go programs.
>>
>> Isn't go known to have issues on BSD anyway though ? I have seen
>> complaints of random crashes running go under BSD systems - and
>> preseumably the go compiler itself is written in go, so those issues
>> might surface when compiling.
>
> Not that I'm aware of. I'm not a heavy go user on FreeBSD, but I don't
> recall any unexpected go crashes and I haven't seen problems building
> go on my older AMD machines.


From the go 1.9 release notes:

"Known Issues
There are some instabilities on FreeBSD that are known but not
understood. These can lead to program crashes in rare cases. See issue
15658. Any help in solving this FreeBSD-specific issue would be
appreciated."

( link is to https://github.com/golang/go/issues/15658 )

Having said that, we use it internally and have not seen any issues with
it ourselves. Just I am wary of the release notes, and that issue report.

Mike Tancsa

unread,
Jan 23, 2018, 12:16:15 PM1/23/18
to
On 1/22/2018 5:13 PM, Don Lewis wrote:
> On 22 Jan, Mike Tancsa wrote:
>> On 1/22/2018 1:41 PM, Peter Moody wrote:
>>> fwiw, I upgraded to 11-STABLE (11.1-STABLE #6 r328223), applied the
>>> hw.lower_amd64_sharedpage setting to my loader.conf and got a crash
>>> last night following the familiar high load -> idle. this was with SMT
>>> re-enabled. no crashdump, so it was the hard crash that I've been
>>> getting.
>>
>> hw.lower_amd64_sharedpage=1 is the default on AMD boxes no ? I didnt
>> need to set mine to 1
>>
>>>
>>> shrug, I'm at a loss here.
>>
>> I am trying an RMA with AMD.
>
> Something else that you might want to try is 12.0-CURRENT. There might
> be some changes in HEAD that need to be merged back to 11.1-STABLE.


Temp works as expected now. However, a (similar?) hang building Samba47.

ctrl+T shows


load: 1.98 cmd: python2.7 53438 [usem] 54.70r 14.98u 6.04s 0% 230992k
make: Working in: /usr/ports/net/samba47
load: 0.34 cmd: python2.7 53438 [usem] 168.48r 14.98u 6.04s 0% 230992k
make: Working in: /usr/ports/net/samba47
load: 0.31 cmd: python2.7 53438 [usem] 174.12r 14.98u 6.04s 0% 230992k
make: Working in: /usr/ports/net/samba47

Going to try the RMA route and see if the replacement CPU avoids this
problem.


# uname -a
FreeBSD amdtestr12.sentex.ca 12.0-CURRENT FreeBSD 12.0-CURRENT #1
r328282: Tue Jan 23 11:34:18 EST 2018
mdta...@amdtestr12.sentex.ca:/usr/obj/usr/src/amd64.amd64/sys/server amd64



dev.amdtemp.0.core0.sensor0: 52.6C
dev.amdtemp.0.sensor_offset: 0
dev.amdtemp.0.%parent: hostb0
dev.amdtemp.0.%pnpinfo:
dev.amdtemp.0.%location:
dev.amdtemp.0.%driver: amdtemp
dev.amdtemp.0.%desc: AMD CPU On-Die Thermal Sensors
dev.amdtemp.%parent:
dev.cpu.11.temperature: 52.6C
dev.cpu.10.temperature: 52.6C
dev.cpu.9.temperature: 52.6C
dev.cpu.8.temperature: 52.6C
dev.cpu.7.temperature: 52.6C
dev.cpu.6.temperature: 52.6C
dev.cpu.5.temperature: 52.6C
dev.cpu.4.temperature: 52.6C
dev.cpu.3.temperature: 52.6C
dev.cpu.2.temperature: 52.6C
dev.cpu.1.temperature: 52.6C
dev.cpu.0.temperature: 52.6C

Mike Tancsa

unread,
Jan 23, 2018, 2:09:14 PM1/23/18
to
On 1/22/2018 5:13 PM, Don Lewis wrote:
>>
>> I am trying an RMA with AMD.
>
> Something else that you might want to try is 12.0-CURRENT. There might
> be some changes in HEAD that need to be merged back to 11.1-STABLE.

It looks like this thread got mention on phorix :) In the comments
section (comment #9) a post makes reference to

http://blog.programster.org/ubuntu-16-04-compile-custom-kernel-for-ryzen

I guess Linux is still working through similar lockups too :(

Andriy Gapon

unread,
Jan 23, 2018, 2:22:59 PM1/23/18
to
On 23/01/2018 19:15, Mike Tancsa wrote:
> On 1/22/2018 5:13 PM, Don Lewis wrote:
>> On 22 Jan, Mike Tancsa wrote:
>>> On 1/22/2018 1:41 PM, Peter Moody wrote:
>>>> fwiw, I upgraded to 11-STABLE (11.1-STABLE #6 r328223), applied the
>>>> hw.lower_amd64_sharedpage setting to my loader.conf and got a crash
>>>> last night following the familiar high load -> idle. this was with SMT
>>>> re-enabled. no crashdump, so it was the hard crash that I've been
>>>> getting.
>>>
>>> hw.lower_amd64_sharedpage=1 is the default on AMD boxes no ? I didnt
>>> need to set mine to 1
>>>
>>>>
>>>> shrug, I'm at a loss here.
>>>
>>> I am trying an RMA with AMD.
>>
>> Something else that you might want to try is 12.0-CURRENT. There might
>> be some changes in HEAD that need to be merged back to 11.1-STABLE.
>
>
> Temp works as expected now. However, a (similar?) hang building Samba47.
>
> ctrl+T shows

If that works, then maybe you can get procstat -kk -a or a crash dump.
Maybe this is not a hardware problem at all (or maybe it is).

> load: 1.98 cmd: python2.7 53438 [usem] 54.70r 14.98u 6.04s 0% 230992k
> make: Working in: /usr/ports/net/samba47
> load: 0.34 cmd: python2.7 53438 [usem] 168.48r 14.98u 6.04s 0% 230992k
> make: Working in: /usr/ports/net/samba47
> load: 0.31 cmd: python2.7 53438 [usem] 174.12r 14.98u 6.04s 0% 230992k
> make: Working in: /usr/ports/net/samba47



--
Andriy Gapon

Mike Tancsa

unread,
Jan 23, 2018, 2:26:48 PM1/23/18
to
On 1/23/2018 2:22 PM, Andriy Gapon wrote:
>>
>> ctrl+T shows
>
> If that works, then maybe you can get procstat -kk -a or a crash dump.
> Maybe this is not a hardware problem at all (or maybe it is).

Unfortunately all 3 CPUs are packed up now and on their way to AMD for
RMA. As soon as I get some replacements, I will get back to this. I am
thinking of looking at a ThreadRipper board in the mean time as the Epyc
ones are on 2-4week back order from my suppliers :(


>
>> load: 1.98 cmd: python2.7 53438 [usem] 54.70r 14.98u 6.04s 0% 230992k
>> make: Working in: /usr/ports/net/samba47
>> load: 0.34 cmd: python2.7 53438 [usem] 168.48r 14.98u 6.04s 0% 230992k
>> make: Working in: /usr/ports/net/samba47
>> load: 0.31 cmd: python2.7 53438 [usem] 174.12r 14.98u 6.04s 0% 230992k
>> make: Working in: /usr/ports/net/samba47
>
>
>


--
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mi...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada http://www.tancsa.com/

Mike Tancsa

unread,
Jan 23, 2018, 3:14:25 PM1/23/18
to
On 1/23/2018 2:08 PM, Mike Tancsa wrote:
> On 1/22/2018 5:13 PM, Don Lewis wrote:
>>>
>>> I am trying an RMA with AMD.
>>
>> Something else that you might want to try is 12.0-CURRENT. There might
>> be some changes in HEAD that need to be merged back to 11.1-STABLE.
>
> It looks like this thread got mention on phorix :) In the comments
> section (comment #9) a post makes reference to
>
> http://blog.programster.org/ubuntu-16-04-compile-custom-kernel-for-ryzen
>
> I guess Linux is still working through similar lockups too :(

Ubuntu has a patch / workaround for these random lockups which
symptomatically sound very similar to what some of us have been experiencing

https://bugs.launchpad.net/linux/+bug/1690085/comments/69

Pete French

unread,
Jan 23, 2018, 3:34:57 PM1/23/18
to
On 23/01/2018 19:08, Mike Tancsa wrote:
> It looks like this thread got mention on phorix :) In the comments
> section (comment #9) a post makes reference to
>
> http://blog.programster.org/ubuntu-16-04-compile-custom-kernel-for-ryzen
>
> I guess Linux is still working through similar lockups too :(


Interesting - do we have anything like RCU implemented in the kernel
which might be worth looking at ? From a quick glance it looks like its
just a software technique, so I cant see which bits of the CPU it's
tickling that might cause issues though.

Nimrod Levy

unread,
Jan 24, 2018, 6:59:45 AM1/24/18
to
Changing the memory speed from 2400 to 2133 in the BIOS has given me 4.5
days of uptime so far.
The memory is supposed to be 2400.
I'm still confused, but I'll accept the result
>> > _______________________________________________
>> > freebsd...@freebsd.org mailing list
>> > https://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> > To unsubscribe, send any mail to "
>> freebsd-stabl...@freebsd.org"
>> >
>> >
>>
>>
>> --
>> -------------------
>> Mike Tancsa, tel +1 519 651 3400 <(519)%20651-3400>
>> Sentex Communications, mi...@sentex.net
>> Providing Internet services since 1994 www.sentex.net
>> Cambridge, Ontario Canada http://www.tancsa.com/
>> _______________________________________________
>> freebsd...@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> To unsubscribe, send any mail to "freebsd-stabl...@freebsd.org"
>>
>
>
> --
>
> --
> Nimrod
>
--

--
Nimrod

Mike Pumford

unread,
Jan 24, 2018, 7:08:07 AM1/24/18
to
On 24/01/2018 11:55, Nimrod Levy wrote:
> Changing the memory speed from 2400 to 2133 in the BIOS has given me 4.5
> days of uptime so far.
> The memory is supposed to be 2400.
> I'm still confused, but I'll accept the result
>
I've run into this on modern Intel systems as well. The RAM is sold as
2400 but thats actually an overclock profile. If I actually enabled it
(despite both board and RAM being qualified for that) the system ends up
locking up or crashing as soon as you stress it. Go back to the standard
DDR profile advertised by the RAM and it is totally stable.

Mike
--
Mike Pumford | Senior Software Engineer

T: +44 (0) 1225 710635

BSQUARE - The business of IoT

www.bsquare.com <http://www.bsquare.com/>

Nimrod Levy

unread,
Jan 24, 2018, 7:15:26 AM1/24/18
to
The RAM was detected by the MB as 2400. I didn't change it until I set it
to the slower speed.

On Wed, Jan 24, 2018 at 7:06 AM Mike Pumford <mich...@bsquare.com> wrote:

> On 24/01/2018 11:55, Nimrod Levy wrote:
> > Changing the memory speed from 2400 to 2133 in the BIOS has given me 4.5
> > days of uptime so far.
> > The memory is supposed to be 2400.
> > I'm still confused, but I'll accept the result
> >
> I've run into this on modern Intel systems as well. The RAM is sold as
> 2400 but thats actually an overclock profile. If I actually enabled it
> (despite both board and RAM being qualified for that) the system ends up
> locking up or crashing as soon as you stress it. Go back to the standard
> DDR profile advertised by the RAM and it is totally stable.
>
> Mike
> --
> Mike Pumford | Senior Software Engineer
>
> T: +44 (0) 1225 710635 <+44%201225%20710635>
>
> BSQUARE - The business of IoT
>
> www.bsquare.com <http://www.bsquare.com/>
> _______________________________________________
> freebsd...@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stabl...@freebsd.org"
>
--

--
Nimrod

Mike Pumford

unread,
Jan 24, 2018, 7:29:38 AM1/24/18
to
On 24/01/2018 12:11, Nimrod Levy wrote:
> The RAM was detected by the MB as 2400.  I didn't change it until I set
> it to the slower speed.
>
I guess the Intel motherboards I have are more conservative then. They
default to the standard RAM profile (slower than what it is sold as) and
you have to explicitly enable the faster profiles (which do also come
from data read from RAM). So it seems like your BIOS vendor is picking
the faster profile as a default.

I've got a couple of intel systems like this (one windows and one BSD)
and neither ran stable with the faster RAM profiles.

From what I read at the time 2133 is the official upper limit of the
DDR4 standard. Any speed faster than that is an overclock profile.

Mike

--
Mike Pumford | Senior Software Engineer

BSQUARE - The business of IoT

Nimrod Levy

unread,
Jan 24, 2018, 8:26:33 AM1/24/18
to
If my google-fu is any good today, DDR3 maxes out at 2133. DDR4 seems to
go up to 3200[1]. The motherboard claims to support all speeds.

This RAM is supposed to be DDR4-2400, but if it keeps things happy, I'll
run it at 2133.

[1] https://www.kingston.com/us/memory/ddr4

On Wed, Jan 24, 2018 at 7:28 AM Mike Pumford <mich...@bsquare.com> wrote:

> On 24/01/2018 12:11, Nimrod Levy wrote:
> > The RAM was detected by the MB as 2400. I didn't change it until I set
> > it to the slower speed.
> >
> I guess the Intel motherboards I have are more conservative then. They
> default to the standard RAM profile (slower than what it is sold as) and
> you have to explicitly enable the faster profiles (which do also come
> from data read from RAM). So it seems like your BIOS vendor is picking
> the faster profile as a default.
>
> I've got a couple of intel systems like this (one windows and one BSD)
> and neither ran stable with the faster RAM profiles.
>
> From what I read at the time 2133 is the official upper limit of the
> DDR4 standard. Any speed faster than that is an overclock profile.
>
> Mike
>
> --
> Mike Pumford | Senior Software Engineer
>
> T: +44 (0) 1225 710635 <+44%201225%20710635>
>
> BSQUARE - The business of IoT
>
> www.bsquare.com <http://www.bsquare.com/>
> _______________________________________________
> freebsd...@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stabl...@freebsd.org"
>
--

--
Nimrod

Mike Pumford

unread,
Jan 24, 2018, 8:32:34 AM1/24/18
to
On 24/01/2018 13:21, Nimrod Levy wrote:
> If my google-fu is any good today, DDR3 maxes out at 2133.  DDR4 seems
> to go up to 3200[1]. The motherboard claims to support all speeds.
>
> This RAM is supposed to be DDR4-2400, but if it keeps things happy, I'll
> run it at 2133.
>
> [1] https://www.kingston.com/us/memory/ddr4
>
Seems you are right. Its possible at least one of my systems is a DDR3
one then as the reading I did at the time said that 2133 was the max.

I'll recheck my systems and double check what they actually are. ;) I do
know that I tried the faster than default memory profiles on both and it
introduced instability that went away when I went back to the standard
profile. I've done BIOS updates since then so its worth me checking that
again anyway.

Mike
--
Mike Pumford | Senior Software Engineer

BSQUARE - The business of IoT

Mark Millard via freebsd-stable

unread,
Jan 24, 2018, 9:26:46 AM1/24/18
to
Mike Pumford michaelp at bsquare.com wrote on
Wed Jan 24 12:03:04 UTC 2018 :

> I've run into this on modern Intel systems as well. The RAM is sold as
> 2400 but thats actually an overclock profile. If I actually enabled it
> (despite both board and RAM being qualified for that) the system ends up
> locking up or crashing as soon as you stress it. Go back to the standard
> DDR profile advertised by the RAM and it is totally stable.

The reported fails are during idle time as I understand. Things are
working when the CPU's are kept busy from what I've read in the
various notes. The hang-ups are during idle times.

"the system ends up locking up or crashing as soon as you stress it"
does not sound like a matching context.

That a slower RAM speed might help idle behave correctly is interesting
given the Zen and Ryzen dependence on RAM speed for the speed of its
internal interconnect-fabric's operation.

I'll note that, if one goes through the referenced Linux exchanges about
this, Ryzen Threadripper's examples are also reported to have the problem.

===
Mark Millard
marklmi at yahoo.com
( markmi at dsl-only.net is
going away in 2018-Feb, late)

Mike Tancsa

unread,
Jan 24, 2018, 9:47:08 AM1/24/18
to
I think perhaps a good time to summarize as a few issues seem to be going on

a) fragile BIOS settings. There seems to be a number of issues around
RAM speeds and disabled C-STATES that impact stability. Specifically,
lowering the default frequency from 2400 to 2133 seems to help some
users with crashes / lockups under heavy loads.

b) CPUs manufactured prior to week 25 (some say week 33?) have a
hardware defect that manifests itself as segfaults in heavy compiles. I
was able to confirm this on 1 of the CPUs I had using a Linux setup. It
seems to confirm this, you need to physically look at the CPU for the
manufacturing date :( Not sure how to trigger it on FreeBSD reliably,
but there is a github project I used to verify on Linux
(https://github.com/suaefar/ryzen-test)

c) The idle lockup bug. This *seems* to be confirmed on Linux as well
http://blog.programster.org/ubuntu-16-04-compile-custom-kernel-for-ryzen
and
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1690085

d) Compile failures of some ports. For myself and one other user,
compiling net/samba47 reliably hangs in roughly the same place. Its not
clear if this is related to any of the above bugs or not.

Right now I have RMA'd my 3 CPUs back to AMD. Hopefully, I will get
replacements in a week and can get back to testing c) and d).

---Mike
--
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mi...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada http://www.tancsa.com/

Nimrod Levy

unread,
Jan 24, 2018, 10:12:02 AM1/24/18
to
I'm not sure this is characterized quite right. I've been seeing c) where
the system freezes completely and seems to be related to idle/ low load,
but the workaround that seems to be working around the issue is part of a)
where I lowered the memory speed.


On Wed, Jan 24, 2018 at 9:42 AM Mike Tancsa <mi...@sentex.net> wrote:

> I think perhaps a good time to summarize as a few issues seem to be going
> on
>
> a) fragile BIOS settings. There seems to be a number of issues around
> RAM speeds and disabled C-STATES that impact stability. Specifically,
> lowering the default frequency from 2400 to 2133 seems to help some
> users with crashes / lockups under heavy loads.
>
> b) CPUs manufactured prior to week 25 (some say week 33?) have a
> hardware defect that manifests itself as segfaults in heavy compiles. I
> was able to confirm this on 1 of the CPUs I had using a Linux setup. It
> seems to confirm this, you need to physically look at the CPU for the
> manufacturing date :( Not sure how to trigger it on FreeBSD reliably,
> but there is a github project I used to verify on Linux
> (https://github.com/suaefar/ryzen-test)
>
> c) The idle lockup bug. This *seems* to be confirmed on Linux as well
> http://blog.programster.org/ubuntu-16-04-compile-custom-kernel-for-ryzen
> and
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1690085
>
> d) Compile failures of some ports. For myself and one other user,
> compiling net/samba47 reliably hangs in roughly the same place. Its not
> clear if this is related to any of the above bugs or not.
>
> Right now I have RMA'd my 3 CPUs back to AMD. Hopefully, I will get
> replacements in a week and can get back to testing c) and d).
>
> ---Mike
>
> --

--
Nimrod

Mike Tancsa

unread,
Jan 24, 2018, 10:23:08 AM1/24/18
to
On 1/24/2018 10:07 AM, Nimrod Levy wrote:
> I'm not sure this is characterized quite right.  I've been seeing c)
> where the system freezes completely and seems to be related to idle/ low
> load, but the workaround that seems to be working around the issue is
> part of a) where I lowered the memory speed.

It did seem to reduce the frequencies of the lockups, but I still had at
least one with the RAM speed reduced. I want to rule out b) from all
this with replacement CPUs so will revisit in a week.

---Mike

--
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mi...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada http://www.tancsa.com/

Alban Hertroys

unread,
Jan 24, 2018, 11:33:31 AM1/24/18
to
On 24 January 2018 at 15:42, Mike Tancsa <mi...@sentex.net> wrote:
> b) CPUs manufactured prior to week 25 (some say week 33?) have a
> hardware defect that manifests itself as segfaults in heavy compiles. I
> was able to confirm this on 1 of the CPUs I had using a Linux setup. It
> seems to confirm this, you need to physically look at the CPU for the
> manufacturing date :( Not sure how to trigger it on FreeBSD reliably,
> but there is a github project I used to verify on Linux
> (https://github.com/suaefar/ryzen-test)

According to post #39 in the referenced Linux thread
(https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1690085): "It
was officially fixed for all ryzen manufactured after week 30."

I currently have a Ryzen 1600X from week 17/30, which suggests it will
have the bug. Unfortunately, I don't have it built into a system yet
as I'm waiting for DDR4 prices to become reasonable again before
ordering any, so I can't test it.

I'm not sure how solid this info is, should I RMA it without even
having tested that it has a problem?

Alban Hertroys
--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

Peter Moody

unread,
Jan 24, 2018, 12:39:59 PM1/24/18
to
> I currently have a Ryzen 1600X from week 17/30, which suggests it will
> have the bug. Unfortunately, I don't have it built into a system yet
> as I'm waiting for DDR4 prices to become reasonable again before
> ordering any, so I can't test it.

I've seen the blanking screen lockup on a 1600X from 1739SUS and
1740SUS. Only with SMT enabled though. On the 1739 chip I'm also
seeing cc segfaults on make -j 18 buildworld (again, just with SMT
enabled).

This is all so frustrating.

Yuli Khodorkovskiy

unread,
Jan 24, 2018, 12:48:30 PM1/24/18
to
I am seeing the same behavior as you. Disabling SMT resolve freezing issues
at idle. I am using a Ryzen 1700.

Nils Beyer

unread,
Jan 25, 2018, 8:53:27 AM1/25/18
to
On 01/17/18 14:38, Mike Tancsa wrote:
> However, we are seeing random lockups on both boxes. [...]

go into your BIOS:

- load default settings
- disable SMT
- disable Cool&Quiet
- disable global C-state control
- disable anything with C-states

only with SMT _and_ Cool&Quiet _and_ C-state-stuff all together disabled, my system is still running
(uptime now is 49 days). Every other fiddling around with RAM, PSU and timing settings is fruitless.
Before, my system locked up after a run time of six to ten days.

And, yes, I have a replacement CPU and the shared page fix active - these didn't help.

This does not help with your compilation problems though...



BR,
Nils

Mike Tancsa

unread,
Jan 25, 2018, 9:04:33 AM1/25/18
to
On 1/25/2018 8:48 AM, Nils Beyer wrote:
>
> And, yes, I have a replacement CPU and the shared page fix active -
> these didn't help.
>
> This does not help with your compilation problems though...
>

Thanks. So you see the same or similar compile issues too ?

---Mike


--
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mi...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada http://www.tancsa.com/

Nils Beyer

unread,
Jan 25, 2018, 9:07:25 AM1/25/18
to
On 01/25/18 15:00, Mike Tancsa wrote:
>> This does not help with your compilation problems though...
>
> Thanks. So you see the same or similar compile issues too ?

slightly:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219399
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221029

Mike Tancsa

unread,
Jan 27, 2018, 11:56:39 AM1/27/18
to
On 1/27/2018 3:23 AM, Don Lewis wrote:
>
> I just ran into this for this first time with samba46. I kicked of a
> ports build this evening before leaving for several hours. When I
> returned, samba46 had failed with a build runaway. I just tried again
> and I see python stuck in the usem state. This is what I see with
> procstat -k:

Hmmm, is this indicative of a processor bug or a FreeBSD bug or its
indeterminate at this point ?

>
> PID TID COMM TDNAME KSTACK
> 90692 100801 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common
> 90692 100824 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common
> 90692 100857 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common
> 90692 100956 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common
> 90692 100995 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common
> 90692 101483 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common
> 90692 101538 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common
> 90692 101549 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common
> 90692 101570 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common
> 90692 101572 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common
> 90692 101583 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common
> 90692 101588 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common
> 90692 101593 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common
> 90692 101610 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common
> 90692 101629 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_lock_umutex __umtx_op_wait_umutex amd64_syscall fast_syscall_common
> 90692 101666 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common
> 90692 102114 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common
>
> and procstat -t:
>
> PID TID COMM TDNAME CPU PRI STATE WCHAN
> 90692 100801 python2.7 - -1 124 sleep usem
> 90692 100824 python2.7 - -1 124 sleep usem
> 90692 100857 python2.7 - -1 124 sleep usem
> 90692 100956 python2.7 - -1 125 sleep usem
> 90692 100995 python2.7 - -1 124 sleep usem
> 90692 101483 python2.7 - -1 124 sleep usem
> 90692 101538 python2.7 - -1 125 sleep usem
> 90692 101549 python2.7 - -1 124 sleep usem
> 90692 101570 python2.7 - -1 124 sleep usem
> 90692 101572 python2.7 - -1 124 sleep usem
> 90692 101583 python2.7 - -1 125 sleep usem
> 90692 101588 python2.7 - -1 124 sleep usem
> 90692 101593 python2.7 - -1 123 sleep usem
> 90692 101610 python2.7 - -1 124 sleep usem
> 90692 101629 python2.7 - -1 125 sleep umtxn
> 90692 101666 python2.7 - -1 124 sleep usem
> 90692 102114 python2.7 - -1 152 sleep usem
>
> The machine isn't totally idle. The last pid value in top increases by
> about 40 every two seconds. Looks like it might be poudriere polling
> something ...

Mark Millard via freebsd-stable

unread,
Jan 27, 2018, 12:45:38 PM1/27/18
to
Don Lewis truckman at FreeBSD.org wrote on
Sat Jan 27 08:23:27 UTC 2018 :

> PID TID COMM TDNAME CPU PRI STATE WCHAN
> 90692 100801 python2.7 - -1 124 sleep usem
> 90692 100824 python2.7 - -1 124 sleep usem
. . .


# grep -r '"usem"' /usr/src/sys/
/usr/src/sys/dev/qlnx/qlnxe/ecore_dbg_fw_funcs.c: "usem", { true, true, true }, true, DBG_USTORM_ID,
/usr/src/sys/kern/kern_umtx.c: error = umtxq_sleep(uq, "usem", timeout == NULL ? NULL : &timo);
/usr/src/sys/kern/kern_umtx.c: error = umtxq_sleep(uq, "usem", timeout == NULL ? NULL : &timo);

/usr/src/sys/kern/kern_umtx.c has :

#if defined(COMPAT_FREEBSD9) || defined(COMPAT_FREEBSD10)
static int
do_sem_wait(struct thread *td, struct _usem *sem, struct _umtx_time *timeout)
{
. . .
error = umtxq_sleep(uq, "usem", timeout == NULL ? NULL : &timo);
. . .
#endif
. . .
static int
do_sem2_wait(struct thread *td, struct _usem2 *sem, struct _umtx_time *timeout)
{
. . .
error = umtxq_sleep(uq, "usem", timeout == NULL ? NULL : &timo);
. . .


The comparison/contrast for:

> 90692 101629 python2.7 - -1 125 sleep umtxn



# grep -r '"umtxn"' /usr/src/sys/
/usr/src/sys/kern/kern_umtx.c: error = umtxq_sleep(uq, "umtxn", timeout == NULL ?

/usr/src/sys/kern/kern_umtx.c has:

static int
do_lock_normal(struct thread *td, struct umutex *m, uint32_t flags,
struct _umtx_time *timeout, int mode)
{
. . .
/*
* We set the contested bit, sleep. Otherwise the lock changed
* and we need to retry or we lost a race to the thread
* unlocking the umtx.
*/
umtxq_lock(&uq->uq_key);
umtxq_unbusy(&uq->uq_key);
if (old == owner)
error = umtxq_sleep(uq, "umtxn", timeout == NULL ?
NULL : &timo);
umtxq_remove(uq);
umtxq_unlock(&uq->uq_key);
umtx_key_release(&uq->uq_key);
. . .

Both contexts are umtxq_sleep usage:

/*
* Put thread into sleep state, before sleeping, check if
* thread was removed from umtx queue.
*/
static inline int
umtxq_sleep(struct umtx_q *uq, const char *wmesg, struct abs_timeout *abstime)
. . .


Note: I'm guessing that /usr/src/sys/dev/qlnx/qlnxe/ecore_dbg_fw_funcs.c
is not involved.

===
Mark Millard
marklmi at yahoo.com
( markmi at dsl-only.net is
going away in 2018-Feb, late)

Peter Moody

unread,
Jan 27, 2018, 9:25:51 PM1/27/18
to
Whelp, I replaced the r5 1600x with an r7 1700 (au 1734) and I'm now
getting minutes of uptime before I hard crash. With smt, without, with c
states, without, with opcache, without. No difference.

I'm going to try a completely different motherboard next. I think Amazon is
starting to dislike me.

On Jan 21, 2018 11:05 AM, "Peter Moody" <fre...@hda3.com> wrote:

> hm, so i've got nearly 3 days of uptime with smt disabled.
> unfortunately this means that my otherwise '12' cores is actually only
> '6'. I'm also getting occasional segfaults compiling go programs.
>
> should I just RMA this beast again?
>
> On Sun, Jan 21, 2018 at 5:25 AM, Nimrod Levy <nim...@gmail.com> wrote:
> > almost 2 days uptime with a lower memory clock. still holding my breath,
> > but this seems promising.
> >>> > _______________________________________________
> >>> > freebsd...@freebsd.org mailing list
> >>> > https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> >>> > To unsubscribe, send any mail to "
> >>> freebsd-stabl...@freebsd.org"
> >>> >
> >>> >
> >>>
> >>>
> >>> --
> >>> -------------------
> >>> Mike Tancsa, tel +1 519 651 3400 <(519)%20651-3400>
> >>> Sentex Communications, mi...@sentex.net
> >>> Providing Internet services since 1994 www.sentex.net
> >>> Cambridge, Ontario Canada http://www.tancsa.com/
> >>> _______________________________________________
> >>> freebsd...@freebsd.org mailing list
> >>> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> >>> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@
> freebsd.org"
> >>>
> >>
> >>
> >> --
> >>
> >> --
> >> Nimrod
> >>
> >
> >
> > --
> >
> > --
> > Nimrod

Nimrod Levy

unread,
Jan 27, 2018, 11:20:32 PM1/27/18
to
I'm about ready to have a party. My Ryzen 5 1600 has been up for over 8
days so far after changing the memory to a slower speed. System load
hovers around .3
>> freebsd-stabl...@freebsd.org"

Peter Moody

unread,
Jan 28, 2018, 12:29:29 AM1/28/18
to
On Jan 27, 2018 6:20 PM, "Nimrod Levy" <nim...@gmail.com> wrote:

I'm about ready to have a party. My Ryzen 5 1600 has been up for over 8
days so far after changing the memory to a slower speed. System load
hovers around .3


I couldn't find an easy way to down-speed my memory in the bios :(
>> >>> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@
>> freebsd.org"
>> >>>
>> >>
>> >>
>> >> --
>> >>
>> >> --
>> >> Nimrod
>> >>
>> >
>> >
>> > --
>> >
>> > --
>> > Nimrod
>> > _______________________________________________
>> > freebsd...@freebsd.org mailing list
>> > https://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@
>> freebsd.org"

Pete French

unread,
Jan 28, 2018, 11:14:56 AM1/28/18
to
>> I'm about ready to have a party. My Ryzen 5 1600 has been up for over 8
>> days so far after changing the memory to a slower speed. System load
>> hovers around .3
>
> I couldn't find an easy way to down-speed my memory in the bios :(

Out of interest, what motherboards are people using ? I still havent
built my test system, desite being the OP in the thread, but I have
an MSI B350 Tomahawk as the test board.

Sadly, I think the length and contnet of this thread has answered one
of my original questions though, what was whether or not it is safe to
go ahead and order Epyc boards for the data centre. I am not prepared
to gamble on that right now when people on the desktop have had so many
issue.

-pete.

Nimrod Levy

unread,
Jan 28, 2018, 11:17:11 AM1/28/18
to
I have the Asus prime B350-plus

https://www.asus.com/us/Motherboards/PRIME-B350-PLUS/


On Sun, Jan 28, 2018 at 10:50 AM Pete French <petef...@ingresso.co.uk>
wrote:

> >> I'm about ready to have a party. My Ryzen 5 1600 has been up for over 8
> >> days so far after changing the memory to a slower speed. System load
> >> hovers around .3
> >
> > I couldn't find an easy way to down-speed my memory in the bios :(
>
> Out of interest, what motherboards are people using ? I still havent
> built my test system, desite being the OP in the thread, but I have
> an MSI B350 Tomahawk as the test board.
>
> Sadly, I think the length and contnet of this thread has answered one
> of my original questions though, what was whether or not it is safe to
> go ahead and order Epyc boards for the data centre. I am not prepared
> to gamble on that right now when people on the desktop have had so many
> issue.
>
> -pete.
>
> --

--
Nimrod

Don Lewis

unread,
Jan 28, 2018, 3:33:04 PM1/28/18
to
On 28 Jan, Pete French wrote:
>>> I'm about ready to have a party. My Ryzen 5 1600 has been up for over 8
>>> days so far after changing the memory to a slower speed. System load
>>> hovers around .3
>>
>> I couldn't find an easy way to down-speed my memory in the bios :(
>
> Out of interest, what motherboards are people using ? I still havent
> built my test system, desite being the OP in the thread, but I have
> an MSI B350 Tomahawk as the test board.

Gigabyte AX370 Gaming 5.

I'd be wary of the B350 boards with the higher TDP eight core Ryzen CPUs
since the VRMs on the cheaper boards tend to have less robust VRM
designs.

Personally I won't put together a system without ECC RAM both for
overall reliability and also the fact that the error reporting will
immediately flag (or eliminate) RAM issues when the system is unstable.
That pretty much confined my motherboard choices to the higher end X370
motherboards. I think only ASRock makes a B350 motherboard with ECC
support. There's no reason that ECC support couldn't be universal other
than product differentiation so that the motherboard manufacturers can
collect more $$$ from anyone who cares about this feature.

Peter Moody

unread,
Jan 28, 2018, 4:05:44 PM1/28/18
to
> Gigabyte AX370 Gaming 5.

all of issues so far have been on a pair of asrock ab350's. But I've
got an msi x370 coming early next week.

Pete French

unread,
Jan 28, 2018, 4:30:53 PM1/28/18
to


On 28/01/2018 20:28, Don Lewis wrote:
> I'd be wary of the B350 boards with the higher TDP eight core Ryzen CPUs
> since the VRMs on the cheaper boards tend to have less robust VRM
> designs.

Gah! Yes, I forgot that.originally sec'd the board for a smaller Ryzen,
then though "what the hell" and got the 1700 without going back and
checking that kind of stuff. Hmm, shall swap for a different one if I
can. Thanks for poining that out.

-pete. [kicking himself...]

Don Lewis

unread,
Jan 28, 2018, 7:16:09 PM1/28/18
to
On 28 Jan, Pete French wrote:
>
>
> On 28/01/2018 20:28, Don Lewis wrote:
>> I'd be wary of the B350 boards with the higher TDP eight core Ryzen CPUs
>> since the VRMs on the cheaper boards tend to have less robust VRM
>> designs.
>
> Gah! Yes, I forgot that.originally sec'd the board for a smaller Ryzen,
> then though "what the hell" and got the 1700 without going back and
> checking that kind of stuff. Hmm, shall swap for a different one if I
> can. Thanks for poining that out.

I started off with a Gigabyte AB350 Gaming for my 1700X back when there
was enough ambiguity about ECC support to give me hope that it would
work. Everything seemed to work other than ECC and the problems caused
by my buggy CPU and the shared page issue, but the VRM temps in the BIOS
were really high (and I had no way to monitor that under load). When I
upgraded to get working ECC, I also looked at reviews about VRM quality.

Don Lewis

unread,
Jan 28, 2018, 7:29:38 PM1/28/18
to
On 27 Jan, Peter Moody wrote:
> Whelp, I replaced the r5 1600x with an r7 1700 (au 1734) and I'm now
> getting minutes of uptime before I hard crash. With smt, without, with c
> states, without, with opcache, without. No difference.

Check the temperatures. Maybe the heat sink isn't making good contact
after the CPU replacement.

Mike Tancsa

unread,
Jan 30, 2018, 2:56:01 PM1/30/18
to
On 1/28/2018 7:41 PM, Don Lewis wrote:
>
> My suspicion is a FreeBSD bug, probably a locking / race issue. I know
> that we've had to make some tweeks to our code for AMD CPUs, like this:


OK, I got back the CPUs from AMD (fast turn around!)

And sadly, I am still able to hang the compile in about the same place.
However, if I set

hw.lower_amd64_sharedpage=0

it seems to hang in a different way. CTRL+t shows

load: 0.43 cmd: python2.7 15736 [umtxn] 165.00r 14.46u 6.65s 0% 233600k
make[1]: Working in: /usr/ports/net/samba47
make: Working in: /usr/ports/net/samba47


# procstat -t 15736
PID TID COMM TDNAME CPU PRI STATE
WCHAN
15736 100855 python2.7 - -1 152 sleep
usem
15736 100956 python2.7 - -1 124 sleep
umtxn
15736 100957 python2.7 - -1 126 sleep
umtxn
15736 100958 python2.7 - -1 124 sleep
umtxn
15736 100959 python2.7 - -1 127 sleep
umtxn
15736 100960 python2.7 - -1 126 sleep
umtxn
15736 100961 python2.7 - -1 126 sleep
umtxn
15736 100962 python2.7 - -1 126 sleep
umtxn
15736 100963 python2.7 - -1 126 sleep
umtxn
15736 100964 python2.7 - -1 127 sleep
umtxn
15736 100965 python2.7 - -1 126 sleep
umtxn
15736 100966 python2.7 - -1 126 sleep
umtxn
15736 100967 python2.7 - -1 126 sleep
umtxn

# procstat -kk 15736
PID TID COMM TDNAME KSTACK

15736 100855 python2.7 - mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b
amd64_syscall+0xa48 fast_syscall_common+0xfc
15736 100956 python2.7 - mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48
amd64_syscall+0xa48 fast_syscall_common+0xfc
15736 100957 python2.7 - mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48
amd64_syscall+0xa48 fast_syscall_common+0xfc
15736 100958 python2.7 - mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48
amd64_syscall+0xa48 fast_syscall_common+0xfc
15736 100959 python2.7 - mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48
amd64_syscall+0xa48 fast_syscall_common+0xfc
15736 100960 python2.7 - mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48
amd64_syscall+0xa48 fast_syscall_common+0xfc
15736 100961 python2.7 - mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48
amd64_syscall+0xa48 fast_syscall_common+0xfc
15736 100962 python2.7 - mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48
amd64_syscall+0xa48 fast_syscall_common+0xfc
15736 100963 python2.7 - mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48
amd64_syscall+0xa48 fast_syscall_common+0xfc
15736 100964 python2.7 - mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48
amd64_syscall+0xa48 fast_syscall_common+0xfc
15736 100965 python2.7 - mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48
amd64_syscall+0xa48 fast_syscall_common+0xfc
15736 100966 python2.7 - mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48
amd64_syscall+0xa48 fast_syscall_common+0xfc
15736 100967 python2.7 - mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48
amd64_syscall+0xa48 fast_syscall_common+0xfc

If I kill the make, reboot and just type make, it completes after the
reboot. If after the reboot, I do an rm -R work, it will hang again.
With the default of
hw.lower_amd64_sharedpage: 1
post reboot,

CTRL+T shows
load: 2.73 cmd: python2.7 15703 [usem] 40.92r 12.34u 3.45s 0% 233640k
make[1]: Working in: /usr/ports/net/samba47
make: Working in: /usr/ports/net/samba47



root@amdtestr12:/home/mdtancsa # procstat -kk 15703
PID TID COMM TDNAME KSTACK

15703 100824 python2.7 - mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b
amd64_syscall+0xa48 fast_syscall_common+0xfc
15703 100956 python2.7 - mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b
amd64_syscall+0xa48 fast_syscall_common+0xfc
15703 100957 python2.7 - mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b
amd64_syscall+0xa48 fast_syscall_common+0xfc
15703 100958 python2.7 - mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b
amd64_syscall+0xa48 fast_syscall_common+0xfc
15703 100959 python2.7 - mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b
amd64_syscall+0xa48 fast_syscall_common+0xfc
15703 100960 python2.7 - mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b
amd64_syscall+0xa48 fast_syscall_common+0xfc
15703 100961 python2.7 - mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b
amd64_syscall+0xa48 fast_syscall_common+0xfc
15703 100962 python2.7 - mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b
amd64_syscall+0xa48 fast_syscall_common+0xfc
15703 100963 python2.7 - mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b
amd64_syscall+0xa48 fast_syscall_common+0xfc
15703 100964 python2.7 - mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b
amd64_syscall+0xa48 fast_syscall_common+0xfc
15703 100965 python2.7 - mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48
amd64_syscall+0xa48 fast_syscall_common+0xfc
15703 100966 python2.7 - mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b
amd64_syscall+0xa48 fast_syscall_common+0xfc
15703 100967 python2.7 - mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b
amd64_syscall+0xa48 fast_syscall_common+0xfc
root@amdtestr12:/home/mdtancsa # procstat -t 15703
PID TID COMM TDNAME CPU PRI STATE
WCHAN
15703 100824 python2.7 - -1 152 sleep
usem
15703 100956 python2.7 - -1 125 sleep
usem
15703 100957 python2.7 - -1 127 sleep
usem
15703 100958 python2.7 - -1 125 sleep
usem
15703 100959 python2.7 - -1 125 sleep
usem
15703 100960 python2.7 - -1 126 sleep
usem
15703 100961 python2.7 - -1 126 sleep
usem
15703 100962 python2.7 - -1 126 sleep
usem
15703 100963 python2.7 - -1 126 sleep
usem
15703 100964 python2.7 - -1 126 sleep
usem
15703 100965 python2.7 - -1 126 sleep
umtxn
15703 100966 python2.7 - -1 126 sleep
usem
15703 100967 python2.7 - -1 125 sleep
usem
root@amdtestr12:/home/mdtancsa #


---Mike


>
> ------------------------------------------------------------------------
> r321608 | kib | 2017-07-27 01:37:07 -0700 (Thu, 27 Jul 2017) | 9 lines
>
> Use MFENCE to serialize RDTSC on non-Intel CPUs.
>
> Kernel already used the stronger barrier instruction for AMDs, correct
> the userspace fast gettimeofday() implementation as well.
>
>
>
> I did go back and look at the build runaways that I've occasionally seen
> on my AMD FX-8320E package builder. I haven't seen the python issue
> there, but have seen gmake get stuck in a sleeping state with a bunch of
> zombie offspring.
>
>


--
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mi...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada

Mike Tancsa

unread,
Jan 30, 2018, 4:40:38 PM1/30/18
to
On 1/30/2018 2:51 PM, Mike Tancsa wrote:
>
> And sadly, I am still able to hang the compile in about the same place.
> However, if I set


OK, here is a sort of work around. If I have the box a little more busy,
I can avoid whatever deadlock is going on. In another console I have
cat /dev/urandom | sha256
running while the build runs

... and I can compile net/samba47 from scratch without the compile
hanging. This problem also happens on HEAD from today. Should I start
a new thread on freebsd-current ? Or just file a bug report ?
The compile worked 4/4

---Mike

Nimrod Levy

unread,
Jan 30, 2018, 5:28:44 PM1/30/18
to
That's really strange. I never saw those kinds of deadlocks, but I did
notice that if I kept the cpu busy using distributed.net I could keep the
full system lockups away for at least a week if not longer.

Not to keep harping on it, but what worked for me was lowering the memory
speed. I'm at 11 days of uptime so far without anything running the cpu.
Before the change it would lock up anywhere from an hour to a day.
> Mike Tancsa, tel +1 519 651 3400 <(519)%20651-3400>
> Sentex Communications, mi...@sentex.net
> Providing Internet services since 1994 www.sentex.net
> Cambridge, Ontario Canada
> _______________________________________________
> freebsd...@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stabl...@freebsd.org"
>
--

--
Nimrod

Don Lewis

unread,
Jan 30, 2018, 7:41:37 PM1/30/18
to
On 30 Jan, Mike Tancsa wrote:
> On 1/30/2018 2:51 PM, Mike Tancsa wrote:
>>
>> And sadly, I am still able to hang the compile in about the same place.
>> However, if I set
>
>
> OK, here is a sort of work around. If I have the box a little more busy,
> I can avoid whatever deadlock is going on. In another console I have
> cat /dev/urandom | sha256
> running while the build runs

Interesting ...

> ... and I can compile net/samba47 from scratch without the compile
> hanging. This problem also happens on HEAD from today. Should I start
> a new thread on freebsd-current ? Or just file a bug report ?
> The compile worked 4/4

I'd file a PR to capture all the information in one place and drop a
pointer on freebsd-current.

Mike Tancsa

unread,
Jan 30, 2018, 7:52:35 PM1/30/18
to
On 1/30/2018 5:23 PM, Nimrod Levy wrote:
> That's really strange. I never saw those kinds of deadlocks, but I did
> notice that if I kept the cpu busy using distributed.net
> <http://distributed.net> I could keep the full system lockups away for
> at least a week if not longer.
>
> Not to keep harping on it, but what worked for me was lowering the
> memory speed. I'm at 11 days of uptime so far without anything running
> the cpu. Before the change it would lock up anywhere from an hour to a day.
>
Spoke too soon. After a dozen loops, the process has hung again. Note,
this is not the box locking up, just the compile. I do have memory at a
lower speed too. -- 2133 instead of the default 2400


I also just tried upgrading to the latest HEAD with a generic kernel and
same / similar lockups although procstat -kk gives some odd results


root@amdtestr12:/home/mdtancsa # procstat -kk 6067
PID TID COMM TDNAME KSTACK

6067 100865 python2.7 - ??+0 ??+0 ??+0 ??+0
??+0 ??+0 ??+0 ??+0 ??+0 ??+0
6067 100900 python2.7 - ??+0 ??+0 ??+0 ??+0
??+0 ??+0 ??+0 ??+0 ??+0 ??+0
6067 100901 python2.7 - ??+0 ??+0 ??+0 ??+0
??+0 ??+0 ??+0 ??+0 ??+0 ??+0
6067 100902 python2.7 - ??+0 ??+0 ??+0 ??+0
??+0 ??+0 ??+0 ??+0 ??+0 ??+0
6067 100903 python2.7 - ??+0 ??+0 ??+0 ??+0
??+0 ??+0 ??+0 ??+0 ??+0 ??+0
6067 100904 python2.7 - ??+0 ??+0 ??+0 ??+0
??+0 ??+0 ??+0 ??+0 ??+0 ??+0
6067 100905 python2.7 - ??+0 ??+0 ??+0 ??+0
??+0 ??+0 ??+0 ??+0 ??+0 ??+0
6067 100906 python2.7 - ??+0 ??+0 ??+0 ??+0
??+0 ??+0 ??+0 ??+0 ??+0 ??+0
6067 100907 python2.7 - ??+0 ??+0 ??+0 ??+0
??+0 ??+0 ??+0 ??+0 ??+0 ??+0
6067 100908 python2.7 - ??+0 ??+0 ??+0 ??+0
??+0 ??+0 ??+0 ??+0 ??+0 ??+0
6067 100909 python2.7 - ??+0 ??+0 ??+0 ??+0
??+0 ??+0 ??+0 ??+0 ??+0 ??+0
6067 100910 python2.7 - ??+0 ??+0 ??+0 ??+0
??+0 ??+0 ??+0 ??+0 ??+0 ??+0
6067 100911 python2.7 - ??+0 ??+0 ??+0 ??+0
??+0 ??+0 ??+0 ??+0 ??+0 ??+0
root@amdtestr12:/home/mdtancsa # procstat -t 6067
PID TID COMM TDNAME CPU PRI STATE
WCHAN
6067 100865 python2.7 - -1 152 sleep
usem
6067 100900 python2.7 - -1 152 sleep
umtxn
6067 100901 python2.7 - -1 152 sleep
umtxn
6067 100902 python2.7 - -1 152 sleep
umtxn
6067 100903 python2.7 - -1 152 sleep
umtxn
6067 100904 python2.7 - -1 152 sleep
umtxn
6067 100905 python2.7 - -1 152 sleep
umtxn
6067 100906 python2.7 - -1 152 sleep
umtxn
6067 100907 python2.7 - -1 152 sleep
umtxn
6067 100908 python2.7 - -1 152 sleep
umtxn
6067 100909 python2.7 - -1 152 sleep
umtxn
6067 100910 python2.7 - -1 152 sleep
umtxn
6067 100911 python2.7 - -1 152 sleep
umtxn
root@amdtestr12:/home/mdtancsa #


--
-------------------
Mike Tancsa, tel +1 519 651 3400

Eugene Grosbein

unread,
Jan 31, 2018, 8:38:09 AM1/31/18
to
31.01.2018 4:36, Mike Tancsa пишет:

> On 1/30/2018 2:51 PM, Mike Tancsa wrote:
>>
>> And sadly, I am still able to hang the compile in about the same place.
>> However, if I set
>
>
> OK, here is a sort of work around. If I have the box a little more busy,
> I can avoid whatever deadlock is going on. In another console I have
> cat /dev/urandom | sha256
> running while the build runs
>
> ... and I can compile net/samba47 from scratch without the compile
> hanging. This problem also happens on HEAD from today. Should I start
> a new thread on freebsd-current ? Or just file a bug report ?
> The compile worked 4/4

That's really strange. Could you try to do "sysctl kern.eventtimer.periodic=1"
and re-do the test without extra load?

Mike Tancsa

unread,
Jan 31, 2018, 8:42:50 AM1/31/18
to
On 1/31/2018 8:33 AM, Eugene Grosbein wrote:
> 31.01.2018 4:36, Mike Tancsa пишет:
>> On 1/30/2018 2:51 PM, Mike Tancsa wrote:
>>>
>>> And sadly, I am still able to hang the compile in about the same place.
>>> However, if I set
>>
>>
>> OK, here is a sort of work around. If I have the box a little more busy,
>> I can avoid whatever deadlock is going on. In another console I have
>> cat /dev/urandom | sha256
>> running while the build runs
>>
>> ... and I can compile net/samba47 from scratch without the compile
>> hanging. This problem also happens on HEAD from today. Should I start
>> a new thread on freebsd-current ? Or just file a bug report ?
>> The compile worked 4/4
>
> That's really strange. Could you try to do "sysctl kern.eventtimer.periodic=1"
> and re-do the test without extra load?

Thanks for the suggestion! I actually upgraded the box to HEAD last
night and will try there since the problem is there too. I just created
a bug report and started a thread in freebsd-current and will follow up
there with your test.

---Mike

--
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mi...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada

Mike Tancsa

unread,
Feb 1, 2018, 1:53:53 PM2/1/18
to
On 2/1/2018 1:40 PM, Ed Maste wrote:
>> root@amdtestr12:/home/mdtancsa # procstat -kk 6067
>> PID TID COMM TDNAME KSTACK
>>
>> 6067 100865 python2.7 - ??+0 ??+0 ??+0 ??+0
>> ??+0 ??+0 ??+0 ??+0 ??+0 ??+0
>
> I think this part is due to the broken loader change in r328536.
> Kernel symbol loading is broken, and this in particular isn't related
> to Ryzen issues.

Just for the archives, after a buildworld to a newer rev of the source
tree, all was good :)

---Mike


--
-------------------
Mike Tancsa, tel +1 519 651 3400 x203
It is loading more messages.
0 new messages