Random reboots

46 views
Skip to first unread message

Manojav Sridhar

unread,
Oct 19, 2022, 11:18:42 AM10/19/22
to EON ZFS Storage on behalf of vajonam
Dre,


I added some more RAM to my box, but now am seeing reboots when doing large ZFS transfers from one pool to another. 

This happens when I move over NFS or locally on the box. Memtest didn't find an error running for about 30 minutes. 

I have a dump specified. How can I view what is in the dump and what is causing the random reboots.

I cannot see the console, by the time it gets to the box it's done rebooting and the /mnt/eon0 is uncleanly mounted so it doesn't restart correctly until I fsck that disk.

Thanks




vajonam

unread,
Oct 19, 2022, 1:17:15 PM10/19/22
to EON ZFS Storage
some more info looks like not rebooting but for some reason the network interface is stopping. e1000g0. 

This is new-ish motherboard that I have been using for a few weeks as well. 

System Configuration: Supermicro X9DRL-3F/iF
BIOS Configuration: American Megatrends Inc. 3.3 07/12/2018
BMC Configuration: IPMI 2.0 (KCS: Keyboard Controller Style)

==== Processor Sockets ====================================

Version                          Location Tag
-------------------------------- --------------------------
Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz CPU 1
Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz CPU 2

Any ideas on parameters to tune for e1000g0

vajonam

unread,
Oct 19, 2022, 1:21:27 PM10/19/22
to EON ZFS Storage

vajonam

unread,
Oct 19, 2022, 1:36:13 PM10/19/22
to EON ZFS Storage
tried adding it to /kernel/drv/e1000g.conf and updimg and reboot.. lets see if it takes. 

Andre Lue

unread,
Oct 19, 2022, 2:13:49 PM10/19/22
to EON ZFS Storage on behalf of vajonam
It may be a mismatched e1000g driver. Are there any issues listed via google for omni OS or smartOS for the nic?


--
You received this message because you are subscribed to the Google Groups "EON ZFS Storage" group.
To unsubscribe from this group and stop receiving emails from it, send an email to eonstorage+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/eonstorage/bf30cdcd-e26a-4d3e-8f27-7941f3805c0fn%40googlegroups.com.

Manojav Sridhar

unread,
Oct 19, 2022, 3:03:23 PM10/19/22
to EON ZFS Storage on behalf of dre2kse
I haven't changed the image, just changed the board under it to the one I posted earlier. no changes etc. to driver. I have disabled HW for

image.png

when I do a iperf test.. so maybe it's helped. tried a few large transfers as well so far so good. the memory passed all the memtest test btw. 


vajonam

unread,
Oct 20, 2022, 11:51:42 AM10/20/22
to EON ZFS Storage
Dre,

I have changed the memory but still seeing random lockups. Where the console is locked, wont respond to ping, and shares are obviously offline. I will keep monitoring. But this happened just after a restart as well, but usually happens when doing some i/o operations. between pools. transferring data into and out of pools using NFS.

any pointers will help.

thanks 

Andre Lue

unread,
Oct 20, 2022, 12:27:04 PM10/20/22
to EON ZFS Storage on behalf of vajonam
I think using the previous HW image from the previous board with the new HW may be the root or it may be just driver / HW incompatibility issue. Try the following but before you start make sure you have a good normal backup of the previous HW image before HW change/update(mainly for IP n zpool host ID) :
1) Boot the new HW from the generic install image and run setup (don't import pools)
2) Save a copy of the image from step 1 and run this way/test for a while w o pool to see if it locks up. If it locks up, do not proceed, something else is the issue.
3) If it does not lock up, boot previous HW image and run update but point it to update the new HW image from step 2.
4) Boot from image in step 3 and zpool should auto import IP should be the same before step 1.

vajonam

unread,
Oct 20, 2022, 1:42:05 PM10/20/22
to EON ZFS Storage

doesh setup.sh changes drivers installed based on HW? my assumption was the SW image was always the same and had all the drivers in it and it would just work, swapping out HW underneath it. From what you are suggesting  it implies the SW is customized based on devices on the HW. 

Andre Lue

unread,
Oct 20, 2022, 2:28:37 PM10/20/22
to EON ZFS Storage on behalf of vajonam
Generic image is first boot device tree generation after updimg, the hw specific device entries are saved.

vajonam

unread,
Oct 24, 2022, 11:05:51 AM10/24/22
to EON ZFS Storage
dre2ske.

I have narrowed down the errors to some memory I recently got. Let me give you some more details. might help you provide some suggestions.

1. updated ed old Super micro Motherboard with Newer super micro X9 motherboard.  
2. Works fine, with good stability, no software changes same USB key. X7 mobo.  X9 mobo Has different CPU and RAM.  24GIG RAM
3. added 64GB extrra ram as new X9 board supports more ram, this caused a crash after running for a few days.total of 24G+64G RAM 
4. went back to just running 64G Ram, can re-create lockup by doing a dd over nfs. so figured this was RAM related. However memtest86 doesn't show an issues with the ram.  

So in order to try your method.  I have some questions

1) Boot the new HW from the generic install image and run setup (don't import pools)
How can build a usb-key, can I boot the ISO in Virtualbox and use install.sh to build a USB key? does hits already setup the device tree? 

2) Save a copy of the image from step 1 and run this way/test for a while w o pool to see if it locks up. If it locks up, do not proceed, something else is the issue.
By this do you mean save the running x86.eon file 

3) If it does not lock up, boot previous HW image and run update but point it to update the new HW image from step 2.
updimage /mnt/boot/new_x86.eon? 

4) Boot from image in step 3 and zpool should auto import IP should be the same before step 1.
copy new_x86.eon to /mnt/boot/x86.eon and run from there? 

thanks

Andre Lue

unread,
Oct 24, 2022, 11:48:18 AM10/24/22
to EON ZFS Storage on behalf of vajonam
Hi vajonam,

Please see responses below. Are these all ECC mem. Maybe the 64GB, 24GB chips you have are not compatible with each other? Did you try running with just the 24GB to see if the lock up occurs?


On Mon, Oct 24, 2022 at 3:05 PM vajonam via EON ZFS Storage <eonst...@googlegroups.com> wrote:
dre2ske.

I have narrowed down the errors to some memory I recently got. Let me give you some more details. might help you provide some suggestions.

1. updated ed old Super micro Motherboard with Newer super micro X9 motherboard.  
2. Works fine, with good stability, no software changes same USB key. X7 mobo.  X9 mobo Has different CPU and RAM.  24GIG RAM
3. added 64GB extrra ram as new X9 board supports more ram, this caused a crash after running for a few days.total of 24G+64G RAM 
4. went back to just running 64G Ram, can re-create lockup by doing a dd over nfs. so figured this was RAM related. However memtest86 doesn't show an issues with the ram.  

So in order to try your method.  I have some questions

1) Boot the new HW from the generic install image and run setup (don't import pools)
How can build a usb-key, can I boot the ISO in Virtualbox and use install.sh to build a USB key? does hits already setup the device tree?
   The generic/original image is preserved on the usb key and can be selected/booted from grub menu at boot time.

2) Save a copy of the image from step 1 and run this way/test for a while w o pool to see if it locks up. If it locks up, do not proceed, something else is the issue.
By this do you mean save the running x86.eon file 
   Yes, updimg.sh does this but you have to know which image is currently booted and which you wish to fall back to etc.

vajonam

unread,
Oct 24, 2022, 2:07:44 PM10/24/22
to EON ZFS Storage
I have given up on running all the mem at once, they are indeed different kinds and I think that isn't feasible.

The Lockups dont happen with the 24G (6x4Gb sticks). They happen consistently with the 64g(2x32GB). With the latest metest86+ I was able to create the same lockups. 

Thanks for your other answers! Also I have a question if the install.sh needs to be performed on the actual hardware? I am using a virtualbox to run the install.sh against a usb key then using that in the physical hardware, then perform setup.sh on the actual hardware, followed by updimg.sh that is what I did the last time. is that just plain wrong? 

Andre Lue

unread,
Oct 24, 2022, 2:12:34 PM10/24/22
to EON ZFS Storage on behalf of vajonam
Yes, install should be done on the actual HW

vajonam

unread,
Oct 24, 2022, 2:19:03 PM10/24/22
to EON ZFS Storage
Since no longer have access to an cd writer, can I use balenaEtcher to write the iso to a usb for boot purposes? 

vajonam

unread,
Oct 24, 2022, 2:47:54 PM10/24/22
to EON ZFS Storage
I am not able to wire the iso to a usb disk.. what is your method for this? I tired unetbootin and balenaetcher.. I am running linux. I know should know how to this.. but for whatever reason is not booting from the stick.

vajonam

unread,
Oct 24, 2022, 4:20:38 PM10/24/22
to EON ZFS Storage

This is my startup. I understand the alignment messages, I have to re-create a few pools.  But there is still some stuff in there ucodeadm (not sure how to disable this), also the message of milestone depending on multiple instance of physical not sure how I can get to a clean startup. 

Andre Lue

unread,
Oct 24, 2022, 4:31:15 PM10/24/22
to EON ZFS Storage on behalf of Donovan Kaardal
You can run install.sh from a virtual boot if you are able to write to a usb attached to the vm.  

vajonam

unread,
Oct 24, 2022, 5:06:23 PM10/24/22
to EON ZFS Storage
That is how I have been doing it, but you mentioned earlier that install.sh needs to be run on the real hardware? 

Andre Lue

unread,
Oct 24, 2022, 5:20:41 PM10/24/22
to EON ZFS Storage on behalf of Donovan Kaardal
Install.sh installs generic. Then use usb to boot on real hw where you run setup n updimg.sh

vajonam

unread,
Oct 24, 2022, 6:07:25 PM10/24/22
to EON ZFS Storage

yup that is what I am doing, but since installing new HW, you recommend I got back to generic and then run setup and updimg again. I get it now. 

vajonam

unread,
Oct 25, 2022, 10:47:00 AM10/25/22
to EON ZFS Storage
dre,

seeing lockups regardless of memory, when doing pool to pool mv/cp operations (over nfs or even locally). which makes me think this is related to something else. I have disabled hyper-v and hyper threading since I am not really using those, this is purely a zfs box. what are you thoughts on the kernel issues with this ? 

System Configuration: Supermicro X9DRL-3F/iF
BIOS Configuration: American Megatrends Inc. 3.3 07/12/2018
BMC Configuration: IPMI 2.0 (KCS: Keyboard Controller Style)

==== Processor Sockets ====================================

Version                          Location Tag
-------------------------------- --------------------------
Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz CPU 1
Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz CPU 2


Andre Lue

unread,
Oct 25, 2022, 12:56:03 PM10/25/22
to EON ZFS Storage on behalf of vajonam
Does the system logs or console capture or show anything before/after a rebooting? Are there any warning/errors about hw on boot?

If I speculate either disk, zpool or controller-driver issue. I wouldn't rule out memory, yet. I'd say try to isolate each pool, one at a time and try to find if it's one or both pools.
I'd say connect zpools one at time and do a silver. If that completes, try copying/moving between zfs. Did you possibly track which zfs/zpool to which were happening when the lock up occurred?

vajonam

unread,
Oct 25, 2022, 2:51:58 PM10/25/22
to EON ZFS Storage
system logs and console don't show anything some time  I thought I saw the console print out some lines before the system reboot automatically.  lock up happens with any pool, I can re-create at will with the new memory.  I need to set the filesize atleast as big as the memory.

dd if=/dev/zero of=/zfs/anypool/testfile bs=16k count=5128k status=progress

my /var/adm/messages is 0 bytes, think something is overwriting it. I tried the new SW, same issue so I rolled it back as I had a backup and had ssh keys and other stuff setup didn't want to bother migrating. was going to do that once I confirmed stablility.

I am considering installing Ubuntu server 22.04 with zfs now that its on linux and I know much more about zfs and also more about linux / ubuntu admin. will be more manageable for me going forward with more recent driver patches etc. I will get a SSD and install it on there. 

thank you for all your support and help so far on my journey of learning, using and trusting zfs! 

vajonam

unread,
Oct 28, 2022, 12:56:49 AM10/28/22
to EON ZFS Storage
dre,

just to close the loop here.

installed Ubuntu 22.04 with zfs. has been solid, no reboots or hangs, with all the tests  that were failing on eon/solaris. suspect there are some xeon bugs with solaris in the version I have that causes it to get into wierd states. love how far linux and zfs have come! still miss some of the solaris admin stuff :-) like format -e and go used to HDD WWNs! want it back now! 


anyhow pools imported fine, into linux, some issues with zpool import using cache file, but I first have to do upgrades etcs. will be testing this for a few days. but thanks for your help so far in helping learn zfs. 

if you build a modern omnios based image I will come back to this, given how stable this was was for over 8 years, or if the Ubuntu thing doesn't work out. 

Andre Lue

unread,
Nov 3, 2022, 5:55:18 PM11/3/22
to EON ZFS Storage on behalf of Donovan Kaardal
The next intended phase, direction was to roll a parallel minimal linux live iso version w something like casper for persistency. Maybe you can start the path???

Manojav Sridhar

unread,
Nov 6, 2022, 8:12:28 AM11/6/22
to EON ZFS Storage on behalf of dre2kse
Yup. Love the idea. As long as we have rolling updates to keep current with 
Linux and OpenZFS. 
 
If we base it on the Ubuntu kernel which has zfs compiled in makes things easier using that upstream kernel. 


On Nov 3, 2022, at 5:55 PM, EON ZFS Storage on behalf of dre2kse <eonst...@googlegroups.com> wrote:



Manojav Sridhar

unread,
Nov 14, 2022, 1:50:12 PM11/14/22
to EON ZFS Storage on behalf of dre2kse
This does exactly what you are after with alpine linux


Haven't tried it yet, maybe next upgrade cycle. 

Reply all
Reply to author
Forward
0 new messages