S11.4 x86 system panic when large file copied.

YTC#1

unread,

Dec 10, 2021, 11:33:42 AM12/10/21

to

I have an "image" iso, that was copied OK to my server, but now when I
try to either scp it to another server or copy it locally my server
crashes/panics.

S11.4
Supermicro H/W (X11SSSZ-F), I5 7400
48Gb

File size is 16Gb
*always* panics at 9.1Gb copied

Not a prod box.
runs a number of zones (including a KZ)

Just wondering if a known issue, or one that needs chasing.

I'll try and test with vbox, when I have created another large image
file :-)

And will also test my T4

--
Bruce Porter
"The internet is a huge and diverse community but mainly friendly"
http://ytc1.blogspot.co.uk/
There *is* an alternative! http://www.openoffice.org/

YTC#1

unread,

Dec 10, 2021, 12:27:50 PM12/10/21

to

On 10/12/2021 16:33, YTC#1 wrote:
> I have an "image" iso, that was copied OK to my server, but now when I
> try to either scp it to another server or copy it locally my server
> crashes/panics.
>
> S11.4
> Supermicro H/W (X11SSSZ-F), I5 7400
> 48Gb
>
>
> File size is 16Gb
> *always* panics at 9.1Gb copied
>
> Not a prod box.
> runs a number of zones (including a KZ)
>
> Just wondering if a known issue, or one that needs chasing.
>
> I'll try and test with vbox, when I have created another large image
> file :-)
>
> And will also test my T4
>

I can confirm there are no issues with S11.4 in VBox copying a large
(20Gb) file

Doug McIntyre

unread,

Dec 10, 2021, 1:03:33 PM12/10/21

to

YTC#1 <b...@ytc1-spambin.co.uk> writes:
>I have an "image" iso, that was copied OK to my server, but now when I
>try to either scp it to another server or copy it locally my server
>crashes/panics.

>S11.4
>Supermicro H/W (X11SSSZ-F), I5 7400
>48Gb

>File size is 16Gb
>*always* panics at 9.1Gb copied

>Not a prod box.
>runs a number of zones (including a KZ)

>Just wondering if a known issue, or one that needs chasing.

Most likely it is bad hardware. I've certainly dealt with many files
(including ISOs) larger than 16GB in size on Solaris boxes, as well as
a bazillion other people.

If you are running ZFS, what does 'zpool status' show? I'm guessing
you'd see errors here. You should see a bunch of zeros.

--
Doug McIntyre
do...@themcintyres.us

YTC#1

unread,

Dec 11, 2021, 4:50:49 AM12/11/21

to

That ws my 1st port of call.
---8<
pool: rpool
id: 7278334453663277700
state: ONLINE
scan: scrub repaired 0 in 3h58m with 0 errors on Tue Nov 23 13:32:11 2021
config:

NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c1t0d0 ONLINE 0 0 0
c4t2d0 ONLINE 0 0 0

---8<

I've just another scrub, as I have realsied the file was copied to the
system after Sunday.

FMA not being particlarly helful on this one :-)
---8<
Dec 08 16:42:42 5755d888-8fae-4575-8f0f-8290b018c178 SUNOS-8000-KL
Major

Suspect 1 of 1 :
Problem class : defect.sunos.kernel.panic
Certainty : 100%
Affects :
sw:///:path=/var/crash/data/5755d888-8fae-4575-8f0f-8290b018c178
Status : faulted but still in service

Resource
FMRI :
"sw:///:path=/var/crash/data/5755d888-8fae-4575-8f0f-8290b018c178"
Status : faulty

Description : The system has rebooted after a kernel panic. The
following are
potential bugs.
stack[0] - 27096339
---8<

YTC#1

unread,

Dec 11, 2021, 1:27:44 PM12/11/21

to

Well, that broke it. Good style.

Fails to boot, beyond devices, hangs at
pci@0,0/pci15d9,888@14/storage,c/esi@0,1 (ses 0) unknown
(nah, I don't know what the esi is either :-) )

It appears to have seen all the (4 disks).

Looks like I need to go into debug mode tomorrow, probably try single
disk (no mirror) boots (after I bring up an inspect via PXE)

But of course my PXE boot is a zone on the server :-) (I'll have to use
my spare on my Mac :-) ).

Looks like I will have to test my backup/DR procedures then .....

YTC#1

unread,

Dec 13, 2021, 4:14:22 AM12/13/21

to

After letting it "rest" and having a mull over it, I concluded it is
possibly a SATA controller issue. I have 2 controllers in the server
(built in and a PCIE card).

I disconnected all drives, except a single rpool, connected to the on
board SATA.
System booted.

I added a single data pool to the on board SATA.
System booted.

I added all disks to on board SATA only.
System booted.

Next test (tomorrow) will be to copy the large file again.

If it is the 2nd SATA that will annoy me, as I would have expected to
just lose 1/2 my disks if that failed, not the entire sysetm.

YTC#1

unread,

Dec 14, 2021, 5:38:10 AM12/14/21

to

And during the copy (scp to desktop), the follwoing appeared in
/var/adm/messages at approx 9.1Gb copied. The copy stalled and the continued
---8<
Dec 14 10:12:47 ytc1 genunix: [ID 408114 kern.notice]
/pci@0,0/pci15d9,888@14/storage@7 (scsa2usb2) removed
Dec 14 10:20:14 ytc1 ahci: [ID 296163 kern.warning] WARNING: ahci0: ahci
port 2 has task file error
Dec 14 10:20:14 ytc1 ahci: [ID 687168 kern.warning] WARNING: ahci0: ahci
port 2 is trying to do error recovery
Dec 14 10:20:14 ytc1 ahci: [ID 693748 kern.warning] WARNING: ahci0: ahci
port 2 task_file_status = 0x4041
Dec 14 10:20:14 ytc1 ahci: [ID 657156 kern.warning] WARNING: ahci0:
error recovery for port 2 succeed
Dec 14 10:20:14 ytc1 ahci: [ID 811322 kern.notice] NOTICE: ahci0:
ahci_tran_reset_dport port 2 reset device
Dec 14 10:20:20 ytc1 ahci: [ID 296163 kern.warning] WARNING: ahci0: ahci
port 2 has task file error
Dec 14 10:20:20 ytc1 ahci: [ID 687168 kern.warning] WARNING: ahci0: ahci
port 2 is trying to do error recovery
Dec 14 10:20:20 ytc1 ahci: [ID 693748 kern.warning] WARNING: ahci0: ahci
port 2 task_file_status = 0x4041
Dec 14 10:20:20 ytc1 ahci: [ID 657156 kern.warning] WARNING: ahci0:
error recovery for port 2 succeed
Dec 14 10:20:20 ytc1 ahci: [ID 811322 kern.notice] NOTICE: ahci0:
ahci_tran_reset_dport port 2 reset device
Dec 14 10:20:25 ytc1 ahci: [ID 296163 kern.warning] WARNING: ahci0: ahci
port 2 has task file error
Dec 14 10:20:25 ytc1 ahci: [ID 687168 kern.warning] WARNING: ahci0: ahci
port 2 is trying to do error recovery
Dec 14 10:20:25 ytc1 ahci: [ID 693748 kern.warning] WARNING: ahci0: ahci
port 2 task_file_status = 0x4041
Dec 14 10:20:25 ytc1 ahci: [ID 657156 kern.warning] WARNING: ahci0:
error recovery for port 2 succeed
Dec 14 10:20:25 ytc1 ahci: [ID 811322 kern.notice] NOTICE: ahci0:
ahci_tran_reset_dport port 2 reset device
Dec 14 10:20:30 ytc1 ahci: [ID 296163 kern.warning] WARNING: ahci0: ahci
port 2 has task file error
Dec 14 10:20:30 ytc1 ahci: [ID 687168 kern.warning] WARNING: ahci0: ahci
port 2 is trying to do error recovery
Dec 14 10:20:30 ytc1 ahci: [ID 693748 kern.warning] WARNING: ahci0: ahci
port 2 task_file_status = 0x4041
Dec 14 10:20:30 ytc1 ahci: [ID 657156 kern.warning] WARNING: ahci0:
error recovery for port 2 succeed
Dec 14 10:20:30 ytc1 ahci: [ID 811322 kern.notice] NOTICE: ahci0:
ahci_tran_reset_dport port 2 reset device
Dec 14 10:20:35 ytc1 ahci: [ID 296163 kern.warning] WARNING: ahci0: ahci
port 2 has task file error
Dec 14 10:20:35 ytc1 ahci: [ID 687168 kern.warning] WARNING: ahci0: ahci
port 2 is trying to do error recovery
Dec 14 10:20:35 ytc1 ahci: [ID 693748 kern.warning] WARNING: ahci0: ahci
port 2 task_file_status = 0x4041
Dec 14 10:20:35 ytc1 ahci: [ID 657156 kern.warning] WARNING: ahci0:
error recovery for port 2 succeed
Dec 14 10:20:36 ytc1 ahci: [ID 811322 kern.notice] NOTICE: ahci0:
ahci_tran_reset_dport port 2 reset device
Dec 14 10:20:41 ytc1 ahci: [ID 296163 kern.warning] WARNING: ahci0: ahci
port 2 has task file error
Dec 14 10:20:41 ytc1 ahci: [ID 687168 kern.warning] WARNING: ahci0: ahci
port 2 is trying to do error recovery
Dec 14 10:20:41 ytc1 ahci: [ID 693748 kern.warning] WARNING: ahci0: ahci
port 2 task_file_status = 0x4041
Dec 14 10:20:41 ytc1 ahci: [ID 657156 kern.warning] WARNING: ahci0:
error recovery for port 2 succeed
Dec 14 10:20:41 ytc1 ahci: [ID 811322 kern.notice] NOTICE: ahci0:
ahci_tran_reset_dport port 2 reset device
Dec 14 10:20:41 ytc1 scsi: [ID 583609 kern.warning] WARNING:
/pci@0,0/pci15d9,888@17/disk@2,0 (sd8): disk not responding to selection
---8<

sd8 is the rpool mirror, which had been attached to the PCIE sata controller

No issue now when copying from internal disk to internal disk (rpool to
data pool)

recopied from server to desktop (scp, message did not re-appear).

So, I guess I am looking at a HDD issue, time to buy a new one. Or maybe
upgrade to SSD :-)

YTC#1

unread,

Dec 18, 2021, 6:10:03 AM12/18/21

to

Replaced both HDDs in rpool (2Tb WD, circa 2015) with newer 2Tb HDDS.

ZFS may have let me donw not trapping a disk issue, but splitting and
replacing rpool is a doddle :-)