Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Sunfire v4100M2 GRASP board CR1 blinking

62 views
Skip to first unread message

leam hall

unread,
Jan 4, 2018, 1:44:52 PM1/4/18
to
And the server won't boot. I have two and tried swapping out the PSUs, the GRASP board, and the BR 2032 battery. While doing this the top cover is off so I can see what is blinking.

So far the GRASP board CR1 comes up green then after a bit goes to slow blinking.No error lights on the front or back but pushing the "start her up" button does nothing.

Thoughts?

Leam

leam hall

unread,
Jan 14, 2018, 8:28:18 AM1/14/18
to
I've gotten a Minicom session connected to the Mgmt port. Trying:

cd /SYS
start

Gives me:

start: Failed to start /SYS

Pushing the button on front gives nothing. No amber light. both power supplies present.

How do I interpret the logs for issues?

-> show list

/SP/logs/event/list
Targets:

Properties:

Commands:
cd
show

ID Date/Time Class Type Severity
----- ------------------------ -------- -------- --------
5415 Thu Jan 4 04:02:13 2018 Audit Log minor
root : Set : object = /SYS/power_state : value = on : error
5414 Thu Jan 4 04:02:09 2018 Audit Log minor
KCS Command : Set ACPI Power State : system power state = 0x0 : device po
wer state = no change : success
5413 Thu Jan 4 03:59:00 2018 Audit Log minor
root : Set : object = /SYS/power_state : value = on : error
5412 Thu Jan 4 03:58:55 2018 Audit Log minor
KCS Command : Set ACPI Power State : system power state = 0x0 : device po
wer state = no change : success
5411 Thu Jan 4 03:56:21 2018 Audit Log minor
root : Set : object = /SYS/power_state : value = on : error
5410 Thu Jan 4 03:56:16 2018 Audit Log minor
KCS Command : Set ACPI Power State : system power state = 0x0 : device po
wer state = no change : success
5409 Thu Jan 4 03:50:39 2018 Audit Log minor
root : Open Session : object = /session/type : value = shell : success
5408 Thu Jan 4 03:50:20 2018 IPMI Log critical
ID = e80 : 01/04/2018 : 03:50:20 : Entity Presence : io.id1.prsnt : Devi
ce Present
5407 Thu Jan 4 03:40:18 2018 IPMI Log critical
ID = e7d : 01/04/2018 : 03:40:18 : Entity Presence : io.id1.prsnt : Devi
ce Present
5406 Thu Jan 4 03:41:49 2018 IPMI Log critical
ID = e7a : 01/04/2018 : 03:41:49 : Power Supply : ps1.pwrok : State Deas
serted
5405 Thu Jan 4 03:30:21 2018 IPMI Log critical
ID = e79 : 01/04/2018 : 03:30:21 : Entity Presence : io.id1.prsnt : Devi
ce Present
5404 Thu Jan 4 03:36:37 2018 IPMI Log critical
ID = e76 : 01/04/2018 : 03:36:37 : Power Supply : ps1.vinok : State Deas
serted
5403 Thu Jan 4 03:20:32 2018 IPMI Log critical
ID = e75 : 01/04/2018 : 03:20:22 : Entity Presence : io.id1.prsnt : Devi
ce Present
5402 Thu Jan 4 03:20:22 2018 IPMI Log critical
ID = e74 : 01/04/2018 : 03:20:22 : Voltage : mb.v_bat : Lower Non-critic
al going low : reading 2.59 < threshold 2.62 Volts
5401 Thu Jan 4 03:22:56 2018 IPMI Log critical
ID = e71 : 01/04/2018 : 03:22:56 : Power Supply : ps1.vinok : State Deas
serted
5400 Thu Jan 4 03:22:55 2018 IPMI Log critical
ID = e70 : 01/04/2018 : 03:22:55 : Power Supply : ps1.pwrok : State Deas
serted
5399 Thu Jan 4 03:21:02 2018 IPMI Log critical
ID = e6f : 01/04/2018 : 03:21:02 : Power Supply : ps0.pwrok : State Asse
rted
5398 Thu Jan 4 03:20:58 2018 IPMI Log critical
ID = e6e : 01/04/2018 : 03:20:58 : Power Supply : ps0.vinok : State Asse
rted
5397 Thu Jan 4 03:20:32 2018 IPMI Log critical
ID = e6d : 01/04/2018 : 03:20:22 : Entity Presence : io.id1.prsnt : Devi
ce Present
5396 Thu Jan 4 03:20:32 2018 IPMI Log critical
ID = e6c : 01/04/2018 : 03:20:22 : Power Supply : ps0.vinok : State Deas
serted

leam hall

unread,
Jan 14, 2018, 12:20:35 PM1/14/18
to
Reset the server and got the following in the event logs.


5420 Thu Jan 4 07:43:28 2018 Audit Log minor
root : Open Session : object = /session/type : value = shell : success
5419 Thu Jan 4 07:40:19 2018 IPMI Log critical
ID = e84 : 01/04/2018 : 07:40:18 : Entity Presence : io.id1.prsnt : Devi
ce Present
5418 Thu Jan 4 07:41:06 2018 IPMI Log critical
ID = e81 : 01/04/2018 : 07:41:06 : Power Supply : ps1.vinok : State Deas
serted

Scott Packard

unread,
Jan 15, 2018, 1:55:53 PM1/15/18
to
On another vendor’s x64 server if it can’t see RAM then it won’t start. It’s been awhile since I’ve used Sun x64-based hardware. In the old days a blade chassis needed power for about 1 hour before I could power on a blade; I’m wondering how many minutes power must be applied to an x4100 before attempting power on.
I’d use a DMM to measure voltage into the power supplies, to make sure it was the minimum spec for the power supply, then try reseating RAM.

I’d put the cover back on; docs mention a chassis intrusion switch.

You’ve Googled for sun fire x4100m2 service manual, I assume.

Verifying cause of NO chassis power:

Visually inspect each power supply for the status of the AC Present, Power OK, and Fault LEDs. If the Fault LED is illuminated on any of the PSUs then further troubleshooting will be required.

If AC Present is NOT illuminated, ensure the AC power cords are securely plugged into the server and connected to working AC power outlet(s).

If Power OK is NOT illuminated, but AC Present IS, then further troubleshooting will be required. Refer to the system Servers Service Manual and Servers Diagnostics Guide for additional troubleshooting steps.


Display System Event Logs, and sensor & fault indicator information:

ILOM:
show /SP/logs/event/list
show -d properties -level all /SYS
show -o table -level all /SP/faultmgmt (Not available in all ILOM versions).

Regards, Scott

DoN. Nichols

unread,
Jan 15, 2018, 6:45:57 PM1/15/18
to
On 2018-01-15, Scott Packard <spac...@gmail.com> wrote:

> On another vendor?s x64 server if it can?t see RAM then it won?t
> start. It?s been awhile since I?ve used Sun x64-based hardware. In the
> old days a blade chassis needed power for about 1 hour before I could
> power on a blade; I?m wondering how many minutes power must be applied
> to an x4100 before attempting power on.

O.K. I haven't observed that with mine. It may be an artifact
of a tired configuration battery. (See man page indicated below about that.)

> I?d use a DMM to measure voltage into the power supplies, to make sure
> it was the minimum spec for the power supply, then try reseating RAM.

> I?d put the cover back on; docs mention a chassis intrusion switch.

Yes -- IIRC, it is a magnet in the cover which actuates a reed
switch along one of the sides. The top *must* be on to allow the system
to power on. IIRC, there is partial power to some diagnoistic circuits
to allow LEDs to indicate bad RAM DIMMs and such.

Aside from that, one of the quoted log entries said that the
battery voltage was below threshold, so you should replace that. I find
this information (for the X4100M2) in the 819-1157-23 service manual, on
PDF page 105.

Without that, configuration settings will be lost when power
goes away.

> You?ve Googled for sun fire x4100m2 service manual, I assume.

A search for "819-1157-23.pdf" should lead you to it.

> Verifying cause of NO chassis power:

> Visually inspect each power supply for the status of the AC Present,
> Power OK, and Fault LEDs. If the Fault LED is illuminated on any of the
> PSUs then further troubleshooting will be required.

> If AC Present is NOT illuminated, ensure the AC power cords are
> securely plugged into the server and connected to working AC power
> outlet(s).

> If Power OK is NOT illuminated, but AC Present IS, then further
> troubleshooting will be required. Refer to the system Servers Service
> Manual and Servers Diagnostics Guide for additional troubleshooting
> steps.

Good Luck,
DoN.

--
Remove oil spill source from e-mail
Email: <BPdnic...@d-and-d.com> | (KV4PH) Voice (all times): (703) 938-4564
(too) near Washington D.C. | http://www.d-and-d.com/dnichols/DoN.html
--- Black Holes are where God is dividing by zero ---

leam hall

unread,
Jan 24, 2018, 11:02:24 AM1/24/18
to
On Monday, January 15, 2018 at 6:45:57 PM UTC-5, DoN. Nichols wrote:
> On 2018-01-15, Scott Packard <> wrote:


Hadn't set the group to send mails, sorry for the delay in replying.

The servers were in the garage during the cold front so a weak battery makes sense. I swapped out the one I could with a brand new, but no luck. There was another thing that looked like a multi-battery pack in shrinkwrap but it didn't want to come off the board. Need to study it more when warmer.

What I was looking at is Item 12 on page 21:

https://docs.oracle.com/cd/E19121-01/sf.x4200m2/819-1157-23/819-1157-23.pdf

DoN. Nichols

unread,
Jan 24, 2018, 9:51:39 PM1/24/18
to
On 2018-01-24, leam hall <leam...@gmail.com> wrote:
> On Monday, January 15, 2018 at 6:45:57 PM UTC-5, DoN. Nichols wrote:
>> On 2018-01-15, Scott Packard <> wrote:
>

> Hadn't set the group to send mails, sorry for the delay in replying.

> The servers were in the garage during the cold front so a weak battery
> makes sense.

Also, the environment values on PDF page 184 of the document
below may be your problem if you were trying to run it in the cold.

======================================================================
Temperature 41 - 95 Deg F
(operating) 5 - 35 Deg C

Temperature -40 - 158 Deg F
(storage) -40 - 70 Deg C
======================================================================


> I swapped out the one I could with a brand new, but no
> luck. There was another thing that looked like a multi-battery pack in
> shrinkwrap but it didn't want to come off the board. Need to study it
> more when warmer.

> What I was looking at is Item 12 on page 21:

> https://docs.oracle.com/cd/E19121-01/sf.x4200m2/819-1157-23/819-1157-23.pdf

PDF page 106 has a photo of the cell in its holder and
instructions on replacing it.

You might want to look up the section on resetting the CMOS
memory (pages 81 and 87) as it may have been corrupted by the low
voltage in the previous cell.

leam hall

unread,
Jan 30, 2018, 2:36:56 PM1/30/18
to
On Wednesday, January 24, 2018 at 9:51:39 PM UTC-5, DoN. Nichols wrote:
Thanks! I re-replaced the battery and made sure it was turned the right way this time. Don't have a jumper so I used a small screwdriver blade to short between the two poles of the jumper. Plugged her back in and still no go.

My bet is that the temperature was the issue. I *assume* the screwdriver make a good enough contact to serve as a jumper. Hmm...I wonder if I have any old hard drives laying around with jumpers on them...

<some time later>
Found a jumper. Went through four batteries, including the one from a once-working server that now doesn't want to work. :(

Couple batteries were very close to the tolerance, 2.61 measured with 2.62 minimal. The server said it wasn't a critical error but still didn't come up.

DoN. Nichols

unread,
Jan 30, 2018, 8:00:46 PM1/30/18
to
Most modern drives use smaller jumpers, though old enough ones
might supply what you need.

><some time later>

> Found a jumper. Went through four batteries, including the one from a
> once-working server that now doesn't want to work. :(

> Couple batteries were very close to the tolerance, 2.61 measured with
> 2.62 minimal. The server said it wasn't a critical error but still
> didn't come up.

I seem to remember that you have to have the cover in place
during the power up to reset the data. You don't say whether it is
closed or not, but there is a sensor (Magnet & Reed switch) to tell
whether the cover is in place or not.

leam hall

unread,
Jan 31, 2018, 4:38:31 PM1/31/18
to
On Tuesday, January 30, 2018 at 8:00:46 PM UTC-5, DoN. Nichols wrote:
Most of my stuff is old. Drives too. ;)

Bought a new pack of batteries and have gone through a couple on the one server.

5515 Thu Jan 4 11:23:26 2018 Audit Log minor
root : Open Session : object = /session/type : value = shell : success
5514 Thu Jan 4 11:20:33 2018 IPMI Log critical
ID = ef3 : 01/04/2018 : 11:20:32 : Voltage : mb.v_bat : Lower Non-recove
rable going low : reading 0.62 < threshold 2.34 Volts
5513 Thu Jan 4 11:20:33 2018 IPMI Log critical
ID = ef2 : 01/04/2018 : 11:20:27 : Voltage : mb.v_bat : Lower Critical g
oing low : reading 0.62 < threshold 2.53 Volts
5512 Thu Jan 4 11:20:33 2018 IPMI Log critical
ID = ef1 : 01/04/2018 : 11:20:22 : Entity Presence : io.id1.prsnt : Devi
ce Present


Did the "jumper on", close lid, power up thing multiple times. The other server came back around, this one didn't. Is the "mb.v_bat" the CMOS battery or the one next to it?

http://reuel.net/images/v4100_batteries.jpg

Thanks!

Leam

DoN. Nichols

unread,
Jan 31, 2018, 10:14:53 PM1/31/18
to
On 2018-01-31, leam hall <leam...@gmail.com> wrote:

> Most of my stuff is old. Drives too. ;)
>
> Bought a new pack of batteries and have gone through a couple on the one server.
>
> 5515 Thu Jan 4 11:23:26 2018 Audit Log minor
> root : Open Session : object = /session/type : value = shell : success
> 5514 Thu Jan 4 11:20:33 2018 IPMI Log critical
> ID = ef3 : 01/04/2018 : 11:20:32 : Voltage : mb.v_bat : Lower Non-recove
> rable going low : reading 0.62 < threshold 2.34 Volts
> 5513 Thu Jan 4 11:20:33 2018 IPMI Log critical
> ID = ef2 : 01/04/2018 : 11:20:27 : Voltage : mb.v_bat : Lower Critical g
> oing low : reading 0.62 < threshold 2.53 Volts
> 5512 Thu Jan 4 11:20:33 2018 IPMI Log critical
> ID = ef1 : 01/04/2018 : 11:20:22 : Entity Presence : io.id1.prsnt : Devi
> ce Present
>
>
> Did the "jumper on", close lid, power up thing multiple times. The other server came back around, this one didn't. Is the "mb.v_bat" the CMOS battery or the one next to it?
>
> http://reuel.net/images/v4100_batteries.jpg
>
> Thanks!
>
> Leam


Looking at that photo, and comparing it with PDF page 106 in
819-1157-23.pdf (printed page # 3-16), I think that you have the battery
in the holder backwards.

======================================================================
Note ­ Install the new battery in the holder with the same
orientation (polarity) as the battery that you removed. The positive
polarity, marked with a "+" symbol, should be facing toward the chassis
center.
======================================================================

and the '+' side is the larger flat side. Your photo shows it facing
the handle, which is near the outer edge, not the chassis center.

The black heat-shrink enclosed part I believe to be a very high
value, low voltage capacitor, to maintain info in the memory when you
are replacing the battery (if it was not so low that it had already lost
data.)

I've got an X4200, and an X4100M2, and I think that you have the
X4100. I'm not going to shut down the two systems (which are both doing
things which I need to keep running, given power) just so I can examine
them.

leam hall

unread,
Feb 1, 2018, 8:54:44 AM2/1/18
to
Don, I really appreciate all the help!

The server self-identifies:

product_name = SUN FIRE X4100 M2
product_part_number = 602-4492-01


I did the full 'jumper, power on, close lid' reset bit with a third battery. This time I let it sit for a few minutes. The low battery seems to be during the reset time given the gap in /SP/logs/event/list.

At present "start /SYS" fails. It shows a PSU_FAULT and there are amber lights. However, both PSUs have two green lights on the rear.

Next step is to try resetting PSUs and see what happens. Will keep you updated.

Thanks!

Leam

leam hall

unread,
Feb 1, 2018, 8:59:52 AM2/1/18
to
Hrmph. Maybe I was wrong about the battery issue just being during the reset. This is with the third brand new CR2032 battery with the positive side facing the center of the motherboard. I pulled the plugs, reseated one PSU and pulled the other. Here's the /SP/logs/event/list:


5539 Thu Jan 4 14:33:17 2018 Audit Log minor
root : Open Session : object = /session/type : value = shell : success
5538 Thu Jan 4 14:30:30 2018 IPMI Log critical
ID = f0a : 01/04/2018 : 14:30:30 : Voltage : mb.v_bat : Lower Non-recove
rable going low : reading 0.62 < threshold 2.34 Volts
5537 Thu Jan 4 14:30:30 2018 IPMI Log critical
ID = f09 : 01/04/2018 : 14:30:25 : Voltage : mb.v_bat : Lower Critical g
oing low : reading 0.62 < threshold 2.53 Volts
5536 Thu Jan 4 14:30:30 2018 IPMI Log critical
ID = f08 : 01/04/2018 : 14:30:19 : Entity Presence : io.id1.prsnt : Devi
ce Present
5535 Thu Jan 4 14:30:19 2018 IPMI Log critical
ID = f07 : 01/04/2018 : 14:30:19 : Voltage : mb.v_bat : Lower Non-critic
al going low : reading 0.62 < threshold 2.62 Volts
5534 Thu Jan 4 14:36:37 2018 IPMI Log critical
ID = f05 : 01/04/2018 : 14:36:37 : Power Supply : ps1.pwrok : State Deas
serted

DoN. Nichols

unread,
Feb 1, 2018, 11:22:18 PM2/1/18
to
On 2018-02-01, leam hall <leam...@gmail.com> wrote:
> On Thursday, February 1, 2018 at 8:54:44 AM UTC-5, leam hall wrote:
>> Don, I really appreciate all the help!
>>
>> The server self-identifies:
>>
>> product_name = SUN FIRE X4100 M2
>> product_part_number = 602-4492-01
>>

>> I did the full 'jumper, power on, close lid' reset bit with a third
>> battery. This time I let it sit for a few minutes. The low battery seems
>> to be during the reset time given the gap in /SP/logs/event/list.

>> At present "start /SYS" fails. It shows a PSU_FAULT and there are
>> amber lights. However, both PSUs have two green lights on the rear.

>> Next step is to try resetting PSUs and see what happens. Will keep
>> you updated.

>> Thanks!
>>
>> Leam

> Hrmph. Maybe I was wrong about the battery issue just being during the
> reset. This is with the third brand new CR2032 battery with the positive
> side facing the center of the motherboard. I pulled the plugs, reseated
> one PSU and pulled the other. Here's the /SP/logs/event/list:


CR2032? I seem to remember that the X4100M2 and X4200 use a
somewhat smaller cell. I went upstairs to see if I could find a saved
low cell to refresh my memory on the size used. I had to go to
Batteries Plus, and they had a few (not on display)

If I am right, the larger diameter CR2032 can't make contact to
both contacts in the holder at the same time.

Do you have the original cells which you pulled from the system?
Is it possible that someone previous to you manage to force in the
CR2032 cell where it did not belong?

But the photo in the manual does look like about CR2032 size.
Maybe the weird size was in the T2000 instead.

O.K. CR1225 in the T2000 -- that was what I was remembering.

Sorry about the mis-remembered value.

If that black heat-shrink wrapped cylinder is really a very high
value low voltage capacitor -- the reverse cell mounting (if it makes
proper contact with both sides of the cell, which it may not do) could
have put a reverse charge on it. If so, that may take hours to recharge
the capacitor.

Good Luck,
DoN.

>
> 5539 Thu Jan 4 14:33:17 2018 Audit Log minor
> root : Open Session : object = /session/type : value = shell : success
> 5538 Thu Jan 4 14:30:30 2018 IPMI Log critical
> ID = f0a : 01/04/2018 : 14:30:30 : Voltage : mb.v_bat : Lower Non-recove
> rable going low : reading 0.62 < threshold 2.34 Volts
> 5537 Thu Jan 4 14:30:30 2018 IPMI Log critical
> ID = f09 : 01/04/2018 : 14:30:25 : Voltage : mb.v_bat : Lower Critical g
> oing low : reading 0.62 < threshold 2.53 Volts
> 5536 Thu Jan 4 14:30:30 2018 IPMI Log critical
> ID = f08 : 01/04/2018 : 14:30:19 : Entity Presence : io.id1.prsnt : Devi
> ce Present
> 5535 Thu Jan 4 14:30:19 2018 IPMI Log critical
> ID = f07 : 01/04/2018 : 14:30:19 : Voltage : mb.v_bat : Lower Non-critic
> al going low : reading 0.62 < threshold 2.62 Volts
> 5534 Thu Jan 4 14:36:37 2018 IPMI Log critical
> ID = f05 : 01/04/2018 : 14:36:37 : Power Supply : ps1.pwrok : State Deas
> serted


leam hall

unread,
Feb 3, 2018, 12:11:27 PM2/3/18
to
Hadn't thought about the cap discharge. Both servers are now flaky. Plugged in one and will let it sit for the rest of the day. Will let you know how it comes out.

leam hall

unread,
Feb 6, 2018, 6:12:01 AM2/6/18
to
On Saturday, February 3, 2018 at 12:11:27 PM UTC-5, leam hall wrote:
> Hadn't thought about the cap discharge. Both servers are now flaky. Plugged in one and will let it sit for the rest of the day. Will let you know how it comes out.

Didn't help. Server powers up fine but won't start the OS.

DoN. Nichols

unread,
Feb 6, 2018, 9:48:27 PM2/6/18
to
Hmm ... pretty much all that I can do. Is it possible that you
have some bad RAM DIMMs in there? IIRC, there are LEDs to indicate
every bad DIMM (and you need a lot of DIMMs to make a valid bank, IIRC.
Also, there are LEDs to indicate bad CPUs.

I only have the two systems (X4100M2 and X4200) and since they
are in service I can't pull them down to check other things.

All the fans are good, I hope?

Good Luck,
DoN.

leam hall

unread,
Feb 15, 2018, 6:49:43 PM2/15/18
to
On Tuesday, February 6, 2018 at 9:48:27 PM UTC-5, DoN. Nichols wrote:
> On 2018-02-06, leam hall <l> wrote:
> > On Saturday, February 3, 2018 at 12:11:27 PM UTC-5, leam hall wrote:
> >> Hadn't thought about the cap discharge. Both servers are now flaky. Plugged in one and will let it sit for the rest of the day. Will let you know how it comes out.
> >
> > Didn't help. Server powers up fine but won't start the OS.
>
> Hmm ... pretty much all that I can do. Is it possible that you
> have some bad RAM DIMMs in there? IIRC, there are LEDs to indicate
> every bad DIMM (and you need a lot of DIMMs to make a valid bank, IIRC.
> Also, there are LEDs to indicate bad CPUs.
>
> I only have the two systems (X4100M2 and X4200) and since they
> are in service I can't pull them down to check other things.
>
> All the fans are good, I hope?
>
> Good Luck,
> DoN.

Hey Don, sorry it took me a bit to respond. Here's the funny. I used a totally new battery, got the same results. Pulled the battery out and fired up the system, same results. So it doesn't matter at the battery at all.

*sigh*

Wish the RAM fit into my Dell box, at least.

0 new messages