Fwd: [engaging1-users] Fwd: Some additional background on the MGHPCC outage

0 views
Skip to first unread message

Daniel Kamalic

unread,
Dec 10, 2013, 9:38:39 AM12/10/13
to moc-te...@googlegroups.com
FYI, this is why we couldn't connect to the MRI hardware when we tried
-- there's still power issues in the datacenter... :(


-------- Original Message --------
Subject: [engaging1-users] Fwd: Some additional background on the MGHPCC
outage
Date: Tue, 10 Dec 2013 06:47:11 -0500
From: Chris Hill <c...@mit.edu>
Reply-To: c...@mit.edu, Engaging1 Cluster Users <engagin...@mit.edu>
To: engagin...@mit.edu

Hi All,

Slight delay in getting power restored (see below).
Plan is for someone who can work on the distribution
feed switch to be onsite this morning. Apologies for the delay.


Chris

---------- Forwarded message ----------
From: John Goodhue <jtgo...@mghpcc.org>
Date: Tue, Dec 10, 2013 at 12:45 AM
Subject: Some additional background on the MGHPCC outage
To: "gl...@bu.edu" <gl...@bu.edu>, Ralph Zottola
<RZot...@umassp.edu>, "r.shr...@neu.edu" <r.shr...@neu.edu>,
James Cuff <james...@harvard.edu>, Christopher N Hill <c...@mit.edu>
Cc: MGHPCC Management List <mghpcc-m...@mghpcc.org>


Glenn, Ralph, Rajiv, James, Chris -

Here�s some more detail on what�s happening with the MHGPCC this evening.

After finishing all of the testing, Mark, Kevin, and a team of field
engineers started restoring power to the facility. Early in the
process, two failures occurred in an interrupter that protects the
34.5KV side of equipment in the MGHPCC transformer yard.
Unfortunately this equipment has to work for our utility power feed to
function safely. After several hours, it became clear that we were
not going to be able to fix or work around the failures without
support from factory engineers, who would not be available until
morning.

The facility is currently running on generator power (which is not
affected by the interruptor malfunction). This enables basic lighting
along with security, fire detection, etc. In the morning, we will
pursue two possible paths to resolution � working with the
manufacturer to replace the failed components, and working with HG&E
on a possible way to temporarily bypass the failed equipment while
preserving the needed protection.

Obviously this is an extremely disappointing conclusion to an
otherwise successful maintenance exercise. We will send news as it
happens to the alerts list, with at least one update every two hours.
Feel free to call with questions any time.

With apologies to you an your user communities,

John

_______________________________________________
Engaging1-users mailing list
Engagin...@mit.edu
http://mailman.mit.edu/mailman/listinfo/engaging1-users


Reply all
Reply to author
Forward
0 new messages