Need to get SC1458 up and working

21 views
Skip to first unread message

ads...@lps.umd.edu

unread,
Jun 4, 2012, 1:43:18 PM6/4/12
to SiCortex Users
I have just started a new job and need to get a Sc1458 machine up and
running. It was moved to this location and has not worked since the
move. I was able to boot the SSP and scboot all of the nodes. The
problem I am having is local will not start. This is where the script
is for the Lustre file system that also does not start. Anyone have
any ideas.

Lawrence Stewart

unread,
Jun 4, 2012, 2:19:37 PM6/4/12
to sicorte...@googlegroups.com, ste...@serissa.com
Sounds like there is nothing much wrong with the hardware, except that the storage subsystem isn't working.

Step 1: Figure out how the storage is <supposed> to work
Step 2: Make sure that the storage <hardware> is all working
Step 3: Figure out if you really want to keep using lustre, or could do something simpler
Step 4: Reconfigure the storage subsystem to do the new thing or to debug the old
thing if you want to stay with it.

To get started, read the boot script and figure out what parts are launching the storage
system. Typically these will be somewhere like ssp:/opt/sicortex/config/local.d

The advantage of lustre is that it is a parallel file system, which enables higher performance with
many parallel readers/writers. However, if the underlying storage system is just a single rack of disks
there really isn't much point to it, since the max bw is only going to be a few hundred MB/sec
You might be able to just NFS mount the storage and get something more robust and almost as fast.
> --
> You received this message because you are subscribed to the Google Groups "SiCortex Users" group.
> To post to this group, send email to sicorte...@googlegroups.com.
> To unsubscribe from this group, send email to sicortex-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/sicortex-users?hl=en.
>

Scalo, Albert

unread,
Jun 4, 2012, 3:08:43 PM6/4/12
to sicorte...@googlegroups.com
I am looking into storage system now. But could that also cause local not to start. Every time I boot the SSP I get the error Failed to start local.

-------------------------------------------
Al Scalo
System Engineer
Contractor - SERCO-NA
Laboratory of Physical Sciences
Research Park
443-654-7921
--------------------------------------------

Lawrence Stewart

unread,
Jun 4, 2012, 4:57:59 PM6/4/12
to sicorte...@googlegroups.com, Lawrence Stewart
The message "Failed to start local" comes from the shell script which brings the system into multiuser
mode. See

ssp:/opt/sicortex/rootfs/default/etc/runlevels/default/local

Basically this message happens when shell scripts in ssp:/opt/sicortex/config/local.d fail.

So what you have to do is look at the scripts in local.d and figure out which one or ones
are failing and why, then fix it.

Typically, the only stuff in local.d are scripts that set up global storage systems like lustre.
You can move <all> of them somewhere else just to see if the rest of the machine is working.

(One of the things scboot does is to copy the scripts in /opt/sicortex/config/local.d into the
right place in the root file system before boot.

See also the script

ssp:/opt/sicortex/rootfs/default/etc/conf.d/local.start

Evidently the local.d scripts are run on both the SSP and the ICE9 nodes at boot time, so
something in there is unhappy during ssp boot.

local.d ships empty from Sicortex, so anything there was added locally to your machine.
The only related stuff is in ssp:/opt/sicortex/script_examples which are starting points
for particular sites to customize these local.d scripts.

-Larry

Kem Stewart

unread,
Jun 5, 2012, 9:12:12 AM6/5/12
to sicorte...@googlegroups.com
Hi Larry,

I still have a pile of clock cards, so if someone looks like they need one, or needs some diag results interpreted, write me directly to make sure you get my attention.  I'm spending too much time hanging out on 40m and redesigning/rebuilding boat anchors I've bought on eBay (and trying to restore my CW skills to something respectable) to monitor the SiCortex users' group.

I hope you're enjoying it in Kendall Sq.  Heller, Nussbaum, etc. are pretty interesting guys to work with (they were MIT classmates/labmates and TMC colleagues).

73
Kem
K5KEM

--
Kem Stewart
kem.s...@alum.mit.edu
781-674-2301

Reply all
Reply to author
Forward
0 new messages