Problem to conect to coma

5 views
Skip to first unread message

Mariana Vargas Magana

unread,
Dec 17, 2014, 4:34:48 PM12/17/14
to [Warp-and-Coma]
Dear all

Is anyone else having problems to connect to coma?

Best


Dr. Mariana Vargas-Magaña
Post-Doctoral Researcher
Carnegie Mellon University
email: mma...@andrew.cmu.edu, mar...@apc.univ-paris7.fr

Shadab Alam

unread,
Dec 17, 2014, 4:35:25 PM12/17/14
to Mariana Vargas Magana, [Warp-and-Coma]
Yes, Me too :(
--
Shadab Alam,
CMU,
Pittsburgh;
USA. 

Edward Walter

unread,
Dec 17, 2014, 4:59:39 PM12/17/14
to warp_a...@googlegroups.com
It looks like coma experienced a kernel panic. I suspect this is
related to the disk volume hosting people's home directories being
filled completely. The stack trace included a number of filesystem
warnings.

We're rebooting now and will have a disk utilization report generated
shortly.

Thanks.

--

Ed Walter
Technical Manager, Unix Services
SCS Computing Facilities
Carnegie Mellon University

On 12/17/2014 04:35 PM, Shadab Alam wrote:
> Yes, Me too :(
>
> On Wed, Dec 17, 2014 at 4:34 PM, Mariana Vargas Magana
> <mma...@andrew.cmu.edu <mailto:mma...@andrew.cmu.edu>> wrote:
>
> Dear all
>
> Is anyone else having problems to connect to coma?
>
> Best
>
>
> Dr. Mariana Vargas-Magaña
> Post-Doctoral Researcher
> Carnegie Mellon University
> email: mma...@andrew.cmu.edu <mailto:mma...@andrew.cmu.edu>,
> mar...@apc.univ-paris7.fr <mailto:mar...@apc.univ-paris7.fr>

Edward Walter

unread,
Dec 17, 2014, 5:08:52 PM12/17/14
to warp_a...@googlegroups.com
Ok, coma is back up and online.


The disk volume hosting home directories is full:
Filesystem Size Used Avail Use% Mounted on
/dev/sdb1 5.0T 5.0T 184K 100% /export


Here's the per user space consumption for that volume (sorted by size):
User ID Used Soft Hard Warn/Grace

mgwalker 1.7T 0 0 00 [------]
anat 1.1T 0 0 00 [------]
mmagana 511.6G 0 0 00 [------]
shadaba 325.8G 0 0 00 [------]
sukhdees 177.3G 0 0 00 [------]
zonggel 176.9G 0 0 00 [------]
yfeng1 136.8G 0 0 00 [------]
nbattia 105.5G 0 0 00 [------]
tmudholk 103.4G 0 0 00 [------]
hungjinh 93.9G 0 0 00 [------]
yenchic 90.7G 0 0 00 [------]
shirleyh 86.2G 0 0 00 [------]
hongyuz 79.7G 0 0 00 [------]
rcroft 74.3G 0 0 00 [------]
kosumi 73.6G 0 0 00 [------]
rcoconne 61.3G 0 0 00 [------]
root 39.9G 0 0 00 [------]
rmebane 32.8G 0 0 00 [------]
yingzu 22.2G 0 0 00 [------]
mozbek 19.9G 0 0 00 [------]
rmandelb 19.4G 0 0 00 [------]
nishanta 16.7G 0 0 00 [------]
cmhicks 16.7G 0 0 00 [------]
andreakl 12.6G 0 0 00 [------]
plaplant 10.6G 0 0 00 [------]
kkandasa 7.6G 0 0 00 [------]
ewalter 7.2G 0 0 00 [------]
sbird 4.9G 0 0 00 [------]
hytrac 4.1G 0 0 00 [------]
kmckeoug 3.6G 0 0 00 [------]
msimet 3.3G 0 0 00 [------]
siyuh 1.1G 0 0 00 [------]
tcv 608K 0 0 00 [------]
nkhandai 368K 0 0 00 [------]
pmassey 340K 0 0 00 [------]
pmansfie 216K 0 0 00 [------]
apopstef 72K 0 0 00 [------]
tiziana 32K 0 0 00 [------]
tdessup 28K 0 0 00 [------]
dpapale 28K 0 0 00 [------]
lbartsch 24K 0 0 00 [------]
tinatin 20K 0 0 00 [------]
rsargent 20K 0 0 00 [------]
johnw1 20K 0 0 00 [------]
cdegraf 20K 0 0 00 [------]
anirbanj 20K 0 0 00 [------]
ab560 20K 0 0 00 [------]

I would expect usage to be problematic until people clean up their files
and move things to some of the large disk volumes.

Thanks.

-Ed

Rachel Mandelbaum

unread,
Dec 17, 2014, 7:39:22 PM12/17/14
to Edward Walter, warp_a...@googlegroups.com
Hi all -

I have a few comments and suggestions:

1) Not all coma users are on this list, so this isn’t a great way to get to all the people who need to move data.  (I personally spoke to 2 people who are on coma but not this list just today.)  Is there an announcement list that we know reliably includes all coma users?  This is not the only time we might want to get to all users, so such a list could be helpful in future.

2) Do we have a document that clearly explains the different storage locations that are available?  I haven’t seen one, and I think that might be a part of the problem here.  Not everyone knows where to store large amounts of data, what places are backed up and what are not, etc.  (Actually, something explaining the different queues would be a good idea, too, and the overall cluster structure.)  I will say that as an advisor I try to tell my students/postdocs what the situation is, but it’s easy to forget a detail about queues or storage or whatever.  A standardized location with all that info that we can send all new coma users to would be very helpful.  If there is one, then I apologize for raising a false alarm!  

3) I think it’s a good idea to have a quota on /home.  This need not be anything very onerous (e.g., it could be 100GB or 200GB per user), but just enough to avoid accidentally dumping lots of data suddenly in /home and causing cluster-wide trouble.

- Rachel

-------------------------------

Rachel Mandelbaum



Matthew Walker

unread,
Dec 17, 2014, 8:09:32 PM12/17/14
to Rachel Mandelbaum, Edward Walter, warp_a...@googlegroups.com
As today's worst hog--sorry everyone!--I agree about enforcing a quota on /home.  That would also help to motivate creating/maintaining/distributing suitable documentation, as maxed-out users will be forced (before crashing the machine) to find appropriate alternative storage.
Matt





Sukhdeep Singh

unread,
Dec 17, 2014, 8:55:33 PM12/17/14
to Rachel Mandelbaum, Edward Walter, warp_a...@googlegroups.com
Hi All,

On Wed, Dec 17, 2014 at 7:39 PM, Rachel Mandelbaum <rman...@andrew.cmu.edu> wrote:
Hi all -

I have a few comments and suggestions:

1) Not all coma users are on this list, so this isn’t a great way to get to all the people who need to move data.  (I personally spoke to 2 people who are on coma but not this list just today.)  Is there an announcement list that we know reliably includes all coma users?  This is not the only time we might want to get to all users, so such a list could be helpful in future.

I suggest adding cmucosmo account to the google group, since it has wider reach (already sent a request). Though I can imagine that cmucosmo may be too wide.
 

2) Do we have a document that clearly explains the different storage locations that are available?  I haven’t seen one, and I think that might be a part of the problem here.  Not everyone knows where to store large amounts of data, what places are backed up and what are not, etc.  (Actually, something explaining the different queues would be a good idea, too, and the overall cluster structure.)  I will say that as an advisor I try to tell my students/postdocs what the situation is, but it’s easy to forget a detail about queues or storage or whatever.  A standardized location with all that info that we can send all new coma users to would be very helpful.  If there is one, then I apologize for raising a false alarm!  

I would really like to have access to some good documentation because my information is quite limited. As a start, I started a google doc on cmucosmo and shared it with the group. We can all edit it and come up with a good solution. The mail probably didn't go through due to privacy settings.

Sukhdeep

shirley ho

unread,
Dec 18, 2014, 1:15:49 AM12/18/14
to Rachel Mandelbaum, Edward Walter, warp_a...@googlegroups.com
I created this list a while ago, hoping that it will organically evolve to include all users (somehow people will tell the others to get on the list). 

@Ed, do you think it is possible for you to tell the new users as they come in to join the list ? (or request it on their behalf?) 
I can give away my approval power quite easily, but so far very few people have ever requested to join the list. 


3) I think it’s a good idea to have a quota on /home.  This need not be anything very onerous (e.g., it could be 100GB or 200GB per user), but just enough to avoid accidentally dumping lots of data suddenly in /home and causing cluster-wide trouble.

Yes. absolutely, 

Shirley

Rachel Mandelbaum

unread,
Dec 18, 2014, 8:41:55 AM12/18/14
to shirley ho, Edward Walter, warp_a...@googlegroups.com
Hi Shirley,

On Dec 18, 2014, at 1:15 AM, shirley ho <shirley...@gmail.com> wrote:

I created this list a while ago, hoping that it will organically evolve to include all users (somehow people will tell the others to get on the list). 

@Ed, do you think it is possible for you to tell the new users as they come in to join the list ? (or request it on their behalf?) 
I can give away my approval power quite easily, but so far very few people have ever requested to join the list. 

Yes, it would be useful if the welcome message people receive when they get a new Coma account says something about this list.  Would that be possible?



3) I think it’s a good idea to have a quota on /home.  This need not be anything very onerous (e.g., it could be 100GB or 200GB per user), but just enough to avoid accidentally dumping lots of data suddenly in /home and causing cluster-wide trouble.

Yes. absolutely, 

Ed, is this something that can easily be implemented?

Sukhdeep - thanks for starting that google doc.  

- Rachel

Melanie Simet

unread,
Dec 19, 2014, 3:48:41 PM12/19/14
to [Warp-and-Coma]
I added some stuff to the Google doc, mostly about using a queuing system since I didn't know anything about that when I started.  Feel free to edit as needed.

Unfortunately, adding cmucosmo to this list won't really help get the word out: it's a single account, not a mailing list.  So unless somebody goes and forwards every email it gets to the mcwilliams or cosmocoffee lists, it'll just go to the single account that a few of us use...

Melanie

Edward Walter

unread,
Dec 19, 2014, 4:09:11 PM12/19/14
to shirley ho, Rachel Mandelbaum, warp_a...@googlegroups.com
Hi Shirley,

I'm happy to point people to the existing list via a login message on
coma itself.

If you'd like us to manage a mailing list that includes all of the coma
and warp physics users; we'd be happy to do so using our existing
mailman server. Just let me know and we'll get it setup.

For quotas: At this point, everyone is below 300GB utilization except
"anat" who's no longer active. I've archived his home directory to
nas-0-1 and will remove his data from /home/anat on Monday. Given that;
lets start out with a 300 GB quota (which I will also apply on Monday).

Thanks,

-Ed

ps. Here's the current utilization:

User ID Used Soft Hard Warn/Grace
anat 1.1T 0 0 00 [------]
shadaba 274.6G 0 0 00 [------]
zonggel 176.9G 0 0 00 [------]
sukhdees 129.3G 0 0 00 [------]
mmagana 121.7G 0 0 00 [------]
nbattia 105.5G 0 0 00 [------]
yfeng1 104.2G 0 0 00 [------]
tmudholk 103.4G 0 0 00 [------]
yenchic 90.7G 0 0 00 [------]
shirleyh 86.2G 0 0 00 [------]
hongyuz 79.7G 0 0 00 [------]
rcroft 74.3G 0 0 00 [------]
rcoconne 61.3G 0 0 00 [------]
root 39.9G 0 0 00 [------]
rmebane 32.8G 0 0 00 [------]
yingzu 22.2G 0 0 00 [------]
mozbek 20.7G 0 0 00 [------]
kosumi 20.1G 0 0 00 [------]
nishanta 16.9G 0 0 00 [------]
cmhicks 16.7G 0 0 00 [------]
rmandelb 15.3G 0 0 00 [------]
andreakl 12.6G 0 0 00 [------]
plaplant 11.7G 0 0 00 [------]
kkandasa 7.6G 0 0 00 [------]
ewalter 7.2G 0 0 00 [------]
sbird 4.9G 0 0 00 [------]
hytrac 4.1G 0 0 00 [------]
kmckeoug 3.6G 0 0 00 [------]
msimet 3.3G 0 0 00 [------]
mgwalker 1.3G 0 0 00 [------]
hungjinh 1.2G 0 0 00 [------]
siyuh 1.1G 0 0 00 [------]







On 12/18/2014 01:15 AM, shirley ho wrote:
> I created this list a while ago, hoping that it will organically evolve
> to include all users (somehow people will tell the others to get on the
> list).
>
> @Ed, do you think it is possible for you to tell the new users as they
> come in to join the list ? (or request it on their behalf?)
> I can give away my approval power quite easily, but so far very few
> people have ever requested to join the list.
>
>
>> 3) I think it’s a good idea to have a quota on /home. This need not
>> be anything very onerous (e.g., it could be 100GB or 200GB per user),
>> but just enough to avoid accidentally dumping lots of data suddenly in
>> /home and causing cluster-wide trouble.
>>
> Yes. absolutely,
>
> Shirley
>
>> - Rachel
>>
>> On Dec 17, 2014, at 5:08 PM, Edward Walter <ewa...@cs.cmu.edu
>> rman...@andrew.cmu.edu <mailto:rman...@andrew.cmu.edu>
>> http://www.andrew.cmu.edu/~rmandelb/
>>
>>
>>
>

shirley ho

unread,
Dec 19, 2014, 8:50:06 PM12/19/14
to Edward Walter, Rachel Mandelbaum, warp_a...@googlegroups.com
Dear Ed, 

Actually, a proper mailing list would be perfect, do you think you can transfer all people over too ? from warp_and_coma if I send you a list of emails? 

Shirley 

“My brother has his sword, King Robert has his warhammer and I have my mind...and a mind needs books as a sword needs a whetstone if it is to keep its edge. That's why I read so much Jon Snow.” 
― George R.R. MartinA Game of Thrones
Reply all
Reply to author
Forward
0 new messages