combinat is not responding...

60 views
Skip to first unread message

Florent Hivert

unread,
Oct 3, 2012, 6:15:38 AM10/3/12
to Sage Devel
Hi there,

The combinat machine seems to ping but doesn't answer neither to the web, nor
to ssh. Can a local guy investigate and maybe relaunch it ?

Thanks,

Florent

Dima Pasechnik

unread,
Oct 3, 2012, 6:41:39 AM10/3/12
to sage-...@googlegroups.com

William Stein

unread,
Oct 3, 2012, 9:52:38 AM10/3/12
to sage-...@googlegroups.com


On Wednesday, October 3, 2012, Dima Pasechnik <dim...@gmail.com> wrote:
> see
> https://groups.google.com/d/topic/sagemath-users/9gOocpRm4KE/discussion

My current hypothesis is that Ubuntu automatic security updates (or something else stupid) somehow activated some sort of overly aggressive firewall rules, since I ran into the same problem on four other servers I have running ubuntu with automatic security updates (I ink).  Anyway, the machines are now way too secure!    I'll find out soon enough though (I should have this fixed in about two hours).  

If you are good at using Dell iDrac remote management with Linux, please email me...



>
> On Wednesday, 3 October 2012 18:15:47 UTC+8, fhivert wrote:
>>
>>      Hi there,
>>
>> The combinat machine seems to ping but doesn't answer neither to the web, nor
>> to ssh. Can a local guy investigate and maybe relaunch it ?
>>
>> Thanks,
>>
>> Florent
>
> --
> You received this message because you are subscribed to the Google Groups "sage-devel" group.
> To post to this group, send email to sage-...@googlegroups.com.
> To unsubscribe from this group, send email to sage-devel+...@googlegroups.com.
> Visit this group at http://groups.google.com/group/sage-devel?hl=en.
>  
>  
>

--
William Stein
Professor of Mathematics
University of Washington
http://wstein.org

William Stein

unread,
Oct 3, 2012, 12:17:33 PM10/3/12
to sage-...@googlegroups.com, sagemath-users, sagemath-admins
Hi,

combinat.math.washington.edu is now fixed. For some mysterious
reasons the ufw firewall was active, with evidently *no* rules, and it
was blocking most everything. I don't know 100% for certain why this
happened; however, I've just done:

apt-get remove unattended-upgrades ufw

so it is unlikely to happen again.

Note that there was no downtime or interruption of anybody's jobs, and
this was not caused by over-use. (This is the only problem we have
ever had so far with combinat, by the way!)

root@combinat:/home/wstein# uptime
08:55:45 up 18 days, 12:58, 4 users, load average: 37.00, 37.01, 37.07

I will be scheduling some (about 2 minutes -- just the time to reboot
once) of downtime for combinat, since I have to reset the UPS it is
connected to, in order to debug a UPS battery issue. That will
probably be in about a week, and there will be an announcement.

William

Florent Hivert

unread,
Oct 3, 2012, 3:33:15 PM10/3/12
to sage-...@googlegroups.com
Hi,

> combinat.math.washington.edu is now fixed. For some mysterious
> reasons the ufw firewall was active, with evidently *no* rules, and it
> was blocking most everything. I don't know 100% for certain why this
> happened; however, I've just done:
>
> apt-get remove unattended-upgrades ufw
>
> so it is unlikely to happen again.

Thanks !!!

> Note that there was no downtime or interruption of anybody's jobs, and
> this was not caused by over-use. (This is the only problem we have
> ever had so far with combinat, by the way!)

A few weeks ago, due to a huge memory leak in a code running in parallel on 32
core, I had my computation killed due to a failed memory alloc. combinat was
not very responsive for a couple of minutes but I had the impression that
except my computations nothing suffered from it. As a consequence I didn't
mention it. Does someone know if it is possible to know if there were some
other consequences ?

Cheers,

Florent

William Stein

unread,
Oct 3, 2012, 4:24:13 PM10/3/12
to sage-...@googlegroups.com
I wouldn't worry about it at all.

-- William

>
> Cheers,
Reply all
Reply to author
Forward
0 new messages