Re: [sage-devel] why Sage 4.3.4 is broken on Fedora and openSUSE

0 views
Skip to first unread message
Message has been deleted

Dr. David Kirkby

unread,
Mar 21, 2010, 8:22:15 AM3/21/10
to sage-...@googlegroups.com
Minh Nguyen wrote:
> Hi folks,
>
> On Sun, Mar 21, 2010 at 9:10 PM, Florent Hivert
> <florent...@univ-rouen.fr> wrote:
>
> <SNIP>
>
>> I don't know what to do and I've no more time to investigate (I already spend
>> half of my week-end on it together with the rebase of sage-combinat
>> queue). Also I must confess I'm far from being an expert on those issues. So
>> maybe, if there is one in the sage project, it's a good time to step up and
>> fix this quickly.
>>
>> If this doesn't happen, here is my feeling:
>> <disclaimer>
>> This is by no means an attack against the release managers. Many thanks to
>> them for there hard work, but shit happens. And we have to find ways to prevent
>> them and to solve them quickly.
>> </disclaimer>
>>
>> I think that we can't seriously make a release which is broken on half the
>> distros. Shouldn't we revert R from it's previous version, remove gd and make
>> as soon as possible a fix release until this is sorted out ?
>
> I think I owe everyone an explanation why the Sage 4.3.4 release is
> broken for Fedora and openSUSE. Before I announce a release, whether
> it be an alpha release or an rc, I would build and long doctest it on
> the following machines:
>
> * sage.math: Ubuntu 8.04.4 LTS, 24 cores, Intel(R) Xeon(R) CPU X7460 @
> 2.66GHz, 132 GB RAM, GCC 4.2.4
>
> * bsd.math: Mac OS X 10.6.2, 4 cores, Intel Xeon @ 2.66 GHz, 8 GB RAM, GCC 4.2.1

I think the strategy for making releases is wrong. At the moment, it basically
works like this.

* A release candidate is created, time is given for people to shout if it does
not work. If there are no reports of failures, or the reports are missed, so the
release goes ahead. Often the time to shout is < 1 week.

A better solution would be:

1) Make the final release candidate.

2) Don't make the release until there is confirmation of a successful build, and
all doctests passing on all supported platforms.

Perhaps have a wiki page, with spaces for each successful build. When, and only
when, that shows a successful build+doctests pass on every supported platform,
is the release made.

If you ever watch F1 racing, you will know that at pitstops a flags is held in
front of the F1 car until the flagman (for what of a better expression) gets
confirmation that the tyres are changed and/or the car is refueled. Only then is
the driver able to drive away. F1 teams do *not* use a method where the driver
waits 20 seconds, then goes unless he gets a message that there is a problem -
it would be dangerous.

It seems we have a fundamentally flawed strategy in place.

I'm a bit busy now, but the simplest solution might be to put near the top of
the iconv package

if [ "x$UNAME" != xSunOS ] && [ "x$UNAME" != xCYGWIN ] then ;
exit 0
fi

This avoids iconv being built on anything except Solaris or Cygwin which are two
platforms it is needed on. But there may be others, such as some of the cut-down
linux versions may not have iconv, or not a sufficiently powerful one to build R.

I don't have time to do that until around 2100 GMT today. (9 hours from now).

It would be worth checking if there are any updates available to 'gd' as the
failure of 'gd' to honor one of its own flags (something like
--with-iconv-path=') must I believe be a bug in gd.

If someone does not beat me to it, I'll get an untested hack out today by around
2100 GMT.

Had R been tested on Solaris, before someone updated the package, and ignored
the warnings issued by the configure script, none of this mess would have happened.

Dave

Dr. David Kirkby

unread,
Mar 21, 2010, 10:28:26 AM3/21/10
to sage-...@googlegroups.com
Minh Nguyen wrote:
> Hi folks,

<snip>

> From my experience with machines on Skynet, building
> and doctesting Sage on some of the Linux machines could actually crash
> the machine. Most times for me, such a crash would result in bringing
> a machine down, or even bringing down the primary network node that is
> the gateway to the other machines on the network.

If this happens, something must be seriously screwed up on the skynet network.
There is no way a normal user should be able to crash one machine, let alone do
even more damage.

I've tried using skynet for Solaris, but the machines are not very fast, and
heavily loaded sometimes.

Dave

William Stein

unread,
Mar 21, 2010, 2:00:02 PM3/21/10
to sage-devel
2010/3/21 Dr. David Kirkby <david....@onetel.net>:

> Minh Nguyen wrote:
>>
>> Hi folks,
>
> <snip>
>
>> From my experience with machines on Skynet, building
>> and doctesting Sage on some of the Linux machines could actually crash
>> the machine. Most times for me, such a crash would result in bringing
>> a machine down, or even bringing down the primary network node that is
>> the gateway to the other machines on the network.
>
> If this happens, something must be seriously screwed up on the skynet network. There is no way a normal user should be able to crash one machine, let alone do even more damage.
>

The login node -- eno -- was regularly crashing when pushed hard. The
sysadmin suspected faulty hardware and very recently replaced the
computer with a new one. Hopefully this will fix the problem.

> I've tried using skynet for Solaris, but the machines are not very fast, and heavily loaded sometimes.
>

You have to type

touch /tmp/`hostname`0 /tmp/`hostname`1 /tmp/`hostname`2
/tmp/`hostname`3 /tmp/`hostname`4

to temporarily disable the ECM jobs running on a given node on skynet,
which use spare cycles. (They will stop after a certain amount of
time.)

William

Florent Hivert

unread,
Mar 21, 2010, 5:50:43 PM3/21/10
to sage-...@googlegroups.com
Hi There,

Thanks to David, the issues #8567 "Change iconv so it builds on Cygwin and
Solaris only" seems to have a fix, but before giving positive review, I think
it should be tested on Cygwin...

> I think I owe everyone an explanation why the Sage 4.3.4 release is
> broken for Fedora and openSUSE. Before I announce a release, whether
> it be an alpha release or an rc, I would build and long doctest it on
> the following machines:
>
> * sage.math: Ubuntu 8.04.4 LTS, 24 cores, Intel(R) Xeon(R) CPU X7460 @
> 2.66GHz, 132 GB RAM, GCC 4.2.4
>
> * bsd.math: Mac OS X 10.6.2, 4 cores, Intel Xeon @ 2.66 GHz, 8 GB RAM, GCC 4.2.1
>

> * winxp1 (Cygwin virtual machine on boxen.math): CYGWIN_NT-5.1, 1
> core, Intel(R) Xeon(R) CPU X7460 @ 2.66GHz, 2 GB RAM, GCC 4.3.4

If anyone have access to this machine (or any Cygwin), can you test that
sage-4.3.4 with iconv-1.31.1.spkg replaced by
http://sage.math.washington.edu/home/kirkby/iconv/iconv-1.13.1.p0.spkg
correctly builds and set up a positive review. If it's ok, what about
re-releasing sage with this fix after that.

Cheers,

Florent

Mike Hansen

unread,
Mar 21, 2010, 8:06:35 PM3/21/10
to sage-...@googlegroups.com
On Sun, Mar 21, 2010 at 2:50 PM, Florent Hivert
<florent...@univ-rouen.fr> wrote:
> If anyone have access to this machine (or any Cygwin), can you test that
> sage-4.3.4 with iconv-1.31.1.spkg replaced by
>     http://sage.math.washington.edu/home/kirkby/iconv/iconv-1.13.1.p0.spkg
> correctly builds and set up a positive review. If it's ok, what about
> re-releasing sage with this fix after that.

It builds fine on Cygwin.

--Mike

David Kirkby

unread,
Mar 21, 2010, 8:30:44 PM3/21/10
to sage-...@googlegroups.com

To my knowlege, a lack of iconv has only ever been a problem on Cygwin
and Solaris. The patch

http://trac.sagemath.org/sage_trac/ticket/8567

builds iconv on Cygwin and Solaris, but nothing else. It should allow
a 4.3.5 to be made, which avoids the issue seen on some linux
distributions. William was keen on a 4.3.5 to fix the issue on linux.

Dave

Dima Pasechnik

unread,
Mar 23, 2010, 12:16:51 AM3/23/10
to sage-devel, sage-w...@googlegroups.com
I don't see why libiconv is needed on Cygwin.
One can install libiconv on Cygwin system-wide, using the Cygwin
installer.


On Mar 22, 5:50 am, Florent Hivert <florent.hiv...@univ-rouen.fr>
wrote:

Dr. David Kirkby

unread,
Mar 23, 2010, 2:40:32 AM3/23/10
to sage-...@googlegroups.com
Dima Pasechnik wrote:
> I don't see why libiconv is needed on Cygwin.
> One can install libiconv on Cygwin system-wide, using the Cygwin
> installer.

I expect you can install iconv using the installer, but iconv is not installed
by default, which goes away from Sage's philosophy of including all the
dependencies. A trac ticket was raised some time ago, which says iconv was
needed on Cygwin - see:

http://trac.sagemath.org/sage_trac/ticket/7319

I can't understand why having two versions of iconv cause problems on some
systems, but not on others. I would have expected that the first one in
LD_LIBRARY_PATH would have been used, and any others ignored. But that does not
necessarily seem to be the case.

Dave

Dima Pasechnik

unread,
Mar 23, 2010, 5:11:13 AM3/23/10
to sage-devel, sage-w...@googlegroups.com
a self-contained Sage package for Windows/Cygwin would anyway include
much more stuff than such a package for Unix...

If on the other hand one goes for having Sage installed on a working
installation of Cygwin (which is much more realistic option),
I see no harm in requiring system-supplied libiconv.
The "default" installation of Cygwin is very easily modifiable, and
not very
well-defined anyway. I am sure it is
easy to supply a configuration for the Cygwin package manager to tell
it
to download what is needed.

Dima


On Mar 23, 2:40 pm, "Dr. David Kirkby" <david.kir...@onetel.net>
wrote:

Dr David Kirkby

unread,
Mar 23, 2010, 6:29:47 AM3/23/10
to sage-devel

On Mar 23, 9:11 am, Dima Pasechnik <dimp...@gmail.com> wrote:
> a self-contained Sage package for Windows/Cygwin would anyway include
> much more stuff than such a package for Unix...
>
> If on the other hand one goes for having Sage installed on a working
> installation of Cygwin (which is much more realistic option),
> I see no harm in requiring system-supplied libiconv.

But the same could be said for several libraries that are included in
Sage. It's because Sage ships so much that it not so easily possible
to get it into Debian, where they object to the fact there are many
'standard' packages in Sage. Sage ships the much more common bzip2
library for example.

Whether this is a good/bad thing has been argued many times.

The iconv package at

http://trac.sagemath.org/sage_trac/ticket/8567

could easily be modified to build / not-build on specific platforms.

Dave


The iconv package is needed on Solaris, as that shipped with Solaris
is not sufficiently powerful to build R.

Dima Pasechnik

unread,
Mar 27, 2010, 8:42:16 AM3/27/10
to sage-devel, Mariah Lenox
On Mar 22, 2:00 am, William Stein <wst...@gmail.com> wrote:
[...]
>
> > I've tried usingskynetfor Solaris, but the machines are not very fast, and heavily loaded sometimes.
>
> You have to type
>
>  touch/tmp/`hostname`0 /tmp/`hostname`1 /tmp/`hostname`2

> /tmp/`hostname`3 /tmp/`hostname`4

I tried this on mark, and got:

bash-3.00$ touch /tmp/
`hostname`
touch: cannot change times on /tmp/mark

indeed:
bash-3.00$ ls -l /tmp/`hostname`*
-rw-r--r-- 1 wbhart sage 0 Oct 15 02:43 /tmp/mark
-rw-r--r-- 1 wstein sage 0 Mar 20 18:56 /tmp/mark0
-rw-r--r-- 1 wstein sage 0 Mar 20 18:56 /tmp/mark1
-rw-r--r-- 1 wstein sage 0 Mar 20 18:56 /tmp/mark2
-rw-r--r-- 1 wstein sage 0 Mar 20 18:56 /tmp/mark3
-rw-r--r-- 1 wstein sage 0 Mar 20 18:56 /tmp/mark4
-
so this does not seem to be working as advertised - one has to be
root, or the users wbhart and wstein
should do appropriate chowns...

Dima

Dr. David Kirkby

unread,
Mar 27, 2010, 6:13:55 PM3/27/10
to sage-...@googlegroups.com
Dima Pasechnik wrote:
> On Mar 22, 2:00 am, William Stein <wst...@gmail.com> wrote:
> [...]
>>> I've tried usingskynetfor Solaris, but the machines are not very fast, and heavily loaded sometimes.
>> You have to type
>>
>> touch/tmp/`hostname`0 /tmp/`hostname`1 /tmp/`hostname`2
>> /tmp/`hostname`3 /tmp/`hostname`4
>
> I tried this on mark, and got:
>
> bash-3.00$ touch /tmp/
> `hostname`
> touch: cannot change times on /tmp/mark
>
> indeed:
> bash-3.00$ ls -l /tmp/`hostname`*
> -rw-r--r-- 1 wbhart sage 0 Oct 15 02:43 /tmp/mark
> -rw-r--r-- 1 wstein sage 0 Mar 20 18:56 /tmp/mark0
> -rw-r--r-- 1 wstein sage 0 Mar 20 18:56 /tmp/mark1
> -rw-r--r-- 1 wstein sage 0 Mar 20 18:56 /tmp/mark2
> -rw-r--r-- 1 wstein sage 0 Mar 20 18:56 /tmp/mark3
> -rw-r--r-- 1 wstein sage 0 Mar 20 18:56 /tmp/mark4
> -
> so this does not seem to be working as advertised - one has to be
> root, or the users wbhart and wstein
> should do appropriate chowns...
>
> Dima

Looking at those permissions, I can understand why it does not work.

Generally it is better to put low-priority jobs at a low priority, so they get
very little CPU time when higher priority tasks are running.

Dave

Mariah

unread,
Mar 30, 2010, 11:18:02 AM3/30/10
to sage-devel
Dima, William,

On Skynet/mark, the reason why Dima could not touch /tmp/mark was
because
William already had touched (written) that file and so owned the file
-
and only William had read/write permission for his file.

While William's suggestion of

> touch/tmp/`hostname`0 /tmp/`hostname`1 /tmp/`hostname`2
> /tmp/`hostname`3 /tmp/`hostname`4

will put Skynet background jobs to sleep
on most Skynet machines, it does not work on all (in
particular on mark). For a list of the correct files to touch
to put background jobs to sleep see on
Skynet:/usr/local/README.background_jobs.

All the background jobs on Skynet machines run at the lowest possible
user priority, so other user jobs should get the majority of
CPU time. Only if you really want to turn off the background jobs
should you touch the appropriate file. Each background job
periodically looks
for its "sleep" file and puts itself to sleep as long as the sleep
file exists.

If you put a background job to sleep, please remember to remove your
sleep file when you are finished. (Otherwise a periodic
cron job will remove the sleep files.)

Currently it seems that the background jobs on mark look for the sleep
file
every 40 minutes or so. (The time is a function of the inputs to
the background job.) If this is unacceptable, let me know and
I will change the inputs.

The primary purpose of the Skynet machines is for Sage research.
However I have a responsibility to the people who paid for these
machines (the US taxpayers) to not have the machines sitting around
idle. I hope you understand.

Mariah

Reply all
Reply to author
Forward
0 new messages