New guidelines for spkg's

92 views
Skip to first unread message

Jeroen Demeyer

unread,
May 5, 2011, 10:06:35 AM5/5/11
to sage-devel
Hi all,

I recently adapted the merge scripts to deal with spkg's in a new way.

Most importantly, changes inside a spkg are automatically *committed*
before merging the spkg into Sage (the spkg is extracted, hg commit is
done using a commit message coming from SPKG.txt, an hg tag is added and
the spkg is repacked). I hope this will make authoring and reviewing
spkg's slightly easier.

This implies that a merged spkg is no longer byte-for-byte identical to
the spkg made by a ticket author.

The new script also adds several sanity checks for a spkg:
1) Inside the spkg, there must be a top-level directory whose name is
the same as the spkg, but with the extension ".spkg" removed.

2) SPKG.txt must contain a line of the form
=== cliquer-1.2.p9 (Jeroen Demeyer, 4 May 2011) ===
(more precisely, it must match /^==* ${spkg_name_and_version} /)

3) There must also be such a line for the previous spkg version (e.g.
any future numpy spkg must mention "numpy-1.5.1" in its SPKG.txt, which
is the version currently in Sage). This is to ensure that a spkg is
based on the most recent version.

4) SPKG.txt and spkg-install must be under hg control.


Further ideas, suggestions, complaints are welcome.
Jeroen.

Jason Grout

unread,
May 5, 2011, 10:19:09 AM5/5/11
to sage-...@googlegroups.com


I agree with the sentiment expressed elsewhere in previous threads that
the changelog should be in the hg log, and not necessarily in the
SPKG.txt file. In other words, I feel like the changes you made should
be reversed---the hg log messages should be insisted on, and the
changelog inside the SPKG.txt should be generated from the hg log. But
it doesn't matter enough to me to change what you've done.

Thanks,

Jason


kcrisman

unread,
May 5, 2011, 11:47:09 AM5/5/11
to sage-devel

> > Further ideas, suggestions, complaints are welcome.
>
> I agree with the sentiment expressed elsewhere in previous threads that
> the changelog should be in the hg log, and not necessarily in the
> SPKG.txt file.  In other words, I feel like the changes you made should
> be reversed---the hg log messages should be insisted on, and the
> changelog inside the SPKG.txt should be generated from the hg log.  But
> it doesn't matter enough to me to change what you've done.

I think having the SPKG.txt is a very nice way for someone to get a
quick overview of this. For instance, they could be attached to the
main Sage page so that people can see how we've been changing/updating
to upstream. hg logs would be different. But as Jason says, this has
been hashed elsewhere.

I do agree that getting SPKG.txt to be automatically generated from
changelogs would be a nice way to get better changelogs, so maybe I am
agreeing with Jason?

- kcrisman

Robert Bradshaw

unread,
May 5, 2011, 11:57:55 AM5/5/11
to sage-...@googlegroups.com
On Thu, May 5, 2011 at 8:47 AM, kcrisman <kcri...@gmail.com> wrote:
>
>> > Further ideas, suggestions, complaints are welcome.
>>
>> I agree with the sentiment expressed elsewhere in previous threads that
>> the changelog should be in the hg log, and not necessarily in the
>> SPKG.txt file.  In other words, I feel like the changes you made should
>> be reversed---the hg log messages should be insisted on, and the
>> changelog inside the SPKG.txt should be generated from the hg log.  But
>> it doesn't matter enough to me to change what you've done.
>
> I think having the SPKG.txt is a very nice way for someone to get a
> quick overview of this.  For instance, they could be attached to the
> main Sage page so that people can see how we've been changing/updating
> to upstream.  hg logs would be different.  But as Jason says, this has
> been hashed elsewhere.

WIth no resolution, so thanks for suggesting a solution.

> I do agree that getting SPKG.txt to be automatically generated from
> changelogs would be a nice way to get better changelogs, so maybe I am
> agreeing with Jason?

+1, I find it more natural to work with hg.

To support both workflows, another option is to support going both
ways--if there are uncommitted changes, make an hg entry based on the
spkg.txt, otherwise, update spkg.txt based on the changelog entry +
spkg filename.

- Robert

Dr. David Kirkby

unread,
May 5, 2011, 2:20:52 PM5/5/11
to sage-...@googlegroups.com
On 05/ 5/11 04:47 PM, kcrisman wrote:
>
>>> Further ideas, suggestions, complaints are welcome.
>>
>> I agree with the sentiment expressed elsewhere in previous threads that
>> the changelog should be in the hg log, and not necessarily in the
>> SPKG.txt file. In other words, I feel like the changes you made should
>> be reversed---the hg log messages should be insisted on, and the
>> changelog inside the SPKG.txt should be generated from the hg log. But
>> it doesn't matter enough to me to change what you've done.
>
> I think having the SPKG.txt is a very nice way for someone to get a
> quick overview of this. For instance, they could be attached to the
> main Sage page so that people can see how we've been changing/updating
> to upstream. hg logs would be different. But as Jason says, this has
> been hashed elsewhere.

I like SPKG.txt. Personally I would have called the file "ChangeLog" in common
with just about every other software project, but SPKG.txt does. I think that
summerises the changes much better than what "hg log" does, where often there
are numerous changes made when a ticket gets reviewed.

> I do agree that getting SPKG.txt to be automatically generated from
> changelogs would be a nice way to get better changelogs, so maybe I am
> agreeing with Jason?
>
> - kcrisman

Automatic generation would be more accurate and more detailed. I don't feel
however it would be better. I suspect it will have a lot of details that one
does not see (or want), when trying to get a quick overview of what has happened
to a package.

--
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?

Jeroen Demeyer

unread,
May 5, 2011, 4:29:13 PM5/5/11
to sage-...@googlegroups.com
On 2011-05-05 20:20, Dr. David Kirkby wrote:
> I like SPKG.txt. Personally I would have called the file "ChangeLog" in
> common with just about every other software project, but SPKG.txt does.
> I think that summerises the changes much better than what "hg log" does,
> where often there are numerous changes made when a ticket gets reviewed.

I agree with David on this, but maybe that is partially because I'm not
very fluent with hg. My personal spkg workflow is NOT to commit changes
until at the very last moment, such that "hg diff" always gives the diff
against the last version. So I modeled the merger script to make this
workflow easier (by not having to do the last step of committing the
changes).

Jeroen.

Jason Grout

unread,
May 5, 2011, 4:34:27 PM5/5/11
to sage-...@googlegroups.com

I guess this suits my workflow too--I would just make extra commits in
between versions. So for me:

1. make all the changes, committing as I go like I would normally do.

2. Make an entry in SPKG.txt which summarizes these changes, as sort of
a changelog for the version bump.

3. Upload the spkg so that Jeroen's script makes one more commit which,
in effect, tags the version number and commits a summary changelog in
SPKG.txt.

That sounds perfect! The details will still be in the hg log from my
commits as I go, and a high-level summary is in the SPKG.txt and
committed as one last commit to the repository.

Jason


Robert Bradshaw

unread,
May 5, 2011, 4:52:51 PM5/5/11
to sage-...@googlegroups.com

I suppose for my spkg workflow (mostly Cython) the new spkg doesn't
usually involve anything more than swapping out the sources and
perhaps adding/removing a patch. Adding an SPKG.txt entry is entirely
redundant with the hg commit (if one is even needed).

- Robert

Dr. David Kirkby

unread,
May 5, 2011, 5:42:57 PM5/5/11
to sage-...@googlegroups.com
On 05/ 5/11 03:06 PM, Jeroen Demeyer wrote:
> Hi all,
>
> I recently adapted the merge scripts to deal with spkg's in a new way.

> Further ideas, suggestions, complaints are welcome.
> Jeroen.
>

I've often wondered if it would be possible to safely remove the write
permissions from the "src" directory and everything below it, so files can't be
accidentally changed.

I believe that would reduce the chances of the "src" being corrupted.

However, there may be the odd package which would fail to build if this was
done, in which case the default permissions should be used.

Francois Bissey

unread,
May 5, 2011, 6:02:37 PM5/5/11
to sage-...@googlegroups.com

I would add on top of that for consideration that SPKG.txt often contains
more info than what you would find in a normal changelog. It often has
special instructions about the package, it is much more info than just a
changelog.

Francois

This email may be confidential and subject to legal privilege, it may
not reflect the views of the University of Canterbury, and it is not
guaranteed to be virus free. If you are not an intended recipient,
please notify the sender immediately and erase all copies of the message
and any attachments.

Please refer to http://www.canterbury.ac.nz/emaildisclaimer for more
information.

Robert Bradshaw

unread,
May 5, 2011, 6:23:13 PM5/5/11
to sage-...@googlegroups.com

Oh, I agree, that's what SPGK.txt was created for. That's why it
wasn't called ChangeLog to begin with, but now people have been using
it as a the changelog.

- Robert

David Kirkby

unread,
May 5, 2011, 7:33:26 PM5/5/11
to sage-...@googlegroups.com
On 5 May 2011 23:02, Francois Bissey <francoi...@canterbury.ac.nz> wrote:

>> I suppose for my spkg workflow (mostly Cython) the new spkg doesn't
>> usually involve anything more than swapping out the sources and
>> perhaps adding/removing a patch. Adding an SPKG.txt entry is entirely
>> redundant with the hg commit (if one is even needed).
>>
> I would add on top of that for consideration that SPKG.txt often contains
> more info than what you would find in a normal changelog. It often has
> special instructions about the package, it is much more info than just a
> changelog.
>
> Francois

Good point.

I know people critisize it, but for me, who has worked with .spkg
files a lot, I find it useful.

Dave

Jason Grout

unread,
May 5, 2011, 9:47:57 PM5/5/11
to sage-...@googlegroups.com
On 5/5/11 5:02 PM, Francois Bissey wrote:
> I would add on top of that for consideration that SPKG.txt often contains
> more info than what you would find in a normal changelog. It often has
> special instructions about the package, it is much more info than just a
> changelog.

Yes, and +1 for keeping the other valuable information in SPKG.txt
updated and useful.

Jason


Jason Grout

unread,
May 5, 2011, 9:48:56 PM5/5/11
to sage-...@googlegroups.com
On 5/5/11 4:42 PM, Dr. David Kirkby wrote:
> On 05/ 5/11 03:06 PM, Jeroen Demeyer wrote:
>> Hi all,
>>
>> I recently adapted the merge scripts to deal with spkg's in a new way.
>
>> Further ideas, suggestions, complaints are welcome.
>> Jeroen.
>>
>
> I've often wondered if it would be possible to safely remove the write
> permissions from the "src" directory and everything below it, so files
> can't be accidentally changed.
>
> I believe that would reduce the chances of the "src" being corrupted.
>
> However, there may be the odd package which would fail to build if this
> was done, in which case the default permissions should be used.
>

I think most packages build inside of that src directory, right?
Changing permissions would mess that up.

Jason

Keshav Kini

unread,
May 6, 2011, 3:12:05 AM5/6/11
to sage-...@googlegroups.com
Hi Jeroen,

I'd suggest that SPKG authors make their final commit themselves rather than just allowing Jeroen's script to do it for them - this keeps the "blame history" intact (assuming Jeroen's script doesn't mine people's names from SPKG.txt and commit under those names!).

I sometimes worry we're not really using Mercurial for anything. http://hg.sagemath.org/ paints a very unrealistic picture of the development of Sage, for example, in comparison to how a real open source project's code repository should look - take for example matplotlib ( http://github.org/matplotlib ), ipython ( http://github.com/ipython ), Octave ( http://hg.savannah.gnu.org/hgweb/octave ), etc. To be frank, when even you, who are in ultimate charge of our source control, say you are not very fluent with the source control mechanism we use, it must mean we are all kind of stumbling around... Are we just using hg as a convenient way to generate patches and nothing more? I'm in no way an expert on Mercurial or on software development practices but these things do worry me.

-Keshav

----
Join us in #sagemath on irc.freenode.net !

Jeroen Demeyer

unread,
May 6, 2011, 5:08:44 AM5/6/11
to sage-...@googlegroups.com
On 2011-05-05 23:42, Dr. David Kirkby wrote:
> I've often wondered if it would be possible to safely remove the write
> permissions from the "src" directory and everything below it, so files
> can't be accidentally changed.
>
> I believe that would reduce the chances of the "src" being corrupted.
Just wondering, is this an actual problem? Does *accidental* corruption
of src/ happen sufficiently often that we need to do something about this?

I can imagine something going wrong when src/ is updated to a new
upstream version, but permissions are not going to help that situation.
At that point, src/ is writable.


Jeroen.

Dr. David Kirkby

unread,
May 6, 2011, 9:26:53 AM5/6/11
to sage-...@googlegroups.com


There have been a number of packages to which the contents under "src" have been
purposely made by people not knowing what they are doing. I've lost count of them.

Only a week or two ago (during the 4.7 release), there was a file which got
patched in "src" when "patch" was run from spkg-install. It was related to
building Python on some Linux version - I forget the ticket.

I suspect with the increased use of "patch" and less use of "cp" when applying
patches, it will become easier to make a mistaken and patch the upstream source
by mistake.

So, it it was possible to protect against that, I think it would be a good idea.

Jeroen Demeyer

unread,
May 6, 2011, 11:26:29 AM5/6/11
to sage-...@googlegroups.com
On 2011-05-06 15:26, Dr. David Kirkby wrote:
> So, it it was possible to protect against that, I think it would be a
> good idea.
One check could be done in the merger script:
If the new and old spkgs have the same upstream version (i.e. the
version numbers are the same except for the patch level), unpack them
both and check whether the contents of src/ are equal. Of course, this
assumes that the first spkg for a given upstream version has a correct src/.

Jeroen.

Robert Bradshaw

unread,
May 6, 2011, 1:52:13 PM5/6/11
to sage-...@googlegroups.com

Using patch is more resistant to this, because it will refuse to apply
the same patch twice.

- Robert

Robert Bradshaw

unread,
May 6, 2011, 1:53:41 PM5/6/11
to sage-...@googlegroups.com

Even better would be to checksum the source in a src.md5 file, and
have sage -spgk warn/error if the checksums don't match. Thus one
couldn't accidentally modify the src directory.

- Robert

Robert Bradshaw

unread,
May 6, 2011, 2:26:31 PM5/6/11
to sage-...@googlegroups.com
On Fri, May 6, 2011 at 12:12 AM, Keshav Kini <kesha...@gmail.com> wrote:
> Hi Jeroen,
>
> I'd suggest that SPKG authors make their final commit themselves rather than
> just allowing Jeroen's script to do it for them - this keeps the "blame
> history" intact (assuming Jeroen's script doesn't mine people's names from
> SPKG.txt and commit under those names!).

I think it would use the user's .hgrc file.

> I sometimes worry we're not really using Mercurial for anything.

I agree.

> http://hg.sagemath.org/ paints a very unrealistic picture of the development
> of Sage, for example, in comparison to how a real open source project's code
> repository should look - take for example matplotlib (
> http://github.org/matplotlib ), ipython ( http://github.com/ipython ),
> Octave ( http://hg.savannah.gnu.org/hgweb/octave ), etc.

Especially with the longer release cycles. At the very least it'd be
nice to have a public "devel" repo updated at every alpha release. And
I'd love to have a live head to test and rebase against. Something
like the sage merger script, but automated as every ticket on trac
with positive review + release manager approval + passing on these X
systems (on top of last previous head). It would just crawl forward
over time and always be (relatively) stable.

> To be frank, when
> even you, who are in ultimate charge of our source control, say you are not
> very fluent with the source control mechanism we use, it must mean we are
> all kind of stumbling around... Are we just using hg as a convenient way to
> generate patches and nothing more? I'm in no way an expert on Mercurial or
> on software development practices but these things do worry me.

Over the years, we've moved to using trac as our revision control
mechanism, and mercurial just to generate patches and keep track of
(limited) history. On top of that we don't have good automated/cli
interfaces for dealing with trac, so in some ways it's a step back
from just emailing patches around (though in others a step forward).
The pull-request model of google code/github I think is a much nicer
one, but momentum is hard to change. There's also the advantage of
iterating on the actual commits to produce a cleaner history (e.g.
folding patches together, making corrections after discussion), though
that could be incorporated into the fork/pull model as well (or
keeping patch queues under revision control, a la sage-combinat).

The other problem is that so much isn't under revision control (eg.
what versions of spkgs to use), or in multiple repositories that need
to be kept in sync. Were I to design the system from scratch, I'd put
all our code (devel/scripts/...) in a single repo, along with the
top-level files, and a list of dependencies (spkgs). Building sage
would fetch (locally or remotely) the dependencies listed and build
them in such a way that changing the list of dependencies and
re-building would easily and cheaply reversible. I would probably
still build my own Python, but may require it (flexible version) as a
bootstrapping prerequisite. Whether the non-upstream parts of an spkg
belong in the spkgs or the main repo, I'm not sure, but I'd rather
*everything* be expressed as commit to a single repository (possibly
moving a pointer to some new, vanilla upstream source, rather than
putting all upstream sources in our repo).

If others have similar views, maybe we could move in that direction.

- Robert

Martin Albrecht

unread,
May 6, 2011, 2:41:54 PM5/6/11
to sage-...@googlegroups.com
> The other problem is that so much isn't under revision control (eg.
> what versions of spkgs to use), or in multiple repositories that need
> to be kept in sync. Were I to design the system from scratch, I'd put
> all our code (devel/scripts/...) in a single repo, along with the
> top-level files, and a list of dependencies (spkgs).

Are you thinking about something like this, a file, say singular-version which
points to the current Singular version? Applying a patch which changes this
textfile implies updating the Singular SPKG? Then, patches can depend on other
patches in a clean way?

> Building sage
> would fetch (locally or remotely) the dependencies listed and build
> them in such a way that changing the list of dependencies and
> re-building would easily and cheaply reversible. I would probably
> still build my own Python, but may require it (flexible version) as a
> bootstrapping prerequisite. Whether the non-upstream parts of an spkg
> belong in the spkgs or the main repo, I'm not sure, but I'd rather
> *everything* be expressed as commit to a single repository (possibly
> moving a pointer to some new, vanilla upstream source, rather than
> putting all upstream sources in our repo).
>
> If others have similar views, maybe we could move in that direction.

+1

Cheers,
Martin

--
name: Martin Albrecht
_pgp: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x8EF0DC99
_otr: 47F43D1A 5D68C36F 468BAEBA 640E8856 D7951CCF
_www: http://martinralbrecht.wordpress.com/
_jab: martinr...@jabber.ccc.de

Robert Bradshaw

unread,
May 6, 2011, 3:05:05 PM5/6/11
to sage-...@googlegroups.com
On Fri, May 6, 2011 at 11:41 AM, Martin Albrecht
<martinr...@googlemail.com> wrote:
>> The other problem is that so much isn't under revision control (eg.
>> what versions of spkgs to use), or in multiple repositories that need
>> to be kept in sync. Were I to design the system from scratch, I'd put
>> all our code (devel/scripts/...) in a single repo, along with the
>> top-level files, and a list of dependencies (spkgs).
>
> Are you thinking about something like this, a file, say singular-version which
> points to the current Singular version? Applying a patch which changes this
> textfile implies updating the Singular SPKG? Then, patches can depend on other
> patches in a clean way?

Yes. Or perhaps a single file that lists all dependencies with
pointers, rather than a whole directory of one-line files.

>> Building sage
>> would fetch (locally or remotely) the dependencies listed and build
>> them in such a way that changing the list of dependencies and
>> re-building would easily and cheaply reversible. I would probably
>> still build my own Python, but may require it (flexible version) as a
>> bootstrapping prerequisite. Whether the non-upstream parts of an spkg
>> belong in the spkgs or the main repo, I'm not sure, but I'd rather
>> *everything* be expressed as commit to a single repository (possibly
>> moving a pointer to some new, vanilla upstream source, rather than
>> putting all upstream sources in our repo).
>>
>> If others have similar views, maybe we could move in that direction.
>
> +1
>
> Cheers,
> Martin
>
> --
> name: Martin Albrecht
> _pgp: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x8EF0DC99
> _otr: 47F43D1A 5D68C36F 468BAEBA 640E8856 D7951CCF
> _www: http://martinralbrecht.wordpress.com/
> _jab: martinr...@jabber.ccc.de
>

> --
> To post to this group, send an email to sage-...@googlegroups.com
> To unsubscribe from this group, send an email to sage-devel+...@googlegroups.com
> For more options, visit this group at http://groups.google.com/group/sage-devel
> URL: http://www.sagemath.org
>

Ondrej Certik

unread,
May 6, 2011, 5:19:44 PM5/6/11
to sage-...@googlegroups.com
On Fri, May 6, 2011 at 11:26 AM, Robert Bradshaw
<robe...@math.washington.edu> wrote:
[...]

> Were I to design the system from scratch, I'd put
> all our code (devel/scripts/...) in a single repo, along with the
> top-level files, and a list of dependencies (spkgs). Building sage
> would fetch (locally or remotely) the dependencies listed and build
> them in such a way that changing the list of dependencies and
> re-building would easily and cheaply reversible. I would probably
> still build my own Python, but may require it (flexible version) as a
> bootstrapping prerequisite. Whether the non-upstream parts of an spkg
> belong in the spkgs or the main repo, I'm not sure, but I'd rather
> *everything* be expressed as commit to a single repository (possibly
> moving a pointer to some new, vanilla upstream source, rather than
> putting all upstream sources in our repo).
>
> If others have similar views, maybe we could move in that direction.

I actually developed such a system from scratch (it's called Qsnake:
http://qsnake.com) and pretty much followed your paragraph above, so I
pushed all the repos at github:

https://github.com/qsnake

for example the Cython repo is here:

https://github.com/qsnake/cython

it is actually a fork of the official Cython repo. And then Qsnake is
clever enough, that when it is fetching the sources, and if there is
setup.py and no spkg-install, it creates spkg-install automatically
with "setup.py install" (and other default stuff), and then saves the
cython.spkg package into the spkg/standard/ directory.

Having all spkg-install scripts in the main repository (so in my case
in this repository: https://github.com/qsnake/qsnake) is a valid idea,
that I have been bouncing around too. I decided not to, to keep things
localized, as sometimes nontrivial modifications are needed to the
upstream packages, and the best way to do such modifications is to
simply create couple git patches. Also for testing, if there is
spkg-install file, committed, I can checkout the git repo for a
particular package and

qsnake install .

and it will install the package. And I can debug it easily.

Ondrej

Ondrej Certik

unread,
May 6, 2011, 5:21:20 PM5/6/11
to sage-...@googlegroups.com

Oh, and then people can simply send pull requests to the respective
packages directly. So I think it solves lots of the problems raised in
this thread. For sage, it would have to move to github though, so it
might not be an option.

Ondrej

Jeroen Demeyer

unread,
May 7, 2011, 5:55:36 AM5/7/11
to sage-...@googlegroups.com
On 2011-05-06 20:26, Robert Bradshaw wrote:
> And
> I'd love to have a live head to test and rebase against. Something
> like the sage merger script, but automated as every ticket on trac
> with positive review + release manager approval + passing on these X
> systems (on top of last previous head).
What you describe here is almost exactly how the current system of alpha
versions works.


Jeroen.

Jeroen Demeyer

unread,
May 7, 2011, 5:59:54 AM5/7/11
to sage-...@googlegroups.com
On 2011-05-06 19:53, Robert Bradshaw wrote:
> Even better would be to checksum the source in a src.md5 file, and
> have sage -spgk warn/error if the checksums don't match.
I think it is necessary and sufficient to checksum the output of "ls -lR".

David Kirkby

unread,
May 7, 2011, 9:45:26 AM5/7/11
to sage-...@googlegroups.com

I would not rely on that, as a different UID or GID on different
systems will give different results.

MD5 would not be good, as different systems have either no program to
compute an md5 checksum, or have them with different names. Although
many systems have a "sum" command, the algorithm will be system
dependant, so that too can't be used.

In contrast, POSIX defines "cksum" so using that as a checksum tool is
preferable.

Something like the following might be workable

drkirkby@laptop:~/sage-4.7.rc0/spkg/standard/singular-3-1-1-4.p8$ find
src -exec cksum {} \; | awk '{print $1}' | cksum | awk '{print $1}'
3766045910


That computes the checksum of each file (using cksum) and then
computes a checksum of the checksums.

Volker Braun

unread,
May 7, 2011, 10:01:29 AM5/7/11
to sage-...@googlegroups.com
On Saturday, May 7, 2011 2:45:26 PM UTC+1, Dr David Kirkby wrote:

drkirkby@laptop:~/sage-4.7.rc0/spkg/standard/singular-3-1-1-4.p8$ find
src -exec cksum {} \; | awk '{print $1}' | cksum | awk '{print $1}'
3766045910


You also want to sort somewhere because find doesn't return the matches in any particular order. 

I'd prefer it if we also store the checksum for each file together with the filename, so one can find out which file was changed. Though one would have to write a bit more than just a one-liner to implement it. While we are at it, cryptographically sign everything ;-)




Jason Grout

unread,
May 7, 2011, 10:48:13 AM5/7/11
to sage-...@googlegroups.com
On 5/6/11 4:21 PM, Ondrej Certik wrote:

> Oh, and then people can simply send pull requests to the respective
> packages directly. So I think it solves lots of the problems raised in
> this thread. For sage, it would have to move to github though, so it
> might not be an option.

Bitbucket has pull requests and forks, and mercurial has submodules,
IIRC. I don't know exactly how they work, as I'm a big fan of the
git/github branching model over mercurial, but it might be possible to
do these sorts of things with bitbucket and stay with mercurial.

Jason


Dr. David Kirkby

unread,
May 7, 2011, 8:58:12 PM5/7/11
to sage-...@googlegroups.com
On 05/ 7/11 03:01 PM, Volker Braun wrote:
> On Saturday, May 7, 2011 2:45:26 PM UTC+1, Dr David Kirkby wrote:
>>
>> drkirkby@laptop:~/sage-4.7.rc0/spkg/standard/singular-3-1-1-4.p8$ find
>> src -exec cksum {} \; | awk '{print $1}' | cksum | awk '{print $1}'
>> 3766045910
>>
>
> You also want to sort somewhere because find doesn't return the matches in
> any particular order.

OK:

find src -print -exec cksum {} \; | awk '{print $1}' | sort | cksum | awk
'{print $1}'

gives what I assume will be the same output on all systems.

> I'd prefer it if we also store the checksum for each file together with the
> filename, so one can find out which file was changed. Though one would have
> to write a bit more than just a one-liner to implement it. While we are at
> it, cryptographically sign everything ;-)

I think once a file is known to have been changed, tracking down which one
manually should not be hard, so I don't really see the need to store the
checksums of every file.

Robert Bradshaw

unread,
May 8, 2011, 12:31:41 AM5/8/11
to sage-...@googlegroups.com

But it isn't exposed as an hg repository anywhere, one only gets
snapshots now and then, and not all tests pass at every point. The
fact that so much is not under (a single) revision control makes this
harder.

- Robert

Robert Bradshaw

unread,
May 8, 2011, 12:37:16 AM5/8/11
to sage-...@googlegroups.com
On Sat, May 7, 2011 at 6:45 AM, David Kirkby <drki...@gmail.com> wrote:
> On 7 May 2011 10:59, Jeroen Demeyer <jdem...@cage.ugent.be> wrote:
>> On 2011-05-06 19:53, Robert Bradshaw wrote:
>>> Even better would be to checksum the source in a src.md5 file, and
>>> have sage -spgk warn/error if the checksums don't match.
>> I think it is necessary and sufficient to checksum the output of "ls -lR".
>
> I would not rely on that, as a different UID or GID on different
> systems will give different results.

As would touching a file (e.g. to revert a patch or accidental
change), or any change keeping the length of the file the same (yes,
small patches do that sometimes).

> MD5 would not be good, as different systems have either no program to
> compute an md5 checksum, or have them with different names. Although
> many systems have a "sum" command, the algorithm will be system
> dependant, so that too can't be used.
>
> In contrast, POSIX defines "cksum" so using that as a checksum tool is
> preferable.

cksum is cryptographically weak, but strong enough for this purpose.

> Something like the following might be workable
>
> drkirkby@laptop:~/sage-4.7.rc0/spkg/standard/singular-3-1-1-4.p8$ find
> src -exec cksum {} \; | awk '{print $1}' | cksum | awk '{print $1}'
> 3766045910
>
>
> That computes the checksum of each file (using cksum) and then
> computes a checksum of the checksums.
>

Robert Bradshaw

unread,
May 8, 2011, 12:41:08 AM5/8/11
to sage-...@googlegroups.com
On Fri, May 6, 2011 at 2:21 PM, Ondrej Certik <ond...@certik.cz> wrote:
> On Fri, May 6, 2011 at 2:19 PM, Ondrej Certik <ond...@certik.cz> wrote:
>> On Fri, May 6, 2011 at 11:26 AM, Robert Bradshaw
>> <robe...@math.washington.edu> wrote:
>> [...]
>>> Were I to design the system from scratch, I'd put
>>> all our code (devel/scripts/...) in a single repo, along with the
>>> top-level files, and a list of dependencies (spkgs). Building sage
>>> would fetch (locally or remotely) the dependencies listed and build
>>> them in such a way that changing the list of dependencies and
>>> re-building would easily and cheaply reversible. I would probably
>>> still build my own Python, but may require it (flexible version) as a
>>> bootstrapping prerequisite. Whether the non-upstream parts of an spkg
>>> belong in the spkgs or the main repo, I'm not sure, but I'd rather
>>> *everything* be expressed as commit to a single repository (possibly
>>> moving a pointer to some new, vanilla upstream source, rather than
>>> putting all upstream sources in our repo).
>>>
>>> If others have similar views, maybe we could move in that direction.
>>
>> I actually developed such a system from scratch (it's called Qsnake:
>> http://qsnake.com) and pretty much followed your paragraph above, so I
>> pushed all the repos at github:
>>
>> https://github.com/qsnake

Cool.

>> for example the Cython repo is here:
>>
>> https://github.com/qsnake/cython
>>
>> it is actually a fork of the official Cython repo. And then Qsnake is
>> clever enough, that when it is fetching the sources, and if there is
>> setup.py and no spkg-install, it creates spkg-install automatically
>> with "setup.py install" (and other default stuff), and then saves the
>> cython.spkg package into the spkg/standard/ directory.
>>
>> Having all spkg-install scripts in the main repository (so in my case
>> in this repository: https://github.com/qsnake/qsnake) is a valid idea,
>> that I have been bouncing around too. I decided not to, to keep things
>> localized, as sometimes nontrivial modifications are needed to the
>> upstream packages, and the best way to do such modifications is to
>> simply create couple git patches. Also for testing, if there is
>> spkg-install file, committed, I can checkout the git repo for a
>> particular package and
>>
>> qsnake install .
>>
>> and it will install the package. And I can debug it easily.

Is it a reversible install? What about concurrent versions? What about
dependancies (e.g. if the version of Cython was updated, would all the
stuff depending on Cython get re-compiled?

> Oh, and then people can simply send pull requests to the respective
> packages directly. So I think it solves lots of the problems raised in
> this thread. For sage, it would have to move to github though, so it
> might not be an option.

Probably not. Is it really tied to github, or could any dvcs be plugged in?

- Robert

Jeroen Demeyer

unread,
May 8, 2011, 4:48:06 AM5/8/11
to sage-...@googlegroups.com
On 2011-05-08 06:31, Robert Bradshaw wrote:
> But it isn't exposed as an hg repository anywhere
I'm happy to expose whatever as hg repository if somebody explains me
what and how.

> The
> fact that so much is not under (a single) revision control makes this
> harder.

You mean that it's harder to rollback because spkg's are not under
revision control? With the current spkg system, that is hard to solve.


Jeroen.

David Kirkby

unread,
May 8, 2011, 7:43:12 AM5/8/11
to sage-...@googlegroups.com
On 8 May 2011 05:37, Robert Bradshaw <robe...@math.washington.edu> wrote:
> On Sat, May 7, 2011 at 6:45 AM, David Kirkby <drki...@gmail.com> wrote:
>> On 7 May 2011 10:59, Jeroen Demeyer <jdem...@cage.ugent.be> wrote:
>>> On 2011-05-06 19:53, Robert Bradshaw wrote:
>>>> Even better would be to checksum the source in a src.md5 file, and
>>>> have sage -spgk warn/error if the checksums don't match.
>>> I think it is necessary and sufficient to checksum the output of "ls -lR".
>>
>> I would not rely on that, as a different UID or GID on different
>> systems will give different results.
>
> As would touching a file (e.g. to revert a patch or accidental
> change), or any change keeping the length of the file the same (yes,
> small patches do that sometimes).

You raise an interesting point. My solution with cksum would not
detect if the file has been touched whilst the contents remain the
same, since its only checking the contents, not the date.

But if someone has touched a file, but did not change the contents, it
is not really an issue.


>> MD5 would not be good, as different systems have either no program to
>> compute an md5 checksum, or have them with different names. Although
>> many systems have a "sum" command, the algorithm will be system
>> dependant, so that too can't be used.
>>
>> In contrast, POSIX defines "cksum" so using that as a checksum tool is
>> preferable.
>
> cksum is cryptographically weak, but strong enough for this purpose.

Yes, there's a small probability of failure - I think 1 in 2^32,
though perhaps thatś not true. It was not designed for cryptographic
purposes, but for just the sort of application being discussed here. I
suspect md5 is more computationally intensive, but the real problem
with md5 is there is no single command which will work on every
system.

Dave

Jeroen Demeyer

unread,
May 8, 2011, 7:52:03 AM5/8/11
to sage-...@googlegroups.com
On 2011-05-08 02:58, Dr. David Kirkby wrote:
> find src -print -exec cksum {} \; | awk '{print $1}' | sort | cksum |
> awk '{print $1}'
The most portable solution would probably be a small Python script.
Personally, I think the *dates* in the src/ directory should also be
kept, "make" can be confused by bad dates.

Jeroen.

Dr. David Kirkby

unread,
May 9, 2011, 5:39:18 AM5/9/11
to sage-...@googlegroups.com

Well, with a shell script, if it works now, you know it will keep working. The
same can't be said for Python, as numerous backwards-incompatible changes occur
with different versions of Python. We can't upgrade to 2.7 yet, and 3.x is well
over the horizon.

That said, I can't find a one-liner as a shell script which will check the times
and dates too.

Dave

Ondrej Certik

unread,
May 9, 2011, 6:05:21 AM5/9/11
to sage-...@googlegroups.com

No, currently it's just like in Sage. For reversible install, one
would have to redo every single package to be able to take things from
SPKG_LOCAL (=SAGE_LOCAL in Sage), but install into something like
SPKG_INSTALL, so that one can package it and keep track of files.

Hand in hand with this go "binary packages". If one can do
"uninstall", then immediately one can start creating automatic binary
packages. That would be super cool.

I am currently undecided whether to go with this or not. No doubt it
would be useful. If Sage goes this way, than surely we'll follow too.
Otherwise probably not, as compatibility is important. People who know
Sage should find it quite easy to play with Qsnake and vice versa.
Learning a completely new system and how it works is an obstacle.

Adding uninstall and binary packages will make everything more complex
(imagine installing the wrong binary into incompatible base
install....). Right now it's all just source packages, and possibly
one binary for the whole thing (for each platform), which is
manageable.

> What about concurrent versions?

Only if you rename the package. In this I have the same (or similar)
vision as Sage, that is to create a well tested scientific environment
with just one tested version of each package, that works with
everything. If there are two incompatible versions, then one should
change the name of the package, e.g. python -> python2 + python3. Then
it can live side by side.

> What about
> dependencies (e.g. if the version of Cython was updated, would all the


> stuff depending on Cython get re-compiled?

No. Only the other way round -- if you want to install "phaml", that
uses Cython to wrap Fortran, it will also pull in "cython"
automatically (as well as all the other packages). However, since the
dependency tree is known, we can add this feature as well, to
recompile all "rdepends" (reverse dependencies, to use Debian
terminology).

>
>> Oh, and then people can simply send pull requests to the respective
>> packages directly. So I think it solves lots of the problems raised in
>> this thread. For sage, it would have to move to github though, so it
>> might not be an option.
>
> Probably not. Is it really tied to github, or could any dvcs be plugged in?

I just made two releases last couple days, one can download it from here:

https://github.com/qsnake/qsnake/archives/master

and that is just one big source tarball, and installs completely
locally, no git is needed (it will actually install its own
automatically --- but it is not needed for the build system). Git is
only used to create the big source tarball using "qsnake -d". It can
be easily changed to "hg" in the build system (and then one needs to
move all the packages, which is simple but tedious).

Ondrej

Ondrej Certik

unread,
May 9, 2011, 6:22:09 AM5/9/11
to sage-...@googlegroups.com
On Mon, May 9, 2011 at 3:05 AM, Ondrej Certik <ond...@certik.cz> wrote:
> On Sat, May 7, 2011 at 9:41 PM, Robert Bradshaw
[...]

>> What about
>> dependencies (e.g. if the version of Cython was updated, would all the
>> stuff depending on Cython get re-compiled?
>
> No. Only the other way round -- if you want to install "phaml", that
> uses Cython to wrap Fortran, it will also pull in "cython"
> automatically (as well as all the other packages). However, since the
> dependency tree is known, we can add this feature as well, to
> recompile all "rdepends" (reverse dependencies, to use Debian
> terminology).

That's actually the first issue I have created about Qsnake some time ago :)

https://github.com/qsnake/qsnake/issues/1

so I just updated it with your idea to automatically recompile reverse
dependencies. That's a great idea. Once implemented, just this should
keep you up to date with the latest git versions of all packages:

qsnake update
qsnake upgrade


Ondrej

Robert Bradshaw

unread,
May 9, 2011, 12:01:24 PM5/9/11
to sage-...@googlegroups.com
On Sun, May 8, 2011 at 4:43 AM, David Kirkby <drki...@gmail.com> wrote:
> On 8 May 2011 05:37, Robert Bradshaw <robe...@math.washington.edu> wrote:
>> On Sat, May 7, 2011 at 6:45 AM, David Kirkby <drki...@gmail.com> wrote:
>>> On 7 May 2011 10:59, Jeroen Demeyer <jdem...@cage.ugent.be> wrote:
>>>> On 2011-05-06 19:53, Robert Bradshaw wrote:
>>>>> Even better would be to checksum the source in a src.md5 file, and
>>>>> have sage -spgk warn/error if the checksums don't match.
>>>> I think it is necessary and sufficient to checksum the output of "ls -lR".
>>>
>>> I would not rely on that, as a different UID or GID on different
>>> systems will give different results.
>>
>> As would touching a file (e.g. to revert a patch or accidental
>> change), or any change keeping the length of the file the same (yes,
>> small patches do that sometimes).
>
> You raise an interesting point. My solution with cksum would not
> detect if the file has been touched whilst the contents remain the
> same, since its only checking the contents, not the date.
>
> But if someone has touched a file, but did not change the contents, it
> is not really an issue.

That was my point, +1 to chksumming the files, not the listing.
Preserving dates is nice, but that will make it much harder to work on
files (e.g. if I patch a file to test something out, then I'd have to
touch it back to the old time, if I can even remember that, or
re-unpack the sources. If Makefiles are an issue, one can touch the
entire directory to have the same timestamp.

>>> MD5 would not be good, as different systems have either no program to
>>> compute an md5 checksum, or have them with different names. Although
>>> many systems have a "sum" command, the algorithm will be system
>>> dependant, so that too can't be used.
>>>
>>> In contrast, POSIX defines "cksum" so using that as a checksum tool is
>>> preferable.
>>
>> cksum is cryptographically weak, but strong enough for this purpose.
>
> Yes, there's a small probability of failure - I think 1 in 2^32,
> though perhaps thatś not true. It was not designed for cryptographic
> purposes, but for just the sort of application being discussed here.

I brought up the issue because someone mentioned signatures.

> I suspect md5 is more computationally intensive, but the real problem
> with md5 is there is no single command which will work on every
> system.

Although finding that single command can be bothersome--with shell
scripting there are a whole lot of commands that worn on one (or even
most) posix systems and not on all of them, or even in one shell and
not another, as evidenced by the amount of work spent porting the <1MB
shell scripts we have to Solaris. WIth Python, if it works on my
system it has a 99.9% chance of working on yours, and
version-to-version changes, though non-trivial, are small (e.g. Cython
runs with 2.3 through 2.7 on a single codebase). 3.x excluded, but
that was 5+ years of backwards incompatible changes all pushed at
once. And of course in Sage we end up using/interfacing with a lot of
C libraries, which are not so portable.

That being said, our hands are tied as Python is not a prerequisite
for building that first spkg.

- Robert

Robert Bradshaw

unread,
May 9, 2011, 12:11:32 PM5/9/11
to sage-...@googlegroups.com

I could see this done with paths, with every spkg installing into its
one versioned directory, and the python/shell/include/library paths
set up to have the list of all currently enabled spkgs.

> Hand in hand with this go "binary packages". If one can do
> "uninstall", then immediately one can start creating automatic binary
> packages. That would be super cool.
>
> I am currently undecided whether to go with this or not. No doubt it
> would be useful. If Sage goes this way, than surely we'll follow too.
> Otherwise probably not, as compatibility is important. People who know
> Sage should find it quite easy to play with Qsnake and vice versa.
> Learning a completely new system and how it works is an obstacle.
>
> Adding uninstall and binary packages will make everything more complex
> (imagine installing the wrong binary into incompatible base
> install....). Right now it's all just source packages, and possibly
> one binary for the whole thing (for each platform), which is
> manageable.
>
>> What about concurrent versions?
>
> Only if you rename the package. In this I have the same (or similar)
> vision as Sage, that is to create a well tested scientific environment
> with just one tested version of each package, that works with
> everything.

Yes, that's what I'm thinking, but it'd be nice to be able to, e.g.,
try out a new spkg without hosing the entire install (in a possibly
irreversible manner).

> If there are two incompatible versions, then one should
> change the name of the package, e.g. python -> python2 + python3. Then
> it can live side by side.

Or python2.6p10.

>> What about
>> dependencies (e.g. if the version of Cython was updated, would all the
>> stuff depending on Cython get re-compiled?
>
> No. Only the other way round -- if you want to install "phaml", that
> uses Cython to wrap Fortran, it will also pull in "cython"
> automatically (as well as all the other packages). However, since the
> dependency tree is known, we can add this feature as well, to
> recompile all "rdepends" (reverse dependencies, to use Debian
> terminology).

That would be useful to get the known-stable environment, regardless
of the order you install packages. That's one of the things that's so
appealing about http://nixos.org/nix/

>>> Oh, and then people can simply send pull requests to the respective
>>> packages directly. So I think it solves lots of the problems raised in
>>> this thread. For sage, it would have to move to github though, so it
>>> might not be an option.
>>
>> Probably not. Is it really tied to github, or could any dvcs be plugged in?
>
> I just made two releases last couple days, one can download it from here:
>
> https://github.com/qsnake/qsnake/archives/master
>
> and that is just one big source tarball, and installs completely
> locally, no git is needed (it will actually install its own
> automatically --- but it is not needed for the build system). Git is
> only used to create the big source tarball using "qsnake -d". It can
> be easily changed to "hg" in the build system (and then one needs to
> move all the packages, which is simple but tedious).

I was thinking of something even simpler, where the dependent packages
would just be tarballs (at least as an option), so no revision control
is needed by the build system at all.

- Robert

Jeroen Demeyer

unread,
May 9, 2011, 4:42:29 PM5/9/11
to sage-...@googlegroups.com
What do you mean? The procedure we are discussing has nothing to do
with *building* a spkg, it has to do with *creating* a spkg (sage -spkg).

>
> - Robert
>

Ondrej Certik

unread,
May 9, 2011, 5:57:05 PM5/9/11
to sage-...@googlegroups.com
On Mon, May 9, 2011 at 9:11 AM, Robert Bradshaw

I was thinking about this a lot yesterday, and there are a lot more
issues to resolve, than it seems at first sight. In particular, some
packages like Python is doing some recompiling of modules (I think,
but maybe I am wrong) and some other stuff to the place where it is
installed. Some other packages (setuptools?) are modifying some stuff
as well (at least I read it somewhere).

Pretty much, as long as the "installation" is just copying of files,
then it should work. But if you also need to modify some stuff after
installing it (post install script in Debian/Ubuntu), then things
become more complex.

With our current approach, one is free to do any kind of necessary
tweaks in $SPKG_LOCAL (=SAGE_LOCAL) to make things work. Usually by
the build system of the package itself.

You can download tarballs from github, for example for the Qsnake's
cython package:

https://github.com/qsnake/cython/archives/master

without having git installed. So in principle the build system can use
it too. I just stick with git for now.

Ondrej

Robert Bradshaw

unread,
May 9, 2011, 6:54:57 PM5/9/11
to sage-...@googlegroups.com

I suppose I was imagining checking on both ends, but that's not really
necessary.

- Robert

Robert Bradshaw

unread,
May 9, 2011, 7:01:58 PM5/9/11
to sage-...@googlegroups.com

True, but I can't think of anything in Sage where one needs to modify
the environment any more than put a file somewhere that it's
accessible (though such a thing could be possible).

I agree a general solution is much more subtle.

Ah, yes.

> I just stick with git for now.

Nothing against github (in fact I really like it), it's just that I'm
wary of making my infrastructure heavily dependent on someone else's
for some things.

- Robert

Ondrej Certik

unread,
May 9, 2011, 7:48:55 PM5/9/11
to sage-...@googlegroups.com
On Mon, May 9, 2011 at 4:01 PM, Robert Bradshaw
<robe...@math.washington.edu> wrote:
> On Mon, May 9, 2011 at 2:57 PM, Ondrej Certik <ond...@certik.cz> wrote:
[...]

>> I was thinking about this a lot yesterday, and there are a lot more
>> issues to resolve, than it seems at first sight. In particular, some
>> packages like Python is doing some recompiling of modules (I think,
>> but maybe I am wrong) and some other stuff to the place where it is
>> installed. Some other packages (setuptools?) are modifying some stuff
>> as well (at least I read it somewhere).
>>
>> Pretty much, as long as the "installation" is just copying of files,
>> then it should work. But if you also need to modify some stuff after
>> installing it (post install script in Debian/Ubuntu), then things
>> become more complex.
>>
>> With our current approach, one is free to do any kind of necessary
>> tweaks in $SPKG_LOCAL (=SAGE_LOCAL) to make things work. Usually by
>> the build system of the package itself.
>
> True, but I can't think of anything in Sage where one needs to modify
> the environment any more than put a file somewhere that it's
> accessible (though such a thing could be possible).
>
> I agree a general solution is much more subtle.

One would have to try and see. Nice thing about the current SPKG
format is that it is extremely simple, and although it doesn't allow
uninstall, in my opinion it is completely general in terms of making
sure the result works.

My point is that in terms of both simplicity (=understandability,
maintenance, time for people to learn it, use it, reuse it, ....) and
functionality (=getting any package to work) together, it might not be
possible to beat the current system.

However, one should always try, that's for sure.


>>> [...]


>>> I was thinking of something even simpler, where the dependent packages
>>> would just be tarballs (at least as an option), so no revision control
>>> is needed by the build system at all.
>>
>> You can download tarballs from github, for example for the Qsnake's
>> cython package:
>>
>> https://github.com/qsnake/cython/archives/master
>>
>> without having git installed. So in principle the build system can use
>> it too.
>
> Ah, yes.
>
>> I just stick with git for now.
>
> Nothing against github (in fact I really like it), it's just that I'm
> wary of making my infrastructure heavily dependent on someone else's
> for some things.

No, you have a good point. Sage relies on the packages hosted here:

http://sagemath.org/packages/standard/

or (equivalently) on the packages distributed in the source tarball,
hosted here at any of these mirrors:

http://sagemath.org/download-source.html

So it's quite safe. I don't have many computers at my disposal, so I
chose github, which allows me to upload files larger than 100MB
(unlike google code).

I was thinking about this too yesterday, and I think there is a *high*
value in having a full source distribution, preferably with all the
git histories for all the subprojects (Qsnake currently strips the
.git repository from each package after downloading it from github for
space reasons, but that's trivial to change), so that if the internet
goes down, or github crashes, as long as enough people have downloads
of the sources, there is pretty much no harm done and one can happily
continue developing scientific applications, without any loss. Sage
currently also doesn't have let's say the sympy git history, or Cython
git history.

Ondrej

Ondrej Certik

unread,
May 10, 2011, 3:10:13 AM5/10/11
to sage-...@googlegroups.com

It just occurred to me, that it should be possible to keep the current
SPKG format, and implement uninstall. One just needs to keep track of
all files in SPKG_LOCAL, then see what new files were added + which
files have changed.

If a file has changed, then a warning should be produced, and we would
look at each case manually. Maybe it's possible to make the whole Sage
(or Qsnake in my case) to build without changing any files, just keep
adding them.

If the file was just added, we'll keep track of it. And when the
package is uninstalled, it will simply be removed. Currently we remove
the old files in the spkg-install for some packages, and that's a
hack.

I'll try to implement this, I think that this feature would be really
cool. With this, one can also (trivially) create a binary package, and
store it in let's say spkg/binary in the local install.

Wow, this is exciting!

Ondrej

Dr. David Kirkby

unread,
May 10, 2011, 9:51:51 AM5/10/11
to sage-...@googlegroups.com
On 05/10/11 08:10 AM, Ondrej Certik wrote:

> It just occurred to me, that it should be possible to keep the current
> SPKG format, and implement uninstall. One just needs to keep track of
> all files in SPKG_LOCAL, then see what new files were added + which
> files have changed.

Why not something like

find local -print > foobar.preinstall

install foobar.spkg

find local -print > foobar.postinstall


then if an uninstall is required, one deletes the files in foobar.postinstall
which are not in before foobar.preinstall

Volker Braun

unread,
May 10, 2011, 10:19:34 AM5/10/11
to sage-...@googlegroups.com
IMHO the list of installed files is an integral piece of package management and should explicitly be part of the spkg. Automatically generating it is not an option during parallel compilation. There should be a "spkg-files" or so in the spkg that lists them. During single package build one could automatically check that it is up to date, but the actual list of files needs to be distributed with the spkg. 

With that information it would be relatively easy to automatically translate spkgs into distribution source packages (e.g. srpm). So in the long run we could make use of native package management schemes...

Ondrej Certik

unread,
May 10, 2011, 5:38:38 PM5/10/11
to sage-...@googlegroups.com
On Tue, May 10, 2011 at 7:19 AM, Volker Braun <vbrau...@gmail.com> wrote:
> IMHO the list of installed files is an integral piece of package management
> and should explicitly be part of the spkg. Automatically generating it is
> not an option during parallel compilation. There should be a "spkg-files" or

That's a good point, didn't occur to me, that it won't work for
parallel compilation.

Does Sage work with parallel installation of packages? Looking at the README:

http://boxen.math.washington.edu/sage/src/README.txt

it doesn't seem to be the default way? I also started to compile Sage
4.6.2 on my computer, and it seems to be compiling in sequential mode.

> so in the spkg that lists them. During single package build one could
> automatically check that it is up to date, but the actual list of files
> needs to be distributed with the spkg.

Personally (and that is just my opinion), I don't like to maintain a
list of files in the spkg itself, I don't think that's a good
solution.

I think that a better solution is to disable uninstall if the user
uses parallel compilation of packages. Note that parallel compilation
(make -j9) inside one package is ok.

> With that information it would be relatively easy to automatically translate
> spkgs into distribution source packages (e.g. srpm). So in the long run we
> could make use of native package management schemes...

Ondrej

John H Palmieri

unread,
May 10, 2011, 6:35:31 PM5/10/11
to sage-...@googlegroups.com


On Tuesday, May 10, 2011 2:38:38 PM UTC-7, Ondrej Certik wrote:
On Tue, May 10, 2011 at 7:19 AM, Volker Braun <vbraun.name@gmail.com> wrote:
> IMHO the list of installed files is an integral piece of package management
> and should explicitly be part of the spkg. Automatically generating it is
> not an option during parallel compilation. There should be a "spkg-files" or

That's a good point, didn't occur to me, that it won't work for
parallel compilation.

Does Sage work with parallel installation of packages?


Absolutely.  Do:

$ export SAGE_PARALLEL_SPKG_BUILD=yes
$ export MAKE='make -j8'
$ make

See the installation guide for information about the relevant environment variables. I think that we should set SAGE_PARALLEL_SPKG_BUILD to "yes" automatically -- it works very well, according to everyone I've talked to about it.

--
John

Robert Bradshaw

unread,
May 10, 2011, 9:32:30 PM5/10/11
to sage-...@googlegroups.com
On Tue, May 10, 2011 at 7:19 AM, Volker Braun <vbrau...@gmail.com> wrote:
> IMHO the list of installed files is an integral piece of package management
> and should explicitly be part of the spkg. Automatically generating it is
> not an option during parallel compilation. There should be a "spkg-files" or
> so in the spkg that lists them. During single package build one could
> automatically check that it is up to date, but the actual list of files
> needs to be distributed with the spkg.

This wouldn't be as painful if wildcards are allowed...

> With that information it would be relatively easy to automatically translate
> spkgs into distribution source packages (e.g. srpm). So in the long run we
> could make use of native package management schemes...

I still like the idea of everything installing into their own
directory, and the final "view" is the union of the directories (e.g.
via PATHS or symlinks) rather than tying to track/synchronize every
package stomping over the same (set of) directories.

- Robert

Maarten Derickx

unread,
May 11, 2011, 2:23:14 AM5/11/11
to sage-devel


On May 10, 9:10 am, Ondrej Certik <ond...@certik.cz> wrote:.
>
> It just occurred to me, that it should be possible to keep the current
> SPKG format, and implement uninstall. One just needs to keep track of
> all files in SPKG_LOCAL, then see what new files were added + which
> files have changed.
>
> If a file has changed, then a warning should be produced, and we would
> look at each case manually. Maybe it's possible to make the whole Sage
> (or Qsnake in my case) to build without changing any files, just keep
> adding them.
>

A brilliant idea. This allows us to transition smoothly to a more
distribution friendly setup. I'm interested to see a list of spkgs
which modify files (and offcourse the files being edited).

Is SPKG_LOCAL really an environment variable used in sage? If so the
next step might be to temporarily change it to SPKG_LOCAL/
packagename.versionnr before each install of an SPKG and see what
breakes (probably a lot). But if we get things working again then
uninstall is just as easy as deleting a directory. This would make the
step to something like nix very small.

Note that a lot of SPKG's also install stuff into something like
python*/site-packages.

And I second the idea that the list of files should not be
autogenerated during install but be a part of the SPKG. Although the
initial file lists in the SPKG can be autogenerated offcourse :).

Maarten Derickx

unread,
May 11, 2011, 2:48:37 AM5/11/11
to sage-devel
Maybe the following is also interesting http://pypi.python.org/pypi/zc.buildout/1.5.2

Ondrej Certik

unread,
May 11, 2011, 3:41:14 AM5/11/11
to sage-...@googlegroups.com
On Tue, May 10, 2011 at 11:23 PM, Maarten Derickx
<m.derick...@gmail.com> wrote:
>
>
> On May 10, 9:10 am, Ondrej Certik <ond...@certik.cz> wrote:.
>>
>> It just occurred to me, that it should be possible to keep the current
>> SPKG format, and implement uninstall. One just needs to keep track of
>> all files in SPKG_LOCAL, then see what new files were added + which
>> files have changed.
>>
>> If a file has changed, then a warning should be produced, and we would
>> look at each case manually. Maybe it's possible to make the whole Sage
>> (or Qsnake in my case) to build without changing any files, just keep
>> adding them.
>>
>
> A brilliant idea. This allows us to transition smoothly to a more
> distribution friendly setup. I'm interested to see a list of spkgs
> which modify files (and offcourse the files being edited).
>
> Is SPKG_LOCAL really an environment variable used in sage? If so the
> next step might be to temporarily change it to SPKG_LOCAL/

Sage uses SAGE_LOCAL, but SPKG_LOCAL is more project neutral, so I use that.

> packagename.versionnr before each install of an SPKG and see what
> breakes (probably a lot). But if we get things working again then
> uninstall is just as easy as deleting a directory. This would make the
> step to something like nix very small.
>
> Note that a lot of SPKG's also install stuff into something like
> python*/site-packages.
>
> And I second the idea that the list of files should not be
> autogenerated during install but be a part of the SPKG. Although the
> initial file lists in the SPKG can be autogenerated offcourse :).

What would be the advantage of having it in the SPKG itself?

Like if you want to install two packages that overwrite the same file?

Ondrej

Dr. David Kirkby

unread,
May 11, 2011, 5:54:46 AM5/11/11
to sage-...@googlegroups.com
On 05/10/11 11:35 PM, John H Palmieri wrote:

>> Does Sage work with parallel installation of packages?
>>
>
> Absolutely. Do:
>
> $ export SAGE_PARALLEL_SPKG_BUILD=yes
> $ export MAKE='make -j8'
> $ make
>
> See the installation guide for information about the relevant environment
> variables. I think that we should set SAGE_PARALLEL_SPKG_BUILD to "yes"
> automatically -- it works very well, according to everyone I've talked to
> about it.

I agree with that too. I still think it would be wise to be able to disable it,
as one might want to build individual packages in paralell, but not have the
resources to build loads of different packages in parallel. It would certainly
be an issue on my laptop!

But I think it would be better to have this as the default, as for the vast
majority of cases it is beneficial.

Maarten Derickx

unread,
May 12, 2011, 8:29:39 AM5/12/11
to sage-devel
That it wil be compatible with parallel building as mentioned earlier.

> Like if you want to install two packages that overwrite the same file?
I don't think we should never do (or even want) such a thing. I think
every spkg should only touch it's own files (or else we will get into
an unpredictable mess if we also want uninstall and parallel
building), if you really want an spkg to touch a file created by for
example foo.spkg, one should instead make a patch for the foo.spkg.
Maybe we should add some code that checks if there are spkg's breaking
this rule.


>
> Ondrej

Ondrej Certik

unread,
May 12, 2011, 6:18:29 PM5/12/11
to sage-...@googlegroups.com
On Thu, May 12, 2011 at 5:29 AM, Maarten Derickx
<m.derick...@gmail.com> wrote:
[...]

>>
>> What would be the advantage of having it in the SPKG itself?
>>
> That it wil be compatible with parallel building as mentioned earlier.
>
>> Like if you want to install two packages that overwrite the same file?
> I don't think we should never do (or even want) such a thing. I think
> every spkg should only touch it's own files (or else we will get into
> an unpredictable mess if we also want uninstall and parallel
> building), if you really want an spkg to touch a file created by for
> example foo.spkg, one should instead make a patch for the foo.spkg.
> Maybe we should add some code that checks if there are spkg's breaking
> this rule.

Definitely, two packages should not override the same files.

So what you have in mind is:

* automatically generate the list of files from *sequential* builds,
store it in spkg (or possibly somewhere else)

* use this in all default builds (either parallel or sequential), Sage
would store the list of files somewhere, and use it for uninstall

Is that right?

Ondrej

John H Palmieri

unread,
May 12, 2011, 8:38:11 PM5/12/11
to sage-...@googlegroups.com


On Thursday, May 12, 2011 3:18:29 PM UTC-7, Ondrej Certik wrote:
On Thu, May 12, 2011 at 5:29 AM, Maarten Derickx
<m.derick...@gmail.com> wrote:
[...]
>>
>> What would be the advantage of having it in the SPKG itself?
>>
> That it wil be compatible with parallel building as mentioned earlier.
>
>> Like if you want to install two packages that overwrite the same file?
> I don't think we should never do (or even want) such a thing. I think
> every spkg should only touch it's own files (or else we will get into
> an unpredictable mess if we also want uninstall and parallel
> building), if you really want an spkg to touch a file created by for
> example foo.spkg, one should instead make a patch for the foo.spkg.
> Maybe we should add some code that checks if there are spkg's breaking
> this rule.

Definitely, two packages should not override the same files.

Does ATLAS sometimes (depending on the OS) overwrite files from the lapack installation?

Certainly *if* two packages can affect the same files, their dependencies should reflect it, so for example, ATLAS depends on lapack,

--
John

Francois Bissey

unread,
May 12, 2011, 8:44:15 PM5/12/11
to sage-...@googlegroups.com

That's certainly a valid question! We usually first install blas which produce
a libf77blas then lapack which will produce liblapack. Finally it is ATLAS's
turn. ATLAS if I am not mistaken overwrite libf77blas from blas purely and
simply. Then it takes liblapack and modifies it.

Note that the current plans that we have for blas/lapack/atlas with Volker
involves getting rid of the blas spkg and of the individual lapack spkg and
build {f77,c}blas and lapack in one go in the ATLAS spkg.

Francois

This email may be confidential and subject to legal privilege, it may
not reflect the views of the University of Canterbury, and it is not
guaranteed to be virus free. If you are not an intended recipient,
please notify the sender immediately and erase all copies of the message
and any attachments.

Please refer to http://www.canterbury.ac.nz/emaildisclaimer for more
information.

Maarten Derickx

unread,
May 13, 2011, 5:05:19 AM5/13/11
to sage-devel
> So what you have in mind is:
>
> * automatically generate the list of files from *sequential* builds,
> store it in spkg (or possibly somewhere else)
>
> * use this in all default builds (either parallel or sequential), Sage
> would store the list of files somewhere, and use it for uninstall
>
> Is that right?
You are right. But I don't see this as the final stage. This is just a
step that is slightly less invasive then giving each SPKG it's own
version dependent directory in which it can install it's files (like
Robert Bradshaw mentioned) so we can have multiple versions of the
same SPKG installed and uninstalling is just the removal of a
directory. Ofcourse we should also find a solution for the python
eggs so that they don't overwrite eachother (maybe use virtualenv or
buildout for this).

But as you can see my idea of the final stage is much more involved
and has a lot more complications then your simple idea to just keep
track of which files belong to which SPKG and create some uninstall
based on that. So independent of wathever we might decide that the
best final situation might be we should first implement your idea in a
parallel build compatible way so we have uninstall.

Kind regards
Maarten Derickx
>
> Ondrej

William Stein

unread,
May 13, 2011, 11:24:11 AM5/13/11
to sage-...@googlegroups.com
On Tue, May 10, 2011 at 3:35 PM, John H Palmieri <jhpalm...@gmail.com> wrote:
>
>
> On Tuesday, May 10, 2011 2:38:38 PM UTC-7, Ondrej Certik wrote:
>>
>> On Tue, May 10, 2011 at 7:19 AM, Volker Braun <vbrau...@gmail.com>

>> wrote:
>> > IMHO the list of installed files is an integral piece of package
>> > management
>> > and should explicitly be part of the spkg. Automatically generating it
>> > is
>> > not an option during parallel compilation. There should be a
>> > "spkg-files" or
>>
>> That's a good point, didn't occur to me, that it won't work for
>> parallel compilation.
>>
>> Does Sage work with parallel installation of packages?
>
> Absolutely.  Do:
>
> $ export SAGE_PARALLEL_SPKG_BUILD=yes
> $ export MAKE='make -j8'
> $ make
>
> See the installation guide for information about the relevant environment
> variables. I think that we should set SAGE_PARALLEL_SPKG_BUILD to "yes"
> automatically -- it works very well, according to everyone I've talked to
> about it.

And short of making it the default, at least it could be documented in
the README file or the Makefile (in addition to the install guide):

wstein@ubuntu:~/sage-4.7$ grep "SAGE_PARALLEL" *
wstein@ubuntu:~/sage-4.7$


>
> --
> John
>
> --
> To post to this group, send an email to sage-...@googlegroups.com
> To unsubscribe from this group, send an email to
> sage-devel+...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/sage-devel
> URL: http://www.sagemath.org
>

--
William Stein
Professor of Mathematics
University of Washington
http://wstein.org

kcrisman

unread,
May 13, 2011, 2:45:56 PM5/13/11
to sage-devel

> > $ export SAGE_PARALLEL_SPKG_BUILD=yes
> > $ export MAKE='make -j8'
> > $ make
>
> > See the installation guide for information about the relevant environment
> > variables. I think that we should set SAGE_PARALLEL_SPKG_BUILD to "yes"
> > automatically -- it works very well, according to everyone I've talked to
> > about it.
>
> And short of making it the default, at least it could be documented in
> the README file or the Makefile (in addition to the install guide):
>
> wstein@ubuntu:~/sage-4.7$ grep "SAGE_PARALLEL" *
> wstein@ubuntu:~/sage-4.7$
>

Yeah, I always have to look it up on the internet to see if it's
SAGE_PARALLEL_SPKG_BUILD or SAGE_SPKG_PARALLEL_BUILD. Maybe both
should be allowed... or other permutations even...

- kcrisman

Dr. David Kirkby

unread,
May 13, 2011, 4:05:58 PM5/13/11
to sage-...@googlegroups.com

I don't think we should permit combinations - it makes testing for the variable
more difficult. When do we stop if we allow any combination of variable names?

IMHO, we should document this in a few places, so people are more likely to find
it. Perhaps in the top level README.txt.

Reply all
Reply to author
Forward
0 new messages