can sage binaries be stripped (made smaller)

56 views
Skip to first unread message

emil

unread,
Jan 6, 2011, 2:59:32 PM1/6/11
to sage-devel
I created a binaries package for the use in a Live CD with the
commands

export SAGE_FAT_BINARY="yes"
make
./sage -bdist x.y.z-fat

Testing seems ok (I am still unsure about sage -testall --optional,
see http://groups.google.com/group/sage-support/browse_thread/thread/d3ffa3501849162b).
Now I have a question:

Is there any recommended procedere to make the resulting package
smaller? I checked the package and e.g. found that dynamic libraries
are not stripped.

Also I found duplicate file entries in the directory tree. e.g.
/devel/sage-main/build/lib.linux-i686-2.6 and
/devel/sage-main/build/sage
/devel/sage-main/sage

contain lots of identical files. This are 100 MB+ directories, so no
peanuts.

/devel/sage-main/build/temp.linux-i686-2.6 contains object files of
the libraries.

I assume those directories are partly needed only for developement
purposes, so maybe it would make sense to split tem into a seperate
DEV package.

I know almost nothing about the inner workings of sage, but clearly
size was no restricting parameter in its design.
(no critizism, also a valid approach)

However there might be some situations, where a smaller size would
open new possibilities. So let me ask if there are any ideas around to
reduce the size of final binaries (remove possibly unneeded files,
strip/upx executeables) or if such approaches are vain because it is
not possible to shrink the binaries?

Thanks in advance for advice
emil

Eviatar

unread,
Jan 6, 2011, 7:17:46 PM1/6/11
to sage-devel
I was also wondering about this. I think a lightweight binary
distribution without the entire devel directory could be useful (of
course as an alternative to the existing binary download). It would
decrease the download size dramatically.

I imagine that the reason this is not done is so development is "at
hand" to any user, and so there would be more chance of them
contributing. I think this is true. After all, most users, given a
choice of a normal version and a developer version which is double the
size, would probably download the former.

Robert Bradshaw

unread,
Jan 6, 2011, 8:43:21 PM1/6/11
to sage-...@googlegroups.com
On Thu, Jan 6, 2011 at 4:17 PM, Eviatar <eviat...@gmail.com> wrote:
> I was also wondering about this. I think a lightweight binary
> distribution without the entire devel directory could be useful (of
> course as an alternative to the existing binary download). It would
> decrease the download size dramatically.

Note that you can't just get rid of the the devel directory, as that's
where the compiled files sit as well as the sources. There's a lot of
hard and symbolic links laying around. Having the source around is
important for debugging and tracebacks (not just for development).

> I imagine that the reason this is not done is so development is "at
> hand" to any user, and so there would be more chance of them
> contributing. I think this is true.

+1, which has been essential to getting Sage where it is.

> After all, most users, given a
> choice of a normal version and a developer version which is double the
> size, would probably download the former.

The binaries do have empty spkgs, but I find it interesting that in
this day and age people would find, say (optimistically) 200MB small
enough but 350MB too big. Size is a concern, but mostly comes into
play when deciding whether it's worth including an spkg or some
pre-computed data.

- Robert

Justin C. Walker

unread,
Jan 7, 2011, 12:48:36 AM1/7/11
to sage-...@googlegroups.com

On Jan 6, 2011, at 11:59 , emil wrote:

> Is there any recommended procedere to make the resulting package
> smaller? I checked the package and e.g. found that dynamic libraries
> are not stripped.

I think you'll find that stripping (dynamic) libraries is counter-productive. But try it and let us know how it works :-}

> Also I found duplicate file entries in the directory tree. e.g.
> /devel/sage-main/build/lib.linux-i686-2.6 and
> /devel/sage-main/build/sage
> /devel/sage-main/sage
>
> contain lots of identical files. This are 100 MB+ directories, so no
> peanuts.

I think the issue is that, without the devel directory set up as it is, actual development won't work.

I suppose it's worth considering a "really binary release" that is of use only for those that don't intend to add to the Sage project.

Justin

--
Justin C. Walker, Curmudgeon-At-Large
Institute for the Enhancement of the Director's Income
--------
When LuteFisk is outlawed,
Only outlaws will have LuteFisk
--------

emil

unread,
Jan 7, 2011, 9:53:12 AM1/7/11
to sage-devel
Oh, thanks for multiple response, some comments:

@ Robert Bradshaw:
> There's a lot of
> hard and symbolic links laying around. Having the source around is
> important for debugging and tracebacks (not just for development).

Right at the moment its not just links, but real files, some of them
in 3 directories.
At some point (in the future?) there might be a stable core
functionality, so debugging may become less important in the vanilla
installation.

> > I imagine that the reason this is not done is so development is "at
> > hand" to any user, and so there would be more chance of them
> > contributing. I think this is true.
>
> +1, which has been essential to getting Sage where it is.

Contribution can also involve other areas (Documentation, Promotion,
Testing, Application, Teaching). I think the vast majority of
potential users will not contribute code to the core of sage - at
least not in the beginning ...

> The binaries do have empty spkgs, but I find it interesting that in
> this day and age people would find, say (optimistically) 200MB small
> enough but 350MB too big.

Right at the moment the binary distribution produced by the default
procedere is 449 MB in a squashed filesystem for me. It was 387 MB in
Version 4.3.1 - this is a significant increase in 6 months. The 350 MB
can be reached by lzma-ing (as far as I know), but that is not usable
in my install.
"Size" is also not only about "Download size".

Away from "too big" or "small enough", let me put it this way: maybe
it is possible to make it smaller without loosing functionality for
normal operation.

> Size is a concern, but mostly comes into
> play when deciding whether it's worth including an spkg or some
> pre-computed data.

Idea: If the core is slim, one can use extra spkg (specialised in
application field).

@ Justin C. Walker:
> I think you'll find that stripping (dynamic) libraries is counter-productive. But try it and let us know how it works :-}

I am pretty ignorant, I am using Puppy Linux, which has the character
and flair of a hobby-distro. But it sure it has some merits in
offering great functionality using minimal space (250 MB with
complete set of applications and full development environment). If I
look into the lib folders I see only stripped dynamic libaries, I
guess it has a reason.
What would be the disadvantage of stripped libraries? My knowledge is,
that debugging information etc is ommited. What else?
And I will try it :-) !

>I suppose it's worth considering a "really binary release"

I could make good use of such a package for the live CD. I intend to
dig into it and try to make a one plus an additional "sage_dev"
package (mid-term goal: make a procedure for it to have it automated
for future releases). But for the uninitiated it is not obvious which
files and folders serve which purpose. I hesitate to start one of my
usual a trial and error approaches.

just take the mentioned directories as examples:
Intuitively I would assume /devel/sage-main/sage contains the "real"
binaries
everything in /devel/sage-main/build is necessary for development.
Right or wrong?

> that is of use only for those that don't intend to add to the Sage project.
As expressed above: there might be additions and contributions beyond
code ...

I would be grateful to get any advice about the structure of the sage
directory tree, is it realistic to build such a "dev" package, if yes:
which files can be shifted?

emil

William Stein

unread,
Jan 7, 2011, 11:18:03 AM1/7/11
to sage-...@googlegroups.com
On Fri, Jan 7, 2011 at 6:53 AM, emil <emil.w...@gmail.com> wrote:
> just take the mentioned directories as examples:
> Intuitively I would assume /devel/sage-main/sage contains the "real"
> binaries
> everything in /devel/sage-main/build is necessary for development.
> Right or wrong?

This is wrong, as you might discover if you delete that directory,
then try to use Sage. There is a symlink from

SAGE_ROOT/local/lib/site-packages/sage to
SAGE_ROOT/devel/sage/build/sage

The best solution to the problem we're discussing in the context of
SAGE_ROOT/devel/sage would be to change the Sage library to build all
.so's in place, so that the SAGE_ROOT/devel/sage/build would in fact
be redundant, and there would only ever be exactly one copy of each
file (or .so) in SAGE_ROOT/devel/sage. This is what happens with
"python setup.py develop" or "setup.py build_ext --inplace <list of
extension names>", when using setuptools instead of distutils. For
example, with the Sage Notebook and PSAGE
(http://code.google.com/p/purplesage/), which both use setuptools, we
do all of our work this way.

Another plus to this change is that if foo is a function and you type:

sage: foo??

you'll see the actual location of the file you should edit to modify
foo, rather than some highly misleading file in SAGE_ROOT/local (which
is a major source of confusion at first).

That said, it could be significant and painfully disruptive work for
somebody to change the core Sage library to work this way. I hope it
happens at some point. Confusion will be reduced overall, as will
disk space usage.

William

--
William Stein
Professor of Mathematics
University of Washington
http://wstein.org

emil

unread,
Jan 7, 2011, 4:37:49 PM1/7/11
to sage-devel


On Jan 7, 4:18 pm, William Stein <wst...@gmail.com> wrote:
> The best solution to the problem we're discussing in the context of
> SAGE_ROOT/devel/sage would be to change the Sage library to build all
> .so's in place, so that the SAGE_ROOT/devel/sage/build would in fact
> be redundant, and there would only ever be exactly one copy of each
> file (or .so) in SAGE_ROOT/devel/sage.  

Would that mean rewriting sage build scripts?

>This is what happens with
> "python setup.py develop" or "setup.py build_ext --inplace <list of
> extension names>", when using setuptools instead of distutils.    For
> example, with the Sage Notebook and PSAGE
> (http://code.google.com/p/purplesage/), which both use setuptools, we
> do all of our work this way.

I guess you use a core of sage for your PSAGE, does it mean
work could concentrate on the packages you have excluded from psage?

> Another plus to this change is that if foo is a function and you type:
>
>    sage: foo??
>
> you'll see the actual location of the file you should edit to modify
> foo, rather than some highly misleading file in SAGE_ROOT/local (which
> is a major source of confusion at first).
>
> That said, it could be significant and painfully disruptive work for
> somebody to change the core Sage library to work this way.
Do you have a rough estimate how long it takes for someone with no
prior knowledge of setuptools?
>  I hope it
> happens at some point.   Confusion will be reduced overall, as will
> disk space usage.
A Rough estimate of size reduction?
>
> William

Thanks for specific info!
After such a change, would there still be some files which could be
regarded as "needed for development only". Do you think there is room
for a "binary only" version of sage or do you prefer that every sage
ships with the sources?

William Stein

unread,
Jan 7, 2011, 5:55:17 PM1/7/11
to sage-...@googlegroups.com
On Fri, Jan 7, 2011 at 1:37 PM, emil <emil.w...@gmail.com> wrote:
>
>
> On Jan 7, 4:18 pm, William Stein <wst...@gmail.com> wrote:
>> The best solution to the problem we're discussing in the context of
>> SAGE_ROOT/devel/sage would be to change the Sage library to build all
>> .so's in place, so that the SAGE_ROOT/devel/sage/build would in fact
>> be redundant, and there would only ever be exactly one copy of each
>> file (or .so) in SAGE_ROOT/devel/sage.
>
> Would that mean rewriting sage build scripts?

Yes. It would be a nontrivial project. I hope somebody does it sometime!

>
>>This is what happens with
>> "python setup.py develop" or "setup.py build_ext --inplace <list of
>> extension names>", when using setuptools instead of distutils.    For
>> example, with the Sage Notebook and PSAGE
>> (http://code.google.com/p/purplesage/), which both use setuptools, we
>> do all of our work this way.
>
> I guess you use a core of sage for your PSAGE, does it mean
> work could concentrate on the packages you have excluded from psage?

By "PSAGE" I meant http://code.google.com/p/purplesage/, which is a
separate Python package, which itself has nothing to do with anything
currently in Sage. It's just a new Python package that depends on
Sage.


>
>> Another plus to this change is that if foo is a function and you type:
>>
>>    sage: foo??
>>
>> you'll see the actual location of the file you should edit to modify
>> foo, rather than some highly misleading file in SAGE_ROOT/local (which
>> is a major source of confusion at first).
>>
>> That said, it could be significant and painfully disruptive work for
>> somebody to change the core Sage library to work this way.
>
> Do you have a rough estimate how long it takes for someone with no
> prior knowledge of setuptools?

It depends entirely on how good that person is, and what they know
about Sage. If they are sufficiently smart, maybe 1 week? It's
impossible to know what issues might arise though until one try.


>>  I hope it
>> happens at some point.   Confusion will be reduced overall, as will
>> disk space usage.
> A Rough estimate of size reduction?

I estimate 189MB, i.e., the size of SAGE_ROOT/devel/sage/sage/.

Robert Bradshaw

unread,
Jan 8, 2011, 1:57:34 AM1/8/11
to sage-...@googlegroups.com

This would of course mean no reasonable tracebacks, if the source
couldn't be found....

- Robert

William Stein

unread,
Jan 8, 2011, 2:54:28 AM1/8/11
to sage-...@googlegroups.com

No, you're wrong. The source would be found, and the tracebacks would
be excellent.

Observe the following:

deep:sage wstein$ pwd
/Users/wstein/sd27/sage-4.6.1.rc1/devel/sage
deep:sage wstein$ ls -lh build/sage/rings/arith.py
-rw-r--r-- 1 wstein staff 127K Jan 5 00:22 build/sage/rings/arith.py
deep:sage wstein$ ls -lh sage/rings/arith.py
-rw-r--r-- 1 wstein staff 127K Jan 5 00:22 sage/rings/arith.py

Notice that there are two distinct but identical files. This is a
waste of disk space. This can be fixed with absolutely no reduction in
functionality or capabilities of Sage, e.g., by switching to using
setuptools and building the so modules in place.

I think I was wrong in my space saving estimate (189MB) above. The
SAGE_ROOT/devel/sage directory in sage-4.6.1.rc1 is 768MB:

deep:sage wstein$ du -sch *
4.0K MANIFEST.in
4.0K PKG-INFO
4.0K README.txt
367M build
4.0K bundle
4.5M c_lib
207M doc
4.0K export
4.0K install
68K module_list.py
40K module_list.pyc
4.0K pull
189M sage
4.0K sage-push
40K setup.py
4.0K spkg-delauto
4.0K spkg-dist
4.0K spkg-install
768M total


------

For Sage to run, I think all that's *needed* is:

SAGE_ROOT/devel/sage/build/sage

since that is exactly the thing that
SAGE_ROOT/local/lib/python/site-packages/sage is linked to, so it is
all that Python sees at runtime. This directory is 128MB:

deep:sage wstein$ du -sch build/sage
128M build/sage
128M total


-----

So one could strip 640MB away from SAGE_ROOT/devel/sage/build/sage and
still have a Sage that runs. And this would have docstring
introspection even for Cython code, since the .pyx files are in
SAGE_ROOT/devel/sage/build/sage.
The autogenerated C/C++ files are not there, so development ("sage
-br") would be painfully slow the first time, as these files would all
get regenerated. The SAGE_ROOT/devel/sage/doc/ directory would also
be missing, which is used by the notebook *only* when browsing the
manuals (not for function_name? introspection).

I didn't figure out the exact savings from just switching to
setuptools, but I think it would be at least 240MB.

-- William

François Bissey

unread,
Jan 8, 2011, 3:14:16 AM1/8/11
to sage-...@googlegroups.com
I guess I am reading this and I am amused. Christopher may be amused too.
From our sage ebuild:
# rebuild in place
sed -i "s:SAGE_DEVEL +
'sage/sage/ext/interpreters':'sage/ext/interpreters':g" \
setup.py || die "failed to patch interpreters path"

and then we just use setuptools.

Francois

Robert Bradshaw

unread,
Jan 8, 2011, 4:03:36 AM1/8/11
to sage-...@googlegroups.com

I was talking about a completely stripped binary (i.e no source).

> Observe the following:
>
> deep:sage wstein$ pwd
> /Users/wstein/sd27/sage-4.6.1.rc1/devel/sage
> deep:sage wstein$ ls -lh build/sage/rings/arith.py
> -rw-r--r--  1 wstein  staff   127K Jan  5 00:22 build/sage/rings/arith.py
> deep:sage wstein$ ls -lh sage/rings/arith.py
> -rw-r--r--  1 wstein  staff   127K Jan  5 00:22 sage/rings/arith.py
>
> Notice that there are two distinct but identical files.  This is a
> waste of disk space. This can be fixed with absolutely no reduction in
> functionality or capabilities of Sage, e.g., by switching to using
> setuptools and building the so modules in place.

I agree 100% here.

I don't even think we need setuptools--just build extensions inplace
(there's an option for this) and put devel/sage in the python path
(this was what I did for enabling concurrent execution of distinct
branches, though that still needs a bit of work
http://trac.sagemath.org/sage_trac/ticket/9967). Then one wouldn't
even need to do a "sage -b" to work on (non-Cython) code.

- Robert

emil

unread,
Jan 8, 2011, 6:54:33 AM1/8/11
to sage-devel
> > So one could strip 640MB away from SAGE_ROOT/devel/sage/build/sage and
> > still have a Sage that runs.

size -30% with almost no loss of functionality!
 
> > SAGE_ROOT/devel/sage/build/sage.
> > The autogenerated C/C++ files are not there, so development ("sage
> > -br") would be painfully slow the first time, as these files would all
> > get regenerated. The SAGE_ROOT/devel/sage/doc/ would also
> > be missing, which is used by the notebook *only* when browsing the
> > manuals (not for function_name? introspection).

Could those files be split away for a "DEV", respec "DOC" package for
optional download?

> I don't even think we need setuptools--just build extensions inplace
> (there's an option for this) and put devel/sage in the python path
> (this was what I did for enabling concurrent execution of distinct
> branches, though that still needs a bit of workhttp://trac.sagemath.org/sage_trac/ticket/9967). Then one wouldn't
> even need to do a "sage -b" to work on (non-Cython) code.

So there is a chance for major size reduction AND improved
functionality (concurrent execution) - sounds exciting!

emil

unread,
Jan 10, 2011, 4:22:07 AM1/10/11
to sage-devel
sagelive-511-46-r3.iso released
I have build the Live CD distribution in the conventional way,
iso size is 630 MB (note: OS with applications, complete Sage with
jsmathfonts, Java and working R plotting)

release announcement is here http://groups.google.com/group/sage-support/browse_thread/thread/38f7d7172fd3113a

This is just to stress my point that in some applications size
matters. Sage makes roughly 80 % of the size of the distribution.

It is hard to make estimates about the size of the distribution with
the discussed changes in the build process implemented because it is
all packed into a squashed filesystem, but I guess it could be up to
50 MB. Maybe RAM allocation would be better too without the double
files, i.e it would run better on older hardware.

Harald Schilly

unread,
Jan 10, 2011, 3:40:02 PM1/10/11
to sage-...@googlegroups.com
On Thursday, January 6, 2011 8:59:32 PM UTC+1, emil wrote:

contain lots of identical files....

Here is a line for bash that shows you identical files:

find . -type f -exec md5sum {} \; | sort --key=1,32 | uniq -w 32 -d --all-repeated=separate

H

Volker Braun

unread,
Jan 10, 2011, 5:09:19 PM1/10/11
to sage-...@googlegroups.com
you can use fslint to identify duplicate files and merge them (i.e. hardlink them to a single file).

[vbraun@volker-two ~]$ /usr/share/fslint/fslint/findup --help
find dUPlicate files.
Usage: findup [[[-t [-m|-d]] | [--summary]] [-r] [-f] paths(s) ...]

If no path(s) specified then the currrent directory is assumed.


When -m is specified any found duplicates will be merged (using hardlinks).
When -d is specified any found duplicates will be deleted (leaving just 1).
When -t is specfied, only report what -m or -d would do.

When --summary is specified change output format to include file sizes.
You can also pipe this summary format to /usr/share/fslint/fslint/fstool/dupwaste
to get a total of the wastage due to duplicates.

Examples:

search for duplicates in current directory and below
    findup or findup .
search for duplicates in all linux source directories and merge using hardlinks
    findup -m /usr/src/linux*
same as above but don't look in subdirectories
    findup -r .
search for duplicates in /usr/bin
    findup /usr/bin
search in multiple directories but not their subdirectories
    findup -r /usr/bin /bin /usr/sbin /sbin
search for duplicates in $PATH
    findup $(/usr/share/fslint/fslint/supprt/getffp)
search system for duplicate files over 100K in size
    findup / -size +100k
search only my files (that I own and are in my home dir)
    findup ~ -user $(id -u)
search system for duplicate files belonging to roger
    findup / -user $(id -u roger)

emil

unread,
Jan 11, 2011, 5:32:18 AM1/11/11
to sage-devel
On Jan 10, 10:09 pm, Volker Braun <vbraun.n...@gmail.com> wrote:
> you can use fslint to identify duplicate files and merge them (i.e. hardlink
> them to a single file).

I tried to follow your path but was not able to use fslint., I wanted
to compile from source, but it needs pygtk, and I was unable to
compile that package.

What is the recommended built procedere if I want to add python
modules to the python of sage? But this is another interesting
topic ...

I then used fdup and got a list of roughly 8000 files with double or
treble occurance in the sage build tree. Is it save to replace all
double occurances by links? I might try to write a script that uses
the fdup output to do this.















Robert Bradshaw

unread,
Jan 11, 2011, 4:49:12 PM1/11/11
to sage-...@googlegroups.com

I'd be surprised if squashfs doesn't already do file/block deduping,
which would erase all potential gains. It'd also probably be a risky
think to do on an actively developed Sage build, but perhaps OK for a
non-development binary.

- Robert

emil

unread,
Jan 12, 2011, 2:38:05 AM1/12/11
to sage-devel


On Jan 11, 9:49 pm, Robert Bradshaw <rober...@math.washington.edu>
wrote:
In my earlier post I reported that I am unable to compile fslint. Of
course I failed horribly to make the fslint-Gtk-GUI work, but that
just reveals my ignorance, because fslint consists basically of a few
bash scripts. I hadn't looked into the subdirectories of the source.
So after I figured that out I made some stripping experiments.

I used bash scripts of package fslint, http://www.pixelbeat.org/fslint/

Summary of my experience:
---------------------------
Safe directory tree original: 1662 MB

./findup --summary SAGE_ROOT | ./fstool/dupwaste
Output: Total wasted Space 290 MB

./findup -m SAGE_ROOT
Directory tree after replacing duplicate files with links: 1429 MB
(-233 MB)
sage -testall passed 0 errors

Then stripping binaries
cd SAGE_ROOT
find . | xargs file | grep "executable" | grep ELF | cut -f 1 -d : |
xargs strip --strip-unneeded 2> /dev/null
this gives 1392 MB (-270 MB)
find . | xargs file | grep "shared object" | grep ELF | cut -f 1 -d :
| xargs strip --strip-unneeded 2> /dev/null
this gives 1224 MB (-438 MB)

This is from 1662 MB to 1224 MB
-26 % in a first try.
sage -testall passed 0 errors

I tried this on 3 different copies of the sage tree. It boggles me
that I got slightly different results
of size, but in the same range.

Squashed filesystems:
SFS - Size of unstripped sage tree (1662 MB original): 451 MB
SFS - Size of stripped sage tree (1224 MB original): 374 MB
This is -77 MB or - 17 %

More than I expected ...
Still I have to test the sfs package if it works in a fresh install

If this proves stable I would recommend to consider that
binary packages should be patched this way.

emil




emil

unread,
Jan 12, 2011, 7:50:47 AM1/12/11
to sage-devel
> I used bash scripts of package fslint,http://www.pixelbeat.org/fslint/
Ok, I tried the small package in a fresh install (of my base
distribution). It works flawless (sage -testall gives 1 error with
preparser-py, but that is because I don't have the developement
package installed). It seems also somewhat faster (stripped binaries).
Usually I always had around 10.000 sec for testall, this was 8.750
sec.

Also some info for ckrisman about R-plotting:
http://trac.sagemath.org/sage_trac/ticket/8868
This binary package works for me without having xorg-dev packages/
headers installed, i.e.
sage -sh
R
demo(graphics)
produces the demo-plots. During creation of the binaries the headers
are needed, but not for execution.

Harald Schilly

unread,
Jan 12, 2011, 8:08:26 AM1/12/11
to sage-...@googlegroups.com, Emil Widmann
On Wed, Jan 12, 2011 at 08:38, emil <emil.w...@gmail.com> wrote:
> If this proves stable I would recommend to consider that
> binary packages should be patched this way.

I think that's definitely something we should look into! Could you
post a bash script? What somebody also has to check is, if it is still
possible to do development. i.e. apply some patches, changing cython
files, sage -b and then checking it again.

H

emil

unread,
Jan 12, 2011, 8:32:47 AM1/12/11
to sage-devel
> I think that's definitely something we should look into! Could you
> post a bash script? What somebody also has to check is, if it is still
> possible to do development. i.e. apply some patches, changing cython
> files, sage -b and then checking it again.

Hi Harald,

this is the procedure suggested by Volker Braun, i.e. just replace all
double file occurrences with hardlinks
+ strip --strip-unneeded for all binaries and libraries. All python
sources are left in the tree, so traceback works.
I have never done sage -b or any development in sage itself, so it's
better somebody else checks this.

I am not so good at bash scripting, but it is basically just the above
4 commands:
(put the fslint scripts in the path, actually only the script findup
is needed )

So the prototype is (untested):
#!/bin/sh
cd SAGE_ROOT
findup -m .
find . | xargs file | grep "executable" | grep ELF | cut -f 1 -d : |
xargs strip --strip-unneeded 2> /dev/null
find . | xargs file | grep "shared object" | grep ELF | cut -f 1 -d :
| xargs strip --strip-unneeded 2> /dev/null

and for me in the Live CD distribution as an addition to create an sfs
package (squashed file system):
cd ..
dir2sfs SAGE_ROOT
(basically command mksquashfs SAGE_ROOT)

whole procedure takes 5 - 10 minutes for me on a vanilla PC.
Gruesse nach Wien
emil

Robert Bradshaw

unread,
Jan 12, 2011, 11:17:48 AM1/12/11
to sage-...@googlegroups.com

+100 There would have to be *significant* savings before I would want
to make it so people would have to download a new version of Sage to
create patches and otherwise develop Sage. Also, would this make
editing files and not doing sage -b have effects?

- Robert

emil

unread,
Jan 12, 2011, 11:39:05 AM1/12/11
to sage-devel
W. Stein wrote
> For Sage to run, I think all that's *needed* is:
> SAGE_ROOT/devel/sage/build/sage

So to do a thorough job I went on further with this:

Stripping by approach W.Stein
-----------------------------
Second Approach was like W. Stein suggested, I kept
SAGE_ROOT/devel/sage-main/build/sage and
SAGE_ROOT/devel/sage-main/clib

and moved other folders out

This gives a Size of 1239 M of the Sage-tree (unstripped binaries, no
double file fixing)
however, sage -testall starts to run and passes some doctests, but
soon fails.
...
The following tests failed:

/initrd/mnt/dev_save/Sage_Work/sage/devel/sage/doc/common # File not
found
/initrd/mnt/dev_save/Sage_Work/sage/devel/sage/doc/en # File not
found
/initrd/mnt/dev_save/Sage_Work/sage/devel/sage/doc/fr # File not
found
/initrd/mnt/dev_save/Sage_Work/sage/devel/sage/sage # File not found
sage -t -force_lib "devel/sagenb-main/sagenb/misc/sageinspect.py"
sage -t -force_lib "devel/sagenb-main/sagenb/misc/sphinxify.py"
Total time for all tests: 109.9 seconds
...

doc failures are clear, I moved them out, others are failures in
sagenb which I should not have touched - Hm, did I make an error?
Anyway!

I don't want to dig into that at the moment because I think everything
would be
just a (dirty?) hack and the preferred solution is a clean rebuild
according to the proposed solution with setuptools, or the other
method mentioned by Robert Bradshaw (build extensions inplace and put
devel/sage in python path).

Just a few more observations:
SAGE_ROOT/devel/sagenb_main
has about 36 MB of double files, i think the same arguments apply for
this directory tree.

If one would want to keep traceback cabapilities, would it be an
option to strip out comments of the
Python sources (but keep line numbering) or would this be a big
"No,No!" idea?
(of course with the commented python sources as separate package)

If I start with the result from -77 MB in the sfs size (with the
stripping mentioned above, and confirmed to work), there would be a
possibility to build a live CD with compilers and dev packages
included (under 700 MB), so cython could work out of the box. (Only
possible if size of the sage core remains somewhat stable in the next
releases)

I am pretty excited so far about the results :D, I hope it proofs
stable enough for use in the Live CD distribution.
emil

kcrisman

unread,
Jan 12, 2011, 11:44:05 AM1/12/11
to sage-devel

> Also some info for ckrisman about R-plotting:http://trac.sagemath.org/sage_trac/ticket/8868
> This binary package works for me without having xorg-dev packages/
> headers installed, i.e.
> sage -sh
> R
> demo(graphics)
> produces the demo-plots. During creation of the binaries the headers
> are needed, but not for execution.

Thanks, that helps us isolate what is going on.

Emil Widmann

unread,
Jan 15, 2011, 6:51:49 PM1/15/11
to sage-devel
Stripping Sage Binaries II
--------------------------

With hardlinking multible files and stripping executables a size
reduction of 438 MB (-26%) was achieved. Further reduction involves
moving directories which breaks sage -testall. The goal is to produce
a binary package of sage with aedequate functionality and reduced
size. So I couldn't resist to push on and accept failing tests to see
the overall potential.

Preliminary results:
-----------------------------------------------
Original Directory tree (~1900 MB)
sage-binary (775 MB / squashed FS 218 MB)
That is: -60% / - 88 % compared to original size !!!
sage-dev (529 MB)
sage-doc (222 MB)

The resulting binaries seem to work in a superficial test. The
stripped binaries can be tested as a live iso or in a virtual Machine
image (also easy frugal install into existing Linux desktop possible).

Download iso image (400 MB, thats Live CD Base distro + stripped
binaries)
http://boxen.math.washington.edu/home/emil/sagelithe/

Stripping Procedere:
--------------------
It started with building a binary distribution on sagelive-511-46-r3
(Live CD release) - 1) (see Footnote). The resulting directory tree of
this build was manually split into 3 directories:
sage-binary
sage-devel
sage-doc

The bulk which was moved out of the original binary tree were the
following directories 2):
SAGE_ROOT/devel/sage (190 MB)
SAGE_ROOT/devel/sage-main/build/lib.linux-i686-2.6/sage (78 MB)
SAGE_ROOT/devel/sage-main/build/temp.linux-i686-2.6/sage (104 MB)
SAGE_ROOT/devel/sagenb-main/dist(14 MB)
SAGE_ROOT/devel/sagenb-main/build/lib/sagenb(35 MB)
hidden directories:
SAGE_ROOT/devel/sage-main/.hg (50 MB)
SAGE_ROOT/devel/sage-main/.hg (39 MB)

After that the stripping procedere from the 1. attempt was applied
(hardlink multiplicate file instances, strip binaries) to the binary
directory 3) .

The produced binary package worked for me in a brief test 4) in a
fresh install of the base distribution. However sage -testall is not
working anymore, so it is not easy to give confirmation about which
parts of sage might be broken. Tracebacks seemed to work, because all
the python source code stayed in the remaining directories.

To investigate further possibilities for reduction I also checked
source sizes still available in the binary tree 4):

Total file size of Python source is: 83506845 Bytes
Total file size of lisp source is 14091345 Bytes
Total file size of C source is 5444780 Bytes
Total file size of C++ source is 163105 Bytes
Total file size of C headers is 3779884 Bytes
--------------------
Total size of source code found: 106985959 Bytes

So removing the sourcefiles would gain another 100 MB.
As I understand, ability for traceback at errors would be lost. But
right at the moment I fear that it will break sage completely. Another
aspect: There are lots of comments. An educated guess about the size
of the comments in python code is about 40 MB. This estimation
includes preservation of the original line-numbering, so tracebacks
would yield the right line numbers. If one assumes that c code could
also be shifted out then this would mean a reduction of 50 MB is
possible ( I dont know if it is possible to shift maximas lisp code
out).

Regarding binaries, there would be the possibility to use upx
compression. In the Live CD this is not needed, because files are
already in a squashed FS. But for distributions which use uncompressed
Filesystems this could give further substantial reduction.

There was no prior knowledge of the structure of the sage package. So
it might be possible that the split is not correct and some essential
files are missing in the binaries. There is also the possibility that
many files and directories could still be omitted in the binary
package and shifted to one of the others.

For further work I would be grateful for any input regarding the
following:
Test of the binaries, suggestions how to implement a working "sage -
testall" for similar binaries?
Feedback and input about the quality of the split, which files and
directories were missed, or are wrong now?
Information about the doc-tree. Which files are responsible to make
the ? command in the CLI work?
Test of the abilities for development. How does it behave if the
development package is loaded? Can --strip-unneeded binaries be used
for developement? (otherways it would be possible to fall back to --
strip-debug for libraries).

Summary
-------
A substantial reduction of the size of sage binaries was achieved
using a combined approach of manual splitting, hardlinking double
files and striping executables. The binary package was reduced to a
size of 792 MB compared to a size of over 1900 MB of the original
directory tree. This is a reduction of 60%. Size reduction in the
squashed package was from 438 MB to 222 MB (-49 %). "sage -testall"
does not work any more in the reduced binary, so there is further
testing needed to confirm the functionality of the created binary
package.

Footnotes:
---------
1)
!#/bin/sh
# build sage binaries for sagelive, be sure that Tcltk is installed
export SAGE_MATPLOTLIB_GUI="yes"
export SAGE_FAT_BINARY="yes"
make
./sage -bdist sagelive-511-4.6.1-r4-fat

comment:
In my opinion it is important, that as many features of Sage
Components are available as possible. There is access to plotting from
R and pylab (TCL backend) in a standard way. It was not possibel to
integrate other matplotlib-backends until now, I would wonder how much
they would add to the total size?

Are there any additional environment variables that should be set to
generate the binaries? The idea is to use the sage Components and
libraries as core of the distribution and to integrate it tightly.
What do other components need (e.g. maxima) to "work out of a box"?

2)
Textfiles with du -ch of the packages are available here:
http://boxen.math.washington.edu/home/emil/sagelithe
The doc and dev package can be loaded as packages into the live
version.
(comming soon ...)

3)
This is the procedure to hardlink multi-file instances and strip
binaries

#!/bin/sh
# script to reduce size of directory tree and binaries, uses the
package fslint (http://www.pixelbeat.org/fslint/)
# be sure to have the scripts of fslint in your path, or edit line 6
so that findup is found.
cd SAGE_ROOT
# replace double files with hardlinks
findup -m .
# strip executables
find . | xargs file | grep "executable" | grep ELF | cut -f 1 -d : |
xargs strip --strip-unneeded 2> /dev/null
# Level 1 stripping for shared libraries (comment/uncomment to switch)
# find . | xargs file | grep "shared object" | grep ELF | cut -f 1 -
d : | xargs strip --strip-debug 2> /dev/null
# Level 2 stripping for shared libraries (comment/uncomment to switch)
find . | xargs file | grep "shared object" | grep ELF | cut -f 1 -d :
| xargs strip --strip-unneeded 2> /dev/null

4)
sage starts up ok in console and in the notebook.
some quick plotting and easy equation solving works in the notebook
without flaws.

sage -sh
R
demo(graphics)
works, produces R demo plottings.

sage -python
from pylab import *
plot ([1,2],[2,1])
show()

produced a plot
(I compiled with TclTk and have this dependency included in sagelive)

built in help (doctstrings) doesn't work in console!, i.e plot ? gives
just a short description and then
Docstring:
< no docstring >

same command in the notebook works well.

5)
just a quick copy paste hack:

#!/bin/sh
# calculates size of source files in directory tree
tsum=0
sum=0
# check python
for k in `find -name *.py -exec ls -l {} \+ | awk '{print $5}'`
do
sum=$((sum+k))
done
echo "Total file size of Python source is: $sum Bytes"
tsum=$((tsum+sum))
sum=0
# check lisp
for k in `find -name *.lisp -exec ls -l {} \+ | awk '{print $5}'`
...SNIP
etc ...

Emil Widmann

unread,
Jan 16, 2011, 6:42:44 AM1/16/11
to sage-devel


On Jan 15, 11:51 pm, Emil Widmann <emil.widm...@gmail.com> wrote:
> Stripping Sage Binaries II
> --------------------------
>
> With hardlinking multible files and stripping executables a size
> reduction of 438 MB (-26%) was achieved. Further reduction involves
> moving directories which breaks sage -testall. The goal is to produce
> a binary package of sage with aedequate functionality and reduced
> size.  So I couldn't resist to push on and accept failing tests to see
> the overall potential.
>
> Preliminary results:
> -----------------------------------------------
> Original Directory tree (~1900 MB)
> sage-binary (775 MB / squashed FS 218 MB)
> That is: -60% / - 88 % compared to original size !!!
> sage-dev (529 MB)
> sage-doc (222 MB)
>
> The resulting binaries seem to work in a superficial test. The
> stripped binaries can be tested as a live iso or in a virtual Machine
> image (also easy frugal install into existing Linux desktop possible).
>
> Download iso image (400 MB, thats Live CD Base distro + stripped
> binaries)http://boxen.math.washington.edu/home/emil/sagelithe/
I always have to laugh when I do, say or write something silly, but at
least it's a good learning experience. But at least now I beginn to
grasp what doctrings are and how they work. I also check the sage -t
command.
I used
sage

So basically tests for the stripped binary works almost. 4 failures
remain:

sage -t "devel/sage/build/sage/misc/preparser.py"
sage -t "devel/sage/build/sage/misc/sagedoc.py"
sage -t "devel/sage/build/sage/misc/sageinspect.py"
sage -t "devel/sagenb/sagenb/misc/sageinspect.py"

From those the failure of sagedoc.py is clear to me. It should be
mended if the Doc package is loaded



Emil Widmann

unread,
Jan 16, 2011, 6:59:44 AM1/16/11
to sage-devel
(Sorry somehow I managed to send incomplete message)

I always have to laugh when I do, say or write something silly, but at
least it's a good learning experience. But at least now I beginn to
grasp what docstrings are and how they work. I also check the sage -t
command.
I used
sage -t "devel/sage/build/sage"
sage -t "devel/sage/build/sagenb"

The tests for the stripped binary works almost. - 4 failures remain:

        sage -t  "devel/sage/build/sage/misc/preparser.py"
        sage -t  "devel/sage/build/sage/misc/sagedoc.py"
        sage -t  "devel/sage/build/sage/misc/sageinspect.py"
        sage -t  "devel/sagenb/sagenb/misc/sageinspect.py"

From those the failure of sagedoc.py is clear to me. It should be
mended if the Doc package is loaded. sageinspect.py from sag and
sagenb directoy seem to have the same error
(TypeError: 'NoneType' object is unsubscriptable) so basically there
are 2 errors left.

parts of the errors (full error log:
http://boxen.math.washington.edu/home/emil/sagelithe/doctest_error.txt)

# sage -t "devel/sage/build/sage/misc/sageinspect.py"
...
File "<doctest __main__.example_0[21]>", line 1, in <module>
sage_getsource(sage.rings.rational.make_rational, True)
[Integer(4):]###line 89:
sage: sage_getsource(sage.rings.rational.make_rational, True)[4:]
TypeError: 'NoneType' object is unsubscriptable
...

sage -t "devel/sage/build/sage/misc/preparser.py"
..
Compiling /root/.sage//temp/puppypc/11503//tmp_2.pyx...
Error compiling cython file:
Error converting /root/.sage//temp/puppypc/11503//tmp_2.pyx to C:
<BLANKLINE>
<BLANKLINE>
Error converting Pyrex file to C:
...

It seems that this error concerns the ability to compile to cython ...
Are there other possible crucial implications of those errors?
What does "introspect" mean?

At least I think just 4 failed tests are no bad for starting,
thanks for any pointers,
emil

Jason Grout

unread,
Jan 17, 2011, 8:17:18 AM1/17/11
to sage-...@googlegroups.com
On 1/15/11 5:51 PM, Emil Widmann wrote:
> Stripping Sage Binaries II
> --------------------------


Is this up on the wiki somewhere, and maybe in a readme on the live CD?
It looks like some great information, and it would be sad if it is
only buried in a post on sage-devel.

Thanks for all of your work on this!

Jason

Emil Widmann

unread,
Jan 17, 2011, 10:17:59 AM1/17/11
to sage-devel
Hi Jason,

right at the moment this is pretty alpha stage, so the info is only
available at
http://sage.math.washington.edu/home/emil/sagelithe
I will also try to integrate it into the sagelive project homepage
with some nicer editing (and less bugs)
http://sage.math.washington.edu(home/emil/doc/html/en

I fear I will not have enough time to really finish this project to
release quality, but maybe I can do something more especially in
making a small vm image. (Anybody has experience with connecting the
VM sage notebook to the host? I had this running 2 weeks ago, but now
it refuses to work)

emil

Nicolas M. Thiery

unread,
Jan 27, 2011, 6:41:09 PM1/27/11
to sage-...@googlegroups.com
On Sat, Jan 08, 2011 at 01:03:36AM -0800, Robert Bradshaw wrote:
> I don't even think we need setuptools--just build extensions inplace
> (there's an option for this) and put devel/sage in the python path
> (this was what I did for enabling concurrent execution of distinct
> branches, though that still needs a bit of work
> http://trac.sagemath.org/sage_trac/ticket/9967). Then one wouldn't
> even need to do a "sage -b" to work on (non-Cython) code.

Whoever implements this will be my hero of the week! PLEASE!!!

I like the other consequences of such a change, but that one would be
such a time saver in my everyday workflow! Beside, it would simplify
the explanations to newcomers, also a good thing.

Cheers,
Nicolas
--
Nicolas M. Thi�ry "Isil" <nth...@users.sf.net>
http://Nicolas.Thiery.name/

Dan Drake

unread,
Jan 27, 2011, 10:41:44 PM1/27/11
to sage-...@googlegroups.com
On Fri, 28 Jan 2011 at 12:41AM +0100, Nicolas M. Thiery wrote:
> On Sat, Jan 08, 2011 at 01:03:36AM -0800, Robert Bradshaw wrote:
> > I don't even think we need setuptools--just build extensions inplace
> > (there's an option for this) and put devel/sage in the python path
> > (this was what I did for enabling concurrent execution of distinct
> > branches, though that still needs a bit of work
> > http://trac.sagemath.org/sage_trac/ticket/9967). Then one wouldn't
> > even need to do a "sage -b" to work on (non-Cython) code.
>
> Whoever implements this will be my hero of the week! PLEASE!!!

I would also really like to get that ticket finished...it would make
things like the patchbot much easier to work with. But beware, the
ticket is complex and I got stuck trying to work out exactly why certain
bits couldn't find other bits and so on. It's pretty tricky...but I
would like to see it working.

Dan

--
--- Dan Drake
----- http://mathsci.kaist.ac.kr/~drake
-------

signature.asc
Reply all
Reply to author
Forward
0 new messages