Sage-3.0 should be coming up soon -- say be the end of March -- so
it's time to decide what this should *mean*.
Mhansen suggests: Macdonald polynomials and friends, k-Schur functions,
LaurentPolynomialRing, and crystals. Those should be feasible by
mid-March for me/us.
And I guess documentation in combinat/ cleaned up and doctests to 100%.
Bill Furnish suggests: Fast symbolics?
Michael Abshoff suggests: MacIntel 64 bit support; > 50% doctest
coverage; Solaris building automaed
What are your thoughts? Just throw it all out there, then we'll come
up with something
very reasonable as a result, and make that the goal.
-- William
--
William Stein
Associate Professor of Mathematics
University of Washington
http://wstein.org
--
name: Martin Albrecht
_pgp: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x8EF0DC99
_www: http://www.informatik.uni-bremen.de/~malb
_jab: martinr...@jabber.ccc.de
I want multivariate factoring and gcd that isn't a joke. Well, I do want
that, but I highly doubt it's happening in 1 month! :)
--
Joel
>
> -- William
On Irc mhansen et al. suggested genuine FLINT integration, i.e, this
should happen:
sage: R.<x> = ZZ[]
sage: type(x)
<type 'sage.rings.polynomial.polynomial_integer_dense_flint.Polynomial_integer_dense_flint'>
> - seriously improved documentation, i.e. hand written chapters for the
> reference manual
> - PolyBoRi first class citizen
For this to happen, we should
- add quotient methods to BooleanPolynomialRing (or whatever we end up calling it after the name change)
- support more ideal methods for BooleanPolynomialIdeals, at least the trivial ones in this case
all this after updating to 0.2 and fixing the outstanding bugs in trac at the moment. I'll try to do at least the former by this weekend.
> - Gröbner bases over ZZ and ZZ_N (possibly slow)
> - libSingular for CC, RR, number fields
- Flint integration
If we can make it,
- Linear algebra over (univariate, for now) polynomial rings
By "interactive graphics", do you mean stuff like the Wolfram
Demonstrations project?
If so, I've posted a short patch that is a first alpha-level stab at
implementing a "manipulate" command. So far several HTML controls are
represented (select menus, text boxes, groups of buttons, and checkboxes).
The patch is up at http://trac.sagemath.org/sage_trac/ticket/1322. It's
the "manipulate.patch" (it's the second patch; ignore the first patch).
There are some examples down at the bottom of the ticket.
Anybody and everybody who wants to apply the patch and give feedback is
more than welcome to do so!
Thanks,
Jason
Maybe I'm missing something here, but do you want to change the allocation
function (only) for integer objects? Does Python's garbage collector support
that, i.e. you allocate some memory with your favorite slab allocator, setup
the python object, use it and once it is not referenced anymore Python's
garbage collector doesn't attempt to free it but calls your
allocator's "free"?
> more verbose cython exceptions(namely dumping information about the c code
> that caused the exception)
Does this mean exceptions Cython throws on compiler errors? If so, then it
doesn't know the C code. Does it mean Exceptions that happen during execution
like ZeroDivisionError and such, then showing the C code seems wrong to me.
Martin
So you want to replace the integer_pool by pool maintained by the slab
allocator but don't want to replace the 'fast integer creation' code
(avoiding going down the inheritance tree, trickery with GMP)? Is that
right?I.e. you want to replace the PyObject_Free, PyObject_MALLOC calls with
calls to the slap allocator?
> As for the cython,
> when an exception occurs it will now print the line number in the c
> file as well as as the line number in the pyx file, which is very
> useful when one statement in python ends up compiling to a whole slew
> of lines. C source code is not displayed.
I am not entirely convinced that this is desired. After all, Cython is
supposed to abstract the C level away. However, it probably won't do much
harm.
Compromise: This should be a command line option to Python that people
(mainly people doing debugging) can turn on, but which is not on by default.
Martin -- what would you think of that? Then Bill Furnish could turn it on
for his personal work on Sage/Cython, but when you use Sage you won't
see the C line numbers.
William
Do you really mean a command line option to _Python_, i.e. change Python? I
guess some global option for Sage should do it, given that we can control
this stuff from the Python (rather than the C) level.
I might use the see-the-C-line-number option for debugging once it is
available from time to time. However, usually if my C level code dies on me
it usually doesn't throw an exception for me first. I'm just concerned to to
clutter up the interface. E.g. when a ZeroDivisonError is thrown, the C line
number is not supposed to be helpful. If it is helpful this is a bug because
the exception is not explained enough.
No! I meant Cython. Sorry about that. I had just woke up and was typing too
quickly.
> I
> guess some global option for Sage should do it, given that we can control
> this stuff from the Python (rather than the C) level.
>
> I might use the see-the-C-line-number option for debugging once it is
> available from time to time. However, usually if my C level code dies on me
> it usually doesn't throw an exception for me first. I'm just concerned to to
> clutter up the interface. E.g. when a ZeroDivisonError is thrown, the C line
> number is not supposed to be helpful. If it is helpful this is a bug because
> the exception is not explained enough.
Yes. I would also never use this feature. But I could see how it
would be useful
to some people...
-- William
> On Thursday 28 February 2008, Bill Furnish wrote:
>> The integer class currently has example code on how to do this (well,
>> it still uses python malloc, but same principle).
>
> So you want to replace the integer_pool by pool maintained by the slab
> allocator but don't want to replace the 'fast integer creation' code
> (avoiding going down the inheritance tree, trickery with GMP)? Is that
> right?I.e. you want to replace the PyObject_Free, PyObject_MALLOC
> calls with
> calls to the slap allocator?
We don't currently have glib as a dependancy, do we? I searched
around and saw a lot of headaches trying to compile it for OS X.
Replacing the integer_pool by a slab allocator would be hard to do as
it stores already-initalized integers (and certainly satisfies KISS).
PyObject_MALLOC is optimized for returning small chunks of memory, so
I don't see a big gain there either.
>> As for the cython,
>> when an exception occurs it will now print the line number in the c
>> file as well as as the line number in the pyx file, which is very
>> useful when one statement in python ends up compiling to a whole slew
>> of lines. C source code is not displayed.
>
> I am not entirely convinced that this is desired. After all, Cython is
> supposed to abstract the C level away. However, it probably won't
> do much
> harm.
I think this would be a good optional flag. Probably would be
something really small (e.g. "(c:2773)") would be sufficient but
helpful.
On the topic of what Sage 3.0 should be, I really like the 50%
doctest coverage goal. FLINT integration is a good one too, as is
interactive graphics (à la manipulate). OS X 64-bit and solaris are
worthy goals too. I have to admit that I was underwhelmed by the 1.x -
> 2.0 transition. Perhaps some of this is due to the fact that Sage
has such a frequent release cycle that most of the "new" stuff is
already there. We could all list little things that we'd like to see,
but I think a good question to ask is if there is any gaping holes in
Sage to making it a competitor to the big M's, and if its feasible to
fill them. (Up until January, 3D graphcis would have fit this, and so
does a native Windows port (though that's not feasible for such a
short timeframe)).
- Robert
I've read through all the messages in this thread a few times.
I propose the following goals for SAGE-3.0:
-------------------------------------------------
1. DOCUMENTATION:
cd SAGE_ROOT/devel/sage/sage; sage -coverage .
should output
Overall weighted coverage score: x% [[where x >= 50]]
Moreover there should be at least one hand written paragraph
at the beginning of each file; "sage -coverage" can be
adapted to flag whether or not this is there. This will improve
the overall quality and maintainability of Sage, and make it
easier for users to find examples.
2. MANIPULATE: Usable "manipulate" functionality standard in
Sage. This has very wide applicability.
3. R: A pexpect interface to R (so, e..g, the notebook can act
as a full R notebook using 100% native R syntax). This will
matter to a lot of Sage users, and make using R from Sage
much easier in some cases (just cut and paste anything from
any R examples and it will work). It will also provide something
in Sage that one doesn't get with Python + rpy + R.
4. TIMING/BENCHMARKING: Fully integrate in Sage wjp's code
that times all doctests, and start publishing the results on
all Sage-supported platforms with every Sage release. This
will give people a much better sense about which hardware
is best for running Sage, and avoid major performance
regressions. Likewise, get the Malb/wjp/my generic
benchmarking code into Sage (this provides a doctest like
plain text format for creating benchmarks, and is already mostly
written).
-------------------------------------------------
I've only proposed goals for "3.0" that are wide reaching and
will be noticed in some way by most users instead of fairly
technical optimizations in a specific area (such as FLINT,
Libsingular, or optimized integer allocation). I think changing
implementations in specific technical areas to speed things up
is more appropriate in week-to-week releases, and is also something
we should be very careful about until we have good speed
regression testing in place (we should have done step 4
above a long long ago).
> I have to admit that I was underwhelmed by the 1.x -
> > 2.0 transition. Perhaps some of this is due to the fact that Sage
> has such a frequent release cycle that most of the "new" stuff is
> already there.
If we're doing our job right then the core Sage developers
(e.g., like you) should barely notice the transition from
a.x -> (a+1).x. When transitions are huge and noticeable,
they are also painful and disruptive. For example, any
patch that is huge and noticeable is likely also to be
hugely painful.
> OS X 64-bit and solaris are worthy goals too.
I don't know whether either of these is genuinely doable within a month.
I personally think OS X 64-bit will be very hard (I hope I am wrong), and
having worked on that damned Solaris port for over 2 years and never
seen it get finished at any point, I'm very dubious that it will be done in
the next month. The difficulty of porting to Solaris is hard to understand
if you haven't tried it. It will get done this year though, since Michael is
damned good. both of these are Sage-4.0 material.
> We could all list little things that we'd like to see,
> but I think a good question to ask is if there is any gaping holes in
> Sage to making it a competitor to the big M's, and if its feasible to
> fill them. (Up until January, 3D graphcis would have fit this, and so
> does a native Windows port (though that's not feasible for such a
> short timeframe)).
Regarding native Sage on Windows, this is maybe sage-4.0, if we're lucky, but
probably not even that. This is something Microsoft research really wants,
and for them it makes a huge amount of sense. But at least Sage does run
on Windows right now; the vmware image has continued to be the top Sage
binary download for a long long time. E.g., since Sunday morning on
sagemath.org:
root@modular:/var/log/apache2# ./howmany access.log
Linux Binary
14
OS X Binary
11
Source
21
VMware
29
(this is only one of the 8 mirror sites, but it's the default)
William
> If the integer code was simple it would be easy to track down the
> double free in 1337. The integer code is in fact so complex that
> there was an incorrectly defined structure from GMP that no one
> noticed.
Excellent work tracking that one down! The simple part I was
referring to is the pool--the non-pool part is certainly not simple
at all.
> I'd actually like to replace both the integer and the gmp
> code. There is also no convincing evidence that a memcpy is needed,
> nor is there evidence that storing the initialized gmp code
> significantly helps. In fact, the expensive part of init is the
> alloc, which calls malloc. However the integer code calls this anyway
> This is worrying about a 3 instruction, easily pipelined, non-virtual
> function call when there is a giant optimization to be made in gmp's
> memory allocation unit that would increase the speed of gmp internals
> as well as significantly simplify the integer code.
The memcopy was to replace going all the way up the inheritance tree
(a dozen virtual function calls) to set some pointers, and skipping
setting self._parent to None, do some increfs/decrefs, then set it to
ZZ. This is also very nice because it lets us not worry about the
specifics of the underlying struct at all. The fact that an
initialized gmp struct was sitting there is just a bonus.
The pool allows us to reuse already initialized integers for free, no
allocation necessary (except when recycling a large integer we
realloc to 1 limb). This is by far the majority of the time, as
Integers are often ephemeral objects.
So, to clarify, you're main suggestion is to replace gmp's memory
allocation with a faster allocator? This would, I think, be great as
this is the main overhead associated with working with (small)
mpz_t's. I have wondered if replacing it with Python's heap allocator
would work (which is optimized for returning small chunks of memory).
- Robert
IIRC we timed every little change we made when David, Robert and I wrote the
fast integer creation. Though we don't give any evidence that all the spared
function calls speed things up, we were at least convinced by the time of
writing.
> The memcopy was to replace going all the way up the inheritance tree
> (a dozen virtual function calls) to set some pointers, and skipping
> setting self._parent to None, do some increfs/decrefs, then set it to
> ZZ. This is also very nice because it lets us not worry about the
> specifics of the underlying struct at all. The fact that an
> initialized gmp struct was sitting there is just a bonus.
>
> The pool allows us to reuse already initialized integers for free, no
> allocation necessary (except when recycling a large integer we
> realloc to 1 limb). This is by far the majority of the time, as
> Integers are often ephemeral objects.
>
> So, to clarify, you're main suggestion is to replace gmp's memory
> allocation with a faster allocator? This would, I think, be great as
> this is the main overhead associated with working with (small)
> mpz_t's. I have wondered if replacing it with Python's heap allocator
> would work (which is optimized for returning small chunks of memory).
We have experimented with GMP's memory functions before, see:
http://wiki.sagemath.org/MallocReplacements
These timing were before the fast integer creation code existed (Sage 1.4,
ancient times!) and are for very small integers only. We use Pymem_malloc now
(see rings.memory) which does not necessary call malloc.
Actually,
PyMem_Malloc:
sage: x = 3; y = 5 # sage_malloc
sage: %timeit _ = x + y
1000000 loops, best of 3: 213 ns per loop
Malloc:
sage: %timeit _ = x + y # malloc
1000000 loops, best of 3: 213 ns per loop
OMalloc:
sage: x = 3; y = 5 # omalloc
sage: %timeit _ = x + y
10000000 loops, best of 3: 195 ns per loop
sage_malloc is mapped to malloc. I would like to see some timings
actually using PyMem_Malloc (especially now that allocation dominates
the creation time)
- Robert
Regarding this, there is a functional pexpect interface to R at
http://sagetrac.org/sage_trac/ticket/839 . It needs some further
polishing for things tab completion, view source, handle plotting
nicely, etc. It would be good if someone who was very comfortable
using R could play around with it and find where it's lacking.
--Mike
<SNIP>
> > OS X 64-bit and solaris are worthy goals too.
>
> I don't know whether either of these is genuinely doable within a month.
> I personally think OS X 64-bit will be very hard (I hope I am wrong), and
> having worked on that damned Solaris port for over 2 years and never
> seen it get finished at any point, I'm very dubious that it will be done in
> the next month. The difficulty of porting to Solaris is hard to understand
> if you haven't tried it. It will get done this year though, since Michael is
> damned good. both of these are Sage-4.0 material.
We are much closer to a Solaris port fully working since various
patches related to new matrix classes caused a number of issues that
resulted in segfaults. Build support wise we are also in excellent
shape, the vast majority of fixes has been merged in the code.
OSX 64 bit has one large known build issue: numpy (and potentially
scipy & matplotlib) which I hope to tackle tomorrow. There are several
smaller issues where ugly workarounds exist (python) or we can stick
to 32 bit for now (clisp), so I think that is also doable. It still
leaves us with a libSingular related crash.
Since I do not have anything else to do in March I hope that I can do
my own little three week coding sprint and get the above two issues
resolved. Time will tell.
Cheers,
Michael
I don't think that these tiny speedups are worth the hassle at all. About
sage_malloc != PyMem_Malloc: Sorry my bad, I'll post a new benchmark
soon-ish.
Martin
--
name: Martin Albrecht
_pgp: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x8EF0DC99
_www: http://www.informatik.uni-bremen.de/~malb
_jab: martinr...@jabber.ccc.de