On Mon, Nov 26, 2012 at 11:48 PM, Burcin Erocal <
bur...@erocal.org> wrote:
> On Tue, 27 Nov 2012 01:53:31 +0800
> P Purkayastha <
ppu...@gmail.com> wrote:
>
> <snip Robert's post>
>> I think the Sage community could quickly expand and there could be
>> tens, if not hundreds, of git development branches once the switch to
>> git occurs. It would be quite hard to keep track of all the different
>> branches and the individual modifications that people have in their
>> forks. I looked up scipy right now, and that itself has over 500
>> watchers and 200 forks. The situation is the same for matplotlib, and
>> almost the same for mathjax. It would be nice to see how those
>> communities cope with such a huge number of forks and development
>> branches.
>
> Note that most of the research code we are talking about is either a
> single .sage file or a bunch of .py files in a directory, totaling
> at most 1500 - 2000 lines of code.
Yes, that's one end of the continuum, and there's a lot of that.
> Here is a good example:
>
>
http://math.bu.edu/people/rpollack/OMS/OMS.zip
>
> from
http://math.bu.edu/people/rpollack/
>
>
> I cannot imagine research mathematicians wrangling forks of the Sage
> library (IIRC, ~ 500k lines of code) just to get a small piece working.
> In most cases, these forks will contain old versions, untouched since
> the paper was published, so even merging with the latest Sage release
> will be nontrivial.
But what if one could magically sync to the version they were using +
their code. Rebasing it to the present might still be work, but the
more code gets used (even in the context of an old sage) the greater
the chances it will get improved and eventually merged in by someone.
> Especially if the research code required changes in
> some core Sage library class (add a function to number fields say), only
> a few people really familiar with Sage (and the DVCS system in use) can
> handle the merge.
>
>
> This is all to say that git forks are not going to solve the problem
> Robert brought up.
Yep.
>> What I describe below is one way I think we could have access to the
>> many individual patches and "alpha-quality" code people might have.
>
> Here is another way, which is not at all new. :) Do what William does
> with psage:
>
> If your code exceeds the "single .sage/.py file" threshold, it is
> fairly easy to create a Python package out of it. With the myriad of
> Python packaging solutions (easy_install, pip, etc.), installing such a
> package given a URL is also trivial. So just publish the URL on your
> home page, announce it to your colleagues and you're good to go.
>
> There are several problems with this approach, which I mention below.
>
>> To encourage people to contribute back high-quality version of their
>> research projects into Sage, one thing that could be done is to
>> enable a wikipage where the developers can mention or list their
>> current project/unpolished code. The hope is that such a model will
>> help the person get feedback for his/her code and the person can get
>> encouraged to eventually submit it to trac and include it with Sage.
>> It often happens with me that I get a bit more motivation to
>> finish/polish my work once someone asks me for it - the feedback
>> helps me know that the code might be useful for someone else too! I
>> wonder if other people here have faced similar situations.
>
> The wiki page is a good idea, but it would be filled with stale
> information quickly if it is not supported by some infrastructure to
> keep it up to date.
For sure, this would have to be automated (or at least freshness
scores assigned).
> You're right that getting feedback is a great encouragement to polish
> the code (which is a big advantage of the combinat model), but I don't
> see why it also encourages people to submit it to Sage. The review
> process can be quite painful after all (see #9016). In many cases, from
> a professional/career perspective, this is even bad for you:
>
> * once your code is readily available in Sage, people assume it's
> standard functionality and stop citing/giving credit to the
> implementation/paper
Very interesting point.
> * the time spent to polish the code is wasted according to academic
> assessment criteria, which usually only counts publications and
> citations
Totally. Even if you get some credit for producing the code, polishing
it is way down on the list of being formally accredited and
appreciated (despite the appreciation of colleagues).
It'd be nice to revisit this.
> Going back to the problems with individual packages:
>
> - code that is not tested regularly against updates in Sage bitrots
>
> This problem can be solved by a continuous integration system (like
> patchbot) that runs the tests against changes in Sage. Depending on
> hardware availability, this can happen with every beta and rc
> releases, or even daily.
As long as the doctests take to run when you're waiting for them, one
surprising revelation of the patchbot is how frequently you can
actually test the entire codebase. Currently, a single computer can
keep up with every patch that's uploaded to trac, and donating cycles
is very easy should we have the need.
> The developer has to commit to fixing the problems revealed by the
> test suite. This would be in their interest, since it guarantees
> that if somebody else ever wants to use the code in question, it
> will (up to amount of test coverage) run on the latest Sage release.
Typically, if someone finds something broken in my code, I am very
motivated to fix it. There's a question abou the other way around:
what if their tests break because I changed the way polynomial rings
print? How easily can I go change their code? Do I have to wait for
them to accept my change? Are things just broken in the meantime
(which is actually really bad, because once things go red, you don't
notice or can't even detect further breaking changes).
There's also the issue cross-package changes. Maybe the code can be
refactored to minimize this (e.g. see the aspect oriented software
development thread) but I still think the coupling is quite strong
between different mathematical components, and fragmenting our library
into several packages/repositories makes development much more painful
(remember when docs and clib were their own spkg). It's possible that
research-y code is more leafy, but adding a method to number fields is
a prime example of something that would be a pain to put in a package
(at least if this became a regular pattern, it's not very scalable).
When, if ever, would this code get reviewed? Would there be a list of
standard (versioned) packages that are "part of" Sage? Would the
quality of a package be based on extrinsic factors (e.g. author
reputations). We do already sweep this under the rug for the upstream
packages we include.
It's also nice to be able to say "Sage x.y.z" rather than "sage +
these packages, but not those packages" for reproducibility,
especially if behavioral changes as well as additional functionality
is involved. (Putting the list of packages under revision control,
with hermetic builds/environments, is one way to solve this issue.)
Also, shipping with "batteries included" is a really nice feature.
Stil, the package model has advantages.
> - some research code goes beyond just a bunch of python files.
>
> For example Simon's p-group cohomology package:
>
>
http://sage.math.washington.edu/home/SimonKing/Cohomology/
>
> It includes MeatAxe, a C-library, and depends on an optional Sage
> package. This provides a challange to Salvus (or wakari:
>
http://continuum.io/blog/introducing-wakari) like models. Unless
> they create a virtual machine for each user to play with, I don't
> see how they can support installing arbitrary code.
>
> Package dependencies for optional/experimental packages is beyond
> the capabilities of the sage packaging system. Adding
> "sage -i <package_name>" commands to spkg-install is a very ugly
> hack.
I think lmona, or something like that, will have to be part of the
model. I'm certainly assuming something more powerful than spkgs...
> If you've read so far, maybe some self-plugging will be tolerated.
> lmonade (
http://www.lmona.de/) is designed to solve these problems,
> with
>
> - a flexible package manager that support overlays and
>
> - a continuous integration system (patchbot in software engineering
> speak) where people can sign up to get their code tested when the
> packages they depend on are updated.
>
> It still lacks many features due to lack of developer time. I plan to
> change that soon (also encouraged by discussions like this one), but
> any help is much appreciated nevertheless.
>
>
> Cheers,
> Burcin
>