Citing used Sage components automatically

68 views
Skip to first unread message

Andrey Novoseltsev

unread,
Jul 27, 2011, 1:08:18 PM7/27/11
to sage-devel
I have just looked over PARI citing discussion and recently I had a
talk with a developer of a software package X who was concerned that
inclusion of X into Sage will mean that people will stop giving credit
to X (and this developer in particular ;-)) Sometime ago there were
suggestions to somehow gather statistics on how many times which
function was called, which in practice does not seem like a great idea
due to performance hits and privacy issues. Figuring out manually
which components are used is somewhat boring and actually quite hard.

But how about this: suppose I have written a function f that does what
I need and I want to properly cite people and systems who made it
possible, but at the same time I am too busy/lazy to do much to
achieve it. However, I can do

sage: uc = UsedComponents("f(75)")

and then

sage: uc
Flint, PARI, Singular
sage: print uc.acknowledgement()
Computations were performed using CAS Sage~\cite{Sage}, interfacing
Flint~\cite{Flint}, PARI~\cite{PARI}, Singular~\cite{Singular}.

which I copy-paste into my paper (or better yet - use SageTeX) and

sage: print uc.BiBTeX()
<BiBTeX entries for keys Flint, PARI, Sage, Singular>

which I include into my bibliography file (or hopefully SageTeX can
somehow take care of).

To get these lists, it seems to me that one can execute the code
"f(75)" in a profiler, collect used functions, and then look for
substrings (either modules or particular functions) from a list that
gives matches of substrings to components. This list has to be
manually maintained, as in general this matching is probably non-
trivial, but authours of particular functions and interfaces can
easily add them, I think. As a bonus such automated citer will include
proper versions for everything.

I don't know how long such lists will be typically, and perhaps it may
be a bit weird to cite 10 papers and 20 software systems, but at the
same time if they were used - why not. As to where stop the list of
components, I think that "is it included in Sage distribution" is a
reasonable compromise, i.e. Linux and gcc don't have to be cited,
Python and Cython probably have to. This also can be made tunable
leaving the final choice to users conscience, which is more or less
the case for regular paper citations.

Thank you,
Andrey

William Stein

unread,
Jul 27, 2011, 1:40:02 PM7/27/11
to sage-...@googlegroups.com, Mike Hansen
On Wed, Jul 27, 2011 at 10:08 AM, Andrey Novoseltsev <novo...@gmail.com> wrote:
> I have just looked over PARI citing discussion and recently I had a
> talk with a developer of a software package X who was concerned that
> inclusion of X into Sage will mean that people will stop giving credit
> to X (and this developer in particular ;-)) Sometime ago there were
> suggestions to somehow gather statistics on how many times which
> function was called, which in practice does not seem like a great idea
> due to performance hits and privacy issues. Figuring out manually
> which components are used is somewhat boring and actually quite hard.
>
> But how about this: suppose I have written a function f that does what
> I need and I want to properly cite people and systems who made it
> possible, but at the same time I am too busy/lazy to do much to
> achieve it. However, I can do
>
> sage: uc = UsedComponents("f(75)")

I believe Mike Hansen implemented something that does the above (using
the profiler) already. However, I don't remember if it is in Sage or
not, or where to find it.

>
> and then
>
...


> I don't know how long such lists will be typically, and perhaps it may
> be a bit weird to cite 10 papers and 20 software systems, but at the
> same time if they were used - why not. As to where stop the list of
> components, I think that "is it included in Sage distribution" is a
> reasonable compromise, i.e. Linux and gcc don't have to be cited,
> Python and Cython probably have to. This also can be made tunable
> leaving the final choice to users conscience, which is more or less
> the case for regular paper citations.

If nothing else, it would be very nice if we had an entry in the
database for each paper listed here:

http://sagemath.org/library-publications.html

showing what systems are used. For example, at would be good if
somebody wrote a webform or something for submitting papers, and part
of what it asked is for a list of systems used (and a command like you
mention above would be suggested by the form).
Then we could make it so after each entry in
http://sagemath.org/library-publications.html there would be a little
list of links for the components used, or a single link to a list of
components for that paper, or maybe just a way of showing "all papers
that use a given component". Then when Karim B. of PARI complains
"you don't cite us", we can respond with a link like:

http://sagemath.org/library-publications.html?system=pari

that shows a nicely formatted list of papers all of which cite PARI.
He can then include a link to this list in his grant proposals, etc.
Many, many components of Sage (including Pari!?) don't have a page
listing publications that used their system, so we would be providing
a useful service to them.

-- William

Jason Grout

unread,
Jul 27, 2011, 1:48:26 PM7/27/11
to sage-...@googlegroups.com
On 7/27/11 10:08 AM, Andrey Novoseltsev wrote:
> To get these lists, it seems to me that one can execute the code
> "f(75)" in a profiler, collect used functions, and then look for
> substrings (either modules or particular functions) from a list that
> gives matches of substrings to components. This list has to be
> manually maintained, as in general this matching is probably non-
> trivial, but authours of particular functions and interfaces can
> easily add them, I think. As a bonus such automated citer will include
> proper versions for everything.

IIRC, someone (Mike Hansen, I believe) wrote something that would track
pexpect interfaces or something to see what software was being used. I
cannot find the command name for his function, though.

Jason


Mike Hansen

unread,
Jul 27, 2011, 1:50:52 PM7/27/11
to sage-...@googlegroups.com
On Wed, Jul 27, 2011 at 10:48 AM, Jason Grout
<jason...@creativetrax.com> wrote:
> IIRC, someone (Mike Hansen, I believe) wrote something that would track
> pexpect interfaces or something to see what software was being used.  I
> cannot find the command name for his function, though.
>

sage: from sage.misc.citation import get_systems

--Mike

William Stein

unread,
Jul 27, 2011, 1:51:09 PM7/27/11
to sage-...@googlegroups.com
On Wed, Jul 27, 2011 at 10:48 AM, Jason Grout
<jason...@creativetrax.com> wrote:

I said *exactly* that in my message too, including not being able to
remember or find the function!

I just asked Mike, and he told me it's sage.misc.citation. And indeed:

sage: import sage.misc.citation
sage: sage.misc.citation.get_systems("expand(sin(2*x))")
['ginac']
sage: sage.misc.citation.get_systems("integrate(expand(sin(2*x)),x)")
['ginac', 'Maxima']
sage: sage.misc.citation.get_systems("SymmetricGroup(3).cardinality()")
['GAP']

This is clearly not just tracking pexpect interfaces, since ginac is
not pexpect.

-- William

Burcin Erocal

unread,
Jul 27, 2011, 1:56:29 PM7/27/11
to sage-...@googlegroups.com, Michael Brickenstein, Niels Ranosch
Hi Andrey,

We already have this, without the bibtex formatting of course:

sage: from sage.misc.citation import get_systems

sage: get_systems("integrate(cos(x^2), x)")
['MPFI', 'ginac', 'GMP', 'Maxima']

There are also two issues on trac about this #3317 and #1422.

I've heard similar complaints from many people, including the Singular
group where I'm currently employed. It's true that Sage can do much
better to provide information on what was used in the background during
a computation, but overall, I think the main problem with citations is
changing the perspective of the users.

For many years, this field was dominated by large, closed computer
algebra systems. Even though some were more transparent than others,
it became common place to say MMA can compute this, Magma can do that.
As pointed out in the Sage & Pari thread recently, we should give
credit to the algorithm and the implementor for a nontrivial
computation.


To improve the capabilities of Sage to provide such information, Niels
Ranosch, a student currently employed in Oberwolfach (and supervised by
Michael Brickenstein), is implementing a new citation module:

https://bitbucket.org/niels_mfo/sage-citation/

We plan to start integrating this in Sage in August. To describe the
motivation and the goals to Niels, I'd started writing this:

http://bitbucket.org/niels_mfo/sage-citation/src/e13a2151d368/citation_description.rst

This blog entry has some design details, subject to change of course:

http://sage-citation.blogspot.com/2011/07/design.html


We would appreciate any comments and suggestions on this citation
system. At this stage, it is just an experiment for us too. :)


There are also plans to make a C-library interface to such a system, so
that libraries used during a computation, for example libSingular or
Gfan, can provide the citation information directly. Unfortunately,
there is no real code for this yet, apart from a header file defining
the proposed interface.


Burcin

Burcin Erocal

unread,
Jul 27, 2011, 2:15:39 PM7/27/11
to sage-...@googlegroups.com
On Wed, 27 Jul 2011 10:40:02 -0700
William Stein <wst...@gmail.com> wrote:

> On Wed, Jul 27, 2011 at 10:08 AM, Andrey Novoseltsev
> <novo...@gmail.com> wrote:
> > I have just looked over PARI citing discussion and recently I had a
> > talk with a developer of a software package X who was concerned that
> > inclusion of X into Sage will mean that people will stop giving
> > credit to X (and this developer in particular ;-)) Sometime ago
> > there were suggestions to somehow gather statistics on how many
> > times which function was called, which in practice does not seem
> > like a great idea due to performance hits and privacy issues.
> > Figuring out manually which components are used is somewhat boring
> > and actually quite hard.
> >
> > But how about this: suppose I have written a function f that does
> > what I need and I want to properly cite people and systems who made
> > it possible, but at the same time I am too busy/lazy to do much to
> > achieve it. However, I can do
> >
> > sage: uc = UsedComponents("f(75)")
>
> I believe Mike Hansen implemented something that does the above (using
> the profiler) already. However, I don't remember if it is in Sage or
> not, or where to find it.

Maybe we should export this function to the top level namespace and put
an entry in the FAQ.

sage: from sage.misc.citation import get_systems

sage: get_systems("integrate(cos(x), x)")
['ginac', 'Maxima']


> If nothing else, it would be very nice if we had an entry in the
> database for each paper listed here:
>
> http://sagemath.org/library-publications.html
>
> showing what systems are used. For example, at would be good if
> somebody wrote a webform or something for submitting papers, and part
> of what it asked is for a list of systems used (and a command like you
> mention above would be suggested by the form).
> Then we could make it so after each entry in
> http://sagemath.org/library-publications.html there would be a little
> list of links for the components used, or a single link to a list of
> components for that paper, or maybe just a way of showing "all papers
> that use a given component". Then when Karim B. of PARI complains
> "you don't cite us", we can respond with a link like:
>
> http://sagemath.org/library-publications.html?system=pari
>
> that shows a nicely formatted list of papers all of which cite PARI.
> He can then include a link to this list in his grant proposals, etc.
> Many, many components of Sage (including Pari!?) don't have a page
> listing publications that used their system, so we would be providing
> a useful service to them.

Great idea! Such statistics would also help convince package authors
that Sage provides exposure for their project as well as a separate
(mathematical) test suite, regular build tests on many platforms and
in many cases bug fixes.

Like Andrey, I heard the FUD that "inclusion of X into Sage will mean
that people will stop giving credit to X." Maybe we should be more
proactive about this issue.


How about changing the credits() function to include a list of
components of Sage with links to their web pages and an example with
the get_systems() function?


Cheers,
Burcin

William Stein

unread,
Jul 27, 2011, 2:46:43 PM7/27/11
to sage-...@googlegroups.com
On Wed, Jul 27, 2011 at 11:15 AM, Burcin Erocal <bur...@erocal.org> wrote:
> Great idea! Such statistics would also help convince package authors
> that Sage provides exposure for their project as well as a separate
> (mathematical) test suite, regular build tests on many platforms and
> in many cases bug fixes.
>
> Like Andrey, I heard the FUD that "inclusion of X into Sage will mean
> that people will stop giving credit to X." Maybe we should be more
> proactive about this issue.

Big +1

> How about changing the credits() function to include a list of
> components of Sage with links to their web pages and an example with
> the get_systems() function?

+1

>
>
> Cheers,
> Burcin
>
> --
> To post to this group, send an email to sage-...@googlegroups.com
> To unsubscribe from this group, send an email to sage-devel+...@googlegroups.com
> For more options, visit this group at http://groups.google.com/group/sage-devel
> URL: http://www.sagemath.org
>

--
William Stein
Professor of Mathematics
University of Washington
http://wstein.org

Niles

unread,
Jul 28, 2011, 7:55:39 AM7/28/11
to sage-devel


On Jul 27, 1:56 pm, Burcin Erocal <bur...@erocal.org> wrote:
> sage: from sage.misc.citation import get_systems
> sage: get_systems("integrate(cos(x^2), x)")
> ['MPFI', 'ginac', 'GMP', 'Maxima']
>

And it's fun! I have just one publication using Sage, and running
get_systems on my main function returns:

['MPFI', 'Singular', 'MPFR', 'ginac', 'GMP', 'Maxima']

I'd *love* to put this data in some web form.

Seeing such a long list (which is not alphabetized) does make me
wonder "how much" each one was used though . . . This might be
measured by number of function calls, or total time for function
calls. Both ways of measuring (or any combination thereof) certainly
have shortcomings, and I think there are some fair reasons not to
include any such measurement at all in a citation list. But seeing
the list does make me *so* curious!

-Niles

p.s. As far as I can see, the ordering of the systems has no
particular meaning -- they're keys in a dictionary.

p.p.s. This "systems" dictionary could easily fall out of date . . .

John Cremona

unread,
Jul 28, 2011, 8:09:02 AM7/28/11
to sage-...@googlegroups.com
Interesting! I ran this on my script which takes elliptic curves as
computed by eclib (outside Sage, that's a C++ program which already
uses gmp, NTL, pari) and got this list:

['PARI', 'mwrank', 'MPFI', 'Singular', 'FLINT', 'MPFR', 'ginac',
'GMP', 'Magma', 'NTL']

This does not include sympow (which is used for E.modular_degree()).
I also have no idea why either Singular or ginac are being used.
Since I do no symbolic algebra it is quite possible that the
appearance of ginac in this list indicates a small bug.

John

Jason Grout

unread,
Jul 28, 2011, 2:35:04 PM7/28/11
to sage-...@googlegroups.com
On 7/28/11 5:09 AM, John Cremona wrote:
> Interesting! I ran this on my script which takes elliptic curves as
> computed by eclib (outside Sage, that's a C++ program which already
> uses gmp, NTL, pari) and got this list:
>
> ['PARI', 'mwrank', 'MPFI', 'Singular', 'FLINT', 'MPFR', 'ginac',
> 'GMP', 'Magma', 'NTL']


I find the two responses here very interesting. From Niles and John's
responses, it sounds like this feature could help users turn into
developers as their curiosity about the algorithms gets the better of
them and they started poking around in the internals of Sage.

Thanks,

Jason

John Cremona

unread,
Jul 28, 2011, 2:55:42 PM7/28/11
to sage-...@googlegroups.com
>
>
> I find the two responses here very interesting.  From Niles and John's
> responses, it sounds like this feature could help users turn into developers
> as their curiosity about the algorithms gets the better of them and they
> started poking around in the internals of Sage.
>

Would it be possible to have a configuration variable to set (default
False) which if True would show that list after *every* command? (I
seem to remember that is possible to get the timing of every command,
but perhaps I'm thinking of Pari).

John

>

Niles

unread,
Jul 29, 2011, 8:50:57 AM7/29/11
to sage-devel


On Jul 28, 2:35 pm, Jason Grout <jason-s...@creativetrax.com> wrote:
> I find the two responses here very interesting.  From Niles and John's
> responses, it sounds like this feature could help users turn into
> developers as their curiosity about the algorithms gets the better of
> them and they started poking around in the internals of Sage.

Indeed, one of the things that first piqued my curiosity was the
message when quitting a Sage session: "Exiting spawned Maxima
process." or "Exiting spawned Gap process." Another approach to
raising user awareness might be to have each component of Sage
"register" its use in a session, even if it doesn't require spawning a
new process. Then the list of used components could be printed when
quitting a session. But can this be done without adding tons of
overhead?

-Niles

Benjamin Jones

unread,
Jul 30, 2011, 11:46:23 AM7/30/11
to sage-...@googlegroups.com
On Fri, Jul 29, 2011 at 7:50 AM, Niles <nil...@gmail.com> wrote:
>
> Indeed, one of the things that first piqued my curiosity was the
> message when quitting a Sage session: "Exiting spawned Maxima
> process." or "Exiting spawned Gap process."  Another approach to
> raising user awareness might be to have each component of Sage
> "register" its use in a session, even if it doesn't require spawning a
> new process.  Then the list of used components could be printed when
> quitting a session.  But can this be done without adding tons of
> overhead?
>
> -Niles
>

+1

Something like "Today's Sage session brought to you by ... "

--
Benjamin Jones

Niels Ranosch

unread,
Aug 12, 2011, 9:55:26 AM8/12/11
to sage-devel
Hi,

I just uploaded a few patches to sage-trac: http://trac.sagemath.org/sage_trac/ticket/3317

The design of this "new" citation system is - in short - the
following:
- "citable_items" holds citation information
- "@cite" is the decorator to mark functions to be using a citable
item
- Collected citations are available at run-time through the object
"citations".
They are available through "sage.citation".

Detailed information is available in the docstrings.
What do you think about it?


You might want to take a look at what we have been experimenting with
the past month: https://bitbucket.org/niels_mfo/sage-citation
I also have a blog about these experiments: http://sage-citation.blogspot.com

Cheers,
Niels Ranosch
Reply all
Reply to author
Forward
0 new messages