Sage's references: new policy?

194 views
Skip to first unread message

John H Palmieri

unread,
Sep 20, 2016, 7:03:27 PM9/20/16
to sage-devel
As discussed in another thread [1]_ on sage-devel recently, I propose changing our policy toward references:

- all references should be put into a master bibliography file, and
- all references should be, insofar as possible, in a standard form: for a work by a single author "Author" published in YEAR: [AutYEAR]. For a work published by "Author" and "Coauthor" in YEAR: [ACYEAR]. The year should be four digits.

The main point is the first item is to avoid conflicting cross-references, and it also seems to make sense to list all references in one place. (The goal behind the second item is just consistency.)

This is implemented at https://trac.sagemath.org/ticket/21454.

Any comments?

--
John

REFERENCES:

.. [1] https://groups.google.com/d/msg/sage-devel/-_kszKLhICw/SjLMs4rXCAAJ

David Coudert

unread,
Sep 21, 2016, 3:41:07 AM9/21/16
to sage-devel
What if we have two papers by "Author" and "Coauthor" in 2016?
How to distinguish between a paper by say "R. Thomas" in 2000 and another by "C. Thomassen" in 2000 ?

Martin R

unread,
Sep 21, 2016, 4:38:59 AM9/21/16
to sage-devel
Why not use the MR number as reference format?

Martin

David Roe

unread,
Sep 21, 2016, 4:46:13 AM9/21/16
to sage-devel
Preprints won't have MR numbers.  I also find MR numbers less readable.

We could just append letters ("a" then "b," etc) if there are collisions.
David

--
You received this message because you are subscribed to the Google Groups "sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+unsubscribe@googlegroups.com.
To post to this group, send email to sage-...@googlegroups.com.
Visit this group at https://groups.google.com/group/sage-devel.
For more options, visit https://groups.google.com/d/optout.

Dima Pasechnik

unread,
Sep 21, 2016, 5:10:00 AM9/21/16
to sage-devel


On Wednesday, September 21, 2016 at 8:46:13 AM UTC, David Roe wrote:
Preprints won't have MR numbers.  I also find MR numbers less readable.
and not all the CS-related publications make it into MR database, either.
 

We could just append letters ("a" then "b," etc) if there are collisions.

I wonder whether it is possible to create aliases for references, i.e. make [Bla]_ and [Foo]_ both refer to [Foo].
This would allow less changes in the source.


 
David

On Wed, Sep 21, 2016 at 4:38 AM, 'Martin R' via sage-devel <sage-...@googlegroups.com> wrote:
Why not use the MR number as reference format?

Martin


Am Mittwoch, 21. September 2016 01:03:27 UTC+2 schrieb John H Palmieri:
As discussed in another thread [1]_ on sage-devel recently, I propose changing our policy toward references:

- all references should be put into a master bibliography file, and
- all references should be, insofar as possible, in a standard form: for a work by a single author "Author" published in YEAR: [AutYEAR]. For a work published by "Author" and "Coauthor" in YEAR: [ACYEAR]. The year should be four digits.

The main point is the first item is to avoid conflicting cross-references, and it also seems to make sense to list all references in one place. (The goal behind the second item is just consistency.)

This is implemented at https://trac.sagemath.org/ticket/21454.

Any comments?

--
John

REFERENCES:

.. [1] https://groups.google.com/d/msg/sage-devel/-_kszKLhICw/SjLMs4rXCAAJ

--
You received this message because you are subscribed to the Google Groups "sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+...@googlegroups.com.

Martin R

unread,
Sep 21, 2016, 5:36:06 AM9/21/16
to sage-devel
well, for preprints clearly there is of course the arXiv number and for sciences without a good database, there is doi.

concerning readability, there is a well known justification for using sequential numbers

I'm not making this up, I used this to organise the references for www.findstat.org, and I'm very happy with the result.

Martin

Dima Pasechnik

unread,
Sep 21, 2016, 7:36:31 AM9/21/16
to sage-devel


On Wednesday, September 21, 2016 at 9:36:06 AM UTC, Martin R wrote:
well, for preprints clearly there is of course the arXiv number and for sciences without a good database, there is doi.

concerning readability, there is a well known justification for using sequential numbers

we talk about readability of the source code, too.
IMHO one should not name variables and functions just using sequential numbers :-)

Having said this, I again would argue for an option to have aliases.

E.g. say there is a popular Arxiv preprint cited 10 times in the source, which then becomes
a publication. It is really unnecessary to change all these 10 citations?

Johan S. H. Rosenkilde

unread,
Sep 21, 2016, 7:49:57 AM9/21/16
to sage-...@googlegroups.com
With MR numbers, do you mean a link of the type [MR3352496]?

> well, for preprints clearly there is of course the arXiv number and for
> sciences without a good database, there is doi.

Neither arXiv nor DOI completely catalogues all publications. I don't
know how many such cases appear in Sage's bibliography of course.

> concerning readability, there is a well known justification for using
> sequential numbers

Can you clarify? How would sequential numbers work? The documentation of
Sage is never read in sequence but more like random access.

A reference like [Tho2000] is to me much more recognisable than
[MR1794692]. Having two or three of the latter kind of references in a
text, it takes brain-effort simply to distinguish if they are different
or not.

In articles and books, [Tho2000] is a much more popular format, and I
guess for exactly this reason. Of course in such publication sizes the
scalability problems don't show well, which could be the case for
Sage.

I just don't think so: in the current master bibliography, there's 1130
references. There's *2* collisions with the current naming scheme
(broken by appending 'a', 'b', etc.)!

> I'm not making this up, I used this to organise the references for
> www.findstat.org, and I'm very happy with the result.

Can you elaborate? When I look at e.g.

http://www.findstat.org/GelfandTsetlinPatterns?action=diff&rev2=66&rev1=65

then the references are [KTT04], [Lo04] and [Sta99].

Best,
Johan
--

Johan S. H. Rosenkilde

unread,
Sep 21, 2016, 7:51:28 AM9/21/16
to sage-...@googlegroups.com
> Having said this, I again would argue for an option to have aliases.
>
> E.g. say there is a popular Arxiv preprint cited 10 times in the source,
> which then becomes
> a publication. It is really unnecessary to change all these 10 citations?

That's a good point. But does Sphinx support such aliasing out of the
box, or would we have to patch it on ourselves? If the latter is the
case, perhaps it's not *that* important?

Best,
Johan

Martin R

unread,
Sep 21, 2016, 7:52:57 AM9/21/16
to sage-devel
Am Mittwoch, 21. September 2016 13:36:31 UTC+2 schrieb Dima Pasechnik:


On Wednesday, September 21, 2016 at 9:36:06 AM UTC, Martin R wrote:
well, for preprints clearly there is of course the arXiv number and for sciences without a good database, there is doi.

concerning readability, there is a well known justification for using sequential numbers

we talk about readability of the source code, too.
IMHO one should not name variables and functions just using sequential numbers :-)

Hm, I'd say that reference identifiers in docstrings and variable names are a bit different.  The argument in favour of using numeric references is that it encourages writing: "In 1783, Xin and Müller [1] have shown foo" instead of "In [XiMü1783] foo is shown".
 
Having said this, I again would argue for an option to have aliases.

E.g. say there is a popular Arxiv preprint cited 10 times in the source, which then becomes
a publication. It is really unnecessary to change all these 10 citations?

I have implemented the following for findstat: as long as title and authors coincide (using a threshold for Levenstein distance to get rid of some noise such as accents and punctuation, etc.) the two entries are merged, with preference (in the bibtex file) given to the MR entry.  (I use MathSciNet, zbMath, arXiv and DOI for citations).

Martin

Dima Pasechnik

unread,
Sep 21, 2016, 9:24:55 AM9/21/16
to sage-devel


On Wednesday, September 21, 2016 at 11:52:57 AM UTC, Martin R wrote:
Am Mittwoch, 21. September 2016 13:36:31 UTC+2 schrieb Dima Pasechnik:


On Wednesday, September 21, 2016 at 9:36:06 AM UTC, Martin R wrote:
well, for preprints clearly there is of course the arXiv number and for sciences without a good database, there is doi.

concerning readability, there is a well known justification for using sequential numbers

we talk about readability of the source code, too.
IMHO one should not name variables and functions just using sequential numbers :-)

Hm, I'd say that reference identifiers in docstrings and variable names are a bit different.  The argument in favour of using numeric references is that it encourages writing: "In 1783, Xin and Müller [1] have shown foo" instead of "In [XiMü1783] foo is shown".

No, we are talking about the source, not the output. You will not want to write \cite{ref4242} in your LaTeX, you will rather use
\cite{sqlifemean}. Similarly in Python code or in rst files.
And also writing Xin and Müller \cite{sqlifemean} 10 times is bad style, I think...

Martin R

unread,
Sep 21, 2016, 9:39:22 AM9/21/16
to sage-devel
Am Mittwoch, 21. September 2016 13:49:57 UTC+2 schrieb Johan S. R. Nielsen:
With MR numbers, do you mean a link of the type [MR3352496]?
 
Yes!  (Except, that in a compiled document such links could be transformed into a link such as [1].
 
> well, for preprints clearly there is of course the arXiv number and for
> sciences without a good database, there is doi.

Neither arXiv nor DOI completely catalogues all publications. I don't
know how many such cases appear in Sage's bibliography of course. 

Exactly, that's why findstat uses all of them (and more), but identifies references with the same (up to some noise) title and authors. 
 
> concerning readability, there is a well known justification for using
> sequential numbers

Can you clarify? How would sequential numbers work? The documentation of
Sage is never read in sequence but more like random access.

What I meant is the common referencing scheme in many mathematics publications.  Of course, if you read something like

    See [1] and [2,4] for background and recent developments.

that's terrible.  However, if the author cares at all, (s)he would have written:

    See the textbook by Foo [1] and the recent papers by Harry and John [2,4] for background and recent developments.

A reference like [Tho2000] is to me much more recognisable than
[MR1794692]. Having two or three of the latter kind of references in a
text, it takes brain-effort simply to distinguish if they are different
or not.

That is very true. 

In articles and books, [Tho2000] is a much more popular format, and I
guess for exactly this reason. Of course in such publication sizes the
scalability problems don't show well, which could be the case for
Sage.

I just don't think so: in the current master bibliography, there's 1130
references. There's *2* collisions with the current naming scheme
(broken by appending 'a', 'b', etc.)!

OK. 

> I'm not making this up, I used this to organise the references for
> www.findstat.org, and I'm very happy with the result.

Can you elaborate? When I look at e.g.

http://www.findstat.org/GelfandTsetlinPatterns?action=diff&rev2=66&rev1=65

then the references are [KTT04], [Lo04] and [Sta99].

Yes, things are more complicated than advertised.


* Alice submits a statistic using http://www.findstat.org/StatisticsDatabase/NewStatistic, and in the reference field, she is (strongly) encouraged to type something like

    [[MathSciNet:3338726]] [[arXiv:1304.4309]]

* FindStat retrieves bibtex from the mathscinet and arxiv catalogues and checks that they indeed refer to the same paper (by definition: authors and title must match).  The identifier from the most preferred catalogue is then used as bibtex key in our bibtex file.  If somebody references this paper again, the bibtex file will not be modified.

Within a statistic entry, the references are simply numbered [1], [2], [3], ...  Currently, there is no automatic linking between numbers used in the description field of the statistic and the list of references.  In this case, I think it would be overkill.

* in case Alice did not use an identifier but rather free text, FindStat sends this to http://www.ams.org/mathscinet-mref and tries to obtain an MR number.  Failing that, the reference is left as Alice types it.

* On the webpage, we show then author, title, and identifiers.

I would have suggested to write "As shown by Chern et al. [[MathSciNet:3338726]] one can do this at that" because it works for me (I also use this in my LaTeX files, using reftex), but I also admit that I do not care enough to advertise it more :-)

The main advantage is that one then has automatically unique identifiers, which can (but need not) be transformed easily into whatever you want for the end user (eg., the one reading docstrings).

Martin

Travis Scrimshaw

unread,
Sep 21, 2016, 10:30:18 AM9/21/16
to sage-devel
From working on stuff that involves 100+ references, even having [1] causes problems. Then you also have essentially random numbers that can change on every new version of Sage. Also, I feel doing stuff like "Foo in [1]" can be overly verbose to redundant at times. So I am strongly for references in the [AC2016] format.

Best,
Travis

John H Palmieri

unread,
Sep 21, 2016, 11:46:30 AM9/21/16
to sage-devel
There may be two issues here.

- How should references be written in source code?
- How should references appear in documentation output?

The default behavior in Sphinx is to use the source code citation name also in the output. I don't know how hard it would be to change that.

We can have discussions about the best way to format references purely in the documentation output, and I think it is clear that we will not come to universal agreement. More importantly, any discussion strictly about the documentation output (for example, using [1], [2], ... -- no one is suggesting that this is how the references should be named in the source code, right?) is orthogonal to the issue at hand: anyone can work on modifying Sphinx so it formats the references in another way independently of the format in the source code. Feel free to do that and propose such a change here. For now, the discussion should be on how to format code in the source (= the format in the output for now, because that is Sphinx's behavior).

So we can discuss the best way to format references in the source code. To some extent, of course, this is bikeshedding. Whether we use [AC2016] or [MR234898349] or [doi:...] or something else, there will always be arguments for doing one of the others. I personally find [Mil1958] in a discussion of the Steenrod algebra to convey information: I know that it refers to Milnor's 1958 paper. I would not recognize the MR number or the doi number for this. So I personally find the format [AC2016] a good balance between readability, brevity, and (to a large extent) unique representation. (Also, my suggested usage would be to often include more information than just the citation name: "In [Mil1958], Milnor showed ..." or "Milnor showed that ... -- see [Mil1958]" or something similar. Again, there is a balance between readability and verbosity.)

--
John

Thierry

unread,
Sep 21, 2016, 12:35:30 PM9/21/16
to sage-...@googlegroups.com
Hi,

bikeshedding for bikeshedding:

- if we decide to centralize everything in a single file (but we should be
aware that a backward move (e.g. for modularization) will require some
work), why not using bibtex (there must be some sphinx interface
somewhere), to that we keep all information with proper fields (might
also be good for pdf rendering) ?

- regarding the citation link, explicit is better than implicit, avoids
collisions, and is not that verbose: [Milnor1958], [AuthorCoauthor2016], ...

My two cents,
Thierry

VulK

unread,
Sep 21, 2016, 12:39:28 PM9/21/16
to sage-...@googlegroups.com
* Thierry <sage-goo...@lma.metelu.net> [2016-09-21 18:35:25]:

>Hi,
>
>bikeshedding for bikeshedding:
>
>- if we decide to centralize everything in a single file (but we should be
> aware that a backward move (e.g. for modularization) will require some
> work), why not using bibtex (there must be some sphinx interface
> somewhere), to that we keep all information with proper fields (might
> also be good for pdf rendering) ?

Et Voilà:
https://sphinxcontrib-bibtex.readthedocs.io/en/latest/

Travis Scrimshaw

unread,
Sep 21, 2016, 5:05:33 PM9/21/16
to sage-devel


On Wednesday, September 21, 2016 at 11:35:30 AM UTC-5, Thierry (sage-googlesucks@xxx) wrote:
Hi,

bikeshedding for bikeshedding:

- if we decide to centralize everything in a single file (but we should be
  aware that a backward move (e.g. for modularization) will require some
  work), why not using bibtex (there must be some sphinx interface
  somewhere), to that we keep all information with proper fields (might
  also be good for pdf rendering) ?

- regarding the citation link, explicit is better than implicit, avoids
  collisions, and is not that verbose: [Milnor1958], [AuthorCoauthor2016], ...

 At what point do you stop (i.e., how many authors or characters?), and what do you switch to? What about those really long names?

Generally speaking, [AC2016] will very likely not have any collisions and it saves on overall space, which is part of the reasons why we have citations to references (it's a fallacy, but why not just put the full citation in every place, be super explicit?).

Best,
Travis

John H Palmieri

unread,
Sep 21, 2016, 6:00:27 PM9/21/16
to sage-devel


On Wednesday, September 21, 2016 at 9:39:28 AM UTC-7, Salvatore Stella wrote:
* Thierry <sage-goo...@lma.metelu.net> [2016-09-21 18:35:25]:

>Hi,
>
>bikeshedding for bikeshedding:
>
>- if we decide to centralize everything in a single file (but we should be
>  aware that a backward move (e.g. for modularization) will require some
>  work), why not using bibtex (there must be some sphinx interface
>  somewhere), to that we keep all information with proper fields (might
>  also be good for pdf rendering) ?

Et Voilà:
https://sphinxcontrib-bibtex.readthedocs.io/en/latest/


That looks interesting. If/when we switch to a single bibliography file, then we could later switch to using this interface. Converting a single ReST bibliography file to a bibtex file would be painful but not that hard, and then changing all references throughout Sage from [ABC1999]_ to :cite:`ABC1999` could be done by a script.

--
John

Nils Bruin

unread,
Sep 22, 2016, 12:32:11 AM9/22/16
to sage-devel
On Tuesday, September 20, 2016 at 4:03:27 PM UTC-7, John H Palmieri wrote:
As discussed in another thread [1]_ on sage-devel recently, I propose changing our policy toward references:

- all references should be put into a master bibliography file

There is one significant drawback to this: it will mean that a lot of ticket branches will be modifying this file, so merge conflicts between tickets may become more prevalent. If we can do something to ensure that resolving these merge conflicts is likely to fall within what standard merge strategies can do automatically we should probably do that.

Eric Gourgoulhon

unread,
Sep 22, 2016, 4:55:34 AM9/22/16
to sage-devel
Hi John,


Le mercredi 21 septembre 2016 01:03:27 UTC+2, John H Palmieri a écrit :
As discussed in another thread [1]_ on sage-devel recently, I propose changing our policy toward references:

- all references should be put into a master bibliography file, and
 
What about the pdf documentation? At present, the reference manual at http://doc.sagemath.org/ is split in many separate pdf files and each of them has its bibliographic references at the end. Will the change to a single master file preserve this?

Best regards,

Eric.

Ralf Stephan

unread,
Sep 22, 2016, 8:48:15 AM9/22/16
to sage-devel
On Wednesday, September 21, 2016 at 1:03:27 AM UTC+2, John H Palmieri wrote:
- all references should be, insofar as possible, in a standard form

There is one standard form for everything, even old papers, the Google Scholar cluster link.
Guess which one will survive longer: Google or all the other schemes?


 

Dima Pasechnik

unread,
Sep 23, 2016, 5:15:30 AM9/23/16
to sage-devel
I agree - however, perhaps it's better to think of using several bibtex files (which is perfectly possible in LaTeX); e.g. one for number theory, one for group theory, one for the commutative algebra, etc.

 

Johan S. H. Rosenkilde

unread,
Sep 23, 2016, 5:47:25 AM9/23/16
to sage-...@googlegroups.com
>>> As discussed in another thread [1]_ on sage-devel recently, I propose
>>> changing our policy toward references:
>>>
>>> - all references should be put into a master bibliography file
>>>
>>
>> There is one significant drawback to this: it will mean that a lot of
>> ticket branches will be modifying this file, so merge conflicts between
>> tickets may become more prevalent. If we can do something to ensure that
>> resolving these merge conflicts is likely to fall within what standard
>> merge strategies can do automatically we should probably do that.

Will it really be that bad? The proposed master bibliography is sorted
alphabetically by first author, so conflicts should only occur when two
tickets insert/modify citations right next to each other (or with
perhaps 1 citation between them). With >1000 references right now,
that's not going to cause too many extra conflicts, I think:

If there's 1000 "equidistant" references and a single release has
tickets that create 10 new random references (this is high, I think),
then there's roughly 20% chance that 1 pair of these references will be
within 1 of each other in the existing reference list.

Dima Pasechnik writes:
> I agree - however, perhaps it's better to think of using several bibtex
> files (which is perfectly possible in LaTeX); e.g. one for number theory,
> one for group theory, one for the commutative algebra, etc.

Hmm, that seems complicated: Wouldn't many references naturally fall
into multiple such categories, so every time you want to add a reference
you would have to grep for it across all files.

Best,
Johan

Thierry

unread,
Sep 24, 2016, 11:15:48 AM9/24/16
to sage-...@googlegroups.com
Hi,

On Thu, Sep 22, 2016 at 05:48:15AM -0700, Ralf Stephan wrote:
> On Wednesday, September 21, 2016 at 1:03:27 AM UTC+2, John H Palmieri wrote:
> >
> > - all references should be, insofar as possible, in a standard form
> >
>
> There is one standard form for everything, even old papers, the Google
> Scholar cluster link.
> Guess which one will survive longer: Google or all the other schemes?

By far the other schemes, see e.g.
https://en.wikipedia.org/wiki/List_of_Google_products#Discontinued_products_and_services

We already suffered from the end of google-id support, by chance we didn't
rely on google-code, let us just hope that google-group will not end soon,
and let us not depend on yet-another closed stuff, which is not even a
bibliographical format (unless i missed something from the links).

Ciao,
Thierry
Reply all
Reply to author
Forward
0 new messages