Tracking Citations; How?

165 views
Skip to first unread message

Martin Raum

unread,
Sep 5, 2014, 9:07:25 AM9/5/14
to sage-...@googlegroups.com
Dear Sage-developer,

I'm writing to get an impression on the communities opinion on how citation management should be implemented.  As a background, I should say that I have taken it into my head to modernize citation management in Sage.  I personally find this very important, as it signalized respect to projects we wrap.  More objectively, I figure such facilities can be a certain plus when writing the European Sage grant, as many such projects (Pari, Gap, Singular, FLINT, etc.) are developed in Europe.

Current status in Sage
======================

Mike Hansen implemented citation facilities in sage.misc.citations.  This is all we have.

sage: from sage.misc.citation import get_systems
sage: get_systems("integrate(x^2,x,0,1)")
['ginac', 'Maxima']

His implementation uses profiling:
1) run the given code under control of the profiler.
2) parse the list of functions called, extracting the list of modules called.  For example, sage.libs.pari.
3) Match this list against a certain list of projects, given in sage/misc/citation.pyx

Problems with the current implementation
========================================

I'm not trying to put Mike's code down. Actually, I'm really glade he implemented what we currently have. I'm just saying where we can improve

1) Use of profiling implies that the code runs much slower.  Tracing citations for a toy computation may result in failure to pick the right ones.
2) For technical reasons, we miss functions written in Cython.
3) The subsystems themselves don't tell the user how to cite them.
4) The user is not being made aware of current functionality.
5) The naming scheme could be improved. The interface is not user friendly.

Two solutions available
=======================

We have three tickets dealing with.  At #3317, there is old code by Niels Ranosch, Michael Brickenstein, Burcin Erocal.  It tries to take a completely different approach.  At #16777 and #16854, I have provided improved versions of the current method.

The issue
=========

Burcin has correctly argued at #16854 that the profiling approach is not capable of tracking decision trees inside a function. I.e., if a function decides according to some parameter to either call Pari or FLINT, we can't see this in the profiling.

On the other hand, #3317 uses decorators, which have to be applied to every function that requires citation management.  Alternatively, one can achieve the same by calling a certain function.  In any event, this means there will be a slight slow down of Sage in general.

Implementation at #3317 is really fast already, but not optimal. If we go for the decorators approach, I would speed it up.

Question
========

So, what does the community think.  Should we prefer the profiling or the decorator approach?  I'm calling for a vote, because I plan to get this into Sage until, say, the end of this year.

Best,
Martin

PS: My personal vote is +1 decorators

Vincent Delecroix

unread,
Sep 6, 2014, 5:50:19 AM9/6/14
to sage-...@googlegroups.com
Hallo Martin,

I agree that it would be nice to have a proper view on what components
of Sage have been used. But for me, anything which would slow down the
code is a big -1 (in particular decorators that modifies functions).
From that point of view, I like the implemented approach of Mike which
is not intrusive (and #16777, #16854 as well).

One cool thing would be to have something like

sage: start_citations()
sage: ## some lengthy computation
sage: write_citation("my_citation.bib")
sage: stop_citations()

and would be appropriately advertised at
http://wiki.sagemath.org/Publications_using_SAGE
(and I think the citation question should be as well on the main
website sagemath.org)

Vincent

Volker Braun

unread,
Sep 6, 2014, 6:54:35 AM9/6/14
to sage-...@googlegroups.com
I'm also uncomfortable with sprinkling explicit citation management calls everywhere. There are many many places that use external shared library code. Apart from the potential slowdown and difficulty in maintaining the code: You never going to get all of them. The source code for calling a shared library via Cython looks just like calling a Python function, and unless you are familiar with the interface it can be hard to tell them apart. 

However, it would be a viable way to deal with anything called through a pipe / pexpect interface, as the call for spawning the subprocess is very localized and easy to identify. And creating a new process is so slow by comparison that the citation accounting would not matter.

First of all, define your goals. Tracking everything that a user could possibly do with Sage or one of its components is rather vague. I would suggest "tracking all calls from Cython to 3rd-party shared libraries", this seems to be where the meat is at. 

On Linux, this could be done trivially with LD_PROFILE which tracks calls into shared libraries through a modified plt. Low overhead, but it would need to be turned on before starting Sage. The situation is much more shitty on OSX where you basically can't do any profiling without root permissions. 

IMHO the cleanest way would be to modify Cython to track C calls. Accounting could then be easily (and dynamically) switched on by flipping a boolean. 
 
A slight variant of that idea would be to modify the sig_on/sig_off macros to (optionally) register from which P/Cython module they were called. Non-trivial calls into shared libraries should be wrapped in them to make them interruptable.

Bill Hart

unread,
Sep 6, 2014, 12:01:33 PM9/6/14
to sage-...@googlegroups.com
I don't have any concrete suggestions, but have some random personal thoughts on the matter.

* I personally don't mind Pari being cited if flint actually performs a computation. What I mean is, I'm not sure how important it is to look all the way down the decision tree to see which package actually got called, but rather to see which packages are in general responsible for the computations being performed (even if they aren't actually used in a particular computation) and to cite them all.

(I'm not saying that I wouldn't personally find it very helpful to be able to easily tell which package is carrying out a given calculation involving a given function for a given input. That would be very useful for other reasons. Just not necessarily citation. In particular I might want to look at their source code to see how they did the calculation so fast/slow/correctly/incorrectly.)

* Perhaps it is easier to notionally divide Sage up into discrete domains and to document which packages are ultimately responsible for those domains. Power series over QQ: Packages X, Y and Z, Calculus: Packages S and T, Plotting: Packages P and Q, Something else: Just Sage itself.

* The hardest thing is meaningfully citing packages for a bunch of fairly low level stuff, e.g. integer or polynomial arithmetic, or some computations over the reals or finite fields or big multifaceted computations which cover a multitude of bits and pieces covering many areas. Even in a single area this could be hard. E.g. in doing some algebraic number theory, you might have used Pari, flint, GMP/MPIR, MPFR, NTL, Linbox, IML, Sage, and so on. At what point do you draw the line? Do you include GCC, Cython, Python, autotools, m4, etc? My personal view is one should cite whichever mathematical packages constituted a critical part of your computation. If you could have used almost anything to do your computation, it probably doesn't warrant a citation. If you used the algebraic number theory in Sage precisely because it has features X, Y and Z which you couldn't have used just about anything for, then it would be useful to figure out why Sage can do X, Y and Z and specially cite any packages that have provided that functionality through Sage.

The unfortunate side effect of doing this is packages like GMP/MPIR won't get widely cited because they are dependencies of other libraries and not often used directly in critical computations. But I'm also not sure how helpful it is for MPIR to get cited along with a dozen other packages for some algebraic number theory computation as opposed to being cited by someone working on a new FFT who compares it against the very fast FFT in MPIR.

* At the end of the day, one of the crucial reasons for citation is not to bring recognition and prestige to the people who wrote the packages, but to aid researchers who are trying to track down prior work in the literature. For example, I might be trying to work out how to compute X, and in reading your paper on X - epsilon I might note that you cite package Y as being critical to your work. I might then look into the code for package Y and see that they have solved part of the problem, and learn something about how they did so. This kind of scientific citation has to be balanced against the prestige motivation, and surely preferred, scientifically speaking.

I'm encouraged by your efforts to work on this. I guess in summary, my personal opinion is that it might be easier to start with a pragmatic approach which doesn't attempt to do things at such a fine level and which still relies on the researcher to use a good deal of discretion and understanding when citing packages written by others.

Every year or so we do a search to see who has cited flint and mpir. It's disappointing just how few citations we receive. Either people are just not using flint and MPIR in any way that is critical to their work (definitely a possibility), or writing highly performant C libraries is just not the way to get citations (also very possible). On the other hand, the situation might be improved for us if we spent more time writing papers on new, groundbreaking algorithms being implemented in flint and MPIR.
 
Sorry that was a bit rambling. It takes a lot of time and effort to write short, succinct posts. Perhaps my garbled thoughts above will trigger some better thoughts from others who have thought more about the citation problem than me.

Bill.

Bill Hart

unread,
Sep 6, 2014, 12:44:57 PM9/6/14
to sage-...@googlegroups.com
On additional technical point.

The profiling approach need not slow Sage down.

The same approach is used in another project I'm aware of (not for citation though). It takes samples at regular points during the computation, figures out which function is currently being called and tallies the results.

My experience is this is not noticeably slower than running the computation without profiling....

Actually I just checked an example and it slowed down a complex computation about 15%. It took around 2400 stack traces over a 2.4s period and intercepted calls into around 44 different high level functions at around 260 different function call points and calls into around 122 distinct C functions in C libraries compiled with symbols on, plus another 500 or so interceptions in C libraries with symbols off (probably far fewer distinct functions). This resulted in a couple of hundred thousand pieces of data (each stack trace includes a complete backtrace)

One trick they use is to separate the actual collection of samples from the processing. The latter happens after the profiling itself stops.

Bill.

Volker Braun

unread,
Sep 6, 2014, 2:04:25 PM9/6/14
to sage-...@googlegroups.com
On Saturday, September 6, 2014 5:44:57 PM UTC+1, Bill Hart wrote:
It takes samples at regular points during the computation

Thats the aforementioned http://trac.sagemath.org/ticket/16777. See that ticket for technical obstacles of that approach.

Robert Bradshaw

unread,
Sep 6, 2014, 2:34:56 PM9/6/14
to sage-devel
Note that Cython supports cProfile these days:
http://docs.cython.org/src/tutorial/profiling_tutorial.html However,
that won't help too much as the real missing pieces are the calls from
Cython into the various C libraries.

I'm also -1 to an approach that slows down all of Sage to track this
unconditionally. The decorator approach could be good for annotating
functions (e.g. attaching them to some database the citations module
would use) but recording every call could be prohibitively expensive.

As far as the question of why software isn't cited, usually the metric
of "usefulness" is #downloads. If you're established enough that
downloads typically come via a distribution of sorts (e.g. Sage) then
that is another data point for being useful. This adds to the
difficulty of getting academic credit for programming work.
> --
> You received this message because you are subscribed to the Google Groups
> "sage-devel" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sage-devel+...@googlegroups.com.
> To post to this group, send email to sage-...@googlegroups.com.
> Visit this group at http://groups.google.com/group/sage-devel.
> For more options, visit https://groups.google.com/d/optout.

Bill Hart

unread,
Sep 6, 2014, 2:39:19 PM9/6/14
to sage-...@googlegroups.com
The code for the profiler in the project I mentioned seems to have been custom written for the project. You can see it here:


and here:


There looks to be fairly good separation between the C part, which gathers the actual data, and the high level part, which drives the whole thing and processes the data.

I didn't check carefully, but perhaps the C part is of some use to someone trying to do this for at least the C libraries used by Sage.

I appreciate that there are technical difficulties at the Cython level and Python stack tracing is different again. The approach can work. I'm not necessarily suggesting it is the approach that should be taken.

Bill.

Bill Hart

unread,
Sep 6, 2014, 2:52:33 PM9/6/14
to sage-...@googlegroups.com


On Saturday, 6 September 2014 20:34:56 UTC+2, Robert Bradshaw wrote:
Note that Cython supports cProfile these days:
http://docs.cython.org/src/tutorial/profiling_tutorial.html However,
that won't help too much as the real missing pieces are the calls from
Cython into the various C libraries.

I'm also -1 to an approach that slows down all of Sage to track this
unconditionally.

I am also not comfortable with an approach that slows down the whole of Sage, just so people like me and my colleagues can get more credit. Especially when we are trying to speed Sage up. :-)
 
The decorator approach could be good for annotating
functions (e.g. attaching them to some database the citations module
would use) but recording every call could be prohibitively expensive.

As far as the question of why software isn't cited, usually the metric
of "usefulness" is #downloads.

That would be true of programs, not necessarily of libraries. There are many more users of programs than consumers of libraries, I would suspect.
 
We don't track downloads for flint, only unique pageviews on our download page, which come in at a little over 200 a month.

I find such a metric a little difficult to interpret.

Burcin Erocal

unread,
Sep 7, 2014, 10:55:32 AM9/7/14
to sage-...@googlegroups.com
On Sat, 6 Sep 2014 11:52:33 -0700 (PDT)
Bill Hart <goodwi...@googlemail.com> wrote:

> On Saturday, 6 September 2014 20:34:56 UTC+2, Robert Bradshaw wrote:
> >
> > Note that Cython supports cProfile these days:
> > http://docs.cython.org/src/tutorial/profiling_tutorial.html
> > However, that won't help too much as the real missing pieces are
> > the calls from Cython into the various C libraries.
> >
> > I'm also -1 to an approach that slows down all of Sage to track
> > this unconditionally.
>
>
> I am also not comfortable with an approach that slows down the whole
> of Sage

Let me clarify a few points:

- the current (profiling) approach slows everything down when you turn
it on.

It simply turns on the profiler, which either logs _all_ function
calls or periodically samples the running process and logs the active
function in addition to loads of other data that is irrelevant for
citations.

- the (decorator) approach suggested in #3317 [1] notes which
implementation is being used only at the point it is used.

[1] http://trac.sagemath.org/ticket/3317

It would be simple to add a global switch to turn this logging on or
off. We didn't think this was necessary because the overhead is
incredibly small. See below for details.



Since the profiling approach is so slow, it can only be used on toy
examples which often use a completely different code path. I will copy
from my comment in #16854 [2]:

The profiling approach is broken for several reasons:

- the code used for different problem sizes is often different.

Profiling a small example will not give you the correct
information. If you are really working on the cutting edge of
what is computable, then you don't want to run the whole
computation under the profiler once more.

- you have to guess what is being used from the data obtained from
the profiler.

There is no clean way to associate citation information to
functions this way.

- it does not allow tracking more fine grained information than
function names.

If a Sage function wraps several algorithms by calling an
external package with different arguments, you cannot
differentiate these.


[2] http://trac.sagemath.org/ticket/16854#comment:6


Decorators & Speed:

We spent a lot of effort on speeding up the implementation in
#3317 and measuring the effect of adding citation information via
decorators.

IIRC, the only additional operation performed by the decorated function
is to add a string to a Python set. Compared to the overhead of calling
a function in Python, this is negligible.

There are some benchmarks in this blog post [3]. The title may give the
wrong idea, but the numbers are quite impressive. Note that we are not
suggesting to decorate arithmetic operations like addition and
multiplication, only calls to higher level routines, like groebner
basis computation or symbolic integration.

[3] http://sage-citation.blogspot.de/2011/08/awful-benchmarks.html

Here are some numbers from the link above:

- calling a pass-function (empty Python function):

100000 loops, best of 3: 110 ns per loop

- calling the above function after decorating:

100000 loops, best of 3: 295 ns per loop

- calling a pass-function, that takes some parameters:

100000 loops, best of 3: 399 ns per loop

- calling the above function after decorating:

100000 loops, best of 3: 796 ns per loop


This 200 ns difference would be a measuring error if the function in
question did any real work.


> , just so people like me and my colleagues can get more credit.

We suggest to cite not only libraries used by Sage but papers on the
algorithms used. See the example in the ticket description here:

http://trac.sagemath.org/ticket/3317

> Especially when we are trying to speed Sage up. :-)

I sincerely hope that you are not saying I am trying to "slow Sage
down."

> > The decorator approach could be good for annotating
> > functions (e.g. attaching them to some database the citations
> > module would use) but recording every call could be prohibitively
> > expensive.

Let me emphasize again. We definitely do not want to recall every
function call. The goal is to add annotations to functions that
implement / wrap relatively expensive computational routines.



In short, I vote +1 for decorators.


Cheers,
Burcin


Volker Braun

unread,
Sep 7, 2014, 12:24:48 PM9/7/14
to sage-...@googlegroups.com
If you want to cite inside a decision tree then you can't do that with a decorator. Instead of *also* having a function call syntax, we should then *only* have function call syntax. Nothing good ever came out of having two ways of achieving the same outcome. The other reason for why function call syntax is better is that, in Cython code, this can then be a C macro or inline function, avoiding an unnecessary stack frame. The citation tracking code can then be inside an 

    if (unlikely(sage_citation_enabled)) {} 

which will be essentially for free due to branch prediction. I think 5% slowdown mentioned in http://sage-citation.blogspot.de/2011/08/awful-benchmarks.html is not desirable.

Also, I don't really like the syntax

    @cites(citable_items.libsingular, citable_items.foo, citable_items.bar)

how about staying a bit closer to LaTeX:

    cite(bib.libsingular, bib.foo, bib.bar)

I also only skimmed #3317, can somebody convert that to a git branch?




On Friday, September 5, 2014 2:07:25 PM UTC+1, Martin Raum wrote:

Martin Raum

unread,
Sep 8, 2014, 4:47:51 AM9/8/14
to sage-...@googlegroups.com
Thank you Bill for your input. You made me think of some aspects that will be important. Still, let us first focus on getting some citation management into Sage, and then work on it. As for Volker's question on what the goal is: First step, and I think we agree that this desirable: Get third part packages tracked. I personally also would like to track usage of major Sage components, so there is no penalty for developing within Sage instead as a library.

I like how Volker thinks about the citation calls. Decorators are perhaps really not optimal compared to function calls. In Cython, due to inlining, the if (sage_citation_management_enabled) {} is truly the fastest thing. For Python, because there is no inlining, I thought of a little hack. To trigger citations call

sage.citation.cite(bib.libsingular)

Now, if citation is enabled, then cite = _cite(*bibs), and if not then cite = def _cite_pass(*bibs): pass. bib.libsingular is a function, which we do not call in the citation statement, but only in _cite(*bibs). I think in Python, you can't do better than empty calls and I really want to avoid letting the user spell out the if statement - Citatation management should be easy.

To summerize: Would it be feasible to replace decorators by function calls, exploit Cython inlining and the above Python trick to make citations, if deactivated, consume as little resources as possible? Then sprinkle the citation calls in Sage, however, keep functionality in, say, a module sage.citation?

Volker Braun

unread,
Sep 8, 2014, 5:23:12 AM9/8/14
to sage-...@googlegroups.com
On Monday, September 8, 2014 9:47:51 AM UTC+1, Martin Raum wrote:
For Python, because there is no inlining, I thought of a little hack. To trigger citations call
sage.citation.cite(bib.libsingular)
Now, if citation is enabled, then cite = _cite(*bibs), and if not then cite = def _cite_pass(*bibs): pass

Changing the definition of sage.citation.cite inside the module at runtime is going to break code that uses the "from sage.citation import cite" statement, as it will have already imported the old definition. I also think that its not necessary, checking a cdef bool is again basically free compared to the cost of a new Python stack frame. And if that is a problem then you should have used Cython for your function anyways. 

sage: cython("""
cdef bint sage_citation_enabled = False
cpdef cite():
    if sage_citation_enabled:
        raise NotImplementedError()
""")
sage: timeit('cite()', number=10000000)
10000000 loops, best of 3: 30.6 ns per loop

sage: cython("""
cpdef empty():
    pass
""")
sage: timeit('empty()', number=10000000)
10000000 loops, best of 3: 31 ns per loop

sage: def add_with_citation(a, b):
....:     cite()
....:     return a + b
....: 
sage: def add(a, b):
....:     return a + b
....: 
sage: P.<x,y,z>=QQ[]
sage: f = x * y + z
sage: timeit('add(f, x)')
625 loops, best of 3: 1.2 µs per loop
sage: timeit('add_with_citation(f, x)')
625 loops, best of 3: 1.25 µs per loop

Nils Bruin

unread,
Sep 8, 2014, 1:32:33 PM9/8/14
to sage-...@googlegroups.com
On Sunday, September 7, 2014 9:24:48 AM UTC-7, Volker Braun wrote:
If you want to cite inside a decision tree then you can't do that with a decorator. Instead of *also* having a function call syntax, we should then *only* have function call syntax. Nothing good ever came out of having two ways of achieving the same outcome. The other reason for why function call syntax is better is that,

There is an advantage to a decorator on python-level: If you're happy to configure sage_citation_enabled at start-up (which means it would have to be a command line option or an environment variable), you can make it completely *zero* penalty at runtime, with a little penalty at function definition ("import") time:

def cite(<citations>):
    if sage_citation_enabled:
        def decorator(f):
            def wrapper(*args):
                <register citations>
                return f(*args)
            return wrapper
    else:
        def decorator(f): return f
    return decorator


Bill Hart

unread,
Sep 8, 2014, 2:32:29 PM9/8/14
to sage-...@googlegroups.com
I really believe on general principles that if at all possible, something like this should be zero penalty.

Burcin, I obviously don't think you are trying to slow Sage down. I'm sure you know I always appreciate your efforts. 

It sounds like this won't actually affect flint all that much, as flint is mainly used for arithmetic with (hopefully) very low cost, and it sounds like you want to focus on higher cost algorithms at this stage. But as Martin says, getting something into Sage is the first priority. Extending it can happen later on.

Obviously after Nils' post, I can see the benefit of the decorator approach. Previously I assumed that only the profiler approach could be truly zero penalty.

Bill.

Martin Raum

unread,
Sep 8, 2014, 3:28:32 PM9/8/14
to sage-...@googlegroups.com
That also seems like a true option, in particular, because of the zero penalty.  Do you have an idea how to handle the problem that not citation can be decided at function level (hence using decorators)?

Robert Bradshaw

unread,
Sep 9, 2014, 12:31:05 AM9/9/14
to sage-devel
On Sun, Sep 7, 2014 at 7:57 AM, Burcin Erocal <bur...@erocal.org> wrote:
On Sat, 6 Sep 2014 11:52:33 -0700 (PDT)
Bill Hart <goodwi...@googlemail.com> wrote:

> On Saturday, 6 September 2014 20:34:56 UTC+2, Robert Bradshaw wrote:
> >
> > Note that Cython supports cProfile these days:
> > http://docs.cython.org/src/tutorial/profiling_tutorial.html
> > However, that won't help too much as the real missing pieces are
> > the calls from Cython into the various C libraries.
> >
> > I'm also -1 to an approach that slows down all of Sage to track
> > this unconditionally.
>
>
> I am also not comfortable with an approach that slows down the whole
> of Sage

Let me clarify a few points:

- the current (profiling) approach slows everything down when you turn
  it on.

  It simply turns on the profiler, which either logs _all_ function
  calls or periodically samples the running process and logs the active
  function in addition to loads of other data that is irrelevant for
  citations.

True, but it can be (completely) enabled/disabled at runtime. 
 
- the (decorator) approach suggested in #3317 [1] notes which
  implementation is being used only at the point it is used.

  [1] http://trac.sagemath.org/ticket/3317

  It would be simple to add a global switch to turn this logging on or
  off. We didn't think this was necessary because the overhead is
  incredibly small. See below for details.

I concur, this is small overhead if you're doing anything significant. 
I think it depends on what you're trying to track. If you're trying to track uses of, say, Pari or GAP this wouldn't work well, but if you want to track an algorithm for computing, say, Smith Normal Form, then it's perfect. 

FWIW, I actually prefer a function call like

    cite(bib.foo, bib.bar)

over a decorator; it's more flexible (e.g. if a function takes several paths) and if performance is a concern it can be an inline Cython function which will be really fast to call from Cython code (the branch can be inside the function with little penalty). 

- Robert

Volker Braun

unread,
Sep 9, 2014, 8:48:59 AM9/9/14
to sage-...@googlegroups.com
On Monday, September 8, 2014 6:32:33 PM UTC+1, Nils Bruin wrote:
There is an advantage to a decorator on python-level: If you're happy to configure sage_citation_enabled at start-up (which means it would have to be a command line option or an environment variable), you can make it completely *zero* penalty at runtime, with a little penalty at function definition ("import") time:

But that doesn't work at the C level, you either compile something in or not. So you can't apply it to the (presumably) speed-critical c(p)def functions, only to plain Python functions where overhead wasn't much of an issue to start with.

Also, in terms of user experience it is much preferable to have the citation tracking switchable at runtime.

Nils Bruin

unread,
Sep 9, 2014, 11:44:33 AM9/9/14
to sage-...@googlegroups.com
On Tuesday, September 9, 2014 5:48:59 AM UTC-7, Volker Braun wrote:
But that doesn't work at the C level, you either compile something in or not. So you can't apply it to the (presumably) speed-critical c(p)def functions, only to plain Python functions
True
where overhead wasn't much of an issue to start with.
 
Don't  underestimate the importance of "reasonable" performance at python level. In for instance magma, the first version of a program often has "decent" performance (at least in my experience), where "decent" means: I can easily do an example in the range of interest. In sage, I have never had that. You can use cython to get *great* performance, but at the expense of putting in significant work (unwrapping all the convenient sage-layers of things, making sure you avoid category overhead, etc). I think it's important to have decent performance from the get-go, because that often means you don't have to bother putting in 5 extra days of optimizing code to do the example you're interested in.

I'm not sure whether the tests involved with this would lead to a significant slow-down, but these things add up. With profiling sage code, I have often noticed that there's not a single bottleneck. It's just "death by thousand cuts". It seems to me this might add another cut at python level. In python, looking up a global flag is going to be relatively slow, because, if addressed by its proper name, it involves checking several __dict__s. This adds noticeably to the overhead:

sage.misc.sageinspect.cite_enabled = False #just borrowing some spot
def citetest(a): pass
sage.misc.sageinspect.citetest = citetest #to compare symbol lookup with call overhead

def t1(a):
    if sage.misc.sageinspect.cite_enabled:
        print "cite"
    return 2*a

def t2(a):
    return 2*a
   
def t3(a):
    sage.misc.sageinspect.citetest("citation")
    return 2*a  

sage: %timeit t1(20)
1000000 loops, best of 3: 675 ns per loop
sage: %timeit t2(20)
1000000 loops, best of 3: 396 ns per loop
sage: %timeit t3(20)
1000000 loops, best of 3: 797 ns per loop

Unlike a "cdef cite_enabled" flag on cython level, testing the flag on python level is quite noticeable. I'd suspect that cythonizing the process would still be comparable, due to the symbol lookup required for finding the cythonized routine.

Volker Braun

unread,
Sep 9, 2014, 11:55:04 AM9/9/14
to sage-...@googlegroups.com
On Tuesday, September 9, 2014 4:44:33 PM UTC+1, Nils Bruin wrote:
In python, looking up a global flag is going to be relatively slow, because, if addressed by its proper name, it involves checking several __dict__s

Which is why it needs to be a cdef bool as I wrote earlier.

 sage: sage: cython("""
....: cdef bint sage_citation_enabled = False
....: cpdef cite():
....:     if sage_citation_enabled:
....:         raise NotImplementedError()
....: """)
sage: def t2(a):
....:         return 2*a
....: 
sage: def t4(a):
....:     cite()
....:     return 2*a
....: 
sage: timeit('t2(20)', number=100000)
100000 loops, best of 3: 360 ns per loop
sage: timeit('t4(20)', number=100000)
100000 loops, best of 3: 377 ns per loop

Not really noticable any more, especially considering that multiplying-by-2 is a) something that really should be implented in Cython to avoid Python overhead and b) not something that you would cite anyways.

Nils Bruin

unread,
Sep 9, 2014, 3:38:37 PM9/9/14
to sage-...@googlegroups.com
On Tuesday, September 9, 2014 8:55:04 AM UTC-7, Volker Braun wrote:
On Tuesday, September 9, 2014 4:44:33 PM UTC+1, Nils Bruin wrote:
In python, looking up a global flag is going to be relatively slow, because, if addressed by its proper name, it involves checking several __dict__s

Which is why it needs to be a cdef bool as I wrote earlier.

Yes, but that's not the overhead I referred to. It's the fact that "cite" itself would live somewhere deep in the sage namespace and that the canonical way of referring to it would require several dictionary lookups. Compare:

cython("""

cdef bint sage_citation_enabled = False
cpdef cite():
    if sage_citation_enabled:
        raise NotImplementedError()
""")

sage.misc.sageinspect.cite=cite #borrowing namespace here

def t2(a):
        return 2*a

def t4(a):
    sage.misc.sageinspect.cite()
    return 2*a

sage: sage: timeit('t2(20)', number=10000000)
10000000 loops, best of 3: 302 ns per loop
sage: sage: timeit('t4(20)', number=10000000)
10000000 loops, best of 3: 457 ns per loop

On this trivial example it is now quite noticeable. Doing a "from sage.citation import cite" would alleviate this quite a bit as you show, but generally "from ... import ..." is discouraged in favour of leaving namespaces intact. 
Reply all
Reply to author
Forward
0 new messages