How parallel should @parallel be?

165 views
Skip to first unread message

Julian Rüth

unread,
Jul 9, 2018, 12:35:22 PM7/9/18
to sage-devel
Hello.

since Sage 8.2 sage.parallel.ncpus.ncpus() returns 1 if you have no environment variables such as MAKE, SAGE_NUM_THREADS, MAKEOPTS set.

This number is used by the @parallel decorator and similar constructions to determine the number of processes to run in parallel. (Unless during doctests, then it's set to 2 I think.)

The question is: What is a good default for things such as @parallel when SAGE_NUM_THREADS has not been set? I think that 1 is not a good one. The actual number of cores/threads on a system probably isn't either on servers with lots of cores. At some point we had `min(8, number of threads)` which appears reasonable to me.

Please join the discussion at https://trac.sagemath.org/ticket/24937 :)

julian

Jori Mäntysalo

unread,
Jul 9, 2018, 12:55:24 PM7/9/18
to sage-devel
On Mon, 9 Jul 2018, Julian Rüth wrote:

> The question is: What is a good default for things such as @parallel when
> SAGE_NUM_THREADS has not been set? I think that 1 is not a good one. The
> actual number of cores/threads on a system probably isn't either on servers
> with lots of cores. At some point we had `min(8, number of threads)` which
> appears reasonable to me.

I would suppose that by default Sage uses all cores (or one core); all
other limits are artificial.

(But Sage doc should contain a little page telling how to use timeout,
taskset and ulimit commands on some common Linux distribution. Maybe also
a link to page telling about cgroups and even about at-command.)

--
Jori Mäntysalo

Friedrich Wiemer

unread,
Jul 9, 2018, 5:27:20 PM7/9/18
to sage-devel
I would also expect it to run as many threads as my laptop has cores (+ hyperthreading if available).

Sébastien Labbé

unread,
Jul 9, 2018, 5:41:05 PM7/9/18
to sage-devel


The question is: What is a good default for things such as @parallel when SAGE_NUM_THREADS has not been set? I think that 1 is not a good one.

+1

The actual number of cores/threads on a system probably isn't either on servers with lots of cores. At some point we had `min(8, number of threads)` which appears reasonable to me.

+1

John H Palmieri

unread,
Jul 9, 2018, 6:25:15 PM7/9/18
to sage-devel


On Monday, July 9, 2018 at 2:27:20 PM UTC-7, Friedrich Wiemer wrote:
I would also expect it to run as many threads as my laptop has cores (+ hyperthreading if available).

This makes sense for single-user machines, but the current default was implemented because it was deemed safer on machines used by multiple users.

--
John

Jori Mäntysalo

unread,
Jul 10, 2018, 12:49:12 AM7/10/18
to sage-devel
On Mon, 9 Jul 2018, John H Palmieri wrote:

>> I would also expect it to run as many threads as my laptop has
>> cores (+ hyperthreading if available).

> This makes sense for single-user machines, but the current default was
> implemented because it was deemed safer on machines used by multiple
> users.

Then there is a need for better maintainer on the machine.

Also, limiting number of CPU cores is the most less efective restriction.
Timesharing works very well in any case; much more problematic is program
that eats memory or just runs heavy I/O.

--
Jori Mäntysalo

Nils Bruin

unread,
Jul 10, 2018, 1:32:05 AM7/10/18
to sage-devel
On Monday, July 9, 2018 at 9:49:12 PM UTC-7, Jori Mäntysalo wrote:
On Mon, 9 Jul 2018, John H Palmieri wrote:

>> I would also expect it to run as many threads as my laptop has
>> cores (+ hyperthreading if available).

> This makes sense for single-user machines, but the current default was
> implemented because it was deemed safer on machines used by multiple
> users.
 
Then there is a need for better maintainer on the machine.

The reality is that there are many machines like that, though. I have also seen docker images that run on only one core, but where "ncpus" still reports all the cpus on the host machine. I don't know what the right default is, but I think there are many situations where "ncpus" isn't it.

Erik Bray

unread,
Jul 10, 2018, 6:37:49 AM7/10/18
to sage-devel
For the sake of reference, the standard library's
`multiprocessing.Pool` [1] just uses `os.cpu_count()` by default, and
I don't really see a problem with that. If I want to limit
parallelization on a machine that I'm concurrently using for other
work, that's kind of a decision a human has to make. If I have 32
cores why would I want it limited to 8, by default, for example?

(No, most people's personal machines won't have 32 cores, though
that's not going to be so uncommon forever...)

[1] https://docs.python.org/3.7/library/multiprocessing.html?highlight=process#multiprocessing.pool.Pool

Jeroen Demeyer

unread,
Jul 10, 2018, 6:43:39 AM7/10/18
to sage-...@googlegroups.com
On 2018-07-10 12:37, Erik Bray wrote:
> If I want to limit
> parallelization on a machine that I'm concurrently using for other
> work, that's kind of a decision a human has to make.

One could argue the other way around: if I want Sage to use all
resources on the machine that it's running on, that's kind of a decision
a human has to make. In other words: guessing is hard.

The default of 1 was certainly meant as feature: the idea being that
parallellism is something that should be enabled explicitly. A bit like
GNU make, which uses a single process by default but optionally supports
multi-processing.


Jeroen.

Erik Bray

unread,
Jul 10, 2018, 6:47:18 AM7/10/18
to sage-devel
I think this is good argument, but it's also a bit of a problem in
that @parallel is supposed to be kind of "out-of-the-box"
parallelization, and if its default is not provide any parallelization
I envision users saying "I put @parallel on my code and it didn't run
any faster" (if anything it's slower). So I don't think 1 is a good
low-end default for that alone. Maybe instead min(2, os.cpu_count())
if you're going to argue from that end.

Julian Rüth

unread,
Jul 11, 2018, 10:11:01 AM7/11/18
to sage-devel
Thanks for the feedback so far. It seems that there are pros and cons to all of the options.

What about the following: We go with the somewhat random min(8, number of threads) and print a warning once if "number of threads" > 8 (telling the user to export SAGE_NUM_THREADS)? Note that this won't affect doctests as SAGE_NUM_THREADS=2 in that context.

That way, we provide a good experience for the typical laptop/desktop user and don't risk angry emails from admins after somebody convinced them to install Sage on their shiny server.

What do you think?

julian

PS: I am also fine with "number of threads" as a default. But I am opposed to "1" as that provides a poor experience for the casual user who won't dig into the documentation to find out what's going on.

Eric Gourgoulhon

unread,
Jul 11, 2018, 12:08:11 PM7/11/18
to sage-devel


Le mercredi 11 juillet 2018 16:11:01 UTC+2, Julian Rüth a écrit :
Thanks for the feedback so far. It seems that there are pros and cons to all of the options.

What about the following: We go with the somewhat random min(8, number of threads) and print a warning once if "number of threads" > 8 (telling the user to export SAGE_NUM_THREADS)? Note that this won't affect doctests as SAGE_NUM_THREADS=2 in that context.


+1

Eric.

William Stein

unread,
Jul 11, 2018, 2:31:20 PM7/11/18
to sage-devel
On Wed, Jul 11, 2018 at 7:11 AM, Julian Rüth <julian...@fsfe.org> wrote:
> Thanks for the feedback so far. It seems that there are pros and cons to all
> of the options.
>
> What about the following: We go with the somewhat random min(8, number of
> threads) and print a warning once if "number of threads" > 8 (telling the
> user to export SAGE_NUM_THREADS)? Note that this won't affect doctests as
> SAGE_NUM_THREADS=2 in that context.
>
> That way, we provide a good experience for the typical laptop/desktop user
> and don't risk angry emails from admins after somebody convinced them to
> install Sage on their shiny server.
>
> What do you think?

Does anything in our codebase (the sage library) use a "naked" @parallel?

sage: search_src("@parallel")

I ask because in any interactive or external code I write, I'm happy
to just be explicit and pass a parameter
to @parallel with the number of cpus I want it to use. However, if
there is code deep in Sage itself that just
uses @parallel, then this design choice we are talking about greatly
impacts how that code runs.

Looking.... there are a bunch of places in SageManifolds that use
@parallel automatically. It looks like
they explicitly set ncpus, and they even provide a cool whole new
framework for setting such defaults!

sage-8.2/src/sage/parallel$ cat parallelism.py

...

class Parallelism(Singleton, SageObject):
r"""
Singleton class for managing the number of processes used in parallel
computations involved in various fields.

EXAMPLES:

The number of processes is initialized to 1 (no parallelization) for
each field (only tensor computations are implemented at the moment)::

sage: Parallelism()
Number of processes for parallelization:
- tensor computations: 1
...


Anyway, having a framework for configuring how Sage uses multiple cpus
is a really good idea, and possibly relevant to people reading this
thread. Also, please be sure to use this framework in other code that
uses @parallel.

E.g., I long ago wrote some such code:

lfunctions/zero_sums.pyx:1339: @parallel(ncpus=NCPUS)

and of course it doesn't use this framework at all... and this code
also doesn't:

schemes/curves/zariski_vankampen.py:293:@parallel
...
@parallel
def braid_in_segment(f, x0, x1):
"""
Return the braid formed by the `y` roots of ``f`` when `x` moves
from ``x0`` to ``x1``.
...


What will it do? Is there any way to even impact how it runs?

William

>
> julian
>
> PS: I am also fine with "number of threads" as a default. But I am opposed
> to "1" as that provides a poor experience for the casual user who won't dig
> into the documentation to find out what's going on.
>
> On Monday, July 9, 2018 at 6:35:22 PM UTC+2, Julian Rüth wrote:
>>
>> Hello.
>>
>> since Sage 8.2 sage.parallel.ncpus.ncpus() returns 1 if you have no
>> environment variables such as MAKE, SAGE_NUM_THREADS, MAKEOPTS set.
>>
>> This number is used by the @parallel decorator and similar constructions
>> to determine the number of processes to run in parallel. (Unless during
>> doctests, then it's set to 2 I think.)
>>
>> The question is: What is a good default for things such as @parallel when
>> SAGE_NUM_THREADS has not been set? I think that 1 is not a good one. The
>> actual number of cores/threads on a system probably isn't either on servers
>> with lots of cores. At some point we had `min(8, number of threads)` which
>> appears reasonable to me.
>>
>> Please join the discussion at https://trac.sagemath.org/ticket/24937 :)
>>
>> julian
>
> --
> You received this message because you are subscribed to the Google Groups
> "sage-devel" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sage-devel+...@googlegroups.com.
> To post to this group, send email to sage-...@googlegroups.com.
> Visit this group at https://groups.google.com/group/sage-devel.
> For more options, visit https://groups.google.com/d/optout.



--
William (http://wstein.org)

Samuel Lelievre

unread,
Jul 17, 2018, 12:57:04 AM7/17/18
to sage-devel
There is a discussion on sage-support where Simon King
mentions the multimodular algorithm for computing determinants
of integer-valued matrices as a good use case for parallelism:

Reply all
Reply to author
Forward
0 new messages