Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH 0/2] cpusets,isolcpus: resolve conflict between cpusets and isolcpus

460 views
Skip to first unread message

ri...@redhat.com

unread,
Feb 23, 2015, 4:45:21 PM2/23/15
to linux-...@vger.kernel.org
Ensure that cpus specified with the isolcpus= boot commandline
option stay outside of the load balancing in the kernel scheduler.

Operations like load balancing can introduce unwanted latencies,
which is exactly what the isolcpus= commandline is there to prevent.

Previously, simply creating a new cpuset, without even touching the
cpuset.cpus field inside the new cpuset, would undo the effects of
isolcpus=, by creating a scheduler domain spanning the whole system,
and setting up load balancing inside that domain. The cpuset root
cpuset.cpus file is read-only, so there was not even a way to undo
that effect.

This does not impact the majority of cpusets users, since isolcpus=
is a fairly specialized feature used for realtime purposes.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

ri...@redhat.com

unread,
Feb 23, 2015, 4:45:33 PM2/23/15
to linux-...@vger.kernel.org, Rik van Riel, Peter Zijlstra, Clark Williams, Li Zefan, Ingo Molnar, Luiz Capitulino, cgr...@vger.kernel.org
From: Rik van Riel <ri...@redhat.com>

Ensure that cpus specified with the isolcpus= boot commandline
option stay outside of the load balancing in the kernel scheduler.

Operations like load balancing can introduce unwanted latencies,
which is exactly what the isolcpus= commandline is there to prevent.

Previously, simply creating a new cpuset, without even touching the
cpuset.cpus field inside the new cpuset, would undo the effects of
isolcpus=, by creating a scheduler domain spanning the whole system,
and setting up load balancing inside that domain. The cpuset root
cpuset.cpus file is read-only, so there was not even a way to undo
that effect.

This does not impact the majority of cpusets users, since isolcpus=
is a fairly specialized feature used for realtime purposes.

Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Clark Williams <will...@redhat.com>
Cc: Li Zefan <liz...@huawei.com>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: Luiz Capitulino <lcapi...@redhat.com>
Cc: cgr...@vger.kernel.org
Signed-off-by: Rik van Riel <ri...@redhat.com>
---
include/linux/sched.h | 2 ++
kernel/cpuset.c | 13 +++++++++++--
kernel/sched/core.c | 2 +-
3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index cb5cdc777c8a..af1b32a5ddcc 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1038,6 +1038,8 @@ static inline struct cpumask *sched_domain_span(struct sched_domain *sd)
extern void partition_sched_domains(int ndoms_new, cpumask_var_t doms_new[],
struct sched_domain_attr *dattr_new);

+extern cpumask_var_t cpu_isolated_map;
+
/* Allocate an array of sched domains, for partition_sched_domains(). */
cpumask_var_t *alloc_sched_domains(unsigned int ndoms);
void free_sched_domains(cpumask_var_t doms[], unsigned int ndoms);
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 64b257f6bca2..1ad63fa37cb4 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -625,6 +625,7 @@ static int generate_sched_domains(cpumask_var_t **domains,
int csn; /* how many cpuset ptrs in csa so far */
int i, j, k; /* indices for partition finding loops */
cpumask_var_t *doms; /* resulting partition; i.e. sched domains */
+ cpumask_var_t non_isolated_cpus; /* load balanced CPUs */
struct sched_domain_attr *dattr; /* attributes for custom domains */
int ndoms = 0; /* number of sched domains in result */
int nslot; /* next empty doms[] struct cpumask slot */
@@ -634,6 +635,10 @@ static int generate_sched_domains(cpumask_var_t **domains,
dattr = NULL;
csa = NULL;

+ if (!alloc_cpumask_var(&non_isolated_cpus, GFP_KERNEL))
+ goto done;
+ cpumask_andnot(non_isolated_cpus, cpu_possible_mask, cpu_isolated_map);
+
/* Special case for the 99% of systems with one, full, sched domain */
if (is_sched_load_balance(&top_cpuset)) {
ndoms = 1;
@@ -646,7 +651,8 @@ static int generate_sched_domains(cpumask_var_t **domains,
*dattr = SD_ATTR_INIT;
update_domain_attr_tree(dattr, &top_cpuset);
}
- cpumask_copy(doms[0], top_cpuset.effective_cpus);
+ cpumask_and(doms[0], top_cpuset.effective_cpus,
+ non_isolated_cpus);

goto done;
}
@@ -669,7 +675,8 @@ static int generate_sched_domains(cpumask_var_t **domains,
* the corresponding sched domain.
*/
if (!cpumask_empty(cp->cpus_allowed) &&
- !is_sched_load_balance(cp))
+ !(is_sched_load_balance(cp) &&
+ cpumask_intersects(cp->cpus_allowed, non_isolated_cpus)))
continue;

if (is_sched_load_balance(cp))
@@ -751,6 +758,7 @@ static int generate_sched_domains(cpumask_var_t **domains,

if (apn == b->pn) {
cpumask_or(dp, dp, b->effective_cpus);
+ cpumask_and(dp, dp, non_isolated_cpus);
if (dattr)
update_domain_attr_tree(dattr + nslot, b);

@@ -763,6 +771,7 @@ static int generate_sched_domains(cpumask_var_t **domains,
BUG_ON(nslot != ndoms);

done:
+ free_cpumask_var(non_isolated_cpus);
kfree(csa);

/*
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 97fe79cf613e..6069f3703240 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5831,7 +5831,7 @@ cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu)
}

/* cpus with isolated domains */
-static cpumask_var_t cpu_isolated_map;
+cpumask_var_t cpu_isolated_map;

/* Setup the mask of cpus configured for isolated domains */
static int __init isolated_cpu_setup(char *str)
--
1.9.3

ri...@redhat.com

unread,
Feb 23, 2015, 4:46:19 PM2/23/15
to linux-...@vger.kernel.org, Rik van Riel, Peter Zijlstra, Clark Williams, Li Zefan, Ingo Molnar, Luiz Capitulino, cgr...@vger.kernel.org
From: Rik van Riel <ri...@redhat.com>

The previous patch makes it so the code skips over isolcpus when
building scheduler load balancing domains. This makes it hard to
see for a user which of the CPUs in a cpuset are participating in
load balancing, and which ones are isolated cpus.

Add a cpuset.isolcpus file with info on which cpus in a cpuset are
isolated CPUs.

This file is read-only for now. In the future we could extend things
so isolcpus can be changed at run time, for the root (system wide)
cpuset only.

Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Clark Williams <will...@redhat.com>
Cc: Li Zefan <liz...@huawei.com>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: Luiz Capitulino <lcapi...@redhat.com>
Cc: cgr...@vger.kernel.org
Signed-off-by: Rik van Riel <ri...@redhat.com>
---
kernel/cpuset.c | 27 +++++++++++++++++++++++++++
1 file changed, 27 insertions(+)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 1ad63fa37cb4..19ad5d3377f8 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1563,6 +1563,7 @@ typedef enum {
FILE_MEMORY_PRESSURE,
FILE_SPREAD_PAGE,
FILE_SPREAD_SLAB,
+ FILE_ISOLCPUS,
} cpuset_filetype_t;

static int cpuset_write_u64(struct cgroup_subsys_state *css, struct cftype *cft,
@@ -1704,6 +1705,23 @@ static ssize_t cpuset_write_resmask(struct kernfs_open_file *of,
return retval ?: nbytes;
}

+static size_t cpuset_sprintf_isolcpus(char *s, ssize_t pos, struct cpuset *cs)
+{
+ cpumask_var_t my_isolated_cpus;
+ ssize_t count;
+
+ if (!alloc_cpumask_var(&my_isolated_cpus, GFP_KERNEL))
+ return 0;
+
+ cpumask_and(my_isolated_cpus, cs->cpus_allowed, cpu_isolated_map);
+
+ count = cpulist_scnprintf(s, pos, my_isolated_cpus);
+
+ free_cpumask_var(my_isolated_cpus);
+
+ return count;
+}
+
/*
* These ascii lists should be read in a single call, by using a user
* buffer large enough to hold the entire map. If read in smaller
@@ -1738,6 +1756,9 @@ static int cpuset_common_seq_show(struct seq_file *sf, void *v)
case FILE_EFFECTIVE_MEMLIST:
s += nodelist_scnprintf(s, count, cs->effective_mems);
break;
+ case FILE_ISOLCPUS:
+ s += cpuset_sprintf_isolcpus(s, count, cs);
+ break;
default:
ret = -EINVAL;
goto out_unlock;
@@ -1906,6 +1927,12 @@ static struct cftype files[] = {
.private = FILE_MEMORY_PRESSURE_ENABLED,
},

+ {
+ .name = "isolcpus",
+ .seq_show = cpuset_common_seq_show,
+ .private = FILE_ISOLCPUS,
+ },
+
{ } /* terminate */
};

--
1.9.3

Mike Galbraith

unread,
Feb 23, 2015, 9:18:56 PM2/23/15
to ri...@redhat.com, linux-...@vger.kernel.org
On Mon, 2015-02-23 at 16:45 -0500, ri...@redhat.com wrote:
> Ensure that cpus specified with the isolcpus= boot commandline
> option stay outside of the load balancing in the kernel scheduler.
>
> Operations like load balancing can introduce unwanted latencies,
> which is exactly what the isolcpus= commandline is there to prevent.
>
> Previously, simply creating a new cpuset, without even touching the
> cpuset.cpus field inside the new cpuset, would undo the effects of
> isolcpus=, by creating a scheduler domain spanning the whole system,
> and setting up load balancing inside that domain. The cpuset root
> cpuset.cpus file is read-only, so there was not even a way to undo
> that effect.
>
> This does not impact the majority of cpusets users, since isolcpus=
> is a fairly specialized feature used for realtime purposes.

3/3: nohz_full cpus become part of that unified isolated map?

-Mike

Rik van Riel

unread,
Feb 24, 2015, 9:13:18 AM2/24/15
to Mike Galbraith, linux-...@vger.kernel.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 02/23/2015 09:18 PM, Mike Galbraith wrote:
> On Mon, 2015-02-23 at 16:45 -0500, ri...@redhat.com wrote:
>> Ensure that cpus specified with the isolcpus= boot commandline
>> option stay outside of the load balancing in the kernel
>> scheduler.
>>
>> Operations like load balancing can introduce unwanted latencies,
>> which is exactly what the isolcpus= commandline is there to
>> prevent.
>>
>> Previously, simply creating a new cpuset, without even touching
>> the cpuset.cpus field inside the new cpuset, would undo the
>> effects of isolcpus=, by creating a scheduler domain spanning the
>> whole system, and setting up load balancing inside that domain.
>> The cpuset root cpuset.cpus file is read-only, so there was not
>> even a way to undo that effect.
>>
>> This does not impact the majority of cpusets users, since
>> isolcpus= is a fairly specialized feature used for realtime
>> purposes.
>
> 3/3: nohz_full cpus become part of that unified isolated map?

There may be use cases where users want nohz_full, but still
want the scheduler to automatically load balance the CPU.

I am not sure whether we want nohz_full and isolcpus to always
overlap 100%.

On the other hand, any CPU that is isolated with isolcpus=
probably wants nohz_full...

- --
All rights reversed
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBAgAGBQJU7IbwAAoJEM553pKExN6DxpAIAIt3Wp1fYhyTiceCZPZj/75y
aNdpa1tsdyZmC3UoqHlWPajhU9kz3LV88gkDuRVLkBSIbAdc+Krpj0QwU80SBpn8
MIRkzlDE5pHwqgpNEmY0dTI8OP/BWH6SzgkbAqACTeffR8glz49ELFL2IK9hSl4P
2j5gOc1sgBD24cpqComw0qpIJwhRfDTr270zHPzEcqwESYLD57Z6AZxLuz8UjDnD
vgvmz5+zCeVKfPWfFSCUHGDZ56PuQAvQk0olAVp5pd6wwGoPyAMNJm12RgE2ru0M
8y/xyHAhbtzdV7XsdfBWWe6F4jmodvjhKKtqRdwGzQSGkJzxGjdlFBeSkOkeBsk=
=4dML
-----END PGP SIGNATURE-----

Mike Galbraith

unread,
Feb 24, 2015, 9:22:27 AM2/24/15
to Rik van Riel, linux-...@vger.kernel.org
I can't imagine caring deeply about the tiny interference of the tick,
yet not caring about the massive interference of load balancing.

-Mike

Mike Galbraith

unread,
Feb 24, 2015, 9:29:40 AM2/24/15
to Rik van Riel, linux-...@vger.kernel.org
On Tue, 2015-02-24 at 09:13 -0500, Rik van Riel wrote:

> On the other hand, any CPU that is isolated with isolcpus=
> probably wants nohz_full...

Not here. I isolate (via cpusets) for a 60 core rt load, but it's not
single task/core, and doesn't like the nohz_full overhead.

-Mike

David Rientjes

unread,
Feb 24, 2015, 9:11:19 PM2/24/15
to Rik van Riel, linux-...@vger.kernel.org, Peter Zijlstra, Clark Williams, Li Zefan, Ingo Molnar, Luiz Capitulino, cgr...@vger.kernel.org
On Mon, 23 Feb 2015, ri...@redhat.com wrote:

> From: Rik van Riel <ri...@redhat.com>
>
> Ensure that cpus specified with the isolcpus= boot commandline
> option stay outside of the load balancing in the kernel scheduler.
>
> Operations like load balancing can introduce unwanted latencies,
> which is exactly what the isolcpus= commandline is there to prevent.
>
> Previously, simply creating a new cpuset, without even touching the
> cpuset.cpus field inside the new cpuset, would undo the effects of
> isolcpus=, by creating a scheduler domain spanning the whole system,
> and setting up load balancing inside that domain. The cpuset root
> cpuset.cpus file is read-only, so there was not even a way to undo
> that effect.
>
> This does not impact the majority of cpusets users, since isolcpus=
> is a fairly specialized feature used for realtime purposes.
>
> Cc: Peter Zijlstra <pet...@infradead.org>
> Cc: Clark Williams <will...@redhat.com>
> Cc: Li Zefan <liz...@huawei.com>
> Cc: Ingo Molnar <mi...@redhat.com>
> Cc: Luiz Capitulino <lcapi...@redhat.com>
> Cc: cgr...@vger.kernel.org
> Signed-off-by: Rik van Riel <ri...@redhat.com>

Tested-by: David Rientjes <rien...@google.com>

David Rientjes

unread,
Feb 24, 2015, 9:15:26 PM2/24/15
to Rik van Riel, linux-...@vger.kernel.org, Peter Zijlstra, Clark Williams, Li Zefan, Ingo Molnar, Luiz Capitulino, cgr...@vger.kernel.org
On Mon, 23 Feb 2015, ri...@redhat.com wrote:

Whitespace.

> + if (!alloc_cpumask_var(&my_isolated_cpus, GFP_KERNEL))
> + return 0;
> +
> + cpumask_and(my_isolated_cpus, cs->cpus_allowed, cpu_isolated_map);
> +
> + count = cpulist_scnprintf(s, pos, my_isolated_cpus);
> +
> + free_cpumask_var(my_isolated_cpus);
> +
> + return count;
> +}
> +
> /*
> * These ascii lists should be read in a single call, by using a user
> * buffer large enough to hold the entire map. If read in smaller
> @@ -1738,6 +1756,9 @@ static int cpuset_common_seq_show(struct seq_file *sf, void *v)
> case FILE_EFFECTIVE_MEMLIST:
> s += nodelist_scnprintf(s, count, cs->effective_mems);
> break;
> + case FILE_ISOLCPUS:
> + s += cpuset_sprintf_isolcpus(s, count, cs);
> + break;

This patch looks fine, and I think cpuset.effective_cpus and
cpuset.isolcpus can be used well together, but will need updating now that
commit e8e6d97c9b ("cpuset: use %*pb[l] to print bitmaps including
cpumasks and nodemasks") has been merged which reworks this function.

It's a little unfortunate, though, that the user sees Cpus_allowed,
cpuset.cpus, and cpuset.effective_cpus that include isolcpus and then have
to check another cpulist for the isolcpus to see their sched domain,
though.

> default:
> ret = -EINVAL;
> goto out_unlock;
> @@ -1906,6 +1927,12 @@ static struct cftype files[] = {
> .private = FILE_MEMORY_PRESSURE_ENABLED,
> },
>
> + {
> + .name = "isolcpus",
> + .seq_show = cpuset_common_seq_show,
> + .private = FILE_ISOLCPUS,
> + },
> +
> { } /* terminate */
> };
>
--

Rik van Riel

unread,
Feb 24, 2015, 10:31:03 PM2/24/15
to David Rientjes, linux-...@vger.kernel.org, Peter Zijlstra, Clark Williams, Li Zefan, Ingo Molnar, Luiz Capitulino, cgr...@vger.kernel.org
I will take a look at that changeset. It was not in the
tip tree I worked against.

Expect a v2 :)

> It's a little unfortunate, though, that the user sees Cpus_allowed,
> cpuset.cpus, and cpuset.effective_cpus that include isolcpus and then have
> to check another cpulist for the isolcpus to see their sched domain,
> though.

Agreed, but all the alternatives I could think of would break the
userspace API, leaving this as the best way to go.

--
All rights reversed

ri...@redhat.com

unread,
Feb 25, 2015, 11:38:27 AM2/25/15
to linux-...@vger.kernel.org
-v2 addresses the conflict David Rientjes spotted between my previous
patches and commit e8e6d97c9b ("cpuset: use %*pb[l] to print bitmaps
including cpumasks and nodemasks")

Ensure that cpus specified with the isolcpus= boot commandline
option stay outside of the load balancing in the kernel scheduler.

Operations like load balancing can introduce unwanted latencies,
which is exactly what the isolcpus= commandline is there to prevent.

Previously, simply creating a new cpuset, without even touching the
cpuset.cpus field inside the new cpuset, would undo the effects of
isolcpus=, by creating a scheduler domain spanning the whole system,
and setting up load balancing inside that domain. The cpuset root
cpuset.cpus file is read-only, so there was not even a way to undo
that effect.

This does not impact the majority of cpusets users, since isolcpus=
is a fairly specialized feature used for realtime purposes.

ri...@redhat.com

unread,
Feb 25, 2015, 11:38:33 AM2/25/15
to linux-...@vger.kernel.org, Rik van Riel, Peter Zijlstra, Clark Williams, Li Zefan, Ingo Molnar, Luiz Capitulino, David Rientjes, Mike Galbraith, cgr...@vger.kernel.org
From: Rik van Riel <ri...@redhat.com>

The previous patch makes it so the code skips over isolcpus when
building scheduler load balancing domains. This makes it hard to
see for a user which of the CPUs in a cpuset are participating in
load balancing, and which ones are isolated cpus.

Add a cpuset.isolcpus file with info on which cpus in a cpuset are
isolated CPUs.

This file is read-only for now. In the future we could extend things
so isolcpus can be changed at run time, for the root (system wide)
cpuset only.

Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Clark Williams <will...@redhat.com>
Cc: Li Zefan <liz...@huawei.com>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: Luiz Capitulino <lcapi...@redhat.com>
Cc: David Rientjes <rien...@google.com>
Cc: Mike Galbraith <umgwana...@gmail.com>
Cc: cgr...@vger.kernel.org
Signed-off-by: Rik van Riel <ri...@redhat.com>
---
kernel/cpuset.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index b544e5229d99..94bf59588e23 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1563,6 +1563,7 @@ typedef enum {
FILE_MEMORY_PRESSURE,
FILE_SPREAD_PAGE,
FILE_SPREAD_SLAB,
+ FILE_ISOLCPUS,
} cpuset_filetype_t;

static int cpuset_write_u64(struct cgroup_subsys_state *css, struct cftype *cft,
@@ -1704,6 +1705,20 @@ static ssize_t cpuset_write_resmask(struct kernfs_open_file *of,
return retval ?: nbytes;
}

+static void cpuset_seq_print_isolcpus(struct seq_file *sf, struct cpuset *cs)
+{
+ cpumask_var_t my_isolated_cpus;
+
+ if (!alloc_cpumask_var(&my_isolated_cpus, GFP_KERNEL))
+ return;
+
+ cpumask_and(my_isolated_cpus, cs->cpus_allowed, cpu_isolated_map);
+
+ seq_printf(sf, "%*pbl\n", nodemask_pr_args(my_isolated_cpus));
+
+ free_cpumask_var(my_isolated_cpus);
+}
+
/*
* These ascii lists should be read in a single call, by using a user
* buffer large enough to hold the entire map. If read in smaller
@@ -1733,6 +1748,9 @@ static int cpuset_common_seq_show(struct seq_file *sf, void *v)
case FILE_EFFECTIVE_MEMLIST:
seq_printf(sf, "%*pbl\n", nodemask_pr_args(&cs->effective_mems));
break;
+ case FILE_ISOLCPUS:
+ cpuset_seq_print_isolcpus(sf, cs);
+ break;
default:
ret = -EINVAL;
}
@@ -1893,6 +1911,12 @@ static struct cftype files[] = {
.private = FILE_MEMORY_PRESSURE_ENABLED,
},

+ {
+ .name = "isolcpus",
+ .seq_show = cpuset_common_seq_show,
+ .private = FILE_ISOLCPUS,
+ },
+
{ } /* terminate */
};

--
2.1.0

ri...@redhat.com

unread,
Feb 25, 2015, 11:38:47 AM2/25/15
to linux-...@vger.kernel.org, Rik van Riel, Peter Zijlstra, Clark Williams, Li Zefan, Ingo Molnar, Luiz Capitulino, Mike Galbraith, cgr...@vger.kernel.org
From: Rik van Riel <ri...@redhat.com>

Ensure that cpus specified with the isolcpus= boot commandline
option stay outside of the load balancing in the kernel scheduler.

Operations like load balancing can introduce unwanted latencies,
which is exactly what the isolcpus= commandline is there to prevent.

Previously, simply creating a new cpuset, without even touching the
cpuset.cpus field inside the new cpuset, would undo the effects of
isolcpus=, by creating a scheduler domain spanning the whole system,
and setting up load balancing inside that domain. The cpuset root
cpuset.cpus file is read-only, so there was not even a way to undo
that effect.

This does not impact the majority of cpusets users, since isolcpus=
is a fairly specialized feature used for realtime purposes.

Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Clark Williams <will...@redhat.com>
Cc: Li Zefan <liz...@huawei.com>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: Luiz Capitulino <lcapi...@redhat.com>
Cc: Mike Galbraith <umgwana...@gmail.com>
Cc: cgr...@vger.kernel.org
Signed-off-by: Rik van Riel <ri...@redhat.com>
Tested-by: David Rientjes <rien...@google.com>
---
include/linux/sched.h | 2 ++
kernel/cpuset.c | 13 +++++++++++--
kernel/sched/core.c | 2 +-
3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6d77432e14ff..aeae02435717 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1038,6 +1038,8 @@ static inline struct cpumask *sched_domain_span(struct sched_domain *sd)
extern void partition_sched_domains(int ndoms_new, cpumask_var_t doms_new[],
struct sched_domain_attr *dattr_new);

+extern cpumask_var_t cpu_isolated_map;
+
/* Allocate an array of sched domains, for partition_sched_domains(). */
cpumask_var_t *alloc_sched_domains(unsigned int ndoms);
void free_sched_domains(cpumask_var_t doms[], unsigned int ndoms);
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 1d1fe9361d29..b544e5229d99 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
index f0f831e8a345..3db1beace19b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5812,7 +5812,7 @@ cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu)
}

/* cpus with isolated domains */
-static cpumask_var_t cpu_isolated_map;
+cpumask_var_t cpu_isolated_map;

/* Setup the mask of cpus configured for isolated domains */
static int __init isolated_cpu_setup(char *str)
--
2.1.0

David Rientjes

unread,
Feb 25, 2015, 4:09:36 PM2/25/15
to Rik van Riel, linux-...@vger.kernel.org, Peter Zijlstra, Clark Williams, Li Zefan, Ingo Molnar, Luiz Capitulino, Mike Galbraith, cgr...@vger.kernel.org
On Wed, 25 Feb 2015, ri...@redhat.com wrote:

> diff --git a/kernel/cpuset.c b/kernel/cpuset.c
> index b544e5229d99..94bf59588e23 100644
> --- a/kernel/cpuset.c
> +++ b/kernel/cpuset.c
> @@ -1563,6 +1563,7 @@ typedef enum {
> FILE_MEMORY_PRESSURE,
> FILE_SPREAD_PAGE,
> FILE_SPREAD_SLAB,
> + FILE_ISOLCPUS,
> } cpuset_filetype_t;
>
> static int cpuset_write_u64(struct cgroup_subsys_state *css, struct cftype *cft,
> @@ -1704,6 +1705,20 @@ static ssize_t cpuset_write_resmask(struct kernfs_open_file *of,
> return retval ?: nbytes;
> }
>
> +static void cpuset_seq_print_isolcpus(struct seq_file *sf, struct cpuset *cs)
> +{
> + cpumask_var_t my_isolated_cpus;
> +
> + if (!alloc_cpumask_var(&my_isolated_cpus, GFP_KERNEL))
> + return;
> +
> + cpumask_and(my_isolated_cpus, cs->cpus_allowed, cpu_isolated_map);
> +
> + seq_printf(sf, "%*pbl\n", nodemask_pr_args(my_isolated_cpus));

That unfortunately won't output anything, it needs to be
cpumask_pr_args(). After that's fixed, feel free to add my

Acked-by: David Rientjes <rien...@google.com>

Rik van Riel

unread,
Feb 25, 2015, 4:22:28 PM2/25/15
to David Rientjes, linux-...@vger.kernel.org, Peter Zijlstra, Clark Williams, Li Zefan, Ingo Molnar, Luiz Capitulino, Mike Galbraith, cgr...@vger.kernel.org
Gah. Too many things going on at once.

Let me resend a v3 of just patch 2/2 with your ack.

Rik van Riel

unread,
Feb 25, 2015, 4:33:23 PM2/25/15
to linux-...@vger.kernel.org, Peter Zijlstra, Clark Williams, Li Zefan, Ingo Molnar, Luiz Capitulino, David Rientjes, Mike Galbraith, cgr...@vger.kernel.org
Subject: cpusets,isolcpus: add file to show isolated cpus in cpuset

The previous patch makes it so the code skips over isolcpus when
building scheduler load balancing domains. This makes it hard to
see for a user which of the CPUs in a cpuset are participating in
load balancing, and which ones are isolated cpus.

Add a cpuset.isolcpus file with info on which cpus in a cpuset are
isolated CPUs.

This file is read-only for now. In the future we could extend things
so isolcpus can be changed at run time, for the root (system wide)
cpuset only.

Acked-by: David Rientjes <rien...@google.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Clark Williams <will...@redhat.com>
Cc: Li Zefan <liz...@huawei.com>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: Luiz Capitulino <lcapi...@redhat.com>
Cc: David Rientjes <rien...@google.com>
Cc: Mike Galbraith <umgwana...@gmail.com>
Cc: cgr...@vger.kernel.org
Signed-off-by: Rik van Riel <ri...@redhat.com>
---
OK, I suck. Thanks to David Rientjes for spotting the silly mistake.

kernel/cpuset.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index b544e5229d99..455df101ceec 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1563,6 +1563,7 @@ typedef enum {
FILE_MEMORY_PRESSURE,
FILE_SPREAD_PAGE,
FILE_SPREAD_SLAB,
+ FILE_ISOLCPUS,
} cpuset_filetype_t;

static int cpuset_write_u64(struct cgroup_subsys_state *css, struct cftype *cft,
@@ -1704,6 +1705,20 @@ static ssize_t cpuset_write_resmask(struct kernfs_open_file *of,
return retval ?: nbytes;
}

+static void cpuset_seq_print_isolcpus(struct seq_file *sf, struct cpuset *cs)
+{
+ cpumask_var_t my_isolated_cpus;
+
+ if (!alloc_cpumask_var(&my_isolated_cpus, GFP_KERNEL))
+ return;
+
+ cpumask_and(my_isolated_cpus, cs->cpus_allowed, cpu_isolated_map);
+
+ seq_printf(sf, "%*pbl\n", cpumask_pr_args(my_isolated_cpus));

Zefan Li

unread,
Feb 26, 2015, 6:06:36 AM2/26/15
to ri...@redhat.com, linux-...@vger.kernel.org, Peter Zijlstra, Clark Williams, Ingo Molnar, Luiz Capitulino, David Rientjes, Mike Galbraith, cgr...@vger.kernel.org
> +static void cpuset_seq_print_isolcpus(struct seq_file *sf, struct cpuset *cs)
> +{
> + cpumask_var_t my_isolated_cpus;
> +
> + if (!alloc_cpumask_var(&my_isolated_cpus, GFP_KERNEL))
> + return;
> +

Make it return -ENOMEM ? Or make it a global variable and allocate memory for it
in cpuset_init().

Rik van Riel

unread,
Feb 26, 2015, 10:25:54 AM2/26/15
to Zefan Li, linux-...@vger.kernel.org, Peter Zijlstra, Clark Williams, Ingo Molnar, Luiz Capitulino, David Rientjes, Mike Galbraith, cgr...@vger.kernel.org
On 02/26/2015 06:05 AM, Zefan Li wrote:
>> +static void cpuset_seq_print_isolcpus(struct seq_file *sf, struct cpuset *cs)
>> +{
>> + cpumask_var_t my_isolated_cpus;
>> +
>> + if (!alloc_cpumask_var(&my_isolated_cpus, GFP_KERNEL))
>> + return;
>> +
>
> Make it return -ENOMEM ? Or make it a global variable and allocate memory for it
> in cpuset_init().

OK, can do.

I see that cpuset_common_seq_show already takes a lock, so having
one global variable for this should not introduce any additional
contention.

I will send a v4.

>> @@ -1733,6 +1748,9 @@ static int cpuset_common_seq_show(struct seq_file *sf, void *v)
>> case FILE_EFFECTIVE_MEMLIST:
>> seq_printf(sf, "%*pbl\n", nodemask_pr_args(&cs->effective_mems));
>> break;
>> + case FILE_ISOLCPUS:
>> + cpuset_seq_print_isolcpus(sf, cs);
>> + break;
>> default:
>> ret = -EINVAL;
>> }


--
All rights reversed

Rik van Riel

unread,
Feb 26, 2015, 12:13:17 PM2/26/15
to Zefan Li, linux-...@vger.kernel.org, Peter Zijlstra, Clark Williams, Ingo Molnar, Luiz Capitulino, David Rientjes, Mike Galbraith, cgr...@vger.kernel.org
On Thu, 26 Feb 2015 19:05:57 +0800
Zefan Li <liz...@huawei.com> wrote:

> Make it return -ENOMEM ? Or make it a global variable and allocate memory for it
> in cpuset_init().

Here you are. This addresses your concern, as well as the
issue David Rientjes found earlier.

---8<---

Subject: cpusets,isolcpus: add file to show isolated cpus in cpuset

The previous patch makes it so the code skips over isolcpus when
building scheduler load balancing domains. This makes it hard to
see for a user which of the CPUs in a cpuset are participating in
load balancing, and which ones are isolated cpus.

Add a cpuset.isolcpus file with info on which cpus in a cpuset are
isolated CPUs.

This file is read-only for now. In the future we could extend things
so isolcpus can be changed at run time, for the root (system wide)
cpuset only.

Acked-by: David Rientjes <rien...@google.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Clark Williams <will...@redhat.com>
Cc: Li Zefan <liz...@huawei.com>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: Luiz Capitulino <lcapi...@redhat.com>
Cc: David Rientjes <rien...@google.com>
Cc: Mike Galbraith <umgwana...@gmail.com>
Cc: cgr...@vger.kernel.org
Signed-off-by: Rik van Riel <ri...@redhat.com>
---
kernel/cpuset.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index b544e5229d99..5462e1ca90bd 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1563,6 +1563,7 @@ typedef enum {
FILE_MEMORY_PRESSURE,
FILE_SPREAD_PAGE,
FILE_SPREAD_SLAB,
+ FILE_ISOLCPUS,
} cpuset_filetype_t;

static int cpuset_write_u64(struct cgroup_subsys_state *css, struct cftype *cft,
@@ -1704,6 +1705,16 @@ static ssize_t cpuset_write_resmask(struct kernfs_open_file *of,
return retval ?: nbytes;
}

+/* protected by the lock in cpuset_common_seq_show */
+static cpumask_var_t print_isolated_cpus;
+
+static void cpuset_seq_print_isolcpus(struct seq_file *sf, struct cpuset *cs)
+{
+ cpumask_and(print_isolated_cpus, cs->cpus_allowed, cpu_isolated_map);
+
+ seq_printf(sf, "%*pbl\n", cpumask_pr_args(print_isolated_cpus));
+}
+
/*
* These ascii lists should be read in a single call, by using a user
* buffer large enough to hold the entire map. If read in smaller
@@ -1733,6 +1744,9 @@ static int cpuset_common_seq_show(struct seq_file *sf, void *v)
case FILE_EFFECTIVE_MEMLIST:
seq_printf(sf, "%*pbl\n", nodemask_pr_args(&cs->effective_mems));
break;
+ case FILE_ISOLCPUS:
+ cpuset_seq_print_isolcpus(sf, cs);
+ break;
default:
ret = -EINVAL;
}
@@ -1893,6 +1907,12 @@ static struct cftype files[] = {
.private = FILE_MEMORY_PRESSURE_ENABLED,
},

+ {
+ .name = "isolcpus",
+ .seq_show = cpuset_common_seq_show,
+ .private = FILE_ISOLCPUS,
+ },
+
{ } /* terminate */
};

@@ -2070,6 +2090,8 @@ int __init cpuset_init(void)
BUG();
if (!alloc_cpumask_var(&top_cpuset.effective_cpus, GFP_KERNEL))
BUG();
+ if (!alloc_cpumask_var(&print_isolated_cpus, GFP_KERNEL))
+ BUG();

cpumask_setall(top_cpuset.cpus_allowed);
nodes_setall(top_cpuset.mems_allowed);

Peter Zijlstra

unread,
Feb 27, 2015, 4:32:50 AM2/27/15
to ri...@redhat.com, linux-...@vger.kernel.org, Clark Williams, Li Zefan, Ingo Molnar, Luiz Capitulino, Mike Galbraith, cgr...@vger.kernel.org
On Wed, Feb 25, 2015 at 11:38:07AM -0500, ri...@redhat.com wrote:
> From: Rik van Riel <ri...@redhat.com>
>
> Ensure that cpus specified with the isolcpus= boot commandline
> option stay outside of the load balancing in the kernel scheduler.
>
> Operations like load balancing can introduce unwanted latencies,
> which is exactly what the isolcpus= commandline is there to prevent.
>
> Previously, simply creating a new cpuset, without even touching the
> cpuset.cpus field inside the new cpuset, would undo the effects of
> isolcpus=, by creating a scheduler domain spanning the whole system,
> and setting up load balancing inside that domain. The cpuset root
> cpuset.cpus file is read-only, so there was not even a way to undo
> that effect.
>
> This does not impact the majority of cpusets users, since isolcpus=
> is a fairly specialized feature used for realtime purposes.
>
> Cc: Peter Zijlstra <pet...@infradead.org>
> Cc: Clark Williams <will...@redhat.com>
> Cc: Li Zefan <liz...@huawei.com>
> Cc: Ingo Molnar <mi...@redhat.com>
> Cc: Luiz Capitulino <lcapi...@redhat.com>
> Cc: Mike Galbraith <umgwana...@gmail.com>
> Cc: cgr...@vger.kernel.org
> Signed-off-by: Rik van Riel <ri...@redhat.com>
> Tested-by: David Rientjes <rien...@google.com>

Might I asked you to update Documentation/cgroups/cpusets.txt with this
knowledge? While it does mentions isolcpus it does not clarify the
interaction between it and cpusets.

Other than that,

Acked-by: Peter Zijlstra (Intel) <pet...@infradead.org>

Rik van Riel

unread,
Feb 27, 2015, 12:08:43 PM2/27/15
to Peter Zijlstra, linux-...@vger.kernel.org, Clark Williams, Li Zefan, Ingo Molnar, Luiz Capitulino, Mike Galbraith, cgr...@vger.kernel.org
Document the subtly changed relationship between cpusets and isolcpus.
Turns out the old documentation did not quite match the code...

Signed-off-by: Rik van Riel <ri...@redhat.com>
Suggested-by: Peter Zijlstra <pet...@infradead.org>
---
Documentation/cgroups/cpusets.txt | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/Documentation/cgroups/cpusets.txt b/Documentation/cgroups/cpusets.txt
index f2235a162529..fdf7dff3f607 100644
--- a/Documentation/cgroups/cpusets.txt
+++ b/Documentation/cgroups/cpusets.txt
@@ -392,8 +392,10 @@ Put simply, it costs less to balance between two smaller sched domains
than one big one, but doing so means that overloads in one of the
two domains won't be load balanced to the other one.

-By default, there is one sched domain covering all CPUs, except those
-marked isolated using the kernel boot time "isolcpus=" argument.
+By default, there is one sched domain covering all CPUs, including those
+marked isolated using the kernel boot time "isolcpus=" argument. However,
+the isolated CPUs will not participate in load balancing, and will not
+have tasks running on them unless explicitly assigned.

This default load balancing across all CPUs is not well suited for
the following two situations:
@@ -465,6 +467,10 @@ such partially load balanced cpusets, as they may be artificially
constrained to some subset of the CPUs allowed to them, for lack of
load balancing to the other CPUs.

+CPUs in "cpuset.isolcpus" were excluded from load balancing by the
+isolcpus= kernel boot option, and will never be load balanced regardless
+of the value of "cpuset.sched_load_balance" in any cpuset.
+
1.7.1 sched_load_balance implementation details.
------------------------------------------------

David Rientjes

unread,
Feb 27, 2015, 4:15:35 PM2/27/15
to Rik van Riel, Peter Zijlstra, linux-...@vger.kernel.org, Clark Williams, Li Zefan, Ingo Molnar, Luiz Capitulino, Mike Galbraith, cgr...@vger.kernel.org
On Fri, 27 Feb 2015, Rik van Riel wrote:

> Document the subtly changed relationship between cpusets and isolcpus.
> Turns out the old documentation did not quite match the code...
>
> Signed-off-by: Rik van Riel <ri...@redhat.com>
> Suggested-by: Peter Zijlstra <pet...@infradead.org>

Acked-by: David Rientjes <rien...@google.com>

Zefan Li

unread,
Feb 27, 2015, 10:22:02 PM2/27/15
to ri...@redhat.com, linux-...@vger.kernel.org, Peter Zijlstra, Clark Williams, Ingo Molnar, Luiz Capitulino, Mike Galbraith, cgr...@vger.kernel.org
On 2015/2/26 0:38, ri...@redhat.com wrote:
> From: Rik van Riel <ri...@redhat.com>
>
> Ensure that cpus specified with the isolcpus= boot commandline
> option stay outside of the load balancing in the kernel scheduler.
>
> Operations like load balancing can introduce unwanted latencies,
> which is exactly what the isolcpus= commandline is there to prevent.
>
> Previously, simply creating a new cpuset, without even touching the
> cpuset.cpus field inside the new cpuset, would undo the effects of
> isolcpus=, by creating a scheduler domain spanning the whole system,
> and setting up load balancing inside that domain. The cpuset root
> cpuset.cpus file is read-only, so there was not even a way to undo
> that effect.
>
> This does not impact the majority of cpusets users, since isolcpus=
> is a fairly specialized feature used for realtime purposes.
>
> Cc: Peter Zijlstra <pet...@infradead.org>
> Cc: Clark Williams <will...@redhat.com>
> Cc: Li Zefan <liz...@huawei.com>
> Cc: Ingo Molnar <mi...@redhat.com>
> Cc: Luiz Capitulino <lcapi...@redhat.com>
> Cc: Mike Galbraith <umgwana...@gmail.com>
> Cc: cgr...@vger.kernel.org
> Signed-off-by: Rik van Riel <ri...@redhat.com>
> Tested-by: David Rientjes <rien...@google.com>

Acked-by: Zefan Li <liz...@huawei.com>

Zefan Li

unread,
Feb 27, 2015, 10:22:58 PM2/27/15
to Rik van Riel, linux-...@vger.kernel.org, Peter Zijlstra, Clark Williams, Ingo Molnar, Luiz Capitulino, David Rientjes, Mike Galbraith, cgr...@vger.kernel.org
> Subject: cpusets,isolcpus: add file to show isolated cpus in cpuset
>
> The previous patch makes it so the code skips over isolcpus when
> building scheduler load balancing domains. This makes it hard to
> see for a user which of the CPUs in a cpuset are participating in
> load balancing, and which ones are isolated cpus.
>
> Add a cpuset.isolcpus file with info on which cpus in a cpuset are
> isolated CPUs.
>
> This file is read-only for now. In the future we could extend things
> so isolcpus can be changed at run time, for the root (system wide)
> cpuset only.
>
> Acked-by: David Rientjes <rien...@google.com>
> Cc: Peter Zijlstra <pet...@infradead.org>
> Cc: Clark Williams <will...@redhat.com>
> Cc: Li Zefan <liz...@huawei.com>
> Cc: Ingo Molnar <mi...@redhat.com>
> Cc: Luiz Capitulino <lcapi...@redhat.com>
> Cc: David Rientjes <rien...@google.com>
> Cc: Mike Galbraith <umgwana...@gmail.com>
> Cc: cgr...@vger.kernel.org
> Signed-off-by: Rik van Riel <ri...@redhat.com>

Acked-by: Zefan Li <liz...@huawei.com>

Zefan Li

unread,
Feb 27, 2015, 10:23:40 PM2/27/15
to Rik van Riel, Peter Zijlstra, linux-...@vger.kernel.org, Clark Williams, Ingo Molnar, Luiz Capitulino, Mike Galbraith, cgr...@vger.kernel.org
On 2015/2/28 1:08, Rik van Riel wrote:
> Document the subtly changed relationship between cpusets and isolcpus.
> Turns out the old documentation did not quite match the code...
>
> Signed-off-by: Rik van Riel <ri...@redhat.com>
> Suggested-by: Peter Zijlstra <pet...@infradead.org>

Acked-by: Zefan Li <liz...@huawei.com>

Zefan Li

unread,
Mar 2, 2015, 1:16:25 AM3/2/15
to Rik van Riel, linux-...@vger.kernel.org, Peter Zijlstra, Clark Williams, Ingo Molnar, Luiz Capitulino, David Rientjes, Mike Galbraith, cgr...@vger.kernel.org
Hi Rik,

> Subject: cpusets,isolcpus: add file to show isolated cpus in cpuset
>
> The previous patch makes it so the code skips over isolcpus when
> building scheduler load balancing domains. This makes it hard to
> see for a user which of the CPUs in a cpuset are participating in
> load balancing, and which ones are isolated cpus.
>
> Add a cpuset.isolcpus file with info on which cpus in a cpuset are
> isolated CPUs.
>
> This file is read-only for now. In the future we could extend things
> so isolcpus can be changed at run time, for the root (system wide)
> cpuset only.
>

One Question, why not add a /sys/devices/system/cpu/isolated instead?

Peter Zijlstra

unread,
Mar 2, 2015, 4:09:55 AM3/2/15
to Rik van Riel, Zefan Li, linux-...@vger.kernel.org, Clark Williams, Ingo Molnar, Luiz Capitulino, David Rientjes, Mike Galbraith, cgr...@vger.kernel.org
On Thu, Feb 26, 2015 at 12:12:31PM -0500, Rik van Riel wrote:
> Subject: cpusets,isolcpus: add file to show isolated cpus in cpuset
>
> The previous patch makes it so the code skips over isolcpus when
> building scheduler load balancing domains. This makes it hard to
> see for a user which of the CPUs in a cpuset are participating in
> load balancing, and which ones are isolated cpus.
>
> Add a cpuset.isolcpus file with info on which cpus in a cpuset are
> isolated CPUs.
>
> This file is read-only for now. In the future we could extend things
> so isolcpus can be changed at run time, for the root (system wide)
> cpuset only.
>
> Acked-by: David Rientjes <rien...@google.com>
> Cc: Peter Zijlstra <pet...@infradead.org>
> Cc: Clark Williams <will...@redhat.com>
> Cc: Li Zefan <liz...@huawei.com>
> Cc: Ingo Molnar <mi...@redhat.com>
> Cc: Luiz Capitulino <lcapi...@redhat.com>
> Cc: David Rientjes <rien...@google.com>
> Cc: Mike Galbraith <umgwana...@gmail.com>
> Cc: cgr...@vger.kernel.org
> Signed-off-by: Rik van Riel <ri...@redhat.com>

So let me start off by saying I hate isolcpus ;-)

Let me further state that I had hopes we could extend cpusets to
natively provide the functionality isolcpus has, and kill isolcpus.

The 'normal' way would be to create 2 cgroups with disjoint cpus,
disable sched_load_balance on root and one of the siblings, while moving
everything into the other group.

The 'problem' is that we cannot move everything that is affected by
isolcpus, workqueues have grown a horrible 'new' interface outside of
the regular task interfaces and things like kthreadd are non-movable for
mostly good reasons.

Furthermore it appears that software like system-disease and libvirt
hard assume they're lord and master of the cgroup hierarchy and do not
expect things like this.

So while I mostly hate all of this it might be the best we can do :-(

Peter Zijlstra

unread,
Mar 2, 2015, 4:12:33 AM3/2/15
to Zefan Li, Rik van Riel, linux-...@vger.kernel.org, Clark Williams, Ingo Molnar, Luiz Capitulino, David Rientjes, Mike Galbraith, cgr...@vger.kernel.org
On Mon, Mar 02, 2015 at 02:15:39PM +0800, Zefan Li wrote:
> Hi Rik,
>
> > Subject: cpusets,isolcpus: add file to show isolated cpus in cpuset
> >
> > The previous patch makes it so the code skips over isolcpus when
> > building scheduler load balancing domains. This makes it hard to
> > see for a user which of the CPUs in a cpuset are participating in
> > load balancing, and which ones are isolated cpus.
> >
> > Add a cpuset.isolcpus file with info on which cpus in a cpuset are
> > isolated CPUs.
> >
> > This file is read-only for now. In the future we could extend things
> > so isolcpus can be changed at run time, for the root (system wide)
> > cpuset only.
> >
>
> One Question, why not add a /sys/devices/system/cpu/isolated instead?

It would leave userspace to calculate the result for any one cpuset
itself. Furthermore, is that /sys thing visible for all nested
containers?

Mike Galbraith

unread,
Mar 2, 2015, 7:45:55 AM3/2/15
to Peter Zijlstra, Rik van Riel, Zefan Li, linux-...@vger.kernel.org, Clark Williams, Ingo Molnar, Luiz Capitulino, David Rientjes, cgr...@vger.kernel.org
+1

That's where nohz_full goop belongs too.

> The 'normal' way would be to create 2 cgroups with disjoint cpus,
> disable sched_load_balance on root and one of the siblings, while moving
> everything into the other group.

That's what cset shield does, works fine.

> The 'problem' is that we cannot move everything that is affected by
> isolcpus, workqueues have grown a horrible 'new' interface outside of
> the regular task interfaces and things like kthreadd are non-movable for
> mostly good reasons.
>
> Furthermore it appears that software like system-disease and libvirt
> hard assume they're lord and master of the cgroup hierarchy and do not
> expect things like this.
>
> So while I mostly hate all of this it might be the best we can do :-(

Hm, I'm now all system-disease-ified now (still hate the bloody thing),
and have no problem isolating cpus via cpusets, modulo workqueues
wanting a bat upside the head.

-Mike

Rik van Riel

unread,
Mar 2, 2015, 9:36:30 AM3/2/15
to Mike Galbraith, Peter Zijlstra, Zefan Li, linux-...@vger.kernel.org, Clark Williams, Ingo Molnar, Luiz Capitulino, David Rientjes, cgr...@vger.kernel.org
Except nohz_full and isolcpus are very much global attributes of
each CPU, so I am not sure whether it would make sense to allow
configuration of this attribute anywhere other than the root
cpuset.

--
All rights reversed

Mike Galbraith

unread,
Mar 2, 2015, 9:55:01 AM3/2/15
to Rik van Riel, Peter Zijlstra, Zefan Li, linux-...@vger.kernel.org, Clark Williams, Ingo Molnar, Luiz Capitulino, David Rientjes, cgr...@vger.kernel.org
They're attributes of exclusive sets, which excludes the root set. It'd
be kinda hard to have the root set be both ticked and tickless :)

-Mike

Tejun Heo

unread,
Mar 2, 2015, 10:29:37 AM3/2/15
to Mike Galbraith, Peter Zijlstra, Rik van Riel, Zefan Li, linux-...@vger.kernel.org, Clark Williams, Ingo Molnar, Luiz Capitulino, David Rientjes, cgr...@vger.kernel.org
On Mon, Mar 02, 2015 at 01:44:50PM +0100, Mike Galbraith wrote:
> Hm, I'm now all system-disease-ified now (still hate the bloody thing),
> and have no problem isolating cpus via cpusets, modulo workqueues
> wanting a bat upside the head.

It shouldn't be difficult to teach workqueue pools to follow the same
rules. This matters only for the unbound ones anyway, right?

Thanks.

--
tejun

Mike Galbraith

unread,
Mar 2, 2015, 11:03:12 AM3/2/15
to Tejun Heo, Peter Zijlstra, Rik van Riel, Zefan Li, linux-...@vger.kernel.org, Clark Williams, Ingo Molnar, Luiz Capitulino, David Rientjes, cgr...@vger.kernel.org
On Mon, 2015-03-02 at 10:29 -0500, Tejun Heo wrote:
> On Mon, Mar 02, 2015 at 01:44:50PM +0100, Mike Galbraith wrote:
> > Hm, I'm now all system-disease-ified now (still hate the bloody thing),
> > and have no problem isolating cpus via cpusets, modulo workqueues
> > wanting a bat upside the head.
>
> It shouldn't be difficult to teach workqueue pools to follow the same
> rules. This matters only for the unbound ones anyway, right?

Well, those are the only ones we can do anything about. Dirt simple
diddling of the workqueue default mask as sched domains are
added/removed should do it I think. Automatically moving any existing
unbound worker away from isolated cores at the same time would be a
bonus, most important is that no new threads sneak in.

-Mike

Tejun Heo

unread,
Mar 2, 2015, 11:09:17 AM3/2/15
to Mike Galbraith, Peter Zijlstra, Rik van Riel, Zefan Li, linux-...@vger.kernel.org, Clark Williams, Ingo Molnar, Luiz Capitulino, David Rientjes, cgr...@vger.kernel.org
On Mon, Mar 02, 2015 at 05:02:57PM +0100, Mike Galbraith wrote:
> Well, those are the only ones we can do anything about. Dirt simple
> diddling of the workqueue default mask as sched domains are
> added/removed should do it I think. Automatically moving any existing
> unbound worker away from isolated cores at the same time would be a
> bonus, most important is that no new threads sneak in.

Worker pools are immutable once created and configuraiton changes are
achieved by creating new pools and draining old ones but at any rate
making it follow config changes is almost trivial. Figuring out
configuration policy might take a bit of effort tho. Can you point me
to what specific configuration it should be following?

Thanks.

--
tejun

Tejun Heo

unread,
Mar 2, 2015, 12:08:28 PM3/2/15
to Rik van Riel, Zefan Li, linux-...@vger.kernel.org, Peter Zijlstra, Clark Williams, Ingo Molnar, Luiz Capitulino, David Rientjes, Mike Galbraith, cgr...@vger.kernel.org
On Thu, Feb 26, 2015 at 12:12:31PM -0500, Rik van Riel wrote:
Applied 1-2 to cgroup/for-4.1.

Thanks.

--
tejun

Tejun Heo

unread,
Mar 2, 2015, 12:31:35 PM3/2/15
to Rik van Riel, Zefan Li, linux-...@vger.kernel.org, Peter Zijlstra, Clark Williams, Ingo Molnar, Luiz Capitulino, David Rientjes, Mike Galbraith, cgr...@vger.kernel.org
On Mon, Mar 02, 2015 at 12:01:16PM -0500, Tejun Heo wrote:
> Applied 1-2 to cgroup/for-4.1.

Reverted due to build failure. Looks like UP build is broken.

Mike Galbraith

unread,
Mar 2, 2015, 12:35:50 PM3/2/15
to Tejun Heo, Peter Zijlstra, Rik van Riel, Zefan Li, linux-...@vger.kernel.org, Clark Williams, Ingo Molnar, Luiz Capitulino, David Rientjes, cgr...@vger.kernel.org
On Mon, 2015-03-02 at 11:09 -0500, Tejun Heo wrote:
> On Mon, Mar 02, 2015 at 05:02:57PM +0100, Mike Galbraith wrote:
> > Well, those are the only ones we can do anything about. Dirt simple
> > diddling of the workqueue default mask as sched domains are
> > added/removed should do it I think. Automatically moving any existing
> > unbound worker away from isolated cores at the same time would be a
> > bonus, most important is that no new threads sneak in.
>
> Worker pools are immutable once created and configuraiton changes are
> achieved by creating new pools and draining old ones but at any rate
> making it follow config changes is almost trivial. Figuring out
> configuration policy might take a bit of effort tho. Can you point me
> to what specific configuration it should be following?

For cpusets, an exclusive set should become taboo to unbound workers
when load balancing is turned off. The user making sched domains go
away is a not so subtle hint that he wants no random interference, as he
is trying to assume full responsibility for task placement therein.

In my trees, I let the user turn rt cpupri/push/pull off as well, as
that further reduces jitter.

-Mike

Zefan Li

unread,
Mar 3, 2015, 4:55:13 AM3/3/15
to Peter Zijlstra, Rik van Riel, linux-...@vger.kernel.org, Clark Williams, Ingo Molnar, Luiz Capitulino, David Rientjes, Mike Galbraith, cgr...@vger.kernel.org
On 2015/3/2 17:12, Peter Zijlstra wrote:
> On Mon, Mar 02, 2015 at 02:15:39PM +0800, Zefan Li wrote:
>> Hi Rik,
>>
>>> Subject: cpusets,isolcpus: add file to show isolated cpus in cpuset
>>>
>>> The previous patch makes it so the code skips over isolcpus when
>>> building scheduler load balancing domains. This makes it hard to
>>> see for a user which of the CPUs in a cpuset are participating in
>>> load balancing, and which ones are isolated cpus.
>>>
>>> Add a cpuset.isolcpus file with info on which cpus in a cpuset are
>>> isolated CPUs.
>>>
>>> This file is read-only for now. In the future we could extend things
>>> so isolcpus can be changed at run time, for the root (system wide)
>>> cpuset only.
>>>
>>
>> One Question, why not add a /sys/devices/system/cpu/isolated instead?
>
> It would leave userspace to calculate the result for any one cpuset
> itself.

It's trivial. Instead of reading cpuset.isolcpus, now we read cpuset.cpus
and /sys/.../isolated.

> Furthermore, is that /sys thing visible for all nested
> containers?
> .
>

Never tried nested containers, but I think so.

ri...@redhat.com

unread,
Mar 3, 2015, 6:00:34 PM3/3/15
to t...@kernel.org, linux-...@vger.kernel.org
From: Rik van Riel <ri...@redhat.com>

The previous patch makes it so the code skips over isolcpus when
building scheduler load balancing domains. This makes it hard to
see for a user which of the CPUs in a cpuset are participating in
load balancing, and which ones are isolated cpus.

Add a cpuset.isolcpus file with info on which cpus in a cpuset are
isolated CPUs.

This file is read-only for now. In the future we could extend things
so isolcpus can be changed at run time, for the root (system wide)
cpuset only.

Acked-by: David Rientjes <rien...@google.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Clark Williams <will...@redhat.com>
Cc: Li Zefan <liz...@huawei.com>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: Luiz Capitulino <lcapi...@redhat.com>
Cc: David Rientjes <rien...@google.com>
Cc: Mike Galbraith <umgwana...@gmail.com>
Cc: cgr...@vger.kernel.org
Signed-off-by: Rik van Riel <ri...@redhat.com>
2.1.0

ri...@redhat.com

unread,
Mar 3, 2015, 6:00:45 PM3/3/15
to t...@kernel.org, linux-...@vger.kernel.org
From: Rik van Riel <ri...@redhat.com>

Document the subtly changed relationship between cpusets and isolcpus.
Turns out the old documentation did not match the code...

Signed-off-by: Rik van Riel <ri...@redhat.com>

ri...@redhat.com

unread,
Mar 3, 2015, 6:01:01 PM3/3/15
to t...@kernel.org, linux-...@vger.kernel.org
From: Rik van Riel <ri...@redhat.com>

Needed by the next patch. Also makes cpu_isolated_map present
when compiled without SMP and/or with CONFIG_NR_CPUS=1, like
the other cpu masks.

At some point we may want to clean things up so cpumasks do
not exist in UP kernels. Maybe something for the CONFIG_TINY
crowd.

Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Clark Williams <will...@redhat.com>
Cc: Li Zefan <liz...@huawei.com>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: Luiz Capitulino <lcapi...@redhat.com>
Cc: David Rientjes <rien...@google.com>
Cc: Mike Galbraith <umgwana...@gmail.com>
Cc: cgr...@vger.kernel.org
Signed-off-by: Rik van Riel <ri...@redhat.com>
---
include/linux/sched.h | 2 ++
kernel/sched/core.c | 6 +++---
2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6d77432e14ff..ca365d79480c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -329,6 +329,8 @@ extern asmlinkage void schedule_tail(struct task_struct *prev);
extern void init_idle(struct task_struct *idle, int cpu);
extern void init_idle_bootup_task(struct task_struct *idle);

+extern cpumask_var_t cpu_isolated_map;
+
extern int runqueue_is_locked(int cpu);

#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f0f831e8a345..b578bb23410b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -306,6 +306,9 @@ __read_mostly int scheduler_running;
*/
int sysctl_sched_rt_runtime = 950000;

+/* cpus with isolated domains */
+cpumask_var_t cpu_isolated_map;
+
/*
* this_rq_lock - lock this runqueue and disable interrupts.
*/
@@ -5811,9 +5814,6 @@ cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu)
update_top_cache_domain(cpu);
}

-/* cpus with isolated domains */
-static cpumask_var_t cpu_isolated_map;
-
/* Setup the mask of cpus configured for isolated domains */
static int __init isolated_cpu_setup(char *str)
{

ri...@redhat.com

unread,
Mar 3, 2015, 6:01:16 PM3/3/15
to t...@kernel.org, linux-...@vger.kernel.org
Ensure that cpus specified with the isolcpus= boot commandline
option stay outside of the load balancing in the kernel scheduler.

Operations like load balancing can introduce unwanted latencies,
which is exactly what the isolcpus= commandline is there to prevent.

Previously, simply creating a new cpuset, without even touching the
cpuset.cpus field inside the new cpuset, would undo the effects of
isolcpus=, by creating a scheduler domain spanning the whole system,
and setting up load balancing inside that domain. The cpuset root
cpuset.cpus file is read-only, so there was not even a way to undo
that effect.

This does not impact the majority of cpusets users, since isolcpus=
is a fairly specialized feature used for realtime purposes.

This version fixes the UP compilation issue, in the same way done
for the other cpumasks.

ri...@redhat.com

unread,
Mar 3, 2015, 6:01:36 PM3/3/15
to t...@kernel.org, linux-...@vger.kernel.org
From: Rik van Riel <ri...@redhat.com>

Ensure that cpus specified with the isolcpus= boot commandline
option stay outside of the load balancing in the kernel scheduler.

Operations like load balancing can introduce unwanted latencies,
which is exactly what the isolcpus= commandline is there to prevent.

Previously, simply creating a new cpuset, without even touching the
cpuset.cpus field inside the new cpuset, would undo the effects of
isolcpus=, by creating a scheduler domain spanning the whole system,
and setting up load balancing inside that domain. The cpuset root
cpuset.cpus file is read-only, so there was not even a way to undo
that effect.

This does not impact the majority of cpusets users, since isolcpus=
is a fairly specialized feature used for realtime purposes.

Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Clark Williams <will...@redhat.com>
Cc: Li Zefan <liz...@huawei.com>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: Luiz Capitulino <lcapi...@redhat.com>
Cc: Mike Galbraith <umgwana...@gmail.com>
Cc: cgr...@vger.kernel.org
Signed-off-by: Rik van Riel <ri...@redhat.com>
Tested-by: David Rientjes <rien...@google.com>
---
kernel/cpuset.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 1d1fe9361d29..b544e5229d99 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -625,6 +625,7 @@ static int generate_sched_domains(cpumask_var_t **domains,
int csn; /* how many cpuset ptrs in csa so far */
int i, j, k; /* indices for partition finding loops */
cpumask_var_t *doms; /* resulting partition; i.e. sched domains */
+ cpumask_var_t non_isolated_cpus; /* load balanced CPUs */
struct sched_domain_attr *dattr; /* attributes for custom domains */
int ndoms = 0; /* number of sched domains in result */
int nslot; /* next empty doms[] struct cpumask slot */
@@ -634,6 +635,10 @@ static int generate_sched_domains(cpumask_var_t **domains,
dattr = NULL;
csa = NULL;

+ if (!alloc_cpumask_var(&non_isolated_cpus, GFP_KERNEL))
+ goto done;
+ cpumask_andnot(non_isolated_cpus, cpu_possible_mask, cpu_isolated_map);
+
/* Special case for the 99% of systems with one, full, sched domain */
if (is_sched_load_balance(&top_cpuset)) {
ndoms = 1;
@@ -646,7 +651,8 @@ static int generate_sched_domains(cpumask_var_t **domains,
*dattr = SD_ATTR_INIT;
update_domain_attr_tree(dattr, &top_cpuset);
}
- cpumask_copy(doms[0], top_cpuset.effective_cpus);
+ cpumask_and(doms[0], top_cpuset.effective_cpus,
+ non_isolated_cpus);

goto done;
}
@@ -669,7 +675,8 @@ static int generate_sched_domains(cpumask_var_t **domains,
* the corresponding sched domain.
*/
if (!cpumask_empty(cp->cpus_allowed) &&
- !is_sched_load_balance(cp))
+ !(is_sched_load_balance(cp) &&
+ cpumask_intersects(cp->cpus_allowed, non_isolated_cpus)))
continue;

if (is_sched_load_balance(cp))
@@ -751,6 +758,7 @@ static int generate_sched_domains(cpumask_var_t **domains,

if (apn == b->pn) {
cpumask_or(dp, dp, b->effective_cpus);
+ cpumask_and(dp, dp, non_isolated_cpus);
if (dattr)
update_domain_attr_tree(dattr + nslot, b);

@@ -763,6 +771,7 @@ static int generate_sched_domains(cpumask_var_t **domains,
BUG_ON(nslot != ndoms);

done:
+ free_cpumask_var(non_isolated_cpus);
kfree(csa);

/*
--
2.1.0

Tejun Heo

unread,
Mar 8, 2015, 11:38:07 PM3/8/15
to ri...@redhat.com, linux-...@vger.kernel.org
On Tue, Mar 03, 2015 at 06:00:19PM -0500, ri...@redhat.com wrote:
> Ensure that cpus specified with the isolcpus= boot commandline
> option stay outside of the load balancing in the kernel scheduler.
>
> Operations like load balancing can introduce unwanted latencies,
> which is exactly what the isolcpus= commandline is there to prevent.
>
> Previously, simply creating a new cpuset, without even touching the
> cpuset.cpus field inside the new cpuset, would undo the effects of
> isolcpus=, by creating a scheduler domain spanning the whole system,
> and setting up load balancing inside that domain. The cpuset root
> cpuset.cpus file is read-only, so there was not even a way to undo
> that effect.
>
> This does not impact the majority of cpusets users, since isolcpus=
> is a fairly specialized feature used for realtime purposes.
>
> This version fixes the UP compilation issue, in the same way done
> for the other cpumasks.

Can you please repost with Li Zefan <liz...@huawei.com> and
cgr...@vger.kernel.org cc'd?

Thanks.

--
tejun

ri...@redhat.com

unread,
Mar 9, 2015, 12:12:25 PM3/9/15
to t...@kernel.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org, liz...@huawei.com
Ensure that cpus specified with the isolcpus= boot commandline
option stay outside of the load balancing in the kernel scheduler.

Operations like load balancing can introduce unwanted latencies,
which is exactly what the isolcpus= commandline is there to prevent.

Previously, simply creating a new cpuset, without even touching the
cpuset.cpus field inside the new cpuset, would undo the effects of
isolcpus=, by creating a scheduler domain spanning the whole system,
and setting up load balancing inside that domain. The cpuset root
cpuset.cpus file is read-only, so there was not even a way to undo
that effect.

This does not impact the majority of cpusets users, since isolcpus=
is a fairly specialized feature used for realtime purposes.

This version fixes the UP compilation issue, in the same way done
for the other cpumasks.

ri...@redhat.com

unread,
Mar 9, 2015, 12:12:38 PM3/9/15
to t...@kernel.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org, liz...@huawei.com, Rik van Riel
From: Rik van Riel <ri...@redhat.com>

Document the subtly changed relationship between cpusets and isolcpus.
Turns out the old documentation did not match the code...

Signed-off-by: Rik van Riel <ri...@redhat.com>

ri...@redhat.com

unread,
Mar 9, 2015, 12:12:43 PM3/9/15
to t...@kernel.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org, liz...@huawei.com, Rik van Riel, Peter Zijlstra, Clark Williams, Ingo Molnar, Luiz Capitulino, David Rientjes, Mike Galbraith
From: Rik van Riel <ri...@redhat.com>

The previous patch makes it so the code skips over isolcpus when
building scheduler load balancing domains. This makes it hard to
see for a user which of the CPUs in a cpuset are participating in
load balancing, and which ones are isolated cpus.

Add a cpuset.isolcpus file with info on which cpus in a cpuset are
isolated CPUs.

This file is read-only for now. In the future we could extend things
so isolcpus can be changed at run time, for the root (system wide)
cpuset only.

Acked-by: David Rientjes <rien...@google.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Clark Williams <will...@redhat.com>
Cc: Li Zefan <liz...@huawei.com>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: Luiz Capitulino <lcapi...@redhat.com>
Cc: David Rientjes <rien...@google.com>
Signed-off-by: Rik van Riel <ri...@redhat.com>
---
kernel/cpuset.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index b544e5229d99..5462e1ca90bd 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c

ri...@redhat.com

unread,
Mar 9, 2015, 12:13:36 PM3/9/15
to t...@kernel.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org, liz...@huawei.com, Rik van Riel, Peter Zijlstra, Clark Williams, Ingo Molnar, Luiz Capitulino, David Rientjes, Mike Galbraith
From: Rik van Riel <ri...@redhat.com>

Needed by the next patch. Also makes cpu_isolated_map present
when compiled without SMP and/or with CONFIG_NR_CPUS=1, like
the other cpu masks.

At some point we may want to clean things up so cpumasks do
not exist in UP kernels. Maybe something for the CONFIG_TINY
crowd.

Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Clark Williams <will...@redhat.com>
Cc: Li Zefan <liz...@huawei.com>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: Luiz Capitulino <lcapi...@redhat.com>
Cc: David Rientjes <rien...@google.com>
Cc: Mike Galbraith <umgwana...@gmail.com>
Cc: cgr...@vger.kernel.org
Signed-off-by: Rik van Riel <ri...@redhat.com>
---

ri...@redhat.com

unread,
Mar 9, 2015, 12:13:50 PM3/9/15
to t...@kernel.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org, liz...@huawei.com, Rik van Riel, Peter Zijlstra, Clark Williams, Ingo Molnar, Luiz Capitulino, Mike Galbraith
From: Rik van Riel <ri...@redhat.com>

Ensure that cpus specified with the isolcpus= boot commandline
option stay outside of the load balancing in the kernel scheduler.

Operations like load balancing can introduce unwanted latencies,
which is exactly what the isolcpus= commandline is there to prevent.

Previously, simply creating a new cpuset, without even touching the
cpuset.cpus field inside the new cpuset, would undo the effects of
isolcpus=, by creating a scheduler domain spanning the whole system,
and setting up load balancing inside that domain. The cpuset root
cpuset.cpus file is read-only, so there was not even a way to undo
that effect.

This does not impact the majority of cpusets users, since isolcpus=
is a fairly specialized feature used for realtime purposes.

Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Clark Williams <will...@redhat.com>
Cc: Li Zefan <liz...@huawei.com>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: Luiz Capitulino <lcapi...@redhat.com>
Cc: Mike Galbraith <umgwana...@gmail.com>
Cc: cgr...@vger.kernel.org
Signed-off-by: Rik van Riel <ri...@redhat.com>
Tested-by: David Rientjes <rien...@google.com>
---
kernel/cpuset.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 1d1fe9361d29..b544e5229d99 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c

Rik van Riel

unread,
Mar 18, 2015, 12:13:46 PM3/18/15
to t...@kernel.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org, liz...@huawei.com
On 03/09/2015 12:12 PM, ri...@redhat.com wrote:
> Ensure that cpus specified with the isolcpus= boot commandline
> option stay outside of the load balancing in the kernel scheduler.
>
> Operations like load balancing can introduce unwanted latencies,
> which is exactly what the isolcpus= commandline is there to prevent.
>
> Previously, simply creating a new cpuset, without even touching the
> cpuset.cpus field inside the new cpuset, would undo the effects of
> isolcpus=, by creating a scheduler domain spanning the whole system,
> and setting up load balancing inside that domain. The cpuset root
> cpuset.cpus file is read-only, so there was not even a way to undo
> that effect.
>
> This does not impact the majority of cpusets users, since isolcpus=
> is a fairly specialized feature used for realtime purposes.
>
> This version fixes the UP compilation issue, in the same way done
> for the other cpumasks.

Ping?

Tejun? Li?

Tejun Heo

unread,
Mar 18, 2015, 12:48:02 PM3/18/15
to ri...@redhat.com, linux-...@vger.kernel.org, cgr...@vger.kernel.org, liz...@huawei.com, Peter Zijlstra, Clark Williams, Ingo Molnar, Luiz Capitulino, David Rientjes, Mike Galbraith
On Mon, Mar 09, 2015 at 12:12:09PM -0400, ri...@redhat.com wrote:
> From: Rik van Riel <ri...@redhat.com>
>
> The previous patch makes it so the code skips over isolcpus when
> building scheduler load balancing domains. This makes it hard to
> see for a user which of the CPUs in a cpuset are participating in
> load balancing, and which ones are isolated cpus.
>
> Add a cpuset.isolcpus file with info on which cpus in a cpuset are
> isolated CPUs.
>
> This file is read-only for now. In the future we could extend things
> so isolcpus can be changed at run time, for the root (system wide)
> cpuset only.

Didn't Li say that this is trivially computable from userland? I'm
not sure this knob actually belongs to cpuset.

Thanks.

--
tejun

Rik van Riel

unread,
Mar 18, 2015, 7:40:32 PM3/18/15
to Tejun Heo, linux-...@vger.kernel.org, cgr...@vger.kernel.org, liz...@huawei.com, Peter Zijlstra, Clark Williams, Ingo Molnar, Luiz Capitulino, David Rientjes, Mike Galbraith
On 03/18/2015 12:47 PM, Tejun Heo wrote:
> On Mon, Mar 09, 2015 at 12:12:09PM -0400, ri...@redhat.com wrote:
>> From: Rik van Riel <ri...@redhat.com>
>>
>> The previous patch makes it so the code skips over isolcpus when
>> building scheduler load balancing domains. This makes it hard to
>> see for a user which of the CPUs in a cpuset are participating in
>> load balancing, and which ones are isolated cpus.
>>
>> Add a cpuset.isolcpus file with info on which cpus in a cpuset are
>> isolated CPUs.
>>
>> This file is read-only for now. In the future we could extend things
>> so isolcpus can be changed at run time, for the root (system wide)
>> cpuset only.
>
> Didn't Li say that this is trivially computable from userland? I'm
> not sure this knob actually belongs to cpuset.

I don't know whether the information to compute this is
always visible from userland. I am happy to drop this
patch if Li prefers things that way, though.

Li?

--
All rights reversed

Rik van Riel

unread,
Mar 18, 2015, 9:46:00 PM3/18/15
to Tejun Heo, linux-...@vger.kernel.org, cgr...@vger.kernel.org, liz...@huawei.com, Peter Zijlstra, Clark Williams, Ingo Molnar, Luiz Capitulino, David Rientjes, Mike Galbraith
On 03/18/2015 12:47 PM, Tejun Heo wrote:
> On Mon, Mar 09, 2015 at 12:12:09PM -0400, ri...@redhat.com wrote:
>> From: Rik van Riel <ri...@redhat.com>
>>
>> The previous patch makes it so the code skips over isolcpus when
>> building scheduler load balancing domains. This makes it hard to
>> see for a user which of the CPUs in a cpuset are participating in
>> load balancing, and which ones are isolated cpus.
>>
>> Add a cpuset.isolcpus file with info on which cpus in a cpuset are
>> isolated CPUs.
>>
>> This file is read-only for now. In the future we could extend things
>> so isolcpus can be changed at run time, for the root (system wide)
>> cpuset only.
>
> Didn't Li say that this is trivially computable from userland? I'm
> not sure this knob actually belongs to cpuset.

Going back through my email, it looks like Li
previously acked this patch.

That explains why I'm confused about your comment :)

--
All rights reversed

Tejun Heo

unread,
Mar 18, 2015, 9:48:40 PM3/18/15
to Rik van Riel, linux-...@vger.kernel.org, cgr...@vger.kernel.org, liz...@huawei.com, Peter Zijlstra, Clark Williams, Ingo Molnar, Luiz Capitulino, David Rientjes, Mike Galbraith
Hello,

On Wed, Mar 18, 2015 at 09:45:45PM -0400, Rik van Riel wrote:
> Going back through my email, it looks like Li
> previously acked this patch.

Ooh, the patch didn't have acked-by. Sorry about the confusion.

> That explains why I'm confused about your comment :)

I still don't think cpuset is the right place to put this knob. This
is a system-wide boot paramter, isn't it? There gotta be a place
better suited for this, be that a sysctl knob or a file under sys.

Thanks.

--
tejun

Rik van Riel

unread,
Mar 18, 2015, 9:55:00 PM3/18/15
to Zefan Li, Tejun Heo, linux-...@vger.kernel.org, cgr...@vger.kernel.org, Peter Zijlstra, Clark Williams, Ingo Molnar, Luiz Capitulino, David Rientjes, Mike Galbraith
On 03/18/2015 09:50 PM, Zefan Li wrote:
> On 2015/3/19 7:40, Rik van Riel wrote:
>> On 03/18/2015 12:47 PM, Tejun Heo wrote:
>>> On Mon, Mar 09, 2015 at 12:12:09PM -0400, ri...@redhat.com wrote:
>>>> From: Rik van Riel <ri...@redhat.com>
>>>>
>>>> The previous patch makes it so the code skips over isolcpus when
>>>> building scheduler load balancing domains. This makes it hard to
>>>> see for a user which of the CPUs in a cpuset are participating in
>>>> load balancing, and which ones are isolated cpus.
>>>>
>>>> Add a cpuset.isolcpus file with info on which cpus in a cpuset are
>>>> isolated CPUs.
>>>>
>>>> This file is read-only for now. In the future we could extend things
>>>> so isolcpus can be changed at run time, for the root (system wide)
>>>> cpuset only.
>>>
>>> Didn't Li say that this is trivially computable from userland? I'm
>>> not sure this knob actually belongs to cpuset.
>>
>> I don't know whether the information to compute this is
>> always visible from userland. I am happy to drop this
>> patch if Li prefers things that way, though.
>>
>
> What I proposed is adding /sys/devices/system/cpu/isolated. Sysfs is
> visible in containers, unless specially configured not so.

OK, are you willing to take patches 1, 2, and the first hunk of patch
4 now? I can submit a patch to add /sys/devices/system/cpu/isolated
on Friday, to the appropriate maintainer.

(taking tomorrow off to go hiking on the last full day of winter)

Zefan Li

unread,
Mar 18, 2015, 9:55:51 PM3/18/15
to Rik van Riel, Tejun Heo, linux-...@vger.kernel.org, cgr...@vger.kernel.org, Peter Zijlstra, Clark Williams, Ingo Molnar, Luiz Capitulino, David Rientjes, Mike Galbraith
On 2015/3/19 7:40, Rik van Riel wrote:
> On 03/18/2015 12:47 PM, Tejun Heo wrote:
>> On Mon, Mar 09, 2015 at 12:12:09PM -0400, ri...@redhat.com wrote:
>>> From: Rik van Riel <ri...@redhat.com>
>>>
>>> The previous patch makes it so the code skips over isolcpus when
>>> building scheduler load balancing domains. This makes it hard to
>>> see for a user which of the CPUs in a cpuset are participating in
>>> load balancing, and which ones are isolated cpus.
>>>
>>> Add a cpuset.isolcpus file with info on which cpus in a cpuset are
>>> isolated CPUs.
>>>
>>> This file is read-only for now. In the future we could extend things
>>> so isolcpus can be changed at run time, for the root (system wide)
>>> cpuset only.
>>
>> Didn't Li say that this is trivially computable from userland? I'm
>> not sure this knob actually belongs to cpuset.
>
> I don't know whether the information to compute this is
> always visible from userland. I am happy to drop this
> patch if Li prefers things that way, though.
>

What I proposed is adding /sys/devices/system/cpu/isolated. Sysfs is
visible in containers, unless specially configured not so.

Zefan Li

unread,
Mar 19, 2015, 2:28:42 AM3/19/15
to Rik van Riel, Tejun Heo, linux-...@vger.kernel.org, cgr...@vger.kernel.org, Peter Zijlstra, Clark Williams, Ingo Molnar, Luiz Capitulino, David Rientjes, Mike Galbraith
Other patches look good to me.

> (taking tomorrow off to go hiking on the last full day of winter)
>

Have a good day!

Tejun Heo

unread,
Mar 19, 2015, 2:30:21 PM3/19/15
to ri...@redhat.com, linux-...@vger.kernel.org, cgr...@vger.kernel.org, liz...@huawei.com
On Mon, Mar 09, 2015 at 12:12:06PM -0400, ri...@redhat.com wrote:
> Ensure that cpus specified with the isolcpus= boot commandline
> option stay outside of the load balancing in the kernel scheduler.
>
> Operations like load balancing can introduce unwanted latencies,
> which is exactly what the isolcpus= commandline is there to prevent.
>
> Previously, simply creating a new cpuset, without even touching the
> cpuset.cpus field inside the new cpuset, would undo the effects of
> isolcpus=, by creating a scheduler domain spanning the whole system,
> and setting up load balancing inside that domain. The cpuset root
> cpuset.cpus file is read-only, so there was not even a way to undo
> that effect.
>
> This does not impact the majority of cpusets users, since isolcpus=
> is a fairly specialized feature used for realtime purposes.
>
> This version fixes the UP compilation issue, in the same way done
> for the other cpumasks.

Applied 1, 2, and 4 to cgroup/for-4.1. I collected the acks from the
previous threads. Can you please collect them when posting new
revisions in the future?

Thanks.

--
tejun
0 new messages