Code Review - Packed Core allocation strategy

4 views
Skip to first unread message

Kevin Klues

unread,
Nov 25, 2015, 9:43:25 AM11/25/15
to aka...@googlegroups.com
This change introduces the packed core allocation strategy that Valmon
and I worked on over the summer. The idea is to keep cores allocated
to the same process as tightly packed as possible, while keeping cores
allocated to different processes as far away from each other as
possible.

The existing FCFS core allocation strategy is still the default, but
the new packed strategy can be selected at compile time via a config
variable (CONFIG_COREALLOC_PACKED).

The plan is to keep the existing strategy as the default for now, but
migrate to the new strategy once we're able to concretely measure the
benefits of it.

The changes in this request can be viewed online at:

https://github.com/brho/akaros/compare/abe793d...0780d03

The following changes since commit abe793d7c7507cbd6732ed2d794916ae295523ad:

Added test for devarch MSR file (2015-11-24 15:39:05 -0500)

are available in the git repository at:

g...@github.com:klueska/akaros corealloc-packed

for you to fetch changes up to 0780d0382aad2350ef3106761d41383b0a11908b:

Add implementation of packed core alloc strategy (2015-11-25 06:27:07 -0800)

----------------------------------------------------------------
Kevin Klues (1):
Add implementation of packed core alloc strategy

Kconfig | 8 +
kern/include/corealloc_packed.h | 46 ++++
kern/include/corerequest.h | 2 +
kern/src/Kbuild | 1 +
kern/src/corealloc_packed.c | 508 ++++++++++++++++++++++++++++++++++++++++
kern/src/schedule.c | 1 +
6 files changed, 566 insertions(+)
create mode 100644 kern/include/corealloc_packed.h
create mode 100644 kern/src/corealloc_packed.c

Barret Rhoden

unread,
Dec 3, 2015, 3:34:08 PM12/3/15
to aka...@googlegroups.com
Hi -

Couple minor things, listed below.


On 2015-11-25 at 06:42 Kevin Klues <klu...@gmail.com> wrote:
> This change introduces the packed core allocation strategy that Valmon
> and I worked on over the summer. The idea is to keep cores allocated
> to the same process as tightly packed as possible, while keeping cores
> allocated to different processes as far away from each other as
> possible.
>
> The existing FCFS core allocation strategy is still the default, but
> the new packed strategy can be selected at compile time via a config
> variable (CONFIG_COREALLOC_PACKED).
>
> The plan is to keep the existing strategy as the default for now, but
> migrate to the new strategy once we're able to concretely measure the
> benefits of it.
>
> The changes in this request can be viewed online at:
>
> https://github.com/brho/akaros/compare/abe793d...0780d03
>
> The following changes since commit
> abe793d7c7507cbd6732ed2d794916ae295523ad:
>
> Added test for devarch MSR file (2015-11-24 15:39:05 -0500)
>
> are available in the git repository at:
>
> g...@github.com:klueska/akaros corealloc-packed

> From eb630798aa2ac398870687ab291d1fbe04034457 Mon Sep 17 00:00:00 2001
> From: Kevin Klues <klu...@cs.berkeley.edu>
> Date: Mon, 5 Oct 2015 19:42:22 -0700
> Subject: Add implementation of packed core alloc strategy

> --- /dev/null
> +++ b/kern/src/corealloc_packed.c
> @@ -0,0 +1,508 @@

> +enum pnode_type { CORE, CPU, SOCKET, NUMA, MACHINE, NUM_NODE_TYPES };
> +static char pnode_label[5][8] = { "CORE", "CPU", "SOCKET", "NUMA", "MACHINE" };

The 5 and 8 seem a little janky. I guess 5 is NUM_NODE_TYPES and 8 is the max
of the strlen + 1?

> +/* Create a node and initialize it. This code assumes that child are created
> + * before parent nodes. */
> +static void init_nodes(int type, int num, int nchildren)
> +{
> + /* Initialize the lookup table for this node type. */
> + num_nodes[type] = num;
> + node_lookup[type] = all_nodes;
> + for (int i = CORE; i < type; i++)
> + node_lookup[type] += num_nodes[i];

This is a little mystical at first. From this, it looks like all_nodes is laid
out such that all of the nodes are: CORES, then the CPUS, then the SOCKETS, then
NUMA, then one for MACHINE. The nodes at any given layer are later broken up
into chunks of n and assigned to their parents (down below). Can yinz put
something in here that explains this layout?

> +/* Allocate a table of distances from one core to an other. Cores on the same
> + * cpu have a distance of 1; same socket have a distance of 2; same numa -> 3;
> + * same machine -> 4; */
> +static void init_core_distances(void)
> +{
> + core_distance = kzmalloc(num_cores * sizeof(int*), 0);
> + if (core_distance == NULL)
> + panic("Out of memory!\n");

Can use KMALLOC_WAIT in lieu of panicking.

> + for (int i = 0; i < num_cores; i++) {
> + core_distance[i] = kzmalloc(num_cores * sizeof(int), 0);
> + if (core_distance[i] == NULL)
> + panic("Out of memory!\n");

Can use KMALLOC_WAIT in lieu of panicking.

> + }
> + for (int i = 0; i < num_cores; i++) {
> + for (int j = 0; j < num_cores; j++) {
> + for (int k = CPU; k <= MACHINE; k++) {
> + if (i/num_descendants[k][CORE] ==
> + j/num_descendants[k][CORE]) {
> + core_distance[i][j] = k;

This took me a little bit to sort out.

Two cores are have the same level id (and thus at distance k) if their IDs
divided by the number of cores per level are the same, due to how we number our
cores. i.e with some suitably documented helper:

int get_level_id(int coreid, int level)
{
return coreid / num_descendants[level][CORE];
}

if (get_level_id(i, k) == get_level_id(j, k))
core_distance[i][j] = k;

Can yinz either use a helper or add a line or two to explain that?


> +/* Initialize any data assocaited with doing core allocation. */
> +void corealloc_init(void)
> +{
> + void *nodes_and_cores;
> +
> + /* Allocate a flat array of nodes. */
> + total_nodes = num_cores + num_cpus + num_sockets + num_numa + 1;
> + nodes_and_cores = kmalloc(total_nodes * sizeof(struct sched_pnode) +
> + num_cores * sizeof(struct sched_pcore), 0);

That kmalloc should take either KMALLOC_WAIT or check the return value.


> +/* Returns the sum of the distances from one core to all cores in a list. */
> +static int cumulative_core_distance(struct sched_pcore *c,
> + struct sched_pcore_tailq cl)
> +{
> + int d = 0;
> + struct sched_pcore *temp = NULL;
> +
> + TAILQ_FOREACH(temp, &cl, alloc_next) {

This assumes that the cl list is a list of allocated cores, not any generic
list. Yinz only use it on allocated lists, so that's fine. We probably need a
comment about that though.


> +/* Track the pcore properly when it is allocated to p. This code assumes that
> + * the scheduler that uses it holds a lock for the duration of the call. */
> +void __track_core_alloc(struct proc *p, uint32_t pcoreid)
> +{
> + struct sched_pcore *spc;
> +
> + assert(pcoreid < num_cores); /* catch bugs */
> + spc = pcoreid2spc(pcoreid);
> + assert(spc->alloc_proc != p); /* corruption or double-alloc */
> + spc->alloc_proc = p;
> + /* if the pcore is prov to them and now allocated, move lists */
> + if (spc->prov_proc == p) {
> + TAILQ_REMOVE(&p->ksched_data.crd.prov_not_alloc_me, spc, prov_next);
> + TAILQ_INSERT_TAIL(&p->ksched_data.crd.prov_alloc_me, spc, prov_next);
> + }
> + /* Actually allocate the core, removing it from the idle core list. */
> + TAILQ_INSERT_TAIL(&p->ksched_data.crd.alloc_me, spc, alloc_next);
> + TAILQ_REMOVE(&idlecores, spc, alloc_next);

The order of these two might be backwards. You probably want to remove it
before adding it to another list, esp since alloc_next is used for both lists.


> +/* Get an idle core from our pcore list and return its core_id. Don't
> + * consider the chosen core in the future when handing out cores to a
> + * process. This code assumes that the scheduler that uses it holds a lock
> + * for the duration of the call. This will not give out provisioned cores. */
> +int __get_any_idle_core(void)
> +{
> + struct sched_pcore *spc;
> + int ret = -1;
> +
> + for (int i = 0; i < num_cores; i++) {
> + struct sched_pcore *c = &all_pcores[i];
> +
> + if (spc->alloc_proc == NULL) {
> + spc->alloc_proc = UNNAMED_PROC;
> + ret = spc->core_info->core_id;

Does this code also need to do something with the refcnt, like yinz did when
initializing the scheduler and removing cores?

The steps for removing a core from consideration for allocation could use a
helper, which we can then use here. And we'd need one for returning it (for
put_idle_core).

Also, do these __get_ routines need to remove the core from the idle core list?


Barret Rhoden

unread,
Dec 13, 2017, 5:38:14 PM12/13/17
to akaros-list
Hi -

File this under "better late than never."

I merged this patch and added a few more patches to make the changes I
mentioned in the code review and to fix a couple other bugs.

For instance, the topology stuff wasn't identifying NUMA nodes properly
and was turning off - at least for all of my machines. I fixed that up
and a few other topology-related things.

The most user-visible change is that 'prov' grants cores based on a list
(like perf) instead of single items, and it can spawn processes. That
required moving some perf code to parlib.

Here's an example of prov with the new 'packed' scheduler:

bash-4.3$ prov -tc -v39-46 pthread_test 1000 1000000 10 &

bash-4.3$ prov -s
...
Core 37, prov: 0(0x0000000000000000) alloc: 0(0x0000000000000000)
Core 38, prov: 0(0x0000000000000000) alloc: 42(0xffff8000150d2100)
Core 39, prov: 42(0xffff8000150d2100) alloc: 42(0xffff8000150d2100)
Core 40, prov: 42(0xffff8000150d2100) alloc: 42(0xffff8000150d2100)
Core 41, prov: 42(0xffff8000150d2100) alloc: 42(0xffff8000150d2100)
Core 42, prov: 42(0xffff8000150d2100) alloc: 42(0xffff8000150d2100)
Core 43, prov: 42(0xffff8000150d2100) alloc: 42(0xffff8000150d2100)
Core 44, prov: 42(0xffff8000150d2100) alloc: 42(0xffff8000150d2100)
Core 45, prov: 42(0xffff8000150d2100) alloc: 42(0xffff8000150d2100)
Core 46, prov: 42(0xffff8000150d2100) alloc: 42(0xffff8000150d2100)
Core 47, prov: 0(0x0000000000000000) alloc: 42(0xffff8000150d2100)
Core 48, prov: 0(0x0000000000000000) alloc: 0(0x0000000000000000)
...

Note the scheduler gave out the provisioned cores first (as is usual).
When it looked for more cores, it gave out cores that are hyperthreaded
to existing cores (38 is a sibling of 39, and 47 is a sibling of 48).
If it gave out any more cores, it would come from the same socket as
the existing cores.

I haven't tested it much, so we'll see. Nothing requires a toolchain
rebuild, but some of the parlib headers changes actually affect it. So
rebuild if you're paranoid.


The following changes since commit ef39a1bbb55afb3e89f311e474cd8cb906cb49ca:

vmm: Provide a fast-path for IPIs in the kernel (2017-12-12 14:51:57 -0500)

are available in the Git repository at:

g...@github.com:brho/akaros.git topo

for you to fetch changes up to 3d407708b8bc4bb5f202aad7582f1a40034e37a3:

sched: Slightly fix up tests/prov (2017-12-13 17:34:46 -0500)

----------------------------------------------------------------
View this online at:
https://github.com/brho/akaros/compare/ef39a1bbb55a...3d407708b8bc

----------------------------------------------------------------
Barret Rhoden (12):
vmx: Squelch per-core startup messages
x86: Fix topology detection
sched: Remove the idle core interface
sched: Fix packed initialization
sched: When disabling SMT, turn off the odd cores
sched: Touch up the packed scheduler
parlib: Make bitmask.h more compilable
parlib: Move perf's xlib to parlib
perf: Rename the ros_ core_set code
parlib: Move core_sets to parlib
parlib: Move the provisioning of cores to a PID
sched: Slightly fix up tests/prov

Kevin Klues (1):
Add implementation of packed core alloc strategy

Kconfig | 8 +
kern/arch/x86/topology.c | 11 +-
kern/arch/x86/vmm/intel/vmx.c | 1 -
kern/include/corealloc_packed.h | 46 ++
kern/include/corerequest.h | 11 +-
kern/include/schedule.h | 10 -
kern/src/Kbuild | 1 +
kern/src/corealloc_fcfs.c | 82 +---
kern/src/corealloc_packed.c | 468 +++++++++++++++++++++
kern/src/schedule.c | 27 +-
tests/prov.c | 122 ++----
tools/dev-util/perf/Makefile | 2 +-
tools/dev-util/perf/akaros.h | 50 ---
tools/dev-util/perf/perf.c | 27 +-
tools/dev-util/perf/perf_core.c | 2 +-
tools/dev-util/perf/perf_core.h | 2 +-
tools/dev-util/perf/xlib.c | 149 -------
tools/dev-util/perf/xlib.h | 14 +-
.../perf/akaros.c => user/parlib/core_set.c | 47 ++-
user/parlib/include/parlib/bitmask.h | 21 +-
user/parlib/include/parlib/core_set.h | 43 ++
user/parlib/include/parlib/parlib.h | 2 +
user/parlib/include/parlib/x86/bitmask.h | 20 +-
user/parlib/include/parlib/xlib.h | 26 ++
user/parlib/parlib.c | 19 +
user/parlib/xlib.c | 163 +++++++
26 files changed, 894 insertions(+), 480 deletions(-)
create mode 100644 kern/include/corealloc_packed.h
create mode 100644 kern/src/corealloc_packed.c
delete mode 100644 tools/dev-util/perf/akaros.h
rename tools/dev-util/perf/akaros.c => user/parlib/core_set.c (68%)
create mode 100644 user/parlib/include/parlib/core_set.h
create mode 100644 user/parlib/include/parlib/xlib.h
create mode 100644 user/parlib/xlib.c

Thanks,
Barret

Kevin Klues

unread,
Dec 13, 2017, 7:25:58 PM12/13/17
to aka...@googlegroups.com
Nice! Valmon‘s legacy lives on.

--
You received this message because you are subscribed to the Google Groups "Akaros" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akaros+un...@googlegroups.com.
To post to this group, send email to aka...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
~Kevin

Barret Rhoden

unread,
Dec 14, 2017, 2:11:58 PM12/14/17
to aka...@googlegroups.com
On 2017-12-14 at 00:25 Kevin Klues <klu...@gmail.com> wrote:
> Nice! Valmon‘s legacy lives on.

Indeed!

Merged to master at ef39a1bbb55a..3751ae416e9f (from, to]

You can see the entire diff with 'git diff' or at
https://github.com/brho/akaros/compare/ef39a1bbb55a...3751ae416e9f

Reply all
Reply to author
Forward
0 new messages