Hello everyone, some updates on the scheduler:
1) In order reduce run-queue contention, we've decided to link the maches in a type of torus-esque arrangement. This means that each mach will be connected to Ndim neighbors, with Ndim set to 2 at the moment. To balance load across CPUs, each mach, then, will only be allowed to push processes its Ndim neighbors. Contrast this with the situation where multiple idle maches try to steal a process from a busy mach -- contention could be significant if the number of idle maches is large.
2) The initial load balancer has been implemented, but a lot of tweaks and mesaurements must be made. As of now, the load balancer will check for "imbalance" every balance_interval (the definition of imbalance being: a mach has load, but has idling neighbor(s), or if the load balance difference between two maches is greater than a certain percentage difference (currently 25%)). If some sort of imbalance has been determined, a double run queue lock is acquired and a mach will push a process to the target mach.
3) Lastly, tests against the standard 9atom kernel (without the load balancer and per-cpu run queues) are being run to compare performance. What I've found early on is that user time and system time has improved, but real time is lagging behind a bit. The next task is to find the bottleneck(s) and figure out why real time has slowed down.